I think there is no need to say that automatic testing is a must in software development. Tests improve the quality of the code, give confidence in it and make it more stable. However, not all tests are created equal: they differ both in what they test and how hard it is to set them up and maintain.

For instance, it is relatively easy to write unit tests. In turn, they tell you very little about how your system will behave in production. Do not get me wrong, they are very useful, especially in testing a complicated piece of business logic. However, they are too low-level to give much confidence that the code will run properly in production.

Here end-to-end tests come in: you test how your system responds to simulated user actions as a whole. Below we will look at the main advantages and disadvantages of this kind of tests and how to build them for a Telegram bot, using as an example listOK – bot for shopping and simple to-do lists (project page).

Pros and cons of end-to-end tests

Main advantages of end-to-end tests:

  1. They test the system as a whole. Your system likely consists of many moving parts: code itself, load balancer, reverse proxy, database, web-server, cache, queues, storage, etc., let alone microservices. If any of these parts break, the whole system may fail.

  2. They test the code in an environment close to production. Sometimes an obscure difference in environment configuration changes the behavior of the system.

  3. They can test interactions with third parties. If your product consumes other services (e.g. APIs), it is beneficial to test that it works for the user. Of course, you can mock them, but mocks tell you little about the real-life performance of the system.

  4. They test real user scenarios. Users are not interested whether some getter or setter deep in your code is working properly, they need results. Some parts of your code may even fail, but it is still a success if the users get what they came for.

That is why end-to-end tests tell you much more about how your code behaves in the wild and give you more confidence that it will work as planned.

However, everything comes at a cost and end-to-end tests are no exception:

  1. End-to-end tests can be a pain to set up and maintain. Plus it often requires a broader set of technologies, which are not needed in production.

  2. They test a lot of things at once, so if they fail, it can be difficult to untangle what happened.

  3. These tests often involve third-party products, which you have little to no control over. However, so does the production environment: users also use third-party tools to access your product.

  4. They can easily break when you change something in the secondary details of the product since they rely on them for testing (for instance, whether a particular text or HTML element is present on a web page).

  5. End-to-end tests can take much more time, than other types of tests. For instance, unit tests in the bot we show below take maybe 10-20 seconds, while end-to-end tests take around four minutes (you can scale it to your product). No surprise here, the scope of these tests is very different. This is more of a nuisance than a real problem.

I would argue, that despite their difficulties, end-to-end tests are worth every penny, especially, if the system is actively developed. Of course, the depth and breadth of tests will depend on the project and requirements to its reliability, cost of errors, etc., but it is definitely something to consider.

The architecture of chatbot end-to-end tests

It is relatively easy to make end-to-end tests for backend API – you spin up a test server and test requests. It is a bit more tricky for the web frontend – you have to use some kind of browser emulator, for instance, Selenium or Cypress . In addition, the frontend is less formalized, compared to an API, and these tests tend to break on layout changes which even do not affect functionality.

The approach to end-to-end tests of chatbots is similar to the web frontend. The closest to end-to-end tests for Telegram bots I found on the internet were How to Write Integration Tests for a Telegram Bot and some JS mocking libraries.

To start building tests we need to answer two design questions.

How to run end-to-end tests?

When building end-to-end tests we have two options: mock Telegram API used by the bot or use the real Telegram API. Using real API is much better:

  • creating a full-scale mock is cumbersome and can be a project of its own

  • it is too artificial, making the testing environment nothing like production. After running mocked tests, you cannot be sure that the bot would behave the same in production.

Therefore, we will use real Telegram integration to test as a user, not unlike Selenium frontend testing which automates real browsers.

How to run the bot for tests?

The bot and the tests can run together – the tests will spin up the project – or separately when the test instance is running independently. Both approaches have their pros and cons and will depend on the project nature, team size, development practices. It may even make sense to use both, one for day-to-day development and another to test releases.

I opted for running the bot and tests together – in my case, it was easier to set up and allowed for easier integration of these tests into CI/CD pipeline.

Setting up the testing environment

The examples below require at least Python 3.8 with the following libraries:

ℹ️ The code below is not intended to be run as-is. I omitted some parts for clarity and brevity since they are too project-dependent and not required to understand the code.

Get credentials for Telegram API

To set up the testing environment we will need two sets of credentials:

  • for test bot via BotFather , and

  • for Telegram API to log in as a user. I would strongly recommend registering a separate testing Telegram account for safety reasons.

To get credentials for accessing Telegram API as a user you need first to log into the Telegram developer portal using the phone number used in the Telegram account and an access code sent to that account. After that you can get api_id and api_hash to access Telegram API in section API development tools.

On the first log-in, Telegram will ask for an access code (second factor) by sending the code to the account requesting access. To skip this step in the future we need to save the session and provide it for later log-ins:

# tests/e2e/get_session_string.py

from telethon import TelegramClient
from telethon.sessions import StringSession


api_id = "YOUR API ID"
api_hash = "YOUR API HASH"


with TelegramClient(StringSession(), api_id, api_hash) as client:
    print("Session string:", client.session.save())

You need to run this code once, save the session string and use it in the tests.

⚠️ api_id, api_hash, and the session string must be kept secret. Treat then as your login and password.

Prepare bot for tests

First, we need to modify the bot. While it can be run in a separate thread as is, I found no clear way of programmatically stopping the bot after all tests have been completed. This is why we need to introduce a stop_event. It will not be sued in production, but when running a test it will allow stopping the bot from the main thread.

# list_bot/main.py

from threading import Event
from typing import Optional

from dotenv import load_dotenv
from telegram import Update
from telegram.ext import Updater

from list_bot.core import config, db
from list_bot.logic.conversations import (
    start_conversation_handler
)


def start_bot(stop_event: Optional[Event] = None) -> None:
    # Load config and connect to DB
    config.load_config()
    db.connect()

    # Set up bot
    updater = Updater(token=config.TELEGRAM_TOKEN)
    dispatcher = updater.dispatcher

    # Register handlers
    dispatcher.add_handler(start_conversation_handler)
    # Other handlers

    # Start listening for updates
    updater.start_polling()

    # Stop manually on event or use built-in idle()
    if stop_event is not None:
        if stop_event.wait():
            updater.stop()
    else:
        updater.idle()

Test fixture for running the bot

The next step is to set up the bot fixture that will be run automatically for the whole duration of the tests.

# tests/conftest.py

import asyncio
import os
import threading

import pytest
from list_bot.main import start_bot
from telethon import TelegramClient
from telethon.sessions import StringSession
from telethon.tl.custom.conversation import Conversation

from tests import config
from tests.integration.helpers import wait


@pytest.fixture(autouse=True, scope="session")
def bot():
    """Start bot to be tested."""
    os.environ["ENV_FILE"] = config.INTEGRATION_ENV_PATH
    stop_event = threading.Event()
    thread = threading.Thread(target=start_bot, kwargs={"stop_event": stop_event})
    thread.start()
    yield
    stop_event.set()
    thread.join()


# Default event_loop fixture has "function" scope and will
# through ScopeMismatch exception since there are related
# fixtures with "session" scope. Need to override to set scope.
# https://github.com/pytest-dev/pytest-asyncio#event_loop
@pytest.fixture(scope="session")
def event_loop():
    loop = asyncio.get_event_loop()
    yield loop
    loop.close()

Telegram client fixtures

Now we need fixtures that will connect to Telegram API on the other side – as a user.

@pytest.fixture(scope="session")
async def telegram_client():
    """Connect to Telegram user for testing."""
    load_dotenv(dotenv_path=config.INTEGRATION_ENV_PATH)
    api_id = int(os.environ.get("TELEGRAM_APP_ID"))
    api_hash = os.environ.get("TELEGRAM_APP_HASH")
    session_str = os.environ.get("TELEGRAM_APP_SESSION")

    client = TelegramClient(
        StringSession(session_str), api_id, api_hash, sequential_updates=True
    )
    await client.connect()
    await client.get_me()
    await client.get_dialogs()

    yield client

    await client.disconnect()
    await client.disconnected


@pytest.fixture(scope="session")
async def conv(telegram_client) -> Conversation:
    """Open conversation with the bot."""
    async with telegram_client.conversation(
        config.BOT_NAME, timeout=10, max_messages=10000
    ) as conv:
        conv: Conversation

        # These are bot-specific preparation steps. In listOK /start
        # command registers the user and sends a welcome message and
        # a list of user's lists with the main menu. These messages
        # must be awaited before yielding control to tests.
        await conv.send_message("/start")
        await conv.get_response()  # Welcome message
        await conv.get_response()  # User lists
        wait()
        yield conv

Notes about conv fixture:

  • Strictly speaking, it is not required: each test can do it on its own. However, doing it here avoids code repetition.

  • With scope="session" all tests will use the same conversation object. It is a tradeoff between tests speed and their isolation. In my case, it’s worth it since isolation is already imperfect: I do not clean the database on each test.

  • timeout=10 seconds – needed to break test faster if bot becomes unresponsive. The default timeout is 60 seconds.

  • max_messages=10000 – by default telethon conversation can handle only 100 messages and throws ValueError after that (documentation ). 10,000 messages for tests should be enough.

Helper functions

The code above should be enough to start writing end-to-end tests. However, many tests involve similar steps and actions which can be moved to helper functions to avoid repetition and make tests more versatile.

I wrote first tests without these functions and abstracted common actions later, when the code repetition became painful enough.

General-purpose helpers

# tests/helpers.py

import random
import string
from time import sleep
from typing import Optional

from telethon.tl.custom.message import Message, MessageButton

from tests import config


# Used to generate random unique names for lists and items.
# This way you do not need to think about clashing names
# when you do not clear DB after each test.
def random_string(length=16):
    """Return random string of ASCII letters in both registers."""
    return ''.join(
        random.choice(string.ascii_letters) for _ in range (length)
    )


# It is useful to pause from time to time, otherwise
# tests may start failing due to weird latency effects.
# Experimentally I arrived at a delay of 0.5 seconds.
def wait():
    """Sleep of fixed duration (in tests.config)."""
    sleep(config.DELAY)


# Simplifies the most frequent action - look for a button
# with a given text either to check that it exists or click it.
def get_button_with_text(
    message: Message, text: str, strict: bool = False
) -> Optional[MessageButton]:
    """Return MessageButton from Message with text or None."""
    if message.buttons is None:
        return None

    for row in message.buttons:
        for button in row:
            if strict:
                is_match = text == button.text
            else:
                is_match = text in button.text
            if is_match:
                return button

    return None

Bot-specific helpers

In addition to general-purpose helpers I use bot-specific functions for frequent bot actions: get user lists, create a list, open a list, etc.

# tests/e2e/test_lists.py

from telethon.tl.custom.conversation import Conversation
from telethon.tl.custom.message import Message, MessageButton

from tests import config
from tests.helpers import random_string


async def get_user_lists(conv: Conversation) -> Message:
    """Return message with user's lists and main menu."""
    await conv.send_message("/my_lists")
    wait()
    return await conv.get_response()


async def create_list(conv: Conversation) -> str:
    """Create list with random name and return it."""
    user_lists = await get_user_lists(conv)

    await user_lists.buttons[-1][0].click()
    await conv.get_edit()

    list_name = random_string()
    await conv.send_message(list_name)

    # Wait for confirmation
    await conv.get_response()
    # Wait for user's lists
    await conv.get_response()

    return list_name


async def open_list(conv: Conversation, list_name: str) -> Message:
    """Open list with a given name and return message with its contents."""
    await conv.send_message("/my_lists")
    wait()
    user_lists: Message = await conv.get_response()

    button = get_button_with_text(user_lists, list_name)
    await button.click()
    wait()
    message: Message = await conv.get_edit()
    wait()

    return message

Writing end-to-end tests

Finally, we have everything ready for our first test:

@pytest.mark.asyncio
async def test_command_my_lists(conv: Conversation):
    """Test /my_lists bot command."""
    await conv.send_message("/my_lists")
    user_lists: Message = await conv.get_response()

    # Check that the message contains necessary text
    assert "Choose list or action" in user_lists.text
    # Check that there is a button inviting a user to create a list
    assert get_button_with_text(user_lists, "Create new list") is not None

This is a very basic test, it just checks that the command works and that in response the bot sends a message with user’s lists.

Something more involved with several helpers:

@pytest.mark.asyncio
async def test_create_list_item(conv: Conversation):
    """Test creating an item in a list."""
    list_name = await create_list(conv)
    await open_list(conv, list_name)

    item_name = random_string()
    await conv.send_message(item_name)

    # Wait for confirmation
    await conv.get_response()
    # Wait for updated list contents
    list_contents: Message = await conv.get_response()

    assert list_contents.button_count == 3
    assert get_button_with_text(list_contents, item_name) is not None

While listOK is a rather small bot, it is covered by about fifty end-to-end tests, that take up to four minutes to run. If I decreased wait() duration they would run faster, but some strange effects happen from time to time, especially in CI/DC pipeline. So I decided to keep them not too fast.

While it took me several hours to set up this testing environment and then some more to write tests it was worth it. Now it is easy to write new tests and they catch nearly all issues in covered user scenarios, allowing me to confidently refactor and add new functionality.

What is more, these tests are highly readable and I caught myself gravitating to test-driven development more than usual, first modeling the integration in tests and then implementing it.