Chapter 7 Doctest in Python

In Python, there are multiple packages to define unit tests or to conduct advanced testing procedures. In my optionion, the easiest to use is doctest, which allows defining interactive Python examples to be included in documentation and used as tests.

7.1 Configuration

Before we dive into technical details, let us think about the usage scenarios with user-defined functions.

  • On one hand, we develop functions: program and test them.
  • On the other hand, we want to use or reuse them in other programs.
    In this situation, we will import function definitions, but we do not want to test them.

To have this distinction, we can use conditional statement on value of the built-in variable __name__ (see the lecture on functions). In the file, where we define functions, we have to add this conditional execution of tests. Only if __name__ == "__main__", which means that we are running the file with function definitions, we will run and activate doctest.

# Here will come function definitions including test cases...   

if __name__ == "__main__":
    import doctest     # importing doctest module 
    doctest.testmod()  # activating test mode

Remark: Typically, all modules are imported at the top of a file. Here, we need doctest only for particular usage scenario, so we can make an exception and import this module inside the if-branch.

If we can determine how much information we want to see when we run the test cases by setting the verbose parameter either to True or False, when calling testmod().

If verbose is set to True, we will get a full report on test cases. For the example cases from the previous chapter, I used verbose mode to obtain full report, as here:

# test case in Python (doctest)
"""
>>> 1 + 2
3  
>>> 1 + 2
2  
"""
## Finding tests in Example
## Trying:
##     1 + 2
## Expecting:
##     3
## ok
## Trying:
##     1 + 2
## Expecting:
##     2
## **********************************************************************
## Line 4, in Example
## Failed example:
##     1 + 2
## Expected:
##     2
## Got:
##     3

If verbose is set to False, we will only get a report on failed cases. For the same example test cases, we will get the following report:

## **********************************************************************
## Line 4, in Example
## Failed example:
##     1 + 2
## Expected:
##     2
## Got:
##     3

If all test cases are OK, we get no report at all. Therefore, to be sure that we run the test we can set verbose to True.

It is also possible to have test cases defined in separate file(s) and use doctest.testfile().

7.2 Test case

Let us take a closer look at definition, interpretation and result of a test case in a very simple example.

Test case definition

A test case in doctest looks like a step in an interactive Python interpreter. Python’s statements come after the >>> prompt and below them we have the interpreter’s response. The difference is that the test cases are defined within the docstring string, so it means they are within a multiline string in triple quotation marks.

Let us start with the same example as in the previous chapter.

"""
>>> 1 + 2
3  
"""

In this simple test case, we have one line with Python statement to be evaluated and the second line with the expected result.

Test case interpretation

If doctest is imported and testmod() is called, a Python interpreter will look for documentation strings containing >>>. For each such string a new environment (frame) is created with all globally visible names (variables, functions). Therefore, separately defined tests do not interfere.

A sequence of lines starting with >>> is evaluated and the environment is updated.

In our example, there is only one line, namely 1 + 2. It is evaluated and 3 is stored as the obtained result.

A sequence of lines not starting with >>> is interpreted as the expected result. The sequence ends with the next line starting with >>>, a blank line or the end of the documentation string.

In our example, there is only one line, namely 3, which is stored as the expected result.

Next the comparison of obtained and expected result is done. Results as compared as strings, not as objects. Therefore, they must be character exact. If the are equal, the test case will have status pass, otherwise fail.

In our example, 3 (as the obtained result) will be compared with 3 (as the expected result). As they are equal, the test case will be OK (pass).

Test case result / report

The doctest will show a report for a single test case or multiple test cases. For multiple test cases a summary can be shown as well, containing a number of failed test cases and the total number of test cases. Interactive reports are also shown by integrated development environments, e.g. PyCharm, where you can click on the test case and jump to the its definition.

In our example, the test case was OK (passed), as it can be seen in the below verbose report.

## Finding tests in Example
## Trying:
##     1 + 2
## Expecting:
##     3
## ok

7.3 Example test cases

In this section, we will use very basic Python statements to focus on the testing mechanism. Again, you do not need to test built-in functionality, which was used here to have possibly simplest examples. In the next section, we will have examples of testing a user-defined function.

Multiple statements

Similarly, like in an interactive Python interpreter, we can execute several statements before we obtain a result.

"""
>>> a = 1 
>>> b = 2
>>> a + b
3  
"""
## Finding tests in Example
## Trying:
##     a = 1 
## Expecting nothing
## ok
## Trying:
##     b = 2
## Expecting nothing
## ok
## Trying:
##     a + b
## Expecting:
##     3
## ok

From the verbose report, we can see that after the first two lines we just left no empty line, so it means we expected nothing, and indeed we got nothing from the interpreter. When we have multiple lines for one test case, in the report we see that we have several tests passed. In this example, the only relevant information is that the last result was as expected.

Multi-line expected result

Similarly, we can have a multiline expected result. The end of the expected result is either an empty line or a new test case. For example:

"""
>>> print("Hello\nworld") 
Hello
world
"""
## Finding tests in Example
## Trying:
##     print('Hello\nworld') 
## Expecting:
##     Hello
##     world
## ok

One important remark here, doctest compares expected result and obtained result as strings. By default, it is character exact, so if we have leading or trailing spaces the result will differ. For example, when expecting Hello, but getting Hello, the test will fail.

It is possible to change normalize whitespaces to have a bit more moderate form of comparison, see NORMALIZE_WHITESPACE option flag.

Calling a function

Obviously, in the doctest string, we can call any visible function, for example a built-in function:

"""
>>> int('32')
32 
"""
## Finding tests in Example
## Trying:
##     int('32')
## Expecting:
##     32
## ok

Here again, it is important to remember that the expected and obtained results are compared as strings. Even if 32 == 32.0, this test would fail if we would expect 32.0.

Expecting an exception

We can also test if a function will throw an exception. To test an exception, we need to let doctest know that there will be a traceback information, so we need to

  • include the traceback header line as the first line of the expected result,
  • the details of the traceback may be skipped or replaced by a placeholder,
  • include the last line from the traceback with error type and error message.

For example:

"""
>>> 1 / 0
Traceback (most recent call last):
    ...
ZeroDivisionError: division by zero
"""
## Finding tests in Example
## Trying:
##     1 / 0
## Expecting:
##     Traceback (most recent call last):
##         ...
##     ZeroDivisionError: division by zero
## ok

Comparing unordered collections

This paragraph goes a bit ahead, so you may come back to this example later.

As doctest compares results as strings, it is a bit tricky to compare unordered collections as, for example, dictionaries.

If the obtained and expected results are equal dictionaries, but in a different order, our test case will fail. For example:

"""
>>> {1: "first", 2: "second"}
{2: "second", 1: "first"}
"""
## Finding tests in Example
## Trying:
##     {1: "first", 2: "second"}
## Expecting:
##     {2: "second", 1: "first"}
## **********************************************************************
## Line 2, in Example
## Failed example:
##     {1: "first", 2: "second"}
## Expected:
##     {2: "second", 1: "first"}
## Got:
##     {1: 'first', 2: 'second'}

A way around would be to let Python compare the structures not doctest. For the above example, we can define the alternative test case as follows:

"""
>>> {1: "first", 2: "second"} == {2: "second", 1: "first"}
True 
"""
## Finding tests in Example
## Trying:
##     {1: "first", 2: "second"} == {2: "second", 1: "first"}
## Expecting:
##     True
## ok

Of course, in a real test case the first dictionary will be obtained as a result of calling a user-defined function.

7.4 Test driven development by example

Now let us take a more sophisticated example and define a conversion function from meters to feet. To understand the development process of this function, we will do it step by step. First, we need to read the description (docstring) and start defining test cases and a program.

def meters2feet(a_distance):
    """
    Calculates feet based on a distance (parameter) given in meters.
    
    :param a_distance: distance in meters 
    :type a_distance: float

    :return: distance in feet
    :rtype: float
    """
    pass

Core functionality

We can start with a functional test and empty function.

def meters2feet(a_distance):
    """
    Calculates feet based on a distance (parameter) given in meters.
    
    :param a_distance: distance in meters 
    :type a_distance: float

    :return: distance in feet
    :rtype: float
    
    >>> meters2feet(1.0)
    3.28084

    """
    pass
## Finding tests in meters2feet
## Trying:
##     meters2feet(1.0)
## Expecting:
##     3.28084
## **********************************************************************
## Line 2, in meters2feet
## Failed example:
##     meters2feet(1.0)
## Expected:
##     3.28084
## Got nothing

The function will fail the test case (1/1).

We can just return a hard coded number (in most cases, a very bad practice!) to pass the test.

def meters2feet(a_distance):
    """
    Calculates feet based on a distance (parameter) given in meters.
    
    :param a_distance: distance in meters 
    :type a_distance: float

    :return: distance in feet
    :rtype: float
    
    >>> meters2feet(1.0)
    3.28084

    """
    print(3.28084)
## Finding tests in meters2feet
## Trying:
##     meters2feet(1.0)
## Expecting:
##     3.28084
## ok

Obviously, the function will pass the test case now (0/1). We are obviously not done. Our function is rather useless and having one test case is usually not enough. Let’s add more functional tests and include a calculation.

def meters2feet(a_distance):
    """
    Calculates feet based on a distance (parameter) given in meters.
    
    :param a_distance: distance in meters 
    :type a_distance: float

    :return: distance in feet
    :rtype: float
    
    Examples:

    specific values
    
    >>> meters2feet(0)
    0.0

    >>> meters2feet(1/3.28084)
    1.0

    >>> meters2feet(1.0)
    3.28084

    """
    print(3.28084 * a_distance) 
## Finding tests in meters2feet
## Trying:
##     meters2feet(0)
## Expecting:
##     0.0
## ok
## Trying:
##     meters2feet(1/3.28084)
## Expecting:
##     1.0
## ok
## Trying:
##     meters2feet(1.0)
## Expecting:
##     3.28084
## ok

It seems that we are doing well (0/3). But we are just printing our result, not returning it, so we will ne not able to use converted value in our program (outside the function). We need to add a test case to check the type of returned value. It should be float as specified in docstring.

def meters2feet(a_distance):
    """
    Calculates feet based on a distance (parameter) given in meters.
    
    :param a_distance: distance in meters 
    :type a_distance: float

    :return: distance in feet
    :rtype: float
    
    Examples:

    specific values
    
    >>> meters2feet(0)
    0.0

    >>> meters2feet(1/3.28084)
    1.0

    >>> meters2feet(1.0)
    3.28084

    type of returned value 

    >>> type(meters2feet(1)) == float 
    True
    
    """
    print(3.28084 * a_distance) 
## Finding tests in meters2feet
## Trying:
##     meters2feet(0)
## Expecting:
##     0.0
## ok
## Trying:
##     meters2feet(1/3.28084)
## Expecting:
##     1.0
## ok
## Trying:
##     meters2feet(1.0)
## Expecting:
##     3.28084
## ok
## Trying:
##     type(meters2feet(1)) == float 
## Expecting:
##     True
## **********************************************************************
## Line 8, in meters2feet
## Failed example:
##     type(meters2feet(1)) == float 
## Expected:
##     True
## Got:
##     3.28084
##     False

After adding this new test case, our function will fail again (1/4). We need to return calculated value instead of printing it.

def meters2feet(a_distance):
    """
    Calculates feet based on a distance (parameter) given in meters.
    
    :param a_distance: distance in meters 
    :type a_distance: float

    :return: distance in feet
    :rtype: float
    
    Examples:

    specific values
    
    >>> meters2feet(0)
    0.0

    >>> meters2feet(1/3.28084)
    1.0

    >>> meters2feet(1.0)
    3.28084

    type of returned value 

    >>> type(meters2feet(1)) == float 
    True
    
    """
    return 3.28084 * a_distance
## Finding tests in meters2feet
## Trying:
##     meters2feet(0)
## Expecting:
##     0.0
## ok
## Trying:
##     meters2feet(1/3.28084)
## Expecting:
##     1.0
## ok
## Trying:
##     meters2feet(1.0)
## Expecting:
##     3.28084
## ok
## Trying:
##     type(meters2feet(1)) == float 
## Expecting:
##     True
## ok

Now, we passed all test cases defined so far (0/4).

Error handling

If we want to have domain-specific messages for wrong arguments of our function, we need to define error handling.

We will start with an invalid value, defining both a test case and extending the function.

def meters2feet(a_distance):
    """
    Calculates feet based on a distance (parameter) given in meters.
    
    :param a_distance: distance in meters 
    :type a_distance: float

    :return: distance in feet
    :rtype: float
    
    Examples:

    invalid argument value 
    >>> meters2feet(-0.1)
    Traceback (most recent call last):
    ValueError: The distance must be a non-negative number.
    
    specific values
    
    >>> meters2feet(0)
    0.0

    >>> meters2feet(1/3.28084)
    1.0

    >>> meters2feet(1.0)
    3.28084

    type of returned value 

    >>> type(meters2feet(1)) == float 
    True
    
    """
    if type(a_distance) != float and type(a_distance) != int:
       raise TypeError("The distance must be a number.")
    if a_distance < 0:
       raise ValueError("The distance must be a non-negative number.")
    return 3.28084 * a_distance 
## Finding tests in meters2feet
## Trying:
##     meters2feet(-0.1)
## Expecting:
##     Traceback (most recent call last):
##     ValueError: The distance must be a non-negative number.
## ok
## Trying:
##     meters2feet(0)
## Expecting:
##     0.0
## ok
## Trying:
##     meters2feet(1/3.28084)
## Expecting:
##     1.0
## ok
## Trying:
##     meters2feet(1.0)
## Expecting:
##     3.28084
## ok
## Trying:
##     type(meters2feet(1)) == float 
## Expecting:
##     True
## ok

Now, we pass all test cases again (0/5 failed).

Now, we will deal with the wrong type. We will accept both numeric types: float and integer.

def meters2feet(a_distance):
    """
    Calculates feet based on a distance (parameter) given in meters.
    
    :param a_distance: distance in meters 
    :type a_distance: float

    :return: distance in feet
    :rtype: float
    
    Examples:

    invalid argument type   
    >>> meters2feet("1")
    Traceback (most recent call last):
    TypeError: The distance must be a number.

    invalid argument value 
    >>> meters2feet(-0.1)
    Traceback (most recent call last):
    ValueError: The distance must be a non-negative number.
    
    specific values
    
    >>> meters2feet(0)
    0.0

    >>> meters2feet(1/3.28084)
    1.0

    >>> meters2feet(1.0)
    3.28084

    type of returned value 

    >>> type(meters2feet(1)) == float 
    True
    
    """
    if a_distance < 0:
        raise ValueError("The distance must be a non-negative number.")
    if type(a_distance) != float and type(a_distance) != int:
        raise TypeError("The distance must be a number.")
    return 3.28084 * a_distance 
## Finding tests in meters2feet
## Trying:
##     meters2feet("1")
## Expecting:
##     Traceback (most recent call last):
##     TypeError: The distance must be a number.
## **********************************************************************
## Line 2, in meters2feet
## Failed example:
##     meters2feet("1")
## Expected:
##     Traceback (most recent call last):
##     TypeError: The distance must be a number.
## Got:
##     Traceback (most recent call last):
##       File "//usr/lib/python3.8/doctest.py", line 1336, in __run
##         exec(compile(example.source, filename, "single",
##       File "<doctest meters2feet[0]>", line 1, in <module>
##         meters2feet("1")
##       File "<string>", line 40, in meters2feet
##     TypeError: '<' not supported between instances of 'str' and 'int'
## Trying:
##     meters2feet(-0.1)
## Expecting:
##     Traceback (most recent call last):
##     ValueError: The distance must be a non-negative number.
## ok
## Trying:
##     meters2feet(0)
## Expecting:
##     0.0
## ok
## Trying:
##     meters2feet(1/3.28084)
## Expecting:
##     1.0
## ok
## Trying:
##     meters2feet(1.0)
## Expecting:
##     3.28084
## ok
## Trying:
##     type(meters2feet(1)) == float 
## Expecting:
##     True
## ok

Our function now fails in one test case (1/6). Any idea why?

The order should be always, at first checking types, next values (content). For the final solution we need to swap the order in the function definition.

Final solution

The following definition we can consider as the final solution for this example (0/6). If you would like to extend it, you could add a test case and modify the function.

def meters2feet(a_distance):
    """
    Calculates feet based on a distance (parameter) given in meters.
    
    :param a_distance: distance in meters 
    :type a_distance: float

    :return: distance in feet
    :rtype: float
    
    Examples:

    invalid argument type   
    >>> meters2feet("1")
    Traceback (most recent call last):
    TypeError: The distance must be a number.

    invalid argument value 
    >>> meters2feet(-0.1)
    Traceback (most recent call last):
    ValueError: The distance must be a non-negative number.
    
    specific values
    
    >>> meters2feet(0)
    0.0

    >>> meters2feet(1/3.28084)
    1.0

    >>> meters2feet(1.0)
    3.28084

    type of returned value 

    >>> type(meters2feet(1)) == float 
    True
    
    """
    if type(a_distance) != float and type(a_distance) != int:
       raise TypeError("The distance must be a number.")
    if a_distance < 0:
       raise ValueError("The distance must be a non-negative number.")
    return 3.28084 * a_distance 


if __name__ == "__main__":
    import doctest     # importing doctest module 
    doctest.testmod()  # activating test mode
## Finding tests in meters2feet
## Trying:
##     meters2feet("1")
## Expecting:
##     Traceback (most recent call last):
##     TypeError: The distance must be a number.
## ok
## Trying:
##     meters2feet(-0.1)
## Expecting:
##     Traceback (most recent call last):
##     ValueError: The distance must be a non-negative number.
## ok
## Trying:
##     meters2feet(0)
## Expecting:
##     0.0
## ok
## Trying:
##     meters2feet(1/3.28084)
## Expecting:
##     1.0
## ok
## Trying:
##     meters2feet(1.0)
## Expecting:
##     3.28084
## ok
## Trying:
##     type(meters2feet(1)) == float 
## Expecting:
##     True
## ok