Python comprehensions are believed to be faster for generating collections than for loops. The question is why are they faster? What part of a comprehension is faster than in the loop? Below we answer this question and whether their faster performance is something worth considering.

This will be more of an exploratory exercise, where the journey and methodology are more interesting than the destination. If you need only the conclusions, you can jump there.

For Python comprehensions syntax please see Python comprehensions breakdown

Are comprehensions faster than loops?

First, we need to establish the baseline: are comprehensions indeed faster than loops, and if yes, by how much. We will use two functions doing the same thing with both methods – generating a list of integers from 0 to 999,999 – and timeit module to gauge their performance which allows to run code multiple times and returns how long it took in total:

from timeit import timeit

def create_list_with_loop():
    lst = []
    for i in range(1000000):
        lst.append(i)
    return lst

print(timeit(create_list_with_loop, number=100))
# 8.161124781006947 (seconds)


def create_list_with_comprehensions():
    # Assign list to intermediate variable instead of returning it directly
    # to make the comparison more fair
    lst = [i for i in range(1000000)]
    return lst

print(timeit(create_list_with_comprehensions, number=100))
# 5.451697067008354

This is a 33% difference in performance, nothing to sneeze at. The exact numbers will differ depending on the hardware and its load when conducting the experiments. We will see how set and dictionary comprehensions behave later.

Experiment 1: Speed up adding items to a list

To add an item to a list we are calling the list.append method. However, this method is a function reference that the interpreter has to look up on every iteration of the loop. To make sure that this is the case we need to use another Python library – dis and its function under the same name:

from dis import dis

dis(create_list_with_loop)

dis returns instructions that the Python interpreter generates from your code.

  4           0 BUILD_LIST                0
              2 STORE_FAST                0 (lst)

  5           4 LOAD_GLOBAL               0 (range)
              6 LOAD_CONST                1 (1000000)
              8 CALL_FUNCTION             1
              10 GET_ITER
         >>   12 FOR_ITER                14 (to 28)
              14 STORE_FAST               1 (i)

  6           16 LOAD_FAST                0 (lst)
              18 LOAD_METHOD              1 (append)
              20 LOAD_FAST                1 (i)
              22 CALL_METHOD              1
              24 POP_TOP
              26 JUMP_ABSOLUTE           12

  7      >>   28 LOAD_FAST                0 (lst)
              30 RETURN_VALUE

Here on line 12, we see LOAD_METHOD (append) which is an instruction to load method of an object. It is between the instructions FOR_ITER and JUMP_ABSOLUTE so it repeats on each iteration.

We can speed it up by assigning list.append to a variable and calling it instead of the method (idea from here ):

def create_list_with_loop_without_call():
    lst = []
    append = lst.append
    for i in range(1000000):
        append(i)
    return lst

print(timeit(create_list_with_loop_without_call, number=100))
# 7.231394850998186

Calling append turned out to be 11% faster than calling a list.append method. However, it explains only a third of the difference between the for loop and a list comprehension. There is something else.

Note: Assigning the list.append method to a variable reduces readability and, therefore, is not recommended, unless it gives a meaningful performance boost.

Experiment 2: Test pure list iteration

To isolate the difference in performance, let’s test pure iteration without adding anything to the list. To do it we will add an if False statement which should lead to the resulting lists being empty:

def do_nothing_in_for_loop():
    lst = []
    for i in range(1000000):
        if False:
            lst.append(i)
    return lst

print(timeit(do_nothing_in_for_loop, number=100))
# 1.9555912280338816


def do_nothing_in_comprehension():
    lst = [i for i in range(1000000) if False]
    return lst

print(timeit(do_nothing_in_comprehension, number=100))
# 2.382393566018436

What? The for loop turned out to be faster. How come?

It turns out that the Python interpreter is smart: if it sees an if False statement, it skips it and all of its contents altogether as if they were not there. We can see it in the dis output:

  2           0 BUILD_LIST               0
              2 STORE_FAST               0 (lst)

  3           4 LOAD_GLOBAL              0 (range)
              6 LOAD_CONST               1 (1000000)
              8 CALL_FUNCTION            1
             10 GET_ITER
        >>   12 FOR_ITER                 4 (to 18)
             14 STORE_FAST               1 (i)

  5          16 JUMP_ABSOLUTE           12

  6     >>   18 LOAD_FAST                0 (lst)
             20 RETURN_VALUE

The interpreter just skipped lines 4 and 5. Now a list comprehension does more stuff compared to the loop which explains its slower performance.

We can fix it by assigning False to a variable which would prevent this optimization:

false = False

def do_nothing_in_for_loop_v2():
    lst = []
    for i in range(1000000):
        if false:
            lst.append(i)
    return lst

print(timeit(do_nothing_in_for_loop_v2, number=100))
# 3.3703378870268352


def do_nothing_in_comprehension_v2():
    lst = [i for i in range(1000000) if false]
    return lst

print(timeit(do_nothing_in_comprehension_v2, number=100))
# 3.2949654700350948

Now the difference is negligible, if at all present. Sometimes the comprehension is even slower, looks like it depends mostly on chance.

Isolating and analyzing instructions

To understand what is going on we need to dive deeper into dis of both functions. To make it more readable, let’s look only at the iterative part:

dis(do_nothing_in_for_loop_v2)

        >>   12 FOR_ITER                18 (to 32)
             14 STORE_FAST               1 (i)

  6          16 LOAD_GLOBAL              1 (false)
             18 POP_JUMP_IF_FALSE       12

  7          20 LOAD_FAST                0 (lst)
             22 LOAD_METHOD              2 (append)
             24 LOAD_FAST                1 (i)
             26 CALL_METHOD              1
             28 POP_TOP
             30 JUMP_ABSOLUTE           12

  8     >>   32 LOAD_FAST                0 (lst)
             34 RETURN_VALUE

dis(do_nothing_in_comprehension_v2)

        >>    4 FOR_ITER                12 (to 18)
              6 STORE_FAST               1 (i)
              8 LOAD_GLOBAL              0 (false)
             10 POP_JUMP_IF_FALSE        4
             12 LOAD_FAST                1 (i)
             14 LIST_APPEND              2
             16 JUMP_ABSOLUTE            4
        >>   18 RETURN_VALUE

Thanks to the if statement we can isolate commands which are responsible for the performance difference: they are between POP_JUMP_IF_FALSE and JUMP_ABSOLUTE instructions.

for loop approach performs the following commands:

  1. LOAD_FAST (lst) – loads the target list
  2. LOAD_METHOD (append) – loads list.append method
  3. LOAD_FAST (i) – loads the counter variable i
  4. CALL_METHOD – appends i to the list
  5. POP_TOP – removes from the stack the last object added there – i

Comprehension does these:

  1. LOAD_FAST (i) – loads the counter variable i
  2. LIST_APPEND – special instruction available only in a list comprehension which does similar stuff as LOAD_METHOD, CALL_METHOD, and POP_TOP combined. It does not need to load the list itself, since according to the official documentation it is always in the memory:

For all of the SET_ADD, LIST_APPEND and MAP_ADD instructions, while the added value or key/value pair is popped off, the container object remains on the stack so that it is available for further iterations of the loop.

LIST_APPEND is the crucial difference between for loop and comprehensions implementations: the latter does not load the collection (be it list, dict, or set) for each iteration, it is always on the stack, plus it is always ready to add it to the collection.

Why the for loop does not behave the same way? The interpreter knows that the sole purpose of comprehensions is creating collections, so keeping the target collection in memory and having a special implementation make sense. The for loop, at the same time, is a multi-purpose tool, that can carry any number and kind of instructions inside, and it is not easy or efficient to predict what to keep memory.

Performance of set and dictionary comprehensions

In the interest of completeness, let’s check that set and dictionary comprehensions behave the same way as list comprehensions. First, (a) for loop vs. (b) for loop with set.add assigned to a variable vs. (c) set comprehension:

def create_set_with_loop():
    st = set()
    for i in range(1000000):
        st.add(i)
    return st

print(timeit(create_set_with_loop, number=100))
# 10.810808806971181


def create_set_with_loop_without_call():
    st = set()
    add = st.add
    for i in range(1000000):
        add(i)
    return st

print(timeit(create_set_with_loop_without_call, number=100))
# 8.41455348400632


def create_set_with_comprehensions():
    st = {i for i in range(1000000)}
    return st

print(timeit(create_set_with_comprehensions, number=100))
# 7.449792545987293

Performance distribution is similar to that of list comprehension.

Now (a) for loop vs. (b) dictionary comprehension (without method-to-variable shenanigans since dict does not use a separate method):

def create_dict_with_loop():
    dct = {}
    for i in range(1000000):
        dct[i] = i
    return dct

print(timeit(create_dict_with_loop, number=100))
# 11.555924688000232


def create_dict_with_comprehension():
    dct = {i:i for i in range(1000000)}
    return dct

print(timeit(create_dict_with_comprehension, number=100))
# 10.100809580006171

Same result here, although the difference is less pronounced.

Does this performance difference matter?

We have established that comprehensions are faster than for loops and why. But does it matter? I would argue it does not:

  1. Comprehensions were introduced to increase the readability of code. Speed is only a byproduct of optimizing a narrow use case.

  2. Performance difference makes sense only if generating lists is the major part of what a program does. To achieve about three seconds performance difference we had to create a list of 1 million items 100 times. This is not much and there may be better optimizations (generators, for instance).

  3. If you are doing something else in the loop, for example, generating or requesting an item to be added, the performance difference will be much smaller relative to the duration of the program as a whole.

Conclusions

  1. All three types of comprehensions are indeed faster than the same logic in a for loop.

  2. The performance difference comes from two places:

    • comprehensions do not call method list.append or set.add which is a relatively costly operation

    • due to special implementation, they do not load/offload the collection they are building on each iteration and always ready to add an item to it

  3. Better performance is not important in most real-life use cases and should not be the sole reason for using comprehensions.