Python comprehensions are believed to be faster for generating collections than for
loops. The question is why are they faster? What part of a comprehension is faster than in the loop? Below we answer this question and whether their faster performance is something worth considering.
This will be more of an exploratory exercise, where the journey and methodology are more interesting than the destination. If you need only the conclusions, you can jump there.
For Python comprehensions syntax please see Python comprehensions breakdown
Are comprehensions faster than loops?
First, we need to establish the baseline: are comprehensions indeed faster than loops, and if yes, by how much. We will use two functions doing the same thing with both methods – generating a list of integers from 0 to 999,999 – and timeit
module to gauge their performance which allows to run code multiple times and returns how long it took in total:
from timeit import timeit
def create_list_with_loop():
lst = []
for i in range(1000000):
lst.append(i)
return lst
print(timeit(create_list_with_loop, number=100))
# 8.161124781006947 (seconds)
def create_list_with_comprehensions():
# Assign list to intermediate variable instead of returning it directly
# to make the comparison more fair
lst = [i for i in range(1000000)]
return lst
print(timeit(create_list_with_comprehensions, number=100))
# 5.451697067008354
This is a 33% difference in performance, nothing to sneeze at. The exact numbers will differ depending on the hardware and its load when conducting the experiments. We will see how set and dictionary comprehensions behave later.
Experiment 1: Speed up adding items to a list
To add an item to a list we are calling the list.append
method. However, this method is a function reference that the interpreter has to look up on every iteration of the loop. To make sure that this is the case we need to use another Python library – dis
and its function under the same name:
from dis import dis
dis(create_list_with_loop)
dis
returns instructions that the Python interpreter generates from your code.
4 0 BUILD_LIST 0
2 STORE_FAST 0 (lst)
5 4 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (1000000)
8 CALL_FUNCTION 1
10 GET_ITER
>> 12 FOR_ITER 14 (to 28)
14 STORE_FAST 1 (i)
6 16 LOAD_FAST 0 (lst)
18 LOAD_METHOD 1 (append)
20 LOAD_FAST 1 (i)
22 CALL_METHOD 1
24 POP_TOP
26 JUMP_ABSOLUTE 12
7 >> 28 LOAD_FAST 0 (lst)
30 RETURN_VALUE
Here on line 12, we see LOAD_METHOD (append)
which is an instruction to load method of an object. It is between the instructions FOR_ITER
and JUMP_ABSOLUTE
so it repeats on each iteration.
We can speed it up by assigning list.append
to a variable and calling it instead of the method (idea from here):
def create_list_with_loop_without_call():
lst = []
append = lst.append
for i in range(1000000):
append(i)
return lst
print(timeit(create_list_with_loop_without_call, number=100))
# 7.231394850998186
Calling append
turned out to be 11% faster than calling a list.append
method. However, it explains only a third of the difference between the for
loop and a list comprehension. There is something else.
Note: Assigning the
list.append
method to a variable reduces readability and, therefore, is not recommended, unless it gives a meaningful performance boost.
Experiment 2: Test pure list iteration
To isolate the difference in performance, let’s test pure iteration without adding anything to the list. To do it we will add an if False
statement which should lead to the resulting lists being empty:
def do_nothing_in_for_loop():
lst = []
for i in range(1000000):
if False:
lst.append(i)
return lst
print(timeit(do_nothing_in_for_loop, number=100))
# 1.9555912280338816
def do_nothing_in_comprehension():
lst = [i for i in range(1000000) if False]
return lst
print(timeit(do_nothing_in_comprehension, number=100))
# 2.382393566018436
What? The for
loop turned out to be faster. How come?
It turns out that the Python interpreter is smart: if it sees an if False
statement, it skips it and all of its contents altogether as if they were not there. We can see it in the dis
output:
2 0 BUILD_LIST 0
2 STORE_FAST 0 (lst)
3 4 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (1000000)
8 CALL_FUNCTION 1
10 GET_ITER
>> 12 FOR_ITER 4 (to 18)
14 STORE_FAST 1 (i)
5 16 JUMP_ABSOLUTE 12
6 >> 18 LOAD_FAST 0 (lst)
20 RETURN_VALUE
The interpreter just skipped lines 4 and 5. Now a list comprehension does more stuff compared to the loop which explains its slower performance.
We can fix it by assigning False
to a variable which would prevent this optimization:
false = False
def do_nothing_in_for_loop_v2():
lst = []
for i in range(1000000):
if false:
lst.append(i)
return lst
print(timeit(do_nothing_in_for_loop_v2, number=100))
# 3.3703378870268352
def do_nothing_in_comprehension_v2():
lst = [i for i in range(1000000) if false]
return lst
print(timeit(do_nothing_in_comprehension_v2, number=100))
# 3.2949654700350948
Now the difference is negligible, if at all present. Sometimes the comprehension is even slower, looks like it depends mostly on chance.
Isolating and analyzing instructions
To understand what is going on we need to dive deeper into dis
of both functions. To make it more readable, let’s look only at the iterative part:
dis(do_nothing_in_for_loop_v2)
>> 12 FOR_ITER 18 (to 32)
14 STORE_FAST 1 (i)
6 16 LOAD_GLOBAL 1 (false)
18 POP_JUMP_IF_FALSE 12
7 20 LOAD_FAST 0 (lst)
22 LOAD_METHOD 2 (append)
24 LOAD_FAST 1 (i)
26 CALL_METHOD 1
28 POP_TOP
30 JUMP_ABSOLUTE 12
8 >> 32 LOAD_FAST 0 (lst)
34 RETURN_VALUE
dis(do_nothing_in_comprehension_v2)
>> 4 FOR_ITER 12 (to 18)
6 STORE_FAST 1 (i)
8 LOAD_GLOBAL 0 (false)
10 POP_JUMP_IF_FALSE 4
12 LOAD_FAST 1 (i)
14 LIST_APPEND 2
16 JUMP_ABSOLUTE 4
>> 18 RETURN_VALUE
Thanks to the if
statement we can isolate commands which are responsible for the performance difference: they are between POP_JUMP_IF_FALSE
and JUMP_ABSOLUTE
instructions.
for
loop approach performs the following commands:
LOAD_FAST (lst)
– loads the target listLOAD_METHOD (append)
– loads list.append methodLOAD_FAST (i)
– loads the counter variablei
CALL_METHOD
– appendsi
to the listPOP_TOP
– removes from the stack the last object added there –i
Comprehension does these:
LOAD_FAST (i)
– loads the counter variablei
LIST_APPEND
– special instruction available only in a list comprehension which does similar stuff asLOAD_METHOD
,CALL_METHOD
, andPOP_TOP
combined. It does not need to load the list itself, since according to the official documentation it is always in the memory:
For all of the SET_ADD, LIST_APPEND and MAP_ADD instructions, while the added value or key/value pair is popped off, the container object remains on the stack so that it is available for further iterations of the loop.
LIST_APPEND
is the crucial difference between for
loop and comprehensions implementations: the latter does not load the collection (be it list, dict, or set) for each iteration, it is always on the stack, plus it is always ready to add it to the collection.
Why the for
loop does not behave the same way? The interpreter knows that the sole purpose of comprehensions is creating collections, so keeping the target collection in memory and having a special implementation make sense. The for
loop, at the same time, is a multi-purpose tool, that can carry any number and kind of instructions inside, and it is not easy or efficient to predict what to keep memory.
Performance of set and dictionary comprehensions
In the interest of completeness, let’s check that set and dictionary comprehensions behave the same way as list comprehensions. First, (a) for
loop vs. (b) for
loop with set.add
assigned to a variable vs. (c) set comprehension:
def create_set_with_loop():
st = set()
for i in range(1000000):
st.add(i)
return st
print(timeit(create_set_with_loop, number=100))
# 10.810808806971181
def create_set_with_loop_without_call():
st = set()
add = st.add
for i in range(1000000):
add(i)
return st
print(timeit(create_set_with_loop_without_call, number=100))
# 8.41455348400632
def create_set_with_comprehensions():
st = {i for i in range(1000000)}
return st
print(timeit(create_set_with_comprehensions, number=100))
# 7.449792545987293
Performance distribution is similar to that of list comprehension.
Now (a) for
loop vs. (b) dictionary comprehension (without method-to-variable shenanigans since dict does not use a separate method):
def create_dict_with_loop():
dct = {}
for i in range(1000000):
dct[i] = i
return dct
print(timeit(create_dict_with_loop, number=100))
# 11.555924688000232
def create_dict_with_comprehension():
dct = {i:i for i in range(1000000)}
return dct
print(timeit(create_dict_with_comprehension, number=100))
# 10.100809580006171
Same result here, although the difference is less pronounced.
Does this performance difference matter?
We have established that comprehensions are faster than for
loops and why. But does it matter? I would argue it does not:
-
Comprehensions were introduced to increase the readability of code. Speed is only a byproduct of optimizing a narrow use case.
-
Performance difference makes sense only if generating lists is the major part of what a program does. To achieve about three seconds performance difference we had to create a list of 1 million items 100 times. This is not much and there may be better optimizations (generators, for instance).
-
If you are doing something else in the loop, for example, generating or requesting an item to be added, the performance difference will be much smaller relative to the duration of the program as a whole.
Conclusions
-
All three types of comprehensions are indeed faster than the same logic in a
for
loop. -
The performance difference comes from two places:
-
comprehensions do not call method
list.append
orset.add
which is a relatively costly operation -
due to special implementation, they do not load/offload the collection they are building on each iteration and always ready to add an item to it
-
-
Better performance is not important in most real-life use cases and should not be the sole reason for using comprehensions.