How do I loop in Python?
The Python for loop has the following general form:
for myvar in myiterator: # do something with myvar
How do I loop over a list?
If you coming from other languages such as C, you might be used to:
for i in range(len(mylist)): myvar = mylist[i] # do something with myvar
In Python, it's much better (that is, the intention is clearer) if you do the following instead:
for myvar in mylist: # do something with myvar
If you suddenly realise you want the value of the index i, you can easily adapt the previous loop:
for i, myvar in enumerate(mylist): # do something with myvar # do something with iHow do I loop over two lists simultaneously?
Let's say you have a list of people's firstnames and another list containing their surnames, and you want to write them on the screen. Don't do the following:
for i in range(len(firstnames)): print "%s %s" % (firstnames[i], secondnames[i])
Instead, you should make the intention of the code clear by using zip as follows:
for firstname, secondname in zip(firstnames, secondnames): print "%s %s" % (firstname, secondname)
Loop pattern #1 - Building a new list from an old
newlist = [] for mynum in mylist: newnum = mynum*2 newlist.append(newnum)
In the case of this simple example, it should be done as a list comprehension instead:
newlist = [mynum*2 for mynum in mylist]
Here is a slightly more complicated example, involving an "if" statement that acts as a filter for even numbers:
newlist = [] for mynum in mylist: if mynum % 2 == 0: newnum = mynum*2 newlist.append(newnum)
Again, this can be done instead as a list comprehension:
newlist = [mynum*2 for mynum in mylist if mynum % 2 == 0]Loop pattern #2 - Summing things up
Another common pattern is totalling things up using a list:
total = 0 for mynum in mylist: total += mynum
Although in this case, it would be easier to just use:
total = sum(mylist)
One thing to be careful of though is that using "+" to add strings is slow. If speed is important, avoid the following:
longstring = "" for myvar in mylist: # Create smallstring here somehow longstring += smallstringInstead use pattern #1 to create a list of strings, and join them at the end:
stringlist = [] for myvar in mylist: # Create smallstring here somehow stringlist.append(smallstring) longstring = "".join(stringlist)Loop pattern #3 - Filling in a dictionary
A common pattern is to update information in a dictionary using a loop. Let's take an example of counting up how many people have a particular firstname given a list of firstnames.
The problem here is that it is necessary to check whether a particular key is in the dictionary before updating it. This can leads to awkward code like the following:
name_freq = {} for firstname in firstnames: if firstname in name_freq: name_freq[firstname] += 1 else: name_freq[fistname] = 1
Now while this can be improved somewhat by using name_freq.get(firstname, 0), that's getting a bit complicated (and doesn't extend to dictionaries of lists). Instead you should use a defaultdict, a special dictionary that has a default value, as follows:
from collections import defaultdict name_freq = defaultdict(int) for firstname in firstnames: name_freq[firstname] += 1
And what about where you wanted to store the corresponding surnames in a dictionary by firstnames? Use a defaultdict(list), of course:
from collections import defaultdict same_firstname = defaultdict(list) for firstname, surname in zip(firstnames, surnames): same_firstname[firstname].append(surname)
Loop pattern #4 - Don't use a loop
Think dictionary, set, and sort. A lot of tricky algorithms can be implemented in a few lines with one or two of these guys.
A trivial example is finding unique items in a list: set(mylist). Want to check whether that genome sequence only contains 4 letters?: assert len(set(mygenome))==4
The following example uses sort. Given a set of docking scores for 10 poses, find which poses have the top three scores. Here's a solution using the so-called decorate-sort-undecorate paradigm:
# Decorate ("stick your data in a tuple with other stuff") tmp = [(x,i) for i,x in enumerate(pose_scores)] # Sort (uses the items in the first position of the tuple) tmp.sort(reverse=True) # Undecorate ("get your data back out of the tuple") top_poses = [z[1] for z in tmp] print top_poses[:3]
Image credit: Loop by Panca Satrio Nugroho (CC BY-ND 2.0)