Monday 19 September 2022

I am the walrus operator

During my time at NextMove Software I learnt quite a bit about writing efficient C++. I applied some of these learnings to Open Babel during that period, e.g. a 55X improvement in SMILES reading/writing speeds. This was gratifying at the time, but of course the flip side of a 55X improvement is that it was 55X slower before that. Really, code should be written to be 'lean' in the first place.

These days, when I write code, it's in Python. Even though it's slower than C++ (or maybe, because it's slower) I still try to write efficiently. For example, consider the following:

if key in mydict:
    myfunction(mydict[key])
else:
    notfound += 1

Both the first and second lines repeat the same work, looking up the same value in a dictionary. These days, I just can't bring myself to write code that does something twice. Instead, I want to look it up once and use the result twice:

val = mydict.get(key, None)
if val:
    myfunction(val)
else:
    notfound += 1

A downside to this is that it loses a bit of readability. In addition, if 'val' is not used elsewhere then it seems untidy to define it with an apparent broader scope than it needs. (Note that some care is needed with the 'if' statement to avoid problems with values of 'val' that evaluate False. "if val is not None" can help here.)

Enter the walrus operator, :=, which arrived with Python 3.8 (the name comes from the similarity to a pair of eyes with two tusks). This allows you to assign a value to a variable in the middle of an expression; for example in the expression used in an 'if' statement. It was/is quite controversial, and the arguments around it led Guido to step down from the BDFL position he had held since the start of Python.

Here is how it could be used in this case:

if val:=mydict.get(key, None):
    myfunction(val)
else:
    notfound += 1

Is this more readable, or is it confusing? I'm familiar with this idiom from C++, but its overuse and/or abuse can really confuse things. See this stack overflow question for example; imagine seeing that in someone's code and trying to figure out what it's doing. The docs I link to above end with the following request: "Try to limit use of the walrus operator to clean cases that reduce complexity and improve readability." - we'll see how that works out.

My feeling is that the walrus operator is best avoided, or used very sparingly; what do you think?

No comments: