Cleaning up long if/elif/else chains

Cleaning up long if/elif/else chains permalink

Posted by Sjoerd Job on June 6, 2016, 3:54 p.m.

Nearly all projects have it. A method that contains a long chain of checks. They come in various variations.

A few weeks ago I was lucky enough to be around on Code Review Stack Exchange when an excellent example was presented, ready for review.

The question for code-review

On May 20th, a user naming himself "m654" posted the interpreter loop of a code-golfing language named Tellurium for review.

The problem that was noted was a long list of global variables (they are never nice, right?). My initial plan was: can I make it class-based instead, because instance variables are a bit less bad than global variables. However, I never got that far, as I saw other things I'd like to clean up.

Before continuing, I'd like to suggest you to read the code the code review post, and think about a few of the suggestions you would make for change.

The focus of this post

In this post I'm going to look specifically at long if/elif/else chains in general, because there are multiple ways of refactoring, depending on the shape of the bodies.

Value slightly influences action

The simplest type of chain to clean up is that where the checks are all the same, of the form var == "something that changes".

For instance, around line 350 of the review, we see the following sequence of statements:

    elif cmd == "a":
        tape[selected] = int(tape[selected]) + int(tape[selected+1])

    elif cmd == "s":
        tape[selected] = int(tape[selected]) - int(tape[selected+1])

    elif cmd == "m":
        tape[selected] = int(tape[selected]) * int(tape[selected+1])

    elif cmd == "d":
        tape[selected] = int(tape[selected]) / int(tape[selected+1])

Notice the repetition? The only things changing are the asmd in the check, and the +-*/ in the action. We can model than using a dictionary.

BINOPS = {
    "a": lambda x, y: x + y,
    "s": lambda x, y: x - y,
    "m": lambda x, y: x * y,
    "d": lambda x, y: x / y,
}
...
    elif cmd in BINOPS:
        tape[selected] = BINOPS(cmd)[int(tape[selected]), int(tape[selected+1]))

Now this does not seem like a very clean win, because the lambda syntax is still quite verbose. In Haskell, we would have greatly benefited from the fact that one can write (+) instead of \x, y -> x + y. In Python, we can use operator.add, and friends, giving

BINOPS = {
    "a": operator.add,
    "s": operator.sub,
    "m": operator.mul,
    "d": operator.truediv,
}
...
    elif cmd in BINOPS:
        tape[selected] = BINOPS(cmd)[int(tape[selected]), int(tape[selected+1]))

I'm not sure if it makes the code more clean, but at least the elif-chain is a tad shorter.

Value has huge impact on action

Another case—which is not really present in Tellurium—is that different values result in completely different actions being taken. However, I'd just like to highlight it anyway. An obvious example is the steps in a wizard-view, or handling separate tokens in an abstract syntax tree differently. Let's take the example of walking a syntax tree.

    def visit(self, token):
        if token.type == 'keyword':
            # do a lot of logic with the keyword here.
            pass
        elif token.type == 'functioncall':
            pass
        elif token.type == 'operator':
            pass
        elif token.type == 'string':
            pass

Imagine the separate pass statements actually representing a lot of code. Now, the initial reflex is of course: I hate long functions, so I'm going to extract whatever duplication I can. We might incidentally notice that somewhere in the block of the keyword, and the block of the operator we see similar code, and extract that.

However, that might not actually be the best move. We can also take another approach: extract the separate actions:

    def visit_keyword(self, token):
        pass

    def visit_functioncall(self, token):
        pass

    def visit_operator(self, token):
        pass

    def visit_string(self, token):
        pass

    def visit(self, token):
        if token.type == 'keyword':
            self.visit_keyword(token)
        elif token.type == 'functioncall':
            self.visit_functioncall(token)
        elif token.type == 'operator':
            self.visit_operator(token)
        elif token.type == 'string':
            self.visit_string(token)

Is this much better? No! I added a lot of methods that are only going to be called from one place (the visit function). Also, every method I extracted adds three lines: one for the method definition (which we did not have before), one for the blank line between methods, and one for actually calling the method.

On the other hand, there's a smart trick you can do (because Python is such a nice dynamic language): You can use getattr to get the handling method, instead of repeating yourself. This gives us the following.

    def visit_keyword(self, token):
        pass

    def visit_functioncall(self, token):
        pass

    def visit_operator(self, token):
        pass

    def visit_string(self, token):
        pass

    def visit(self, token):
        handler = getattr(self, 'visit_' + token.type, lambda token: None)
        handler()

Is this better? Well, at least we got rid of some of the extra lines again. Also, subclasses could now override one of the handlers while leaving others alone. The trick of looking up a handler dynamically (from a dictionary, or based on name), and then calling it is generally called dispatching.

This method is also what Django uses for its class-based-views.

    def dispatch(self, request, *args, **kwargs):
        # Try to dispatch to the right method; if a method doesn't exist,
        # defer to the error handler. Also defer to the error handler if the
        # request method isn't on the approved list.
        if request.method.lower() in self.http_method_names:
            handler = getattr(self, request.method.lower(), self.http_method_not_allowed)
        else:
            handler = self.http_method_not_allowed
        return handler(request, *args, **kwargs)

Pro-tip: when doing dispatching using getattr, I generally prefix the methods with a common string. Like, in the previous example visit_. Sometimes I make sure the method starts with an underscore as well—but not two, as then name-mangling makes the dispatching harder.

You can see that Django does not use the prefixing trick, but pays the price by having to check that the method to be called is actually on an approved list. When dispatching on user input (as Django does), it is a good idea to have a curated list of approved input and/or document clearly the lookup algorithm being used.

Class of value has (huge) impact on action

Initially I wondered if I should include this one, as the solution is already well documented on refactoring.com. The trick here is to Replace Conditional With Polymorphism. Of course there are cases where this breaks down (e.g., when using the visitor pattern on an abstract syntax tree). If you can't use Polymorphism, maybe you could do dispatching as before?

    def handle(self, value):
        for cls in value.__mro__:
            try:
                handler = getattr(self, "_handle_{}".format(cls.__name__)
                break
            except AttributeError:
                pass
        else:
            raise AttributeError(value.__class__.__name__)  # or handler = default_handler

        handler(value)

But this should be a rarity in most codebases.

Action is influenced by state

Another example (again from Tellurium) is that the action is not influenced by the command, but by some state maintained across loops. We can see this quite clearly.

def parse(cmd):
    ... globals
    if readingFileName == True:
        if cmd == "]":
            readingFileName = False
            f = open(''.join(fileName), 'r')
            code = f.read()
            f.close()
            read(code)
            fileName = []

        else:
            fileName.append(cmd)

    ...

    elif cmd == "0":
        readingFileName = True

    ...

Notice how after reading the "0", the system goes into a state where it is reading a filename, and does not handle the input as it did before? What's actually being emulated is a kind of sub-parser. There are many ways to clean this up. One of them is by introducing a parser-stack. You begin by parsing in 'default' mode, and on seeing the "0" token, you exchange the parser for a specialised parser.

def _parse_default(cmd):
    ... globals
    ... same code (except for the first `if`)
    elif cmd == "0":
        _parser_stack.append(_parse_filename)
    ...


def _parse_filename(cmd):
    global fileName
    if cmd == "]":
        assert _parser_stack.pop() is _parse_filename
        f = open(''.join(fileName), 'r')
        code = f.read()
        f.close()
        read(code)
        fileName = []

    else:
        fileName.append(cmd)


_parser_stack = [_parse_default]
def parse(cmd):
    _parser_stack[-1](cmd)

Notice how we're again using dispatching, but now based on state we maintain ourself?

Is that assert on the call to pop really necessary? Maybe not, but I prefer coding a bit more defensively.

Also, I purposefully glanced over the reading of the file using open(); ...; close() (one should ideally use with statements). It's not the point of this post.

And how about other if/elif/else chains?

Of course, the cases above are the simple cases. But they are also the cases I do see in the wild. I have seen other examples, but not as much. If you have an example that needs some cleaning up (and the above tips don't help), post your code on the Code Review Stack Exchange, and either I or someone else will help you improve the code.

Conclusion

I have listed some of the ways in which certain conditional chains can be cleaned up. Now think back to the last chain you wrote, and see if you can apply some of these techniques.

sjoerdjob.com