Speeding up singleton-descriptors

Posted by Sjoerd Job on Jan. 22, 2016, 3:05 p.m.

Here I will cover an optimisation technique I occasionally use. It's a result of weighing several different concerns, leading to a final implementation. As such, it should be treated with care and not be applied willy-nilly.

This concept started when I was working on a codebase which was always setting self._log in the __init__ method. But, the way it was being set was not as consistent as I would hope.

class Foo(object):
    def __init__(self):
        self._log = logging.getLogger(
            "Project.{}".format(self.__class__.__name__)
        )
        ...
    ...

was one of the ways it was being initialized. Another was

class Foo(object):
    def __init__(self):
        self._log = logging.getLogger(
            "Project.{}".format(__name__)
        )
        ...
    ...

Leading to a different format for the log entries with one being named after the module containing the definition, and the other being named after the class that was instantiated.

Another way there were differences was in deciding between self._log and self.log. Let's leave that out-of-scope for now.

First attempt (Init)

My first instinct was that the log construction logic needed to be consolidated into a separate mixin class.

class LoggerMixin(object):
    def __init__(self, *args, **kwargs):
        self._log = logging.getLogger(
            "Project.{}".format(self.__class__.__name__)
        )
        super(LoggerMixin, self).__init__(*args, **kwargs)

Does that work, I ask you? It should, right? I'll give you a couple of moments to ponder about this.

Time's up! No, this won't work. Why? Actually several reasons:

Only the last item on the list is a definite problem. The rest could eventually go away when migrating to Python 3, as well as writing proper code.

Using properties correctly

So, assigning the attribute in LoggerMixin.__init__ is not going to work. Luckily, we have a nice tool for that: properties.

class LoggerMixin(object):
    __log = None

    @property
    def _log(self):
        if self.__log is None:
            self.__log = logging.getLogger(
                "Project.{}".format(self.__class__.__name__)
            )
        return self.__log

This initialisation looks quite like how a singleton is implemented, and thus I'm willing to call it a singleton-ish property.

Does this work? Yes: it does. It's a write-only property though, (which is good). But does it perform? Well, not as good as what it was before, which is unfortunate. Every single lookup will have to look through the __mro__ to find the property, call the method _log, and return it.

Setting on the instance dict

One would very much like to prevent the method call on attribute access. The default property decorator does not allow for this, but there is a nasty trick we can use. For this, remember that property is just a descriptor constructor.

Descriptors are a nice hook into fooling around with attribute access, and an introduction into descriptors is beyond the scope of this article.

class LoggingDescriptor(object):
    def __get__(self, obj, type=None):
        obj._log = logging.getLogger(
            "Property.{}".format(obj.__class__.__name__)
        )
        return obj._log


class AddingToDictLoggerMixin(object):

    _log = LoggingDescriptor()

Benchmarking (results later) gives that this is almost as good as the initial case, and I'm quite happy with that.

However, I should note that this approach is using an aspect of Python that is not really known and understood that well by everybody, and most likely not by programmers that do not have that much Python experience. As such, it should be used with great care.

Generalising

Now, this begs the question: is this technique useful for anything else? I suppose not, but let's suppose so. How would we make this into a generic solution?

class init_on_demand(object):
    def __init__(self, fun):
        self._fun = fun
    def __get__(self, obj, type=None):
        setattr(obj, self._fun.__name__, self._fun(obj))
        return getattr(obj, self._fun.__name__)


class GenericInitOnDemandLoggerMixin(object):
    @init_on_demand
    def _log(self):
        return logging.getLogger(
            "Project.{}".format(self.__class__.__name__)
        )

To me, this is a nice and general solution for a not-so-common problem. I'd like to stress that a solution like this should not be a go-to solution for every problem you encounter, but at the right moment it might just be the cleanest solution to your problem.

One reason I'd prefer this over the previous solution is because it is wrapping the 'complex' descriptors in something with a more descriptive name.

The "obvious" alternative

Stressing the "obvious", because it was not to me.

def log_builder(obj):
    return logging.getLogger(
        "Project.{}".format(obj.__class__.__name__)
    )


class Foo(object):
    def __init__(self, *args, **kwargs):
        self._log = log_builder(self)
        ...
    ...

This is more DRY than the original, performs just as well, and uses no advanced Python features.

Timing results

Of course, an important aspect of my fiddling around is timing measurements. It was hard to get precise results, because of the jitter.

Method Duration (ns)
Init 561.064
Property 3431.545
AddingToDict 572.315
GenericInitOnDemand 569.860
LogBuilder 559.831

Conclusion

In a search of writing DRY code, mixins can provide wonderful benefits. Python itself brings a lot of benefits to the table. By being such a dynamic language, it allows a lot more hacking around than other languages.

Descriptors are a useful tool in writing Python code, but the use thereof should be shunt unless really useful, because they can cause code that is hard to reason about. However, sometimes it is the only tool you can use, and we should be grateful Python has it to offer.

Finally, when improving the DRYness of code, we should also consider solutions that use simple and understandable tools from our language and weigh the added cost (mental, performance, lines of code) of our solution against the added benefits.

Even though in this case, the simple solution dawned to me without outside help, I think in general it will not hurt to discuss the problem and solution with a colleague before finally implementing it.

In the end, my preference is the AddingToDictLoggerMixin, or maybe perhaps the GenericInitOnDemandLoggerMixin. Not only because it ventures a bit more into the advanced Python usage, but also because it dictates the attribute name to be used.