Is Django's URL routing tightly coupled?

Is Django's URL routing tightly coupled? permalink

Posted by Sjoerd Job on Sept. 20, 2016, 10:15 p.m.

In a recent message to the Django mailing list, a remark was placed that the URL routing of Django was tightly coupled to the Django core. From what I understood of the requirements placed on the routing, I found it a weird statement. I decided to investigate.

What does it mean to be tightly coupled?

When somebody says that two systems are tightly coupled, it is generally implied that at least one of the systems knows too much of the other. In case of the URL routing of Django, it would either imply that the routing itself knows too much about Django, or Django knows/assumes too much about the routing.

Looking at the responsibilities of Django URL routing.

Before we can answer if one part knows too much, it is best to first consider what the responsibilities of the URL routing are. Because the more responsibilities you have, the more knowledge you should be able to assume [1].

So what is the responsibility of URL routing in general?

The responsibility of URL routing is to resolve a given URL to a function and arguments.

That is, the minimal interface of a URL routing class would be

class URLRouter:
    def resolve(self, path: str) -> Optional[(callable, tuple, dict)]:
        # magic happens here.
        return (func, args, kwargs)

Now, in Django, there is another (but not lesser!) responsibility: given a function (or its name/description/...) and arguments, form a URL. This is called reversing the route. To allow this, we extend the interface as follows:

class URLRouter:
    def resolve(self, path: str) -> Optional[(callable, tuple, dict)]:
        # magic happens here.
        return (func, args, kwargs)

    def reverse(self, func: callable, args: tuple, kwargs: dict) -> str:
        # magic happens here.
        return path

Here, the basic premise is that the following two properties hold [2].

router = URLRouter()
router.resolve(router.reverse(func, args, kwargs)) == (func, args, kwargs)
router.reverse(router.resolve(path)) == path

Should a URL router have to implement anything else? Well, yes and no. In general, it might be inconvenient to have to pass the function itself to reverse, but one might prefer giving a name to all separate routes instead. Also, with Django somewhat promoting the idea of reusable apps, where each app supplies its own URL configuration, it becomes interesting to allow for namespaces, to prevent clashes in function naming. That does complicate matters a bit, but that does not take too much away from the core concept.

To be fair, the Django URL routing classes do expose these methods, so that is a good sign. The resolve method returns a ResolverMatch instance instead of a tuple (again, a good sign!). But there is a lot of code in there.

So how does Django use the routers?

To answer the question whether or not Django is too tightly coupled to the specifics of the routers, we need to check whether it only uses the interface, or if it uses more information.

contrib.admin

In the Django admin, a site instance holds a urlpattern attributes which is constructed using a lot of calls to url(...) with regular expressions. This does not really abuse any internals, as the url(...) function is exactly what is also supposed to be used by the developers producing the projects.

The same goes for the ModelAdmin class. Again, a fine usage.

core.checks.urls

Recently, Django added a check management command which goes over the project in search of suboptimal configuration. Here is a small excerpt.

def check_resolver(resolver):
    """
    Recursively check the resolver.
    """
    from django.urls import RegexURLPattern, RegexURLResolver
    warnings = []
    for pattern in resolver.url_patterns:
        if isinstance(pattern, RegexURLResolver):
            warnings.extend(check_include_trailing_dollar(pattern))
            # Check resolver recursively
            warnings.extend(check_resolver(pattern))
        elif isinstance(pattern, RegexURLPattern):
            warnings.extend(check_pattern_name(pattern))
        else:
            # This is not a url() instance
            warnings.extend(get_warning_for_invalid_pattern(pattern))

        if not warnings:
            warnings.extend(check_pattern_startswith_slash(pattern))

    return warnings

# ---->8 snip 8<----

def check_pattern_name(pattern):
    """
    Check that the pattern name does not contain a colon.
    """
    if pattern.name is not None and ":" in pattern.name:
        warning = Warning(
            "Your URL pattern {} has a name including a ':'. Remove the colon, to "
            "avoid ambiguous namespace references.".format(describe_pattern(pattern)),
            id="urls.W003",
        )
        return [warning]
    else:
        return []

To me, this piece of code makes sense, but I would just like to say one thing: 'URL router, check thyself!'. It denies the existence of any other kind of URL routers besides the ones sanctioned by Django. It uses type-checking instead of contract checking. And, the responsibility is in the wrong place.

Just look at check_pattern_name: it gets a pattern (which is actually a RegexURLPattern, grabs the regex attribute, and then keeps on digging into the internals by getting the pattern attribute of the regular expression.

Contrast this with the logic for checking the models, which is a big loop with calls to model.check.

Is this a violation of coupling? Yes! But it's easily repairable, I think.

middleware.common

In the common middleware, there is a special case that gets triggered when a certain path can not be resolved. It checks if the current path does not end with a '/', and if so, it checks if the path with a '/' appended can be resolved. For this it uses the is_valid_path function from the core urls module, in a clean way.

No violation of coupling here.

middleware.locale

This uses the is_language_prefix_patterns_used method from django.conf.urls.i18n. This in turn uses the urlpatterns attribute of the main URL router.

This case is somewhat debatable.

template.defaultfilters

Here the reverse functionality is exposed using the {% url ... %} syntax. This refers back to the global reverse function.

No violation of coupling here.

urls.reverse

It seems a bit odd, as this is in a module specifically to do with URLs, but this is another part where Django is too tightly coupled to the URL routing choice.

Here, the library directly inspects parts of the root URL resolver, and nested ones as well. Could this be made better? I believe so, by actually using the reverse method. It also uses the protected method _reverse_with_prefix here.

If you look closely, you will even notice that the reverse method on the RegexURLResolver is not even called.

Is this fixable? I think so. The question is whether it is fixable without too much of a performance cost. The current implementation seems optimized [3].

Yes, coupling is violated. But the affected area is quite small, so it is probably fixable.

views.generic.base

In the RedirectView, the pattern shows up again, but quite innocently, in get_redirect_url. It tries to reverse a pattern using given args and kwargs.

No violation of coupling here.

Does Django depend on the routing implementation?

Conclusion: not really. Besides the checks framework (which I would advice fixing!), nothing is wrong here.

So how do the routers use Django?

To be fair: we should not really care. But still a small summary.

It uses the django.utils extensively.
It subclasses django.http.Http404 under the name Resolver404.
It uses the django.conf variable settings, for the LANGUAGE_CODE and the ROOT_URLCONF attributes.
It uses the ImproperlyConfigured exception.
It uses the ViewDoesNotExist exception.
It checks the __name__ attribute to distinguish between class-based and function-based views (but only for some simple string formatting).

All in all, it does not really depend on the internal of the views given, which is a good thing.

In my opinion: no violation of coupling here.

Have we researched enough for the conclusion?

No [4].

But why would the answer be no? We have seen that Django does not really depend that much on the specifics of the routers. We have also seen that the routers only depend on really generic parts of Django which I would not dare to call coupling. Have we missed something?

Yes, we missed something! We missed the interaction between different routers. Let that sink in for a while. Routers are coupled to routers. Is that really a problem? Yes [5]! Why? Because it prevents you from using URL routing libraries which are not based on regular expressions. It prevents you from using URL routing libraries which do not expose the same private interface.

What change do I suggest?

This is the hard question I'm asking. Mostly, I'm asking it myself. Should Django change? Is this merely bike-shedding, or not? Am I just trying to impose purity at the cost of practicality? Would this result in code churn with little to no benefit?

These are my recommendations:

Move configuration checking responsibility to URL class

Move the specific checks to a .check() method on the RegexURLResolver and RegexURLPattern. I think this is a reasonable move which should not take that much effort.

Make URL routing class solely responsible for reversing

Because only the URL routing class should know about its implementation details. For starters, I'd suggest moving the global reverse to a more localized one: RegexURLResolver.reverse. It already has an implementation, but the complex logic that is in the global reverse should be there instead.

Later on, when that is achieved, experiments can be made with different algorithms for reversing which split responsibilities. For the current RegexURLResolver, I'd suggest something like this:

# Pseudocode
def reverse(self, lookup_view, *args, **kwargs):
    for sub_resolver in self.url_patterns:
        result, args_, kwargs_ = sub_resolver.reverse(lookup_view, *args, **kwargs)
        # args_ and kwargs_ are the pieces of args/kwargs that could not be
        # used by the ``sub_resolver``.
        if result is not None:
            # Interpolate unused args/kwargs back into pattern.
            prefix = regex_unmatch(self.regex.pattern, args_, kwargs_)
            return prefix + result
    return None

Of course some form of caching could (and maybe should) be added, and it will most likely take a performance hit until somebody starts to optimize it again, but that does not mean this approach should not be taken.

[1]	The calculator on my phone should not have to know the contacts or be able to read my messages.

[2]	Of course ignoring the fact that some simple alterations might happen to the arguments, for instance what used to be '00012' might become '12'.

[3]	That is to say, it looks like a lot of code. But I might also be mistaking obfuscated code for optimized code.

[4]	See Betteridge's law of headlines

[5]	I'm not breaking Betteridge's law here, because this is not a headline.

sjoerdjob.com