Today I wrote a little Django to Jinja2 template converter. While it can translate most of the builtin template tags into Jinja constructs it doesn't fully automate the process because you have to extend it for your own custom tags and it doesn't adapt your templates to the changed semantics. And these differences in semantics (and the underlying architecture) are something I want to discuss a bit here. Whenever someone mentions Jinja in the Django IRC channel you can be pretty sure that someone else will write something like "... if you don't have your logic under control" into the channel and position Jinja in the corner where failed concepts lurk around. Of course Jinja leaves more room for abuse than Django doesâ¦ But this time this isn't actually what I want to talk about here :) First of all a small disclaimer: This article covers Jinja 2.0 and Django 1.0.

Lexing

If you compare Jinja and Django template system internals you have a lexer in both of them. The lexer basically breaks the template into small pieces for easier processing. But that's where the similarities end because the lexers operate on very different levels. Take the following template as a simple example:

Hello {{ name|upper }}!

This is one of those templates that look and work exactly the same in Jinja and Django. First have a look what tokens the Jinja2 tokenizer yields:

>>> from jinja2 import Environment
>>> for token in Environment().lex("Hello {{ name|upper }}!"):
...  print token
... 
(1, 'data', u'Hello ')
(1, 'variable_begin', u'{{')
(1, 'whitespace', u' ')
(1, 'name', u'name')
(1, 'operator', u'|')
(1, 'name', u'upper')
(1, 'whitespace', u' ')
(1, 'variable_end', u'}}')
(1, 'data', u'!')

And here what Django outputs:

>>> from django.template import Lexer, StringOrigin
>>> origin = StringOrigin("Hello {{ name|upper }}!")
>>> for token in Lexer(origin.source, origin).tokenize():
...  print token
...

So as you can see, whereas Jinja creates very tiny bits of the input string, Django only distinguishes between four different kinds of tokens: text, variables, blocks and line comments. While this is a lot easier to implement for the developer of the template engine, it doesn't have any advantages over the concept Jinja has chosen. It actually has a lot of negative side effects. For example it's impossible to write {{ '{% a block in a variable %}' }} in Django. (I know you can use templatetag openblock and templatetag closeblock, but beautiful is something else). It also has the huge disadvantage that tag has to split up the contents of the tag itself which often causes different semantics and syntactic specialities in tags and that for the developer of such a tag it's hugely more work to do that. The former is probably the worse part of it. For example the url tag in Django takes arguments separated by commas (that are not even allowed to be followed by whitespace) but cycle expects arguments to be separated by whitespace. The root of the problem is definitively the weak lexer of the Django template engine and I really think that should be replaced by something that yields proper tokens. That would simplify things for tag developers a lot and also lead to a more intuitive experience for template designers that can expect the same basic syntax rules everywhere.

Parsing

The next step is coverting those tokens into meaningful elements. That's what people refer to as "parsing" usually. Jinja2 has very basic grammatical rules that can be parsed with a simple LL(1) parser (I think it's LL(1), but don't ask me, I'm not a compiler guy). The parser goes through the stream of incoming tokens from the lexer and converts those into logical nodes that belong together. For example if you have the template {{ 1 + 2 + 3 }} and the "cursor" of the parser is right before the first digit in the simple calculation, the parser parses this into Add(Add(Const(1), Const(2)), Const(3)). This is useful because the developer of a custom tag doesn't have to deal with that, the Parser already knows how an Expression looks like. Now you could argue that calculations don't belong into templates and my point is not valid, but even in the Django template language you have expressions. The only expression Django knows about are filter expressions. In Jinja2 the parser converts {{ var|escape|upper }} into a proper filter node for you. Django provides a TokenParser for that which can do something very similar. However that parser is not used in every tag and has it's limitations too. Furthermore was that parser introduced long after the initial implementation of the template language which means that many core tags don't use it. Because in Jinja it's a matter of calling parser.parse_expression() to get an expression called, the same requires a lot more typing and checking in Django. A lot of the tags that lurk around in various pastebins or websites don't even support filters but only variables in some places. Even worse, some people are evaluating the part between the block braces using eval() against the context object. Again, this simple design of the parser helps nobody but the developers of the template engine. I've seen enough Django projects by now that have to write their own template tags because the core tags just don't do what they need, and in any case the process of developing the tag was more painful than it had to be. With a newly implemented lexer that yields all tokens of a block or variable one after another a new parser could be implemented based on the design of the Jinja one. And by doing that one has the chance to specify some operators. Nobody is harmed if the templating language supports {% user.karma >= 20 and user.karma < 40 %} and that hardly counts as logic in templates.

Compilation

This step is the step that Django is missing. After the parser assembled tree of blocks and variables and text and everything (called an abstract syntax tree), Jinja compiles the tree down into Python bytecode. It does that by first creating python code and passing that to the builtin "compile()" function to generate bytecode of it. It does not directly generate bytecode even though this would be way easier to work properly on the appengine and on different Python implementations such as Jython. The compilation of the syntax tree into bytecode is not that interesting in general. Jinja does it because it's possible and provides optimizations that are otherwise not possible. More about that in the semantics section a bit later.

Evaluation

What's more important is what Django does on template evaluation. Django is basically rendering the syntax tree on template evaluation. That's pretty nice and a often used pattern for simple languages from what I've seen so far. The problem with Django however is, that it's incredible slow and currently everything but thread safe. Many tags in the core system modify state variables on the (shared) nodes during rendering. You can easily see that for yourself by using {% cycle "odd" "even" %} inside a loop that iterates over 5 items. Start up your Django server, go to that page and hit refresh over and over again. You will notice that one time the output starts with "even", one time with "odd". The reason for that is that the node tree is shared. If you start up the application on a multithreaded server and hit it with tons of ab/siege requests you will even notice that you often get lists that look like "even even even odd even odd" or something similar. And that's not only for super, that also affects block tags. If you extend from a variable template block.super will probably point to a totally different template when the server is under high load. This is unacceptable behaviour and should be fixed. I'm currently wiring up a patch for that as the ticket was changed from "thread in-safety" to "reset cycle tag after iteration" which shows that at least the editor of that ticket doesn't get the problem and is lurking around in the Django trac for too long. The evaluation of a Jinja template doesn't work over the ast but by evaluating the previously generated Bytecode. And yes, it's thread safe but that's not the point.

About Performance

The Django template engine has multiple problems as said above, and one is certainly the performance. Many people argue that the Django template engine is fast enough. Actually, could be. But think about this for a moment: For many CRUD applications you pull stuff from the database without any joins and iterate over the result set. Now guess where (at least in Django) most of the action takes place: In the template. Even the database queries are often sent by the template engine because the querysets are lazy and the initial query is sent in the template. What makes this problematic is that Django's template engine is an AST evaluator. For every node you have in the template (and that are a lot!) you have one render method that is called. Now imagine you have extended two templates, are four blocks and two ifs deep inside your template. That are already about 10 calls deep. Now try to find read a profiler output. To show you that I've uploaded two profiler outputs (one for Jinja and one for Django) rendering the very same template with the difference that the Jinja version of the template is using a macro and the Django version custom template tags:

Before you try to understand them, a few notes: test_jinja / test_django are the functions that invoke the test rendering process. The reason why the Jinja graph is not joined is that the invocation of the bytecode Jinja generates doesn't count as regular call and the profiler is unable to connect those. So you have to think yourself the line between render -> and root. In both cases the template engine rendered the templates already a few houndred times before the profiler profiles one single call, so the templates are already parsed (and compiled in Jinja's case). If you are wondering why there seems to be the template parser active in the django graph, I'm wondering that too. You can have a look at the benchmark to see how it works. If you think the template parser invocation in that profiler output comes from the djangoext.py, you are wrong. That's what I suspected too. Turns out, even if I don't use the loader there but preload the template, it's still happening. So I take that as normal behaviour cause by template inheritance or something like that. That profiler output shows only the rendering of a pretty normal template situation. Now imagine you have a query somewhere there because of django's lazy querysets. Now try to figure out what the heck is going on. I was running the profiler against the changeset rendering page in bitbucket and had a call tree so complex that it was impossible for me to figure out what was going on because of 400ms for that page, 300ms were spend in the template. Just that the template invoked mercurials diffing system. That's insane. That AST evaluator is seriously killing every possibility to get useful profiler information out of the system.

Generating Python-Code Doesn't Make it Faster

Someone on #django asked why I don't contribute "the thing that makes Jinja fast" to Django. That's quite easy to answer: because it's not that simple. Jinja sets some limitations on the engine to achieve a high performance. For example in Jinja the template context (the data structure you pass to the template) is a data source, not a data container. In Django if you have a custom template tag it is passed a context object you can modify and it will hold the variables of the template. In Jinja the template context object exists, but after the initial creation it is not modified by the engine any more. It's only used to load yet unknown variables into the namespace Jinja is actually using for template evaluation. What this means is that it's impossible for a tag to modify the context unless the custom tag knows at compile time the name of the variable it wants to assign to. This knowledge gives Jinja a huge advantage over Django. Take this little template code:


{% for user in users %}
  {{ user.username }}
{% endfor %}

Hello {{ user.username }}

This template code executes in both Jinja2 and Django. However the assumptions the template engine takes are vastly different. Jinja2 is able to translate the template to this Python code internally (without the comments obviously):

# these two variables (users and user) are used in the template
# without being initialized in the template.
l_users = context.resolve('users')
l_user = context.resolve('user')
yield u'\\\\n'
# because the loop overrides user we assign it to a temporary variable
t_1 = l_user
for l_user in l_users:
    yield u'\\\\n  %s\\\\n' % (
        environment.getattr(l_user, 'username'), 
    )
# after the loop we restore the variable
l_user = t_1
yield u'\\\\n
\\\\nHello %s' % (
    environment.getattr(l_user, 'username'), 
)

If we would want to transform the Django AST into Python code without changing the behavior we would have to do something like this:

buffer.append(u'\\\\n')
context.push()
context['forloop'] = t1 = {'parentloop': context.resolve('parentloop'))
t2 = context.resolve('users')
if not hasattr(t2, '__len__'):
    t2 = list(t2)
t3 = len(t2)
for t4, item in enumerate(t2):
    # Shortcuts for current loop iteration number.
    t1['counter0'] = t4
    t1['counter'] = t4+1
    # Reverse counter iteration numbers.
    t1['revcounter'] = t3 - t4
    t1['revcounter0'] = t3 - t4 - 1
    # Boolean values designating first and last times through loop.
    t1['first'] = (t4 == 0)
    t1['last'] = (t4 == t3 - 1)
    buffer.append(u'\\\\n  %s\\\\n' % (
        environment.getattr(context.resolve('user'), 'username'), 
    ))
context.pop()
buffer.append(u'\\\\n
\\\\nHello %s' % (
    environment.getattr(context.resolve('user'), 'username'), 
))

As you can see. A 1:1 conversation to Python code of what Django templates do currently produces a lot more code. Now I can hear you arguing that the Django example does more because it puts a forloop object into the context. However it has to do that. Because the variables in Jinja are not guaranteed to show up anywhere we have a lot of room for optimizations. If a loop doesn't use the special loop variable, Jinja won't create one. It's that simple. If you don't access loop variables that require knowledge about the length, Jinja won't convert the object into a list. What's a bit unfair is that the Django example has to use buffering. But because tags must have the chance to render nodes they are stored inside them, buffering is necessary unless the custom tag system is changed too. What's even worse than the list object inside this loop is context.resolve. And that's something Django does for every variable access. Imagine you are three levels inside your template (a with, a loop and another loop) and now you try to access a variable inside your loop that was passed to the template. Django has to traverse the context four levels up to get to that data. That's very expensive. Especially compared to what Jinja does. A local variable in Python as used by Jinja does not end up in a dictionary unless locals() is called or frame.f_locals is accessed. And as long as it's not in a dictionary no hash code is calculated and no dict resizing takes place. Instead the name gets a number and a place to be. When the function is called Python has already reserved space for that variable. These fast-locals (the internal name for those) are blazingly fast compared to normal dict lookup already, and even faster compared to what django does to resolve variables and you can't get that without creating bytecode or generating Python code and compiling that.

Synopsis

Django templates are currentlyâ¦

â¦a lot slower than they have to be
â¦caused by a very weak design that doesn't really help anyone
â¦also threadsafe due to some bugs
â¦impossible to further optimize, especially not by "just compiling it to python"
â¦Django's weakest component
â¦pain in the ass if you want to profile Django

My Pony Request

Django 1.0 is out but that doesn't mean it's a good time to stop working on making Django better. It doesn't help justifying the template language implementation detail by saying it's fast to parse. All the sub-parsers involved make it rather slow and if you have threading problems under control the memory stay in the memory until shutdown of the process anyways. Improvement of the template engine is possible, not that hard and will make everybody happier and you don't have to sacrifice your logic-less templates for that. And if that's to radical, at least fix the threading problems.