Antics with semantics: July 2014

So last week has been entriely about optimization. The website I'm working on had serious performance issues and it needed to be fast, to qualify as working.

The problem

The problem is simple - we have a back end database that supplies data about cloud computing jobs, and we need to display these in a table, allowing the user to view and control the jobs.
The data is refreshed every 30 seconds and each job element has about nine pieces of information. Four of them are shown in a table row, and the rest appear in an expandable row when the main row is clicked.

The initial implementation had the server generate the whole HTML, and we simply fetched that via AJAX and assiged it using innerHTML
This was unacceptable - despite gzip encoding, the amount of data per request is huge - For my test case, with ~900 jobs, the compacted HTML is about 900 KB, which gzips to ablut 40 KB. Each table row has a total of 32 HTML tags, so we have about 30000 styled HTML elements.

This may not seem like much - you can easily generate and manipulate millions of rows in a HTML table, however, when you have nested tags, CSS, attributes, and event handlers, it starts to bog down.

With Chrome, the innerHTML approach was passable, but with IE 11 it was simply too much for the browser to handle. A factor of 6x to 8x slower.
Any action by the user that affects the jobs, causes a refresh, which made for a very laggy UI.

Moreover the Django backend also spent too much time generating the HTML - Upto 0.6 seconds just for this. Inconceivable!

The table rows needed to have filtering - only certain rows would be visible, based on the selected tab in a tab control. The user needed to be able to select any number of jobs to perform job actions on them. Both selection and tab switching were too slow.

Something had to be done! After much profiling and reading numerous blogs about javascript performance, I managed to speed up various things, by a factor of anywhere from 2x to 100x

Lesson 1 - Templating on client side

The first approach was to scrap the generation of HTML serverside, and send the job data as JSON to the client, to be rendered there instead.

Now the JS would render the HTML for each job, using a very basic templating function. Using innerHTML to construct the table was still inevitable.

The server side time was reduced, and the simple javascript templater I wrote seemed so much faster, with negligible times to subtitute values into the HTML template. It was a simple search and replace function :

function template(dict, str)
{
    for(key in dict)
       str.replaceAll('<<' + key + '>>', dict[key])
}

The reason for Django's slowness seems to be that its template engine is not so much a glorified printf but quite a complicated parser/interpreter ( given that you can embed code inside a Django template ).

Overkill for a printf() like requirement

Still, this was not ideal - every refresh still led to rebuilding the HTML table even if nothing changed from the previous refresh.

Lesson 2 - Use incremental updates

Obviously, some kind of incremental update was needed. Ideally we would like only a delta to be fetched from the server, but practically there were difficulties implementing this serverside.

The time taken by the server to send off the entire job data was negligible, as was the time taken by the browser to deserialize it into JSON, so there was not much gain to be had from trying to d/dt the data on the server side.

Instead we could do the diff on the client side. It was fairly simple and fast to do this - given the latest job data, and the previous one, we can generate the diff as additions, deletions and changes ( pseudo code below )

for(i in old) 
    if(not i in new) 
        removed.add(i)

for(i in new)
    if(not i in old) 
        added.add(i)  

for(i in new)
    if((i in old) and old[i] != new[i])
        changed.add(i)

The data are in the form of a dictionary keyed by a job number, and each value is in turn a dictionary.(Yes, we know JS doesn't really have dictionaries, only objects or whatever!)

The data looks like this :

{ 1234 : { job_name: "abcd_xyz8765",  job_time : "10:24:05" ... } },
{ 1230 : { job_name: "abcd_xyz9645",  job_time : "01:04:39" ... } } ...

The best way to compare two objects in JS seems to be to JSON.stringify() them and do a string compare. Checking each field individually doesn't seem to be any faster, and may be even slower, considering that each member access will cause a dictionary lookup.

Lesson 3 - Know your data

There are 5 scenarios that happen :

Initial page load, any number of rows get added - this happens only at login, but makes the most psychological impact in terms of how snappy the site seems.
Periodic refresh, a few rows change state - but if the UI is busy during the update, it feels sluggish.
User adds a job, one row gets changed.
User terminates one or more jobs, any number of rows chage state.
User removes one or more jobs.

The add case is the most important even though it happens only once per session.

The table is always sorted based on a job index, in descending order, so it becomes necessary to insert any new rows into the right locations.
The following simple approach acts like an insertion sort, which is extremely slow.

for(job in newjob)
{
    for(row in rows)
    {
        if(row.jobindex < job.index)
        {
            insertBefore(row, makerow(job));
            break;
        }
    }
}

For every new row, its insertion location needs to be found in the table. Sure, we could conceivably make the above into a binary search. Not non-trivial, but still a mass of code.
Even then, we have to insert elements in the middle of a table, which is slow.

After much head scratching, I realized, I don't even need to do this - The job indices in the server are unique and always increasing. The data is also always sorted ( because it ends up in a JSON dictionary sorted by job index ).

If we just ensure that the job data is in the same sort order as the table, we can avoid all the above slowness/complexity.

So for case 1, just appending each row is fine.

For case 3, a new job will always end up right at the top of the table, since a new job will always have an index greater than any job previously seen.

So we can write something like :

if(table.isempty())  // case 1
{
    for(job in newjob)
    {
        row = makerow(job);
        table.append(row);
    }
}
else                // case 3
{
    for(job in newjob)
    {
        row = makerow(job);
        table.prepend(row);
    }
}

A huge amount of unnecessary work in finding where a job fits in the table is avoided and appending, rather than inserting elements into the DOM is way faster too.
This was good for a speedup of more than 2x for case 1.
Case 3 conceivably never requires more than one row insertion - jobs are started one at a time, so the time is negligible.

Deletion also initially was very slow, because the initial code was something like this

for(job in deljobs)
{
    row = findRowWithJobIndex(job.index);
    row.remove();
}

findRowWithJobIndex() used jQuery with the find('[key=value]') selector, which is terribly slow, since it has to be a linear search.

One again, knowing that the deljobs array has descending job indices and so do the table rows, we end up with a much faster delete :

i = 0;
for(row in rows)
{
    if(i < deljobs.length)
    {
        if(row.jobindex == deljobs[i])
        {
            row.remove();
            i++;
        }
    }
}

We are essentially doing only one linear pass on the rows and picking off each one that needs to be removed.

More than 5x speedup when deleting all the rows, and close to 10x on IE 11

Simple fast code, resulting from using knowledge about the data, rather than using a catch-all generic algorithm.

Lesson 4 - Profile

You can only get so far with logic optimization, eventually you have to profile, IE 11 has a very nice profiler, as does Chrome.

I believed the JSON parsing and delta generation code, the HTML template rendering etc. would be very slow, but all that took very little time!

After profiling, the major performance bottlenecks were (surprisingly) :

DOM node creation
jQuery in loops
Adding, removing and testing for class names in DOM elements

Assigning HTML strings to innerHTML is very slow on IE - It stands to reason, the browser has to parse HTML and make it into DOM objects, and pasring fully spec'd HTML is going to be super slow (Chrome does it real fast though!)

The alternative is to create the DOM elements using code like this :

var tr = document.createElement('tr');
var td1 = document.createElement('td');
td1.setAttribute('class', 'job-detail');

var td1a = document.createElement('a');
td1a.setAttribute('class', 'job-link');
td1a.setAttribute('href', '#');

var td2 = document.createElement('td');
td2.setAttribute('class', 'text-center');

var td3 = document.createElement('td');
td3.setAttribute('class', 'text-center');

var td4 = document.createElement('td');
td4.setAttribute('class', 'text-center');

tr.appendChild(td1);
tr.appendChild(td2);
tr.appendChild(td3);
tr.appendChild(td4);
td1.appendChild(td1a);

However, this is just as slow as innerHTML even though a number of synthetic benchmarks show otherwise on the web. But the profiler does not lie.

The answer is that we need to create this node just once, and clone copies of it with cloneNode() when we need more.

In fact doing that means we can possibly keep the template inside the easily tweakable HTML, instead of in the above opaque JS code.
Using this approach improves the node creation performance extremely, leaving appendChild() as the main laggard.

There are hordes of JS performance blogs (including one by John Resig) which hint that great speedup can be had by using createDocumentFragment(), appending many nodes to that, and then appending the fragment to the main DOM.

However on both Chrome and IE 11, this approach has worse performance. Browsers have evolved a lot since the days of IE 8 and we're not in Kansas anymore.
The profiler is the ultimate authority, despite any number of 3rd party benchmarks showing the contrary.

After a lot of experimentation, I found that there is probably no straightforward way to speed up appendChild().
Now it's all the browsers fault!

The second bottleneck is jQuery.

jQuery's $(elem) function does a huge amount of work. Whenever you need to do it more than once, it makes sense to use the raw DOM methods instead, even if it involves more verbose code. jQuery makes for compact neat code, but performance is dismal.

$('.someclass') can be terribly slow - it is much faster to iterate over all the elements and check for the class.
The reasoning is that jQuery (or the corresponding DOM method) is going to iterate over some data structure anyway, and do some work with each element so it can give it back to you.
I am not quite sure, but it seems like this is linear on the number of elements - It could conceivably be a log N operation, but who knows?

Instead you can iterate yourself over the elements using elem.children[] and do the work in one shot if the classname matches. Even more so if you need to test for things other than the classname, like attribute values.

$(elem).find('[key=value]') should be avoided like the plague, if elem has hundreds of child elements

The ideal framework to use for performance is Vannilla.js where possible.

The third bottleneck is dealing with DOM element classnames.

$(elem).addClass('someclass') is slow
elem.className += ' someclass' is not much faster
elem.classList.add('someclass') is the fastest method
elem.classList[n] != null is the best way to test for a class name, if the position of that classname is fixed
elem.classList.has('classname') is the next best

This was used in the table row selection code, and this, along with some other tweaks mentioned below took the speed of selection from 14 seconds for 900 rows on IE 11 to 0.12 seconds - a 100x speedup!

On Chrome it was not as dramatic, but the times dropped out to below whats measurable accurately, but certainly from 120 ms to about 5 ms.

Lesson 5 - Old tricks always work

There are a few simple habits that you develop when writing high performance code in static languages, these actually spill over even to dynamic languages, with greater effect

If you use an expression more than once : make it a variable

For example when you see:

if(row[i].classList.contains('class1'))
{
    row[i].classList.add('class1')
}

You should realize that every [] and . is a potential bottleneck.

var cls = row[i].classList
if(cls.contains('class1'))
{
    cls.add('class1')
}

will be faster

Arrays are fast : when in doubt use an array.
Cache it if you can : If you have something generated by a large chunk of code, hang on to it for later - its easier to know if an answer changed than recalculate it. Applies very much to dynamic langauges since they are bound by CPU speed rather than memory access speed.
Pull invariant code out of loops : The compiler does this for you in static languages. In dynamic languages you need to do it yourself.
Avoid dynamic allocation : Hard to tell in a dynamic language when it doesn't happen, but be aware of when it can happen - reuse objects instead of making new ones. Allocation may be very fast, but constrution can be very heavy in a language like JS - This is why jQuery $(elem) is so bad.
Optimize the worst case : If the worst case is frequent enough to be noticed, optimize that at the expense of the average or best case. Mike Abrash said the same.
Make it look fast, even if it is incurably slow : This is psychology - a console tool that shows progress with a fast \|/-\ spinner appears way faster than something that shows a 0 to 100 progress indicator. Microsoft knows this, which is why progress bars animate as well as move forward.
Optimize multiple condition if() statements : When testing for various things with && and ||, try to order the conditions in a way that maximizes the short circuit of the operator.

For example when you have:

if(leastLikely || lessLikely || likely)
{
    ...
}

if(!likely || !lessLikely || !leastLikely)
{
    ...
}

The conditions are evaluated maximum number of times, instead the following :

if(likely ||  lessLikely || leastLikely)
{
    ...
}

if(!leastLikely || !lessLikely || !likely)
{
    ...
}

will be faster, especially if the three conditions are expressions rather than values, and in any case, due to branch prediction.

All these little tweaks shave off a few dozen milliseconds here and there and it adds up. The trick is to make this simple stuff a reflex.

Conclusion

It seems like the effects of basic optimization are amplified highly when using dynamic languages. You can get away with a lot of inefficiency in static languages - the compiler works really hard to offset your stupidity, but in dynamic languages, it gets thrown back in your face.

Antics with semantics

Saturday, July 5, 2014

A DOM of many hovered class