So last week has been entriely about optimization. The website I'm working on had serious performance issues and it needed to be fast, to qualify as working.
The problem
The problem is simple - we have a back end database that supplies data about cloud computing jobs, and we need to display these in a table, allowing the user to view and control the jobs.
The data is refreshed every 30 seconds and each job element has about nine pieces of information. Four of them are shown in a table row, and the rest appear in an expandable row when the main row is clicked.
The initial implementation had the server generate the whole HTML, and we simply fetched that via AJAX and assiged it usinginnerHTML
This was unacceptable - despite gzip encoding, the amount of data per request is huge - For my test case, with ~900 jobs, the compacted HTML is about900 KB , which gzips to ablut 40 KB . Each table row has a total of 32 HTML tags, so we have about 30000 styled HTML elements.
This may not seem like much - you can easily generate and manipulate millions of rows in a HTML table, however, when you have nested tags, CSS, attributes, and event handlers, it starts to bog down.
With Chrome, theinnerHTML approach was passable, but with IE 11 it was simply too much for the browser to handle. A factor of 6x to 8x slower.
Any action by the user that affects the jobs, causes a refresh, which made for a very laggy UI.
Moreover the Django backend also spent too much time generating the HTML - Upto0.6 seconds just for this. Inconceivable!
The table rows needed to have filtering - only certain rows would be visible, based on the selected tab in a tab control. The user needed to be able to select any number of jobs to perform job actions on them. Both selection and tab switching were too slow.
Something had to be done! After much profiling and reading numerous blogs about javascript performance, I managed to speed up various things, by a factor of anywhere from 2x to 100x
Lesson 1 - Templating on client side
The first approach was to scrap the generation of HTML serverside, and send the job data as JSON to the client, to be rendered there instead.
Now the JS would render the HTML for each job, using a very basic templating function. UsinginnerHTML to construct the table was still inevitable.
The server side time was reduced, and the simple javascript templater I wrote seemed so much faster, with negligible times to subtitute values into the HTML template. It was a simple search and replace function :
The reason for Django's slowness seems to be that its template engine is not so much a glorifiedprintf but quite a complicated parser/interpreter ( given that you can embed code inside a Django template ).
Overkill for aprintf() like requirement
Still, this was not ideal - every refresh still led to rebuilding the HTML table even if nothing changed from the previous refresh.
Lesson 2 - Use incremental updates
Obviously, some kind of incremental update was needed. Ideally we would like only a delta to be fetched from the server, but practically there were difficulties implementing this serverside.
The time taken by the server to send off the entire job data was negligible, as was the time taken by the browser to deserialize it into JSON, so there was not much gain to be had from trying to d/dt the data on the server side.
Instead we could do the diff on the client side. It was fairly simple and fast to do this - given the latest job data, and the previous one, we can generate the diff as additions, deletions and changes ( pseudo code below )
The data are in the form of a dictionary keyed by a job number, and each value is in turn a dictionary.(Yes, we know JS doesn't really have dictionaries, only objects or whatever!)
The data looks like this :
The best way to compare two objects in JS seems to be toJSON.stringify() them and do a string compare. Checking each field individually doesn't seem to be any faster, and may be even slower, considering that each member access will cause a dictionary lookup.
Lesson 3 - Know your data
There are 5 scenarios that happen :
The add case is the most important even though it happens only once per session.
The table is always sorted based on a job index, in descending order, so it becomes necessary to insert any new rows into the right locations.
The following simple approach acts like an insertion sort, which is extremely slow.
For every new row, its insertion location needs to be found in the table. Sure, we could conceivably make the above into a binary search. Not non-trivial, but still a mass of code.
Even then, we have to insert elements in the middle of a table, which is slow.
After much head scratching, I realized, I don't even need to do this - The job indices in the server are unique and always increasing. The data is also always sorted ( because it ends up in a JSON dictionary sorted by job index ).
If we just ensure that the job data is in the same sort order as the table, we can avoid all the above slowness/complexity.
So for case 1, just appending each row is fine.
For case 3, a new job will always end up right at the top of the table, since a new job will always have an index greater than any job previously seen.
So we can write something like :
A huge amount of unnecessary work in finding where a job fits in the table is avoided and appending, rather than inserting elements into the DOM is way faster too.
This was good for a speedup of more than 2x for case 1.
Case 3 conceivably never requires more than one row insertion - jobs are started one at a time, so the time is negligible.
Deletion also initially was very slow, because the initial code was something like this
findRowWithJobIndex() used jQuery with the find ( '[key=value]') selector, which is terribly slow, since it has to be a linear search.
One again, knowing that the deljobs array has descending job indices and so do the table rows, we end up with a much faster delete :
We are essentially doing only one linear pass on the rows and picking off each one that needs to be removed.
More than 5x speedup when deleting all the rows, and close to 10x on IE 11
Simple fast code, resulting from using knowledge about the data, rather than using a catch-all generic algorithm.
Lesson 4 - Profile
You can only get so far with logic optimization, eventually you have to profile, IE 11 has a very nice profiler, as does Chrome.
I believed the JSON parsing and delta generation code, the HTML template rendering etc. would be very slow, but all that took very little time!
After profiling, the major performance bottlenecks were (surprisingly) :
The alternative is to create the DOM elements using code like this :
However, this is just as slow as innerHTML even though a number of synthetic benchmarks show otherwise on the web. But the profiler does not lie.
The answer is that we need to create this node just once, and clone copies of it withcloneNode() when we need more.
In fact doing that means we can possibly keep the template inside the easily tweakable HTML, instead of in the above opaque JS code.
Using this approach improves the node creation performance extremely, leavingappendChild() as the main laggard.
There are hordes of JS performance blogs (including one by John Resig) which hint that great speedup can be had by using createDocumentFragment() , appending many nodes to that, and then appending the fragment to the main DOM.
However on both Chrome and IE 11, this approach has worse performance. Browsers have evolved a lot since the days of IE 8 and we're not in Kansas anymore.
The profiler is the ultimate authority, despite any number of 3rd party benchmarks showing the contrary.
After a lot of experimentation, I found that there is probably no straightforward way to speed upappendChild() .
Now it's all the browsers fault!
The second bottleneck is jQuery.
jQuery's$(elem) function does a huge amount of work. Whenever you need to do it more than once, it makes sense to use the raw DOM methods instead, even if it involves more verbose code. jQuery makes for compact neat code, but performance is dismal.
$('.someclass') can be terribly slow - it is much faster to iterate over all the elements and check for the class.
The reasoning is that jQuery (or the corresponding DOM method) is going to iterate over some data structure anyway, and do some work with each element so it can give it back to you.
I am not quite sure, but it seems like this is linear on the number of elements - It could conceivably be alog N operation, but who knows?
Instead you can iterate yourself over the elements usingelem.children[] and do the work in one shot if the classname matches. Even more so if you need to test for things other than the classname, like attribute values.
$(elem).find('[key=value]') should be avoided like the plague, if elem has hundreds of child elements
The ideal framework to use for performance is Vannilla.js where possible.
The third bottleneck is dealing with DOM element classnames.
This was used in the table row selection code, and this, along with some other tweaks mentioned below took the speed of selection from 14 seconds for 900 rows on IE 11 to 0.12 seconds - a 100x speedup!
On Chrome it was not as dramatic, but the times dropped out to below whats measurable accurately, but certainly from 120 ms to about 5 ms.
Lesson 5 - Old tricks always work
There are a few simple habits that you develop when writing high performance code in static languages, these actually spill over even to dynamic languages, with greater effect
For example when you see:
You should realize that every[] and . is a potential bottleneck.
will be faster
For example when you have:
The conditions are evaluated maximum number of times, instead the following :
will be faster, especially if the three conditions are expressions rather than values, and in any case, due to branch prediction.
All these little tweaks shave off a few dozen milliseconds here and there and it adds up. The trick is to make this simple stuff a reflex.
Conclusion
It seems like the effects of basic optimization are amplified highly when using dynamic languages. You can get away with a lot of inefficiency in static languages - the compiler works really hard to offset your stupidity, but in dynamic languages, it gets thrown back in your face.
The problem
The problem is simple - we have a back end database that supplies data about cloud computing jobs, and we need to display these in a table, allowing the user to view and control the jobs.
The data is refreshed every 30 seconds and each job element has about nine pieces of information. Four of them are shown in a table row, and the rest appear in an expandable row when the main row is clicked.
The initial implementation had the server generate the whole HTML, and we simply fetched that via AJAX and assiged it using
This was unacceptable - despite gzip encoding, the amount of data per request is huge - For my test case, with ~900 jobs, the compacted HTML is about
This may not seem like much - you can easily generate and manipulate millions of rows in a HTML table, however, when you have nested tags, CSS, attributes, and event handlers, it starts to bog down.
With Chrome, the
Any action by the user that affects the jobs, causes a refresh, which made for a very laggy UI.
Moreover the Django backend also spent too much time generating the HTML - Upto
The table rows needed to have filtering - only certain rows would be visible, based on the selected tab in a tab control. The user needed to be able to select any number of jobs to perform job actions on them. Both selection and tab switching were too slow.
Something had to be done! After much profiling and reading numerous blogs about javascript performance, I managed to speed up various things, by a factor of anywhere from 2x to 100x
Lesson 1 - Templating on client side
The first approach was to scrap the generation of HTML serverside, and send the job data as JSON to the client, to be rendered there instead.
Now the JS would render the HTML for each job, using a very basic templating function. Using
The server side time was reduced, and the simple javascript templater I wrote seemed so much faster, with negligible times to subtitute values into the HTML template. It was a simple search and replace function :
function template( dict, str) { for( key in dict) str.replaceAll( '<<' + key + '>>', dict[key]) }
The reason for Django's slowness seems to be that its template engine is not so much a glorified
Overkill for a
Still, this was not ideal - every refresh still led to rebuilding the HTML table even if nothing changed from the previous refresh.
Lesson 2 - Use incremental updates
Obviously, some kind of incremental update was needed. Ideally we would like only a delta to be fetched from the server, but practically there were difficulties implementing this serverside.
The time taken by the server to send off the entire job data was negligible, as was the time taken by the browser to deserialize it into JSON, so there was not much gain to be had from trying to d/dt the data on the server side.
Instead we could do the diff on the client side. It was fairly simple and fast to do this - given the latest job data, and the previous one, we can generate the diff as additions, deletions and changes ( pseudo code below )
for( i in old) if( not i in new) removed.add( i) for( i in new) if( not i in old) added.add( i) for( i in new) if( ( i in old) and old[i] != new[i]) changed.addi ( )
The data are in the form of a dictionary keyed by a job number, and each value is in turn a dictionary.(Yes, we know JS doesn't really have dictionaries, only objects or whatever!)
The data looks like this :
{ 1234 :{ job_name: "abcd_xyz8765", job_time : "10:24:05" ...} } ,{ 1230 :{ job_name: "abcd_xyz9645", job_time : "01:04:39" ...} } ...
The best way to compare two objects in JS seems to be to
Lesson 3 - Know your data
There are 5 scenarios that happen :
- Initial page load, any number of rows get added - this happens only at login, but makes the most psychological impact in terms of how snappy the site seems.
- Periodic refresh, a few rows change state - but if the UI is busy during the update, it feels sluggish.
- User adds a job, one row gets changed.
- User terminates one or more jobs, any number of rows chage state.
- User removes one or more jobs.
The add case is the most important even though it happens only once per session.
The table is always sorted based on a job index, in descending order, so it becomes necessary to insert any new rows into the right locations.
The following simple approach acts like an insertion sort, which is extremely slow.
for( job in newjob) { for( row in rows) { if( row.jobindex < job.index) { insertBefore( row, makerow( job) ) ; break;} } }
For every new row, its insertion location needs to be found in the table. Sure, we could conceivably make the above into a binary search. Not non-trivial, but still a mass of code.
Even then, we have to insert elements in the middle of a table, which is slow.
After much head scratching, I realized, I don't even need to do this - The job indices in the server are unique and always increasing. The data is also always sorted ( because it ends up in a JSON dictionary sorted by job index ).
If we just ensure that the job data is in the same sort order as the table, we can avoid all the above slowness/complexity.
So for case 1, just appending each row is fine.
For case 3, a new job will always end up right at the top of the table, since a new job will always have an index greater than any job previously seen.
So we can write something like :
if( table.isempty() ) // case 1{ for( job in newjob) { row = makerow( job) ; table.append( row) ;} } else // case 3{ for( job in newjob) { row = makerow( job) ; table.prepend( row) ;} }
A huge amount of unnecessary work in finding where a job fits in the table is avoided and appending, rather than inserting elements into the DOM is way faster too.
This was good for a speedup of more than 2x for case 1.
Case 3 conceivably never requires more than one row insertion - jobs are started one at a time, so the time is negligible.
Deletion also initially was very slow, because the initial code was something like this
for( job in deljobs) { row = findRowWithJobIndex( job.index) ; row.remove() ;}
One again, knowing that the deljobs array has descending job indices and so do the table rows, we end up with a much faster delete :
i = 0; for( row in rows) { if( i < deljobs.length) { if( row.jobindex == deljobs[i]) { row.remove() ; i++;} } }
We are essentially doing only one linear pass on the rows and picking off each one that needs to be removed.
More than 5x speedup when deleting all the rows, and close to 10x on IE 11
Simple fast code, resulting from using knowledge about the data, rather than using a catch-all generic algorithm.
Lesson 4 - Profile
You can only get so far with logic optimization, eventually you have to profile, IE 11 has a very nice profiler, as does Chrome.
I believed the JSON parsing and delta generation code, the HTML template rendering etc. would be very slow, but all that took very little time!
After profiling, the major performance bottlenecks were (surprisingly) :
- DOM node creation
- jQuery in loops
- Adding, removing and testing for class names in DOM elements
The alternative is to create the DOM elements using code like this :
var tr = document.createElement ( 'tr') ; var td1 = document.createElement( 'td') ; td1.setAttribute( 'class', 'job-detail') ; var td1a = document.createElement( 'a') ; td1a.setAttribute( 'class', 'job-link') ; td1a.setAttribute( 'href', '#') ; var td2 = document.createElement( 'td') ; td2.setAttribute( 'class', 'text-center') ; var td3 = document.createElement( 'td') ; td3.setAttribute( 'class', 'text-center') ; var td4 = document.createElement( 'td') ; td4.setAttribute( 'class', 'text-center') ; tr.appendChild( td1) ; tr.appendChild( td2) ; tr.appendChild( td3) ; tr.appendChild( td4) ; td1.appendChild( td1a) ;
However, this is just as slow as innerHTML even though a number of synthetic benchmarks show otherwise on the web. But the profiler does not lie.
The answer is that we need to create this node just once, and clone copies of it with
In fact doing that means we can possibly keep the template inside the easily tweakable HTML, instead of in the above opaque JS code.
Using this approach improves the node creation performance extremely, leaving
There are hordes of JS performance blogs (including one by John Resig
However on both Chrome and IE 11, this approach has worse performance. Browsers have evolved a lot since the days of IE 8 and we're not in Kansas anymore.
The profiler is the ultimate authority, despite any number of 3rd party benchmarks showing the contrary.
After a lot of experimentation, I found that there is probably no straightforward way to speed up
Now it's all the browsers fault!
The second bottleneck is jQuery.
jQuery's
The reasoning is that jQuery (or the corresponding DOM method) is going to iterate over some data structure anyway, and do some work with each element so it can give it back to you.
I am not quite sure, but it seems like this is linear on the number of elements - It could conceivably be a
Instead you can iterate yourself over the elements using
The ideal framework to use for performance is Vannilla.js where possible.
The third bottleneck is dealing with DOM element classnames.
$(elem).addClass('someclass') is slowelem.className += ' someclass' is not much fasterelem.classList.add('someclass') is the fastest methodelem.classList[n] != null is the best way to test for a class name, if the position of that classname is fixedelem.classList.has('classname') is the next best
This was used in the table row selection code, and this, along with some other tweaks mentioned below took the speed of selection from 14 seconds for 900 rows on IE 11 to 0.12 seconds - a 100x speedup!
On Chrome it was not as dramatic, but the times dropped out to below whats measurable accurately, but certainly from 120 ms to about 5 ms.
Lesson 5 - Old tricks always work
There are a few simple habits that you develop when writing high performance code in static languages, these actually spill over even to dynamic languages, with greater effect
If you use an expression more than once : make it a variable
For example when you see:
if( row[i].classList.contains( 'class1') ) { row[i].classList.add( 'class1') }
You should realize that every
var cls = row[i].classList if( cls.contains( 'class1') ) { cls.add( 'class1') }
will be faster
Arrays are fast : when in doubt use an array.
Cache it if you can : If you have something generated by a large chunk of code, hang on to it for later - its easier to know if an answer changed than recalculate it. Applies very much to dynamic langauges since they are bound by CPU speed rather than memory access speed.
Pull invariant code out of loops : The compiler does this for you in static languages. In dynamic languages you need to do it yourself.
Avoid dynamic allocation : Hard to tell in a dynamic language when it doesn't happen, but be aware of when it can happen - reuse objects instead of making new ones. Allocation may be very fast, but constrution can be very heavy in a language like JS - This is why jQuery $(elem) is so bad.
Optimize the worst case : If the worst case is frequent enough to be noticed, optimize that at the expense of the average or best case. Mike Abrash said the same.
Make it look fast, even if it is incurably slow : This is psychology - a console tool that shows progress with a fast \|/-\ spinner appears way faster than something that shows a 0 to 100 progress indicator. Microsoft knows this, which is why progress bars animate as well as move forward.
Optimize multiple condition : When testing for various things with && and ||, try to order the conditions in a way that maximizes the short circuit of the operator.if () statements
For example when you have:
if( leastLikely || lessLikely || likely) { ...} if( !likely || !lessLikely || !leastLikely) { ...}
The conditions are evaluated maximum number of times, instead the following :
if( likely || lessLikely || leastLikely) { ...} if( !leastLikely || !lessLikely || !likely) { ...}
will be faster, especially if the three conditions are expressions rather than values, and in any case, due to branch prediction.
All these little tweaks shave off a few dozen milliseconds here and there and it adds up. The trick is to make this simple stuff a reflex.
Conclusion
It seems like the effects of basic optimization are amplified highly when using dynamic languages. You can get away with a lot of inefficiency in static languages - the compiler works really hard to offset your stupidity, but in dynamic languages, it gets thrown back in your face.