Monday, May 16, 2016

Topcoder problem analysis - Birds (CC 2002 REG Semi 500 pointer)

Problem statement

https://community.topcoder.com/stat?c=problem_statement&pm=330&rd=62

This is not a very hard problem once you figure out the right algorithm. I spent a lot of time chasing a solution after having misread the problem.
I assumed that a bird is migratory if it spends its time in two regions for 90 days or more each in any 365 day period, rather than just the specified start date to December 31st.

Lesson #1 : Read the problem statement carefully!


Subtask - Find the number of days between two dates in a year

The naive way to do this is to calculate it by summing up the months upto the current and adding the date.
Instead of doing this in a loop every time, we can precalculate this with partial_sum():



The algorithm itself

My solution was as follows:
  • Put the locations in an array and sort by date.
  • Find all pairs of locations the bird stays which are at least 1000 KM distant
  • For every pair of locations say A and B, walk the array starting from A;
  • Keep walking the array as long as every location is within 1000 KM of A and more than 1000 KM from B. Sum up the time spent along this sequence.
  • The entries that were walked are all in the same region and we have the total time spent in that region.
  • Repeat the same process vice versa starting from B
  • Now we have the total time spent in both the regions, if both are > 90 days were have a migration!

The third step finds the longest time spent in any one region - there is a lot of redundant computation - For e.g. if we have ABCDEFG as locations, and G happens to be 1000 Km away from the rest, while all the others are close to each other, we will be walking from A to G, B to G, C to G and so on - but we don't care, since all that matters is that the time spent in some subset of [A to F] is 90 days and the stay in G itself is 90 days.

There is only one tricky detail - time spent at any location is given by t[d + 1] - t[d] - but obviously the last element of the locations array doesn't have a successor.
It's implicitly assumed that if a bird no longer moves, it stays where it was until December 31st.
So we put in a January 1st (next year or day 366) sentinel at the end of the locations array, so we have a way for the loop to be within bounds.

Perversely, the problem statement explicitly says the same is not the case for the beginning of the year - in other words, if a bird migrated on March 1st, you don't assume it lived at that location before that. I assumed the opposite the first time I read the problem and didn't figure it out until one system test case failed!


Final code


The code seems quite verbose, but the essence of the algorithm is only about 10 lines.
Note the simplicity of using FOR_EVERY_UNIQUE_PAIR()
Also notice the use of stoi() - C++11 onwards has string to numeric conversion functions and vice versa. The only alternative before that was istringstream or -- Dog forbid! -- the sprintf() family

C++ as it is meant to be used


There are two things that are the biggest blessing as well as the biggest curse for C++

A) It is (almost perfectly) backward compatible with C

Perhaps nothing else ensured its success, but it also means that people tend to clump together the two as C/C++ and for most C++ is learned after prior exposure to C. This only leads to folks using C++ as a better C, bringing in all the bad habits :
  • Globals - In C it's impossible to avoid them, but they can and should be avoided in C++ ( Why? )
  • Manual memory management - For every flower there's a bee, For every fruit there's a tree, but unfortunately for every malloc() there's not always a free(). The paranoia of the bugs caused by this led the designers of Java to opt for garbage collection instead (Of course they forgot that memory is not the only resource that needs to be managed!!). In C++ we have tools for the job - Deterministic finalization means we can use RAII classes or simply use auto pointers to ensure matched allocation and deallocation. In fact, most of the time, you can simply use a std::string as a buffer. Google does it in their protocol buffer code - If it's good enough for google it's good enough for me. In the past few years I can't recall having used new or delete[] for buffers (or even for objects for that matter)
  • Using the C functions - using strcpy(), printf(), scanf(), and their brethren is a surefire way to induce vulnerabilities and bugs. C++ gives you a nice string, stream I/O and if you use boost, you can use type safe code like the lexical_cast and the format library.
    It is this C legacy which leads to insecure and fragile code.


B) It is labeled "Object Oriented"

Destructors, Encapsulation, Operator overloading and Pure abstract classes are quite useful features - Especially destructors, without with RAII would be impossible.

However, I'm not much of a fan of the OO bandwagon. OO has it's conveniences and good points, but in emphasizing it above all else, there is too much hype. It really does not give you much of a new way of looking at problem solving unless you opt for pure message passing style OO like Smalltalk
Most of day to day OO usage could be very well done in C - Think of the stdio.h functions or most of the Win32 API that take an opaque FILE* or HANDLE respectively. I tend to agree with this gentleman's opinion on the matter.



What does good C++ look like?

In my opinion, minimum reliance on code that you have written, and maximum use of well written libraries ( I refer to the STL and boost ). Even if it makes the code only readable by someone more experienced - No one said C++ is an easy language.

Some rules of thumb that I tend to follow
  • Use containers instead of C arrays
  • Use string rather than char* buffers
  • Avoid new and malloc()
  • Avoid explicit loops over data when possible, use copy(), transform() etc.
  • Avoid pointers as much as possible
  • Avoid functions that take varargs ... arguments
  • Once again - do not reinvent the wheel, unless you need a square one, it's been done before, there's always a well written library out there with the functionality you need.

Topcoder problems - Tokenizing

One of the most common tasks in topcoder problems is tokenizing a string into individual elements. Almost always the entries are separated by spaces.


The naive and obvious version

Seen surprisingly often in various avatars, it is a C mode of thinking goes something like this:


Very verbose and tedious!
Of course on topcoder they usually write tersely equivalent code on a single line #defined as a macro or an inline function.


Can we do better?

Here is a more C++ like version


Still smells like C


How about

Think in functional terms, get the loop mentality out of the way - The STL is your friend :


Nice and neat!


Can this be a one-liner?

Let's try putting everything together...

Unfortunately this is not correct. The temporary istringstream dies after the istream iterator is constructed, so bad things will happen. The constructor of the istream_iterator takes a istream& rather than a const istream& otherwise it would be workable - See this
Actually should have been obvious...
You cannot read stuff out of a const reference to a stream since it changes state when you do.


What about other delimiters?

C++ streams are tied to locales, so they read stuff based on the definition of isspace() for the current locale. There seems to be no way to manufacture our own locales for splitting strings on things other than whitespace, so we need a different approach:


Neat huh?
If that looks like Greek and Latin to you, don't worry, it looks that way to me too, three days after writing it.

Let's analyze what the heck is going on here...

The intent is as follows - we want to maintain two pointers/iterators - one points to the start of a word, and the other points to a separator (or end of string)

The for loop starts by initializing ps(pointer to start), pe(pointer to end-of-string).
Since we use auto we have to assign something to pd(pointer to delimiter), so might as well assign ps to it. We are going to loop until ps reaches pe.

Next we place the next available token into the result vector - emplace_back() is a function that can push an element onto a vector without copying the value passed into it - it uses move semantics to achieve this. Whenever you push a temporary object onto a vector, use emplace_back()
Remember never to reuse the emplaced object again, or else...

OK, but what the hell is "the next available token" mentioned above?
It is a string consisting of an iterator range - the start of the range is obviously ps. The end of the range is the next delimiter in the string, or the end of the string if no such delimiter exists.
Well that's exactly what find_first_of() does - returns an iterator in a range that has any one of the values you give it (here the characters in the string seps). If not found, it returns the end of the range, which is what we want.

So all we need to do is set ps to the point after the delimiter, so we can grab the next token - this is done in the increment part of the for loop. If it goes past the end, were done. Assigning pd on the fly inside the expression is a neat way to avoid one extra line of code. Remember that everything in C/C++ except control statements return a value (In fact and BTW, in GNU C, there is an extension that lets every statement be a value).

Not too bad at all! I can't think of a more neat version than this. I have some vague idea about using adjacent_find(), any_of() and so on, but I suspect it won't be as neat as this.

One thing you might notice is that I use pointers instead of iterators. The point is that pointers are the best kind of iterator, and if you know that the underlying object is contiguous, one might as well use a pointer than an iterator (the compiler will actually generate almost the same code for an iterator in this case). There is nothing evil about pointers, as long as you use them sparingly in a type safe way.

Here's how it looks in the brief form using the macros from my topcoder template:

Try writing a similar split in your favorite language for kicks!

Topcoder problem analysis - A new hope

Intro

Off late, I've started practising topcoder problems to get better at problem solving. I'm in no state to compete in SRMs right now, but I'll try to practice as often as possible, till I feel up to it.
I'd never ever solved a 1000 pointer before, in any amount of time. But I managed to solve a couple, albeit taking a lot of time.
My intent was not to just get code to work, but write good idiomatic C++, with some amount of terseness.


Setup

The first thing I noticed is that the topcoder beta arena based on HTML really sucks. It doesn't let you browse problems by SRM and as far as I could tell, there is no way to see other peoples solutions. The good old Java applet still works, and I guess it will continue to work because it remains in production, so thank goodness for that.

The first thing to do is to configure a good editor plugin - I used to use FileEdit and TZTester before, but Greed seems like the best thing to use right now. It also downloads the problem descriptions and is highly configurable. So far I haven't been able to get it to generate automated test code, but that's probably because the problems I opened are from very early matches, where the test cases are not specified in a standard form which the plugin can parse.

The next thing is what template to use - as usual, a bunch of macros.
This is what I have right now:


Just a bunch of macros borrowed from many topcoder contestants. all() is probably the most useful, followed by FOR(), and P() is invaluable while debugging. Most of them just reduce verbosity and avoid typos. One has to be careful to use macro names that are not identifiers, otherwise you can get some horrendous errors.


Modus operandi

My goal is not to try to write code as fast as possible. It is to try to write the best (by my taste) C++ for each problem. This means that I will try my best to refine a solution once I get it to work, or try to write it in a refined STL centric manner from the get go.

In future posts I will try to dissect each problem, algorithmically and code wise, and try to show how best to leverage C++.

Ideally I would like to see myself writing code in the very first iteration, that looks like the last iteration I produce now, after much refinement.

In a contest situation, some of the code I write is not ideal, since in a contest, cleanliness of code is irrelevant, as is code duplication. If anything some contestants write code in a purposely obscure manner to thwart others from grokking the code.

For practice purposes, I'd rather write elegant code. Some terseness and one-lining is nice though as code golf.
I do not adhere strictly to my variable naming convention, since it leads to come verbosity. I also break some whitespace rules to make the code look compact.
A supplementary library of code is getting accumulated as I do these problems, since some things are patterns that repeat.
I will discuss those bits of code in future posts.


Saturday, July 5, 2014

A DOM of many hovered class


So last week has been entriely about optimization. The website I'm working on had serious performance issues and it needed to be fast, to qualify as working.

The problem

The problem is simple - we have a back end database that supplies data about cloud computing jobs, and we need to display these in a table, allowing the user to view and control the jobs.
The data is refreshed every 30 seconds and each job element has about nine pieces of information. Four of them are shown in a table row, and the rest appear in an expandable row when the main row is clicked.

The initial implementation had the server generate the whole HTML, and we simply fetched that via AJAX and assiged it using innerHTML
This was unacceptable - despite gzip encoding, the amount of data per request is huge - For my test case, with ~900 jobs, the compacted HTML is about 900 KB, which gzips to ablut 40 KB. Each table row has a total of 32 HTML tags, so we have about 30000 styled HTML elements.

This may not seem like much - you can easily generate and manipulate millions of rows in a HTML table, however, when you have nested tags, CSS, attributes, and event handlers, it starts to bog down.

With Chrome, the innerHTML approach was passable, but with IE 11 it was simply too much for the browser to handle. A factor of 6x to 8x slower.
Any action by the user that affects the jobs, causes a refresh, which made for a very laggy UI.

Moreover the Django backend also spent too much time generating the HTML - Upto 0.6 seconds just for this. Inconceivable!

The table rows needed to have filtering - only certain rows would be visible, based on the selected tab in a tab control. The user needed to be able to select any number of jobs to perform job actions on them. Both selection and tab switching were too slow.

Something had to be done! After much profiling and reading numerous blogs about javascript performance, I managed to speed up various things, by a factor of anywhere from 2x to 100x

Lesson 1 - Templating on client side

The first approach was to scrap the generation of HTML serverside, and send the job data as JSON to the client, to be rendered there instead.

Now the JS would render the HTML for each job, using a very basic templating function. Using innerHTML to construct the table was still inevitable.

The server side time was reduced, and the simple javascript templater I wrote seemed so much faster, with negligible times to subtitute values into the HTML template. It was a simple search and replace function :

function template(dict, str)
{
    for(key in dict)
       str.replaceAll('<<' + key + '>>', dict[key])
}

The reason for Django's slowness seems to be that its template engine is not so much a glorified printf but quite a complicated parser/interpreter ( given that you can embed code inside a Django template ).

Overkill for a printf() like requirement

Still, this was not ideal - every refresh still led to rebuilding the HTML table even if nothing changed from the previous refresh.

Lesson 2 - Use incremental updates

Obviously, some kind of incremental update was needed. Ideally we would like only a delta to be fetched from the server, but practically there were difficulties implementing this serverside.

The time taken by the server to send off the entire job data was negligible, as was the time taken by the browser to deserialize it into JSON, so there was not much gain to be had from trying to d/dt the data on the server side.

Instead we could do the diff on the client side. It was fairly simple and fast to do this - given the latest job data, and the previous one, we can generate the diff as additions, deletions and changes ( pseudo code below )

for(i in old) 
    if(not i in new) 
        removed.add(i)

for(i in new)
    if(not i in old) 
        added.add(i)  

for(i in new)
    if((i in old) and old[i] != new[i])
        changed.add(i)  

The data are in the form of a dictionary keyed by a job number, and each value is in turn a dictionary.(Yes, we know JS doesn't really have dictionaries, only objects or whatever!)

The data looks like this :

{ 1234 : { job_name: "abcd_xyz8765",  job_time : "10:24:05" ... } },
{ 1230 : { job_name: "abcd_xyz9645",  job_time : "01:04:39" ... } } ...

The best way to compare two objects in JS seems to be to JSON.stringify() them and do a string compare. Checking each field individually doesn't seem to be any faster, and may be even slower, considering that each member access will cause a dictionary lookup.

Lesson 3 - Know your data

There are 5 scenarios that happen :

  • Initial page load, any number of rows get added - this happens only at login, but makes the most psychological impact in terms of how snappy the site seems.
  • Periodic refresh, a few rows change state - but if the UI is busy during the update, it feels sluggish.
  • User adds a job, one row gets changed.
  • User terminates one or more jobs, any number of rows chage state.
  • User removes one or more jobs.

The add case is the most important even though it happens only once per session.

The table is always sorted based on a job index, in descending order, so it becomes necessary to insert any new rows into the right locations.
The following simple approach acts like an insertion sort, which is extremely slow.

for(job in newjob)
{
    for(row in rows)
    {
        if(row.jobindex < job.index)
        {
            insertBefore(row, makerow(job));
            break;
        }
    }
}

For every new row, its insertion location needs to be found in the table. Sure, we could conceivably make the above into a binary search. Not non-trivial, but still a mass of code.
Even then, we have to insert elements in the middle of a table, which is slow.

After much head scratching, I realized, I don't even need to do this - The job indices in the server are unique and always increasing. The data is also always sorted ( because it ends up in a JSON dictionary sorted by job index ).

If we just ensure that the job data is in the same sort order as the table, we can avoid all the above slowness/complexity.

So for case 1, just appending each row is fine.

For case 3, a new job will always end up right at the top of the table, since a new job will always have an index greater than any job previously seen.

So we can write something like :

if(table.isempty())  // case 1
{
    for(job in newjob)
    {
        row = makerow(job);
        table.append(row);
    }
}
else                // case 3
{
    for(job in newjob)
    {
        row = makerow(job);
        table.prepend(row);
    }
}

A huge amount of unnecessary work in finding where a job fits in the table is avoided and appending, rather than inserting elements into the DOM is way faster too.
This was good for a speedup of more than 2x for case 1.
Case 3 conceivably never requires more than one row insertion - jobs are started one at a time, so the time is negligible.

Deletion also initially was very slow, because the initial code was something like this

for(job in deljobs)
{
    row = findRowWithJobIndex(job.index);
    row.remove();
}

findRowWithJobIndex() used jQuery with the find('[key=value]') selector, which is terribly slow, since it has to be a linear search.

One again, knowing that the deljobs array has descending job indices and so do the table rows, we end up with a much faster delete :

i = 0;
for(row in rows)
{
    if(i < deljobs.length)
    {
        if(row.jobindex == deljobs[i])
        {
            row.remove();
            i++;
        }
    }
}

We are essentially doing only one linear pass on the rows and picking off each one that needs to be removed.

More than 5x speedup when deleting all the rows, and close to 10x on IE 11

Simple fast code, resulting from using knowledge about the data, rather than using a catch-all generic algorithm.

Lesson 4 - Profile

You can only get so far with logic optimization, eventually you have to profile, IE 11 has a very nice profiler, as does Chrome.

I believed the JSON parsing and delta generation code, the HTML template rendering etc. would be very slow, but all that took very little time!

After profiling, the major performance bottlenecks were (surprisingly) :
  • DOM node creation
  • jQuery in loops
  • Adding, removing and testing for class names in DOM elements
Assigning HTML strings to innerHTML is very slow on IE - It stands to reason, the browser has to parse HTML and make it into DOM objects, and pasring fully spec'd HTML is going to be super slow (Chrome does it real fast though!)

The alternative is to create the DOM elements using code like this :

var tr = document.createElement('tr');
var td1 = document.createElement('td');
td1.setAttribute('class', 'job-detail');

var td1a = document.createElement('a');
td1a.setAttribute('class', 'job-link');
td1a.setAttribute('href', '#');

var td2 = document.createElement('td');
td2.setAttribute('class', 'text-center');

var td3 = document.createElement('td');
td3.setAttribute('class', 'text-center');

var td4 = document.createElement('td');
td4.setAttribute('class', 'text-center');

tr.appendChild(td1);
tr.appendChild(td2);
tr.appendChild(td3);
tr.appendChild(td4);
td1.appendChild(td1a);

However, this is just as slow as innerHTML even though a number of synthetic benchmarks show otherwise on the web. But the profiler does not lie.

The answer is that we need to create this node just once, and clone copies of it with cloneNode() when we need more.

In fact doing that means we can possibly keep the template inside the easily tweakable HTML, instead of in the above opaque JS code.
Using this approach improves the node creation performance extremely, leaving appendChild() as the main laggard.

There are hordes of JS performance blogs (including one by John Resig) which hint that great speedup can be had by using createDocumentFragment(), appending many nodes to that, and then appending the fragment to the main DOM.

However on both Chrome and IE 11, this approach has worse performance. Browsers have evolved a lot since the days of IE 8 and we're not in Kansas anymore.
The profiler is the ultimate authority, despite any number of 3rd party benchmarks showing the contrary.

After a lot of experimentation, I found that there is probably no straightforward way to speed up appendChild().
Now it's all the browsers fault!


The second bottleneck is jQuery.

jQuery's $(elem) function does a huge amount of work. Whenever you need to do it more than once, it makes sense to use the raw DOM methods instead, even if it involves more verbose code. jQuery makes for compact neat code, but performance is dismal.

$('.someclass') can be terribly slow - it is much faster to iterate over all the elements and check for the class.
The reasoning is that jQuery (or the corresponding DOM method) is going to iterate over some data structure anyway, and do some work with each element so it can give it back to you.
I am not quite sure, but it seems like this is linear on the number of elements - It could conceivably be a log N operation, but who knows?

Instead you can iterate yourself over the elements using elem.children[] and do the work in one shot if the classname matches. Even more so if you need to test for things other than the classname, like attribute values.

$(elem).find('[key=value]') should be avoided like the plague, if elem has hundreds of child elements

The ideal framework to use for performance is Vannilla.js where possible.


The third bottleneck is dealing with DOM element classnames.

  • $(elem).addClass('someclass') is slow
  • elem.className += ' someclass' is not much faster
  • elem.classList.add('someclass') is the fastest method
  • elem.classList[n] != null is the best way to test for a class name, if the position of that classname is fixed
  • elem.classList.has('classname') is the next best

This was used in the table row selection code, and this, along with some other tweaks mentioned below took the speed of selection from 14 seconds for 900 rows on IE 11 to 0.12 seconds - a 100x speedup!

On Chrome it was not as dramatic, but the times dropped out to below whats measurable accurately, but certainly from 120 ms to about 5 ms.

Lesson 5 - Old tricks always work

There are a few simple habits that you develop when writing high performance code in static languages, these actually spill over even to dynamic languages, with greater effect

  • If you use an expression more than once : make it a variable

For example when you see:

if(row[i].classList.contains('class1'))
{
    row[i].classList.add('class1')
}

You should realize that every [] and . is a potential bottleneck.

var cls = row[i].classList
if(cls.contains('class1'))
{
    cls.add('class1')
}

will be faster

  • Arrays are fast : when in doubt use an array.
  • Cache it if you can : If you have something generated by a large chunk of code, hang on to it for later - its easier to know if an answer changed than recalculate it. Applies very much to dynamic langauges since they are bound by CPU speed rather than memory access speed.
  • Pull invariant code out of loops : The compiler does this for you in static languages. In dynamic languages you need to do it yourself.
  • Avoid dynamic allocation : Hard to tell in a dynamic language when it doesn't happen, but be aware of when it can happen - reuse objects instead of making new ones. Allocation may be very fast, but constrution can be very heavy in a language like JS - This is why jQuery $(elem) is so bad.
  • Optimize the worst case : If the worst case is frequent enough to be noticed, optimize that at the expense of the average or best case. Mike Abrash said the same.
  • Make it look fast, even if it is incurably slow : This is psychology - a console tool that shows progress with a fast \|/-\ spinner appears way faster than something that shows a 0 to 100 progress indicator. Microsoft knows this, which is why progress bars animate as well as move forward.
  • Optimize multiple condition if() statements : When testing for various things with && and ||, try to order the conditions in a way that maximizes the short circuit of the operator.

For example when you have:

if(leastLikely || lessLikely || likely)
{
    ...
}

if(!likely || !lessLikely || !leastLikely)
{
    ...
}

The conditions are evaluated maximum number of times, instead the following :

if(likely ||  lessLikely || leastLikely)
{
    ...
}

if(!leastLikely || !lessLikely || !likely)
{
    ...
}

will be faster, especially if the three conditions are expressions rather than values, and in any case, due to branch prediction.

All these little tweaks shave off a few dozen milliseconds here and there and it adds up. The trick is to make this simple stuff a reflex.


Conclusion

It seems like the effects of basic optimization are amplified highly when using dynamic languages. You can get away with a lot of inefficiency in static languages - the compiler works really hard to offset your stupidity, but in dynamic languages, it gets thrown back in your face.



Saturday, June 21, 2014

Embracing the hate : python, JS, node.js,

For a few months now, I have been working mostly with web based code, migrating nacc.nimbix.net from node.js/coffeescript to django/python, and extending the website.

The abomination that is node.js and the ugliness that is coffeescript needs another post in itself. And the monstrosity that is Bootstrap, I've learned to live with, since I just copied the entire CSS and HTML for the new portal.

The first few weeks I had to add certain features to the existing website, to somehow grok through the coffeescript and understand what does what.

The first stumbling block was the "almost but not quite" javascript syntax of coffeescript. I can understand inventing a new syntax if your programming paradigm is extremely different from existing languages, but a whitespace significant, functional looking style is absolutely mismatched for a very imperative, side effect filled application like a web portal.

The second one was the fact that it's hard to tell in a node application which part executes on the server and which on the client. In making it seem seamless across client and server, you have no idea what's happening at the HTTP level.

All that is history however, since I moved everything to django, but then jade/less came to bite me - all the CSS was in the form of .less and all the HTML in the form of .jade files. I tried many tools to try converting the code, but everything failed. Even the server on which the actual node application ran, I was not able to get the jade/less files convert to HTML/CSS due to various incomprehensible reasons.
Finally I found a windows based tool and managed to batch convert everything.

Now I was all set to tango with django, but after programming in static typed C++ for more than a decade, it seemed like building out of green bamboo after concrete and steel, dealing with JS and python.

The saving grace was that the model is simple and straightforward - you have URLs that map to python functions that render the HTML you need. None of that over-complicated enterprise stuff that you find in high level frameworks.

Database ORM was reasonably reasonable, although getting existing database queries to map to django's model is quite a head scratcher. Thank heavens for stackoverflow!

Intellisense is pathetic for dynamic languages.

I was trained under a regimen of Delphi, Visual Studio and Eclipse and expect code to write 40% of itself. Half the time, if intellisense works right, you don't have to look at the documentation for APIs, otherwise it degenerates into a continuous cycle of looking on google for methods you believe "should" exist and often you may never realize a particular functionality exists.

For e.g. who would think that the method used to remove elements from an array is called splice() in JS? How do you add something to an array, is it push(), append(), add() or what? Intellisense, when it works, tells you what methods an object has. However intellisense is theoretically impossible to do perfectly in a dynamic language.

In some ways JS and python are deviously similar, but it's a trap: 
  • Is it ' '.join['1','2','3'] or is it [1, 2, 3].join(' ') 
  • Is it arr.length or is it len(arr) ?
  • Is it for(i in items) or is it for i in items 
  • Is whitespace significant or not? Surprisingly, in JS whitespace is not significant, however, newlines are significant and a misplaced one can cause syntax errors or even code that doesn't do what it seems to.
It's frustrating and annoying....

Of course some things are cool, like the [:] array slicing syntax in python but then there are numerous annoyances mostly because of the dynamic nature of the language
  • Numbers are floating point in JS, you have to bitwise or it with 0 to get an integer, and this is idiomatic!
  • You misspell a variable being assigned to and it simply refers to a new one. The last time I saw that was in BASIC and a it's a wellspring of errors. Silly mistakes and typos should not require runtime debugging! 
  • Variable scoping is weird in JS
  • Pass arguments by value but pass object arguments by reference - But, strings are objects, yet they act like values. The only thing worse is passing everything by reference like in FORTRAN. 
  • Performance anxiety - you can't really get a sense for how fast something is going to run. It may be super-optimized due to the VM, or it may be horribly inefficient. There's a continuous worry about slowdowns (and it is a valid worry, when we're talking like 5 to 20 times slower than C++). I had no idea that using map() is faster in python than a loop. There would be no valid reason in a static language for something like that - iteration is iteration
The saving grace is that the latest ECMAScript 6 standard brings a lot of progress to JS and makes it possible to use a much more saner subset of the language.

Python 3 promises much but since they break compatibility it looks bleak.

For now I simply follow grumpy cats advice :



Monday, July 23, 2012

Tales from the crypt - Cross linked partition tables

Back in 1999 I happened to own a DX4 100 ( That's a clock tripled 486 processor) based system. I ran a dual boot with Windows 98 and Redhat Linux 6.1 on a small 2 GB hard drive.

The system had 32 MB of RAM and I used about 100 MB of page file for Windows on a separate partition. A huge annoyance was the fact that I needed to also allocate precious space for Linux to swap on. I could have made Linux use the same file as a swap space, but there was some article about how it's better to use a whole partition. Using that partition as swap for Linux would make Windows no longer recognize it as a file system.


So I decided to do something quite dangerous, at the risk of destroying all my data. I decided to create a cross linked partition table.


Using Peter Norton's Diskedit (Wasn't it just amazing how it brought up a graphics mouse pointer while still appearing to be in text mode?) I added a new partition entry, and pointed it to a range that lived within the page file of the Windows installation.


After booting into Linux, I used dd to backup a sector from that new partition (just in case), and then overwrote that with 0s. Reading the page file on the Windows swap partition showed the same block of 0s I had just written.


I added a mkswap and swapon command to the init scripts and and things worked perfectly on Linux.
Windows ran perfectly too, oblivious to the fact that the page file was doing double duty.


A dangerous and crazy trick, but it did the job for a long time, until I got a 4.3 gig drive.