On Oct 9 2013, I did a lecture for @headinthebox's functional programming course at TU Delft. Similarly to last year, the topic was how to draw the Utah teapot using only right triangles. However, this year the challenge was not to implement the algorithm, but rather to optimize a shared implementation. I had written an implementation in raw PHP and the students would have to improve it using new language or algorithmic features found in the Facebook implementation of PHP (the HipHop VM). Extra points would be given if the output was in HTML, which was the point of the original experiment done by Brian Beckman and Erik Meijer.

In my original implementation, I tried to follow a functional style. However, this was not entirely possible in PHP: while the language does support passing functions as arguments, its imperative roots cannot be hidden. Several functions modify arguments in place (e.g. usort) while others use function arguments to return more than one values (e.g. preg_match). Moreover, many of the higher order functions have varying calling and return conventions (e.g. array_map and array_reduce take the processed array as last and first arguments, respectively), some of the common ones are missing (e.g. array_flat_map or at least array_flatten), while the dark corners of others have been documented elsewhere. The combination of the above makes writing beautiful code in PHP very hard, if not impossible. As a further testament to this statement, see the code in Scala I based my solution upon and compare it with PHP.

Nevertheless, several students took up the challenge to make the code run faster. On my computer (1.8GHz MacbookAir), the code was already running 12 times faster on HipHopVM when compared to Zend PHP (2.8 vs 32.5 sec, respectively). Then, the purpose was to identify and grab the low hanging optimization fruit, and I believed that my code contained a lot. However, this was not the case. Here are some things that the students did, which only delivered marginal improvements (~5%):

  • Removed the class Point as it is merely a data type with no operations and make the Triangle class constructor accept 6 arguments, where the co-ordinates for its point where passed. In theory, this would help remove excessive object field access.
  • Make the internal functions comp1 and comp2, which are re-declared on every call to select_crossline(), static and declared them only once.
  • Memoized the results of the {max,min}_{x,y} functions and even pre-calculated their results on Triangle construction.

The winners did something more clever. Instead of going through the low hanging fruit, they estimated the number of invocations of certain functions by hand (akin manual profiling) and decided to focus on the function that is called the most: split_triangles(). What they noticed is that array_flatten is called recursively thousands of times, and that its implementation is bad performance-wise in every respect. This was in fact my fault: as array_flatten is not an internal PHP function, I looked up for an implementation on the Internet and copied it verbatim in my code without thinking too much. So they replaced the implementation of split_triangles with one that obliterates the need for array flattening. This brought a massive performance win in the order of x2.

After the exercise, we also had a brief discussion about what we learned from it. Here are some remarks the students made:

  • Measure before optimising: What the winners did was to invest time in understanding the program flow and attack the hottest of hotspots first. This strategy paid out quite well. On real software, one would use a profiler for that job.

  • Massive wins do not stem from grabbing the low hanging fruit, but from breaking changes in the code's logic. Optimizing compilers, HHVM in that case, already do a good job at optimizing away small inefficiencies.

  • Trust language implementations: It is usually faster to write a non-pretty version of an algorithm using only functions in a language's core library rather than copy-pasting nice implementations from the Internet. Core libraries use optimized versions of algorithms for handling common cases, while their performance issues have been ironed out. This is especially true for interpreted languages, where it is common to have internal function implementations written in hand-optimized C.

  • HHVM can certainly be improved (and AFAIK it will). For example, currently, type information is not taken into account when doing optimization rounds. In our case, I suspect that type information would help a lot at optimizing the types used in numerical calculations, which is at the core of this exercise. Coming from the JVM world, I was also curious as to how HHVM handles memory management and garbage collection, but I could not find any information.

Congratulations to the winners, a T-shirt sporting the logo of a big social networking company will be waiting for you next week!

Here are my slides (some of which are actually @headinthebox's):



Published

09 October 2013

Tags


Sharing

Submit to HN