On Efficiency, Scalability, and the Wisdom to Know the Difference 4

Posted by Christopher Smith Wed, 13 Sep 2006 00:11:00 GMT

Joel Spolsky has been on a tear lately. He’s managed to really kick up a lot of dust. I’ve ignored most of the excitement, but I couldn’t ignore his latest post. He seems to have completely confused the differences between efficiency and scalability and has curious notions about the reasons for the importance of either.

Let’s review: efficiency is the ability to get something done while consuming few resources. Efficient code uses less memory, less IO bandwidth, and less CPU time to get the same job done. Scalability is a made up term used in industry to refer to the notion of software being able to handle growing levels of work gracefully, where gracefully generally translates to “costs grow no more than linearly with the size of the work”.

All of Joel’s arguments were about efficiency, not scalability. This is important because particularly for web applications, it is well recognized that non-scalable solutions are really not that useful, so there is hardly any debate about that. Scalability is generally tied to algorithms and the infamous Big O notation, so it’s very hard to point at a programming language or other low-level component and say, “that’s not scalable”. You can find frameworks (sometimes libraries) and identify places where they fail to scale, but Joel declines to do so. His beef is with Ruby, so it is all about efficiency (and actually specifically CPU efficiency), despite all his statements about it being “scalability”.

Efficiency is an interesting point of contention. People tend to make a huge deal about it even when they shouldn’t, and ironically don’t tend to make a huge deal about it for the one reason they should. I’m surprised that Joel fell in to the same trap.

First, I haven’t yet seen an implementation of Ruby that is particularly CPU efficient. I haven’t looked at memory consumption, but I’d be willing to guess it’s not so great there. Like most languages, it’s fine for IO efficiency.

So, right off the bat, if your application’s limiting factor is IO, there isn’t an efficiency disadvantage to using Ruby. Check around, and you’ll find there are a LOT of apps that are essentially IO bound. If your competition is using C and needs 10 servers, you could use even dog-slow Ruby and still only need 10 servers, not 100.

Now, Joel claims you’ll inevitably run in to some place where you are CPU bound. I’ve seen exceptions to this rule, but for the most part he’s right. There is always some performance hot spot that comes up that needs to be optimized. A lot of the time, even that hotspot can be addressed algorithmically, which means that you really don’t care about the language it’s implemented in. In the cases where it ultimately comes down to a language runtime’s CPU efficiency, the faulty logic here is that because this one part of your app can’t be implemented in language X, then you can’t use language X for the rest of your app. That’s just silly. You can always implement that hotspot in some other language, provided your language has some reasonably efficient way to hand off computation to code in another language (which Ruby seems to do reasonably well). If it is the difference between 10 and 100 servers, it is probably worth the development overhead to do it.

Joel also pokes fun at “advocates singing hymns about developer cycles vs. CPU cycles”, which I found surprising as well. Sure, you have small parts of your application where CPU cycles are key, and it’s worth sacrificing developer cycles for that added efficiency, but generally for apps the bulk of your code is much more sensitive to developer efficiency, because developer efficiency translates to “more features that work better”. You can find evidence of this in almost every software paradigm: interpreters in embedded systems, languages like Lua bound to high performance C++ game engines in the gaming industry, web servers written in C calling PHP/Perl/Ruby/VBScript/Python/whatever which in turn invoke functions in highly tuned databases (and it’s worth pointing out that Yahoo and Google use PHP, Python and Java despite scaling their apps to literally thousands of servers), and desktop apps like Word whose core is carefully tuned C/C++ and assembler, but that use languages like Basic to implement a lot of their features. The biggest example is the web browser. Most web apps are implemented in XML and Javascript (neither of which are about to set any efficiency records) that are executed by some very highly tuned browsers. So, there is considerable evidence that while you often need some expertise with a CPU efficient runtime, for almost any problem domain, what Joel calls an “inefficient” runtime is still useful and desirable.

Joel also makes some funny claims about duck typing effecting performance. Sure, it has an effect, but it is hardly the kind of thing that can’t be overcome. Yes, it makes type inferencing harder, but the key word there is harder, not impossible. Lots of folks have demonstrated how you can do really simple things like “hey, if self is of type X when I make this first call, it’s probably of type X when I make subsequent calls, and in fact, I can prove that it is always true until someone loads some more code in to this image”. With a sufficiently clever runtime (which Ruby lacks at this time), you can and should be able to get to the point where you are no worse than half as CPU efficient as C code. Joel’s right on one key point though: Ruby lacks this at this time, and that is a concern, but the concern is one he fails to mention.

Read that last sentence again: after two paragraphs pointing out that efficiency isn’t really that important, now I’m saying it is. Isn’t life full of contradictions?

Efficiency *is* important because it is a fairly reasonable proxy for the maturity of a platform. There’s a funny little factoid about software: there is almost always a way to write code in a way that gives the runtime enough information to execute efficiently. When your runtime doesn’t do this, you have to ask the question: why?

Joel makes the argument that you should be able to get the overhead of a function call down to the level where it’s a single CALL instruction. First, a CALL instruction can be expensive, thanks to the wonder of cache misses. That aside, you can in fact get it down to where the overhead isn’t even a single instruction, thanks the the wonders of inlining. As Herb Sutter pointed out in his article “Inline Redux”, inlining is almost always possible, because there are so many places where you can do it (Java, which Joel suggests has poor performance, has runtimes that inline far more aggressively than most C/C++ runtimes). As I mentioned above you can do tricks with type inferencing that get around the performance costs of late binding, except for when you are actually taking advantage of late binding’s benefits (in which case, as per Greenspun’s 10th law, the late binding runtime probably performs better than most attempts to get equivalent capabilities using C/C++). The same can be said for automatic heap management through all kinds of tricks. You can get message dispatch or generic dispatch to perform like function dispatch for the cases where you only need the simpler functionality of the latter. Zero overhead bounds checking can be done by code analysis or in the worst case using page faults. Really, the list of performance optimizations available tends to trump the best efforts of language designers to make things slow. ;-)

So that brings us back to the key question: if your runtime isn’t that efficient, why? The answer is that nobody has put that kind of effort in to making it that efficient. It just hasn’t been worth the effort yet, and that strongly suggests that the platform just isn’t that mature yet. If it were, that’d be one of the things that would have been addressed along with integration with legacy systems, sophisticated development and debugging tools, dealing with corner cases that could break the runtime, building out a complete set of support libraries, integration with various platforms and technologies, etc., etc. The bottom line is that if efficiency hasn’t been tackled to the point where you are within being about half as efficient as the ideal solution, some of those things haven’t been addressed. While efficiency might not matter to you, at least one of those other things probably will. Lack of CPU efficiency should be treated as a strong indicator that some other shortcoming that really matters to you might exist.

Now there are important exceptions to this to consider, in particular there are languages like Erlang, where the whole point is to deal with one very difficult domain efficiently (distributed computing/parallelism), and they actively encourage you to use another language (C) for more “regular” tasks. Even in those cases, you can expect that Erlang will show some lack of maturity if you try to use it for “regular” tasks, but you’re probably more than happy to use it for what it’s good at, and use something else for the rest.

Now, a runtime can be lacking in CPU efficiency in your application domain and still be useful for you. It could be your problem domain is all about other efficiencies, like IO or memory, and the runtime is great for that stuff (indeed, CPU and memory trade offs in particular create cases where it really only matters if a runtime can be memory efficient). You might be at a startup where the maturity of your platform just isn’t as important as your ability to get something up and running before the company runs out of capital, and some degree of risk due to platform immaturity is acceptible. You might be working on a problem that is so complex, and your resources are so constrained, that just getting something that works is such a victory that nobody cares about how efficient it is, how well it integrates with other technologies, etc. Fine, but most people benefit from the advantages of working with a mature platform, and as such, efficiency is a very good proxy for the far more difficult to quantify property of maturity.

So yes, I’d agree that efficiency is important, but in none of the ways that Joel suggests.

Trackbacks

Use the following link to trackback from your own site:
http://xblog.xman.org/articles/trackback/21

  1. Last night I experienced my fifteen minutes of ‘net fame. I had submitted my article rebutting one of Joel Spolsky’s comments on Ruby’s performance to both reddit and digg. I watched how my submis...
Comments

Leave a response

  1. mike bayer about 16 hours later:

    the most common limiting factor ive seen for language interpreters/VMs (in my experience java and perl) is the memory consumption required to host a request-bearing thread or process. if you’re running mod_perl in a forking environment and not preloading modules in the parent process, a big OO-oriented app can have individual child processes of 100M. that right there will prevent a reasonable number of concurrent requests to succeed, as soon as memory spills over onto disk and you thrash yourself to death (i wouldnt count that as “IO bound” even though the slowness of the disk is involved).

    as ruby has no support for threads yet (or some magical erlang message-passing scheme, since apparently threads are so yesterday…), this is a big concern.

    the most common CPU-intensive place to crash in my experience is in generating DOMs, for the purpose of spitting out some form of XSL-generated content (this is a slowness ive seen in both platforms). by lengthening the time it takes for a request to succeed, more requests pile up, more memory gets used, and then you fall into thrash-land again. ive wrestled with that issue quite a bit.

    for applications that rely upon DOM operations, its a very tall order to insist that all the DOM-related areas of an application be written in C (and DHH’s notion that you’d fork a new process in order to do something fast is just nutso).

    an IO-bound performance problem is what I’ve seen the least of. anytime ive seen a site go down, the running application is feeling it/crashing it.

    ill admit Joel was grabbing at straws with the duck typing thing (that is, if we take him to mean that “duck typing is hopelessly slow”, and not “ruby’s implementation of duck typing is slow”).

  2. DukunSakti about 16 hours later:

    Excellent points, and great read!

  3. Christopher Smith about 18 hours later:

    mike baye wrote:

    the most common limiting factor ive seen for language interpreters/VMs (in my experience java and perl) is the memory consumption required to host a request-bearing thread or process

    First, this wasn’t an interpreter/VM vs. compiled code issue, but let’s look at what you’re saying. I’d warn you to be cautious about how you determine how much memory a process is really using. Java VM’s in particular tend to lead people in the wrong way (indeed, Merlin made some changes that really only effected how the OS accounted for memory, rather than changes in how much memory it used, and it had a huge impact on how much memory it looked like Java was using).

    Forget for a second what you’ve observed, and think about the principles involved. What is the actual memory overhead of an interpreter/VM? Well, the one unavoidable bit is the memory taken up by the interpreter and/or JIT compiler. Statically compiled code needn’t even have its compiler hosted on the machine. So, how much memory is the compiler/interpreter going to take up? Well, a source (as opposed to byte-code) parser will take up a chunk of space particularly for a complex grammar and it’ll be generating parse trees too, the interpreter itself shouldn’t take up much memory at all (indeed, interpreters can often be a good CPU/memory trade off if CPU cycles are free), but a good JIT is going to have quite a bit of code in it. During compilation a JIT will build up all kinds of data structures, and it’ll probably add some instrumentation to code as well. So all that chews up some memory. It’s probably not too hard to see it chewing up some 20MB in fixed overhead and some fraction of overhead for instrumentation.

    All of that shouldn’t increase as the load increases though. In fact, almost all of it can be shared with any other threads/processes running the same code. Now, a lot of runtimes fail to be this clever , but doesn’t mean it always has to be so.

    As for your concerns about Ruby not having threads yet (and yeah, you can do Erlang style message passing in Ruby fairly easily), don’t get too excited. First, I think threads very much do have their advantages and their place, but let’s look critically at what the overhead of a process is as compared to a thread. There are basically three key bits: page tables, shared heap, and IPC overhead. The page table overhead is non-negligible, but still fairly small, and there are tricks a dynamic runtime can do to minimize it (like judiciously using large pages and sharing common bits with other processes). The shared heap means it can be easier to share data structures instead of having to copy them, so that can save you some memory, but threaded apps tend to go to a lot of trouble to minimize sharing anyway, so the win is normally not huge. Finally, threads generally can take short cuts when communicating with other threads in the same process, and that can reduce overhead. On good quality OS’s, this overhead should be very minimal, particularly if the app uses shared memory (which admittedly ads complexity) for IPC.

    So, the performance boost from using threads as opposed to processes should be a fractional multiplier rather than some integer multiplier, unless you are doing something fairly unique. Certainly, compared to the overhead of the current Ruby interpreter, it’s of trivial concern.

    As for your DOM/XSL example… most XSL implementations tend to be interpreters in their own right. If you are doing XSL transformations on a DOM, there just aren’t too many fast ways to do it. While DOM parsers themselves can be fairly compact and efficient, but the DOM’s that come from XML docs are often quite huge. The best strategy for this that I’ve seen is to use an XML-database, and even those don’t seem to perform that well. XML/XSL impose fairly severe overhead, and it doesn’t seem to matter much what runtime you use.

    The problem of requests queuing up is an age old scalability problem. If you solve it right, the extra memory consumption of a pending request should be trivial, and having requests pile up should actually improve your thoroughput (at the expense of latency).

    As for the notion of having DOM operations written in C, I fail to see the problems with that. XML databases do that, on Windows it’s unusual for apps NOT to use the COM or .NET interface to Microsoft’s highly tuned XML parser and XSL engine. PHP folks do this all the time with great success. In general, the only times I’ve seen this not done in a performance sensitive environment is when the language-of-choice had an native support for XSL/XML whose performance was comperable to the alternatives.

    Forking might seem like “nutso”, but that’s really only true for fine-grained operations. If you have got a task that’s going to take a few seconds, the overhead of forking isn’t even going to show up as a blip on your profiler’s radar.

  4. Che Fai 6 months later:

    Great stuff! And HAML support would be awesome…

Comments