Greg Stein gets mugged

Posted by Christopher Smith Tue, 28 Aug 2007 21:06:00 GMT

I’m just trying to spread word about this. Really, from my selfish perspective I am best off if he gets back to work ASAP, but spreading the word seems like the right thing to do.

Hell Freezing Over

Posted by Christopher Smith Mon, 13 Nov 2006 17:15:00 GMT

Now that the election is over and I’m starting to feel healthier again, I thought I should mention two events in the programming world that I think are very significant. They both involve two incredibly popular but often misunderstood and vilified programming languages who are forever associated with each other and the birth of the web.

The older news story is the Tamarin project, which is finally going to bring JIT technology to the world of Javascript in a big way (technically Tamarin was already in Acrobat’s Javascript engine, but let’s face it, most Javascript isn’t written for that platform). Word to the ignorant: up until now, Javascript has been painfully slow. Back when I was in the lab someone wrote a spring physics based graph-layout engine in Javascript. They were having problems with the algorithm, and being kind of weak in Javascript, I rewrote the thing in Java. After we got all the kinks out of both versions, the Java version appeared to be an easy 100x faster (possibly 1000x). Seriously.

Since then, I’ve looked a lot more closely at Javascript and discovered it’s not nearly as bad as I’d perceived it to be. It’s kind of a LISP-y language, with some handy little scriptisms in it, and SpiderMonkey makes it fairly easy to embed in C programs (someone needs to make a C++ interface that is like Boost’s Python library so I can say the same thing about C++).

The other big news is that today, Java is getting GPL’d. Specifically the HotSpot VM is being GPL’d along with Javac, with the standard libraries to follow (the VM is far more interesting as Java’s standard libraries tend to be only marginally better than what GNU Classpath has to offer). It will be interesting to see what comes of this, but the optimist in me hopes this will finally make Java more acceptable to the broader open source community and will lead to Java working better on more platforms. Myself, I want to see just how quickly I can hack a more async I/O style interface for Java’s IO libraries (yes, one could do this with JNI before, but you can do more interesting things with access to the full runtime). It’d also be fun to try to hack in an execution model that allows processes to share more of the Java VM’s resources, although that will undoubtedly take more time. I suspect more than a few people will be looking at ways add support for explicit disabling of bounds checking. The really most awesome hack to the Java VM has to be adding support for efficient unsigned arithmetic. That requires new byte codes, but man would it be wonderful.

The irony with both of these milestones is that in some ways they are both non-news events… except for how long it has taken for them to come around. Given Javascript’s popularity, you had to think that sooner or later JIT’s or some equivalent would become the norm. Instead, Javascript’s lackadaisical performance has helped relegate the language to “least common denominator status”, although AJAX allows you to offload work on to the only resource likely to be slower than your local Javascript engine: the web server. ;-)

As for Java becoming open sourced… Sun has been playing hot-and-cold on this one seemingly almost since the language was invented (a prize for whoever finds the oldest comment from a Sun executive which suggests that Java might in any way become open sourced). Based on what’s been in the press, this would have been a non-news event five or six years ago.

I have to say, I worked at Sun back when it was still getting sales by being the “dot in .com”, and one of the primary reasons I left the company was that it was clear to me that internally they just didn’t grok what was happening with open source and why what they absolutely needed to address this on the Solaris side of things in some fashion (either by the OpenSolaris route they chose or by porting Solaris tech to the Linux kernel and abandoning Solaris altogether) and simply make Java open source ASAP (while I thought what they did with openoffice was great news for the open source world, it didn’t strike me as indicative of a larger understanding). Gosling’s history with the free software movement didn’t exactly give one confidence that this was going to change either.

It’s good to see them coming around on this. I’m increasingly running out of reasons not to have a Solaris box at home. I’m about ready to look at the OpenSolaris hardware compatibility list and see just how cheap a box I can put together (I just don’t have enough RAM on my desktop to do Solaris justice with VMware).

The Dialog for the Matrix's Architect Doing Something Useful

Posted by Christopher Smith Tue, 19 Sep 2006 17:55:00 GMT

I’m really starting to like discipline and punish. In particular I enjoyed reading this article on DSL’s and the various language debates being bandied about. Just some very stylized and thought provoking writing.

On Efficiency, Scalability, and the Wisdom to Know the Difference 4

Posted by Christopher Smith Wed, 13 Sep 2006 00:11:00 GMT

Joel Spolsky has been on a tear lately. He’s managed to really kick up a lot of dust. I’ve ignored most of the excitement, but I couldn’t ignore his latest post. He seems to have completely confused the differences between efficiency and scalability and has curious notions about the reasons for the importance of either.

Let’s review: efficiency is the ability to get something done while consuming few resources. Efficient code uses less memory, less IO bandwidth, and less CPU time to get the same job done. Scalability is a made up term used in industry to refer to the notion of software being able to handle growing levels of work gracefully, where gracefully generally translates to “costs grow no more than linearly with the size of the work”.

All of Joel’s arguments were about efficiency, not scalability. This is important because particularly for web applications, it is well recognized that non-scalable solutions are really not that useful, so there is hardly any debate about that. Scalability is generally tied to algorithms and the infamous Big O notation, so it’s very hard to point at a programming language or other low-level component and say, “that’s not scalable”. You can find frameworks (sometimes libraries) and identify places where they fail to scale, but Joel declines to do so. His beef is with Ruby, so it is all about efficiency (and actually specifically CPU efficiency), despite all his statements about it being “scalability”.

Efficiency is an interesting point of contention. People tend to make a huge deal about it even when they shouldn’t, and ironically don’t tend to make a huge deal about it for the one reason they should. I’m surprised that Joel fell in to the same trap.

First, I haven’t yet seen an implementation of Ruby that is particularly CPU efficient. I haven’t looked at memory consumption, but I’d be willing to guess it’s not so great there. Like most languages, it’s fine for IO efficiency.

So, right off the bat, if your application’s limiting factor is IO, there isn’t an efficiency disadvantage to using Ruby. Check around, and you’ll find there are a LOT of apps that are essentially IO bound. If your competition is using C and needs 10 servers, you could use even dog-slow Ruby and still only need 10 servers, not 100.

Now, Joel claims you’ll inevitably run in to some place where you are CPU bound. I’ve seen exceptions to this rule, but for the most part he’s right. There is always some performance hot spot that comes up that needs to be optimized. A lot of the time, even that hotspot can be addressed algorithmically, which means that you really don’t care about the language it’s implemented in. In the cases where it ultimately comes down to a language runtime’s CPU efficiency, the faulty logic here is that because this one part of your app can’t be implemented in language X, then you can’t use language X for the rest of your app. That’s just silly. You can always implement that hotspot in some other language, provided your language has some reasonably efficient way to hand off computation to code in another language (which Ruby seems to do reasonably well). If it is the difference between 10 and 100 servers, it is probably worth the development overhead to do it.

Joel also pokes fun at “advocates singing hymns about developer cycles vs. CPU cycles”, which I found surprising as well. Sure, you have small parts of your application where CPU cycles are key, and it’s worth sacrificing developer cycles for that added efficiency, but generally for apps the bulk of your code is much more sensitive to developer efficiency, because developer efficiency translates to “more features that work better”. You can find evidence of this in almost every software paradigm: interpreters in embedded systems, languages like Lua bound to high performance C++ game engines in the gaming industry, web servers written in C calling PHP/Perl/Ruby/VBScript/Python/whatever which in turn invoke functions in highly tuned databases (and it’s worth pointing out that Yahoo and Google use PHP, Python and Java despite scaling their apps to literally thousands of servers), and desktop apps like Word whose core is carefully tuned C/C++ and assembler, but that use languages like Basic to implement a lot of their features. The biggest example is the web browser. Most web apps are implemented in XML and Javascript (neither of which are about to set any efficiency records) that are executed by some very highly tuned browsers. So, there is considerable evidence that while you often need some expertise with a CPU efficient runtime, for almost any problem domain, what Joel calls an “inefficient” runtime is still useful and desirable.

Joel also makes some funny claims about duck typing effecting performance. Sure, it has an effect, but it is hardly the kind of thing that can’t be overcome. Yes, it makes type inferencing harder, but the key word there is harder, not impossible. Lots of folks have demonstrated how you can do really simple things like “hey, if self is of type X when I make this first call, it’s probably of type X when I make subsequent calls, and in fact, I can prove that it is always true until someone loads some more code in to this image”. With a sufficiently clever runtime (which Ruby lacks at this time), you can and should be able to get to the point where you are no worse than half as CPU efficient as C code. Joel’s right on one key point though: Ruby lacks this at this time, and that is a concern, but the concern is one he fails to mention.

Read that last sentence again: after two paragraphs pointing out that efficiency isn’t really that important, now I’m saying it is. Isn’t life full of contradictions?

Efficiency *is* important because it is a fairly reasonable proxy for the maturity of a platform. There’s a funny little factoid about software: there is almost always a way to write code in a way that gives the runtime enough information to execute efficiently. When your runtime doesn’t do this, you have to ask the question: why?

Joel makes the argument that you should be able to get the overhead of a function call down to the level where it’s a single CALL instruction. First, a CALL instruction can be expensive, thanks to the wonder of cache misses. That aside, you can in fact get it down to where the overhead isn’t even a single instruction, thanks the the wonders of inlining. As Herb Sutter pointed out in his article “Inline Redux”, inlining is almost always possible, because there are so many places where you can do it (Java, which Joel suggests has poor performance, has runtimes that inline far more aggressively than most C/C++ runtimes). As I mentioned above you can do tricks with type inferencing that get around the performance costs of late binding, except for when you are actually taking advantage of late binding’s benefits (in which case, as per Greenspun’s 10th law, the late binding runtime probably performs better than most attempts to get equivalent capabilities using C/C++). The same can be said for automatic heap management through all kinds of tricks. You can get message dispatch or generic dispatch to perform like function dispatch for the cases where you only need the simpler functionality of the latter. Zero overhead bounds checking can be done by code analysis or in the worst case using page faults. Really, the list of performance optimizations available tends to trump the best efforts of language designers to make things slow. ;-)

So that brings us back to the key question: if your runtime isn’t that efficient, why? The answer is that nobody has put that kind of effort in to making it that efficient. It just hasn’t been worth the effort yet, and that strongly suggests that the platform just isn’t that mature yet. If it were, that’d be one of the things that would have been addressed along with integration with legacy systems, sophisticated development and debugging tools, dealing with corner cases that could break the runtime, building out a complete set of support libraries, integration with various platforms and technologies, etc., etc. The bottom line is that if efficiency hasn’t been tackled to the point where you are within being about half as efficient as the ideal solution, some of those things haven’t been addressed. While efficiency might not matter to you, at least one of those other things probably will. Lack of CPU efficiency should be treated as a strong indicator that some other shortcoming that really matters to you might exist.

Now there are important exceptions to this to consider, in particular there are languages like Erlang, where the whole point is to deal with one very difficult domain efficiently (distributed computing/parallelism), and they actively encourage you to use another language (C) for more “regular” tasks. Even in those cases, you can expect that Erlang will show some lack of maturity if you try to use it for “regular” tasks, but you’re probably more than happy to use it for what it’s good at, and use something else for the rest.

Now, a runtime can be lacking in CPU efficiency in your application domain and still be useful for you. It could be your problem domain is all about other efficiencies, like IO or memory, and the runtime is great for that stuff (indeed, CPU and memory trade offs in particular create cases where it really only matters if a runtime can be memory efficient). You might be at a startup where the maturity of your platform just isn’t as important as your ability to get something up and running before the company runs out of capital, and some degree of risk due to platform immaturity is acceptible. You might be working on a problem that is so complex, and your resources are so constrained, that just getting something that works is such a victory that nobody cares about how efficient it is, how well it integrates with other technologies, etc. Fine, but most people benefit from the advantages of working with a mature platform, and as such, efficiency is a very good proxy for the far more difficult to quantify property of maturity.

So yes, I’d agree that efficiency is important, but in none of the ways that Joel suggests.