User Mode Linux performance tuning

Posted by Christopher Smith Sat, 04 Aug 2007 06:16:00 GMT

I haven’t posted for a while, partly because I’ve been on vacation, but particularly because the UML instance I’m using has been behaving… poorly. Every time I post an article it seems to hang for quite some time (minutes, not seconds). I’m starting to get some ideas about what my problems might be.

So, it has taken me a while, but a few things have become apparent to me. First, and most likely the biggest contributor, is that I’m running a fairly new host glibc with a fairly old host kernel and then on top of that I’ve got a fairly new UML kernel. In particular, I suspect this is running in to problems with various bugs UML has had with NPTL.

On top of this, I realized I’ve had my UML kernels defaulting to using completely fair queuing, which is probably counter productive. It’s probably more efficient to get rid of queuing altogether and let things go right to the host, otherwise you get all kinds of ugliness from having two levels of IO scheduling going on.

Anyway, I’ve changed my IO scheduling, and hopefully this weekend I’ll be able to put together a new set of host/UML kernels built on 2.6.20 or maybe even newer (if I dare).

[Crossing fingers]

Article on Scalability

Posted by Christopher Smith Fri, 08 Dec 2006 02:34:00 GMT

Ran in to this nice little article from an eBay guy on scalability. I like a lot of the points raised, and it takes me back to my days as a consultant, particularly the first part where you ask questions like “4X more scalable in what way?”

In fact, in a lot of ways if someone says “4x more scalable” you generally know they are full of it. What they generally mean is “increased capacity 4x” (which leads to the follow up question about what kind of capacity was actually increased), but since they aren’t using those words, you have to think they are playing buzzword bingo, which is never a good sign.

Occasionally they may actually mean 4x more scalable, but generally changes of scalability are spoken about in big O notation, but it is possible that you might improve scalability by modifying a constant factor. Not as impressive but it can still save you a ton of money.

My one critique of this article is that it gets it’s terms wrong. It’s bad enough that the industry made up a term like “scalability”, but when we extend it to mean other things it makes it worse. The article describes “TPS per system” as one form of scalability improvement. I’m sorry, that’s improving efficiency. I’ll agree people often say “scalability” when they mean “efficiency”, but there is no need to compound the problem by saying so oneself.

In general, this article is more focused on capacity than scalability (which is why working on efficiency helps to improve things). Capacity is often the more pressing concern, but to a large degree an architecture design with an eye towards scalability will help you to worry a lot less about capacity issues.

Postgresql makes MySQL cry like a baby

Posted by Christopher Smith Mon, 04 Dec 2006 03:51:00 GMT

I stumbled across this blog entry showing just how much better PostgreSQL performs than MySQL.

I know, the benchmarks are undoubtedly flawed in several ways, and this is just one arbitrary kind of benchmark, with MySQL performing better in other contexts. I don’t care. I’m partisan on this one, and anything that shows that PostgreSQL is better has to be divine truth, and points like the above will be reserved for cases where MySQL benchmarks better than PostgreSQL. ;-)

On Efficiency, Scalability, and the Wisdom to Know the Difference 4

Posted by Christopher Smith Wed, 13 Sep 2006 00:11:00 GMT

Joel Spolsky has been on a tear lately. He’s managed to really kick up a lot of dust. I’ve ignored most of the excitement, but I couldn’t ignore his latest post. He seems to have completely confused the differences between efficiency and scalability and has curious notions about the reasons for the importance of either.

Let’s review: efficiency is the ability to get something done while consuming few resources. Efficient code uses less memory, less IO bandwidth, and less CPU time to get the same job done. Scalability is a made up term used in industry to refer to the notion of software being able to handle growing levels of work gracefully, where gracefully generally translates to “costs grow no more than linearly with the size of the work”.

All of Joel’s arguments were about efficiency, not scalability. This is important because particularly for web applications, it is well recognized that non-scalable solutions are really not that useful, so there is hardly any debate about that. Scalability is generally tied to algorithms and the infamous Big O notation, so it’s very hard to point at a programming language or other low-level component and say, “that’s not scalable”. You can find frameworks (sometimes libraries) and identify places where they fail to scale, but Joel declines to do so. His beef is with Ruby, so it is all about efficiency (and actually specifically CPU efficiency), despite all his statements about it being “scalability”.

Efficiency is an interesting point of contention. People tend to make a huge deal about it even when they shouldn’t, and ironically don’t tend to make a huge deal about it for the one reason they should. I’m surprised that Joel fell in to the same trap.

First, I haven’t yet seen an implementation of Ruby that is particularly CPU efficient. I haven’t looked at memory consumption, but I’d be willing to guess it’s not so great there. Like most languages, it’s fine for IO efficiency.

So, right off the bat, if your application’s limiting factor is IO, there isn’t an efficiency disadvantage to using Ruby. Check around, and you’ll find there are a LOT of apps that are essentially IO bound. If your competition is using C and needs 10 servers, you could use even dog-slow Ruby and still only need 10 servers, not 100.

Now, Joel claims you’ll inevitably run in to some place where you are CPU bound. I’ve seen exceptions to this rule, but for the most part he’s right. There is always some performance hot spot that comes up that needs to be optimized. A lot of the time, even that hotspot can be addressed algorithmically, which means that you really don’t care about the language it’s implemented in. In the cases where it ultimately comes down to a language runtime’s CPU efficiency, the faulty logic here is that because this one part of your app can’t be implemented in language X, then you can’t use language X for the rest of your app. That’s just silly. You can always implement that hotspot in some other language, provided your language has some reasonably efficient way to hand off computation to code in another language (which Ruby seems to do reasonably well). If it is the difference between 10 and 100 servers, it is probably worth the development overhead to do it.

Joel also pokes fun at “advocates singing hymns about developer cycles vs. CPU cycles”, which I found surprising as well. Sure, you have small parts of your application where CPU cycles are key, and it’s worth sacrificing developer cycles for that added efficiency, but generally for apps the bulk of your code is much more sensitive to developer efficiency, because developer efficiency translates to “more features that work better”. You can find evidence of this in almost every software paradigm: interpreters in embedded systems, languages like Lua bound to high performance C++ game engines in the gaming industry, web servers written in C calling PHP/Perl/Ruby/VBScript/Python/whatever which in turn invoke functions in highly tuned databases (and it’s worth pointing out that Yahoo and Google use PHP, Python and Java despite scaling their apps to literally thousands of servers), and desktop apps like Word whose core is carefully tuned C/C++ and assembler, but that use languages like Basic to implement a lot of their features. The biggest example is the web browser. Most web apps are implemented in XML and Javascript (neither of which are about to set any efficiency records) that are executed by some very highly tuned browsers. So, there is considerable evidence that while you often need some expertise with a CPU efficient runtime, for almost any problem domain, what Joel calls an “inefficient” runtime is still useful and desirable.

Joel also makes some funny claims about duck typing effecting performance. Sure, it has an effect, but it is hardly the kind of thing that can’t be overcome. Yes, it makes type inferencing harder, but the key word there is harder, not impossible. Lots of folks have demonstrated how you can do really simple things like “hey, if self is of type X when I make this first call, it’s probably of type X when I make subsequent calls, and in fact, I can prove that it is always true until someone loads some more code in to this image”. With a sufficiently clever runtime (which Ruby lacks at this time), you can and should be able to get to the point where you are no worse than half as CPU efficient as C code. Joel’s right on one key point though: Ruby lacks this at this time, and that is a concern, but the concern is one he fails to mention.

Read that last sentence again: after two paragraphs pointing out that efficiency isn’t really that important, now I’m saying it is. Isn’t life full of contradictions?

Efficiency *is* important because it is a fairly reasonable proxy for the maturity of a platform. There’s a funny little factoid about software: there is almost always a way to write code in a way that gives the runtime enough information to execute efficiently. When your runtime doesn’t do this, you have to ask the question: why?

Joel makes the argument that you should be able to get the overhead of a function call down to the level where it’s a single CALL instruction. First, a CALL instruction can be expensive, thanks to the wonder of cache misses. That aside, you can in fact get it down to where the overhead isn’t even a single instruction, thanks the the wonders of inlining. As Herb Sutter pointed out in his article “Inline Redux”, inlining is almost always possible, because there are so many places where you can do it (Java, which Joel suggests has poor performance, has runtimes that inline far more aggressively than most C/C++ runtimes). As I mentioned above you can do tricks with type inferencing that get around the performance costs of late binding, except for when you are actually taking advantage of late binding’s benefits (in which case, as per Greenspun’s 10th law, the late binding runtime probably performs better than most attempts to get equivalent capabilities using C/C++). The same can be said for automatic heap management through all kinds of tricks. You can get message dispatch or generic dispatch to perform like function dispatch for the cases where you only need the simpler functionality of the latter. Zero overhead bounds checking can be done by code analysis or in the worst case using page faults. Really, the list of performance optimizations available tends to trump the best efforts of language designers to make things slow. ;-)

So that brings us back to the key question: if your runtime isn’t that efficient, why? The answer is that nobody has put that kind of effort in to making it that efficient. It just hasn’t been worth the effort yet, and that strongly suggests that the platform just isn’t that mature yet. If it were, that’d be one of the things that would have been addressed along with integration with legacy systems, sophisticated development and debugging tools, dealing with corner cases that could break the runtime, building out a complete set of support libraries, integration with various platforms and technologies, etc., etc. The bottom line is that if efficiency hasn’t been tackled to the point where you are within being about half as efficient as the ideal solution, some of those things haven’t been addressed. While efficiency might not matter to you, at least one of those other things probably will. Lack of CPU efficiency should be treated as a strong indicator that some other shortcoming that really matters to you might exist.

Now there are important exceptions to this to consider, in particular there are languages like Erlang, where the whole point is to deal with one very difficult domain efficiently (distributed computing/parallelism), and they actively encourage you to use another language (C) for more “regular” tasks. Even in those cases, you can expect that Erlang will show some lack of maturity if you try to use it for “regular” tasks, but you’re probably more than happy to use it for what it’s good at, and use something else for the rest.

Now, a runtime can be lacking in CPU efficiency in your application domain and still be useful for you. It could be your problem domain is all about other efficiencies, like IO or memory, and the runtime is great for that stuff (indeed, CPU and memory trade offs in particular create cases where it really only matters if a runtime can be memory efficient). You might be at a startup where the maturity of your platform just isn’t as important as your ability to get something up and running before the company runs out of capital, and some degree of risk due to platform immaturity is acceptible. You might be working on a problem that is so complex, and your resources are so constrained, that just getting something that works is such a victory that nobody cares about how efficient it is, how well it integrates with other technologies, etc. Fine, but most people benefit from the advantages of working with a mature platform, and as such, efficiency is a very good proxy for the far more difficult to quantify property of maturity.

So yes, I’d agree that efficiency is important, but in none of the ways that Joel suggests.