Ruby on... Gemstone? 2
Really, when you think about it, how can a company called Gemstone NOT get involved with a language called Ruby. So, Gemstone, of Gemstone and GLASS fame, have apparently decided to get the traditionally lackadaisical Ruby runtime running on their VM. From the first time I dabbled with Ruby it seemed like “file-based Smalltalk with some ugly Perl-isms and a crappy VM” (and yes, in fairness, the ugly Perl-isms are also part of its strength), so this makes a lot of sense, and may yet drag Ruby in to the real world. Gemstone gets bonus points for providing yet another example of confusing efficiency with scalability.
BTW: Mike came up with a great acronym for Gemstone to use: GLARE: “Gemstone Linux Apache and Ruby Emulation”.
UPDATE: Avi caught me red handed for not reading the entire interview. Upon further reading of the interview and Avi’s excellent blog posting comparing Gemstone to Rails, it appears the Gemstone folks are very much talking about scalability as opposed to efficiency. In fact, it seems they are expecting the primary advantage of MagLev to be through Gemstone’s persistence architecture (here’s hoping it is also a lot more efficient).
Ruby Going Off The Rails And In To A Deep, Gaping Chasm
Rants can be fun. There is the garden variety rant where someone just blows off some steam. Then there is the next level up where you try to turn indignation (real or otherwise) in to an art form. However, the truly great rants are the built upon white hot anger that’s been quietly simmering, with new spices of hatred periodically added to taste, all unleashed in a vengeful strike of unfiltered fury so over the top it crosses over in to self parody.
Honestly, I can’t say that reading it is particularly enlightening, but it is quite entertaining in a “damn that is the most awesome accident I’ve ever seen” kind of way.
Ruby on Rails Memory Efficiency 7
Okay, I said I suspected that Ruby wasn’t terribly memory efficient yesterday. Today I have some strong indications that at least Typo isn’t quite so memory efficient.
I had noticed that my blog was getting quite slow and handling updates, and it wasn’t clear to me why. I am particularly sensitive to efficiency issues, because I run this whole thing inside a User Mode Linux instance, which imposes it’s own inefficiencies on pretty much all aspects of the Linux kernel. So I generally have to be fairly careful to tune everything I run inside it to minimize kernel involvement. I was all ready to blame UML again until I looked at what was going on with the instance: it was swapping like crazy.
Some peaking around in the system showed working set sizes for my Ruby FastCGI processes in the 35MB-50MB range (very bad considering my instance only has 64MB of RAM allocated for it) –that’s the kind of footprint that makes you start to think of J2SE/EE! To get an idea of how inefficient that is, Postgres’s working set size for handling this puny blog is about 4-5MB.
So far, I’ve been able to get back to some kind of decent performance by trimming my FastCGI process count down to one (which has some unfortunate side effects, but none as unfortunate as having 20-30MB of memory constantly swapping).
This tweaks my curiosity as to what’s going on under the covers, and how much of the overhead is Ruby, how much is Rails, and how much is Typo.
On Efficiency, Scalability, and the Wisdom to Know the Difference 4
Joel Spolsky has been on a tear lately. He’s managed to really kick up a lot of dust. I’ve ignored most of the excitement, but I couldn’t ignore his latest post. He seems to have completely confused the differences between efficiency and scalability and has curious notions about the reasons for the importance of either.
Let’s review: efficiency is the ability to get something done while consuming few resources. Efficient code uses less memory, less IO bandwidth, and less CPU time to get the same job done. Scalability is a made up term used in industry to refer to the notion of software being able to handle growing levels of work gracefully, where gracefully generally translates to “costs grow no more than linearly with the size of the work”.
All of Joel’s arguments were about efficiency, not scalability. This is important because particularly for web applications, it is well recognized that non-scalable solutions are really not that useful, so there is hardly any debate about that. Scalability is generally tied to algorithms and the infamous Big O notation, so it’s very hard to point at a programming language or other low-level component and say, “that’s not scalable”. You can find frameworks (sometimes libraries) and identify places where they fail to scale, but Joel declines to do so. His beef is with Ruby, so it is all about efficiency (and actually specifically CPU efficiency), despite all his statements about it being “scalability”.
Efficiency is an interesting point of contention. People tend to make a huge deal about it even when they shouldn’t, and ironically don’t tend to make a huge deal about it for the one reason they should. I’m surprised that Joel fell in to the same trap.
First, I haven’t yet seen an implementation of Ruby that is particularly CPU efficient. I haven’t looked at memory consumption, but I’d be willing to guess it’s not so great there. Like most languages, it’s fine for IO efficiency.
So, right off the bat, if your application’s limiting factor is IO, there isn’t an efficiency disadvantage to using Ruby. Check around, and you’ll find there are a LOT of apps that are essentially IO bound. If your competition is using C and needs 10 servers, you could use even dog-slow Ruby and still only need 10 servers, not 100.
Now, Joel claims you’ll inevitably run in to some place where you are CPU bound. I’ve seen exceptions to this rule, but for the most part he’s right. There is always some performance hot spot that comes up that needs to be optimized. A lot of the time, even that hotspot can be addressed algorithmically, which means that you really don’t care about the language it’s implemented in. In the cases where it ultimately comes down to a language runtime’s CPU efficiency, the faulty logic here is that because this one part of your app can’t be implemented in language X, then you can’t use language X for the rest of your app. That’s just silly. You can always implement that hotspot in some other language, provided your language has some reasonably efficient way to hand off computation to code in another language (which Ruby seems to do reasonably well). If it is the difference between 10 and 100 servers, it is probably worth the development overhead to do it.
Joel also pokes fun at “advocates singing hymns about developer cycles vs. CPU cycles”, which I found surprising as well. Sure, you have small parts of your application where CPU cycles are key, and it’s worth sacrificing developer cycles for that added efficiency, but generally for apps the bulk of your code is much more sensitive to developer efficiency, because developer efficiency translates to “more features that work better”. You can find evidence of this in almost every software paradigm: interpreters in embedded systems, languages like Lua bound to high performance C++ game engines in the gaming industry, web servers written in C calling PHP/Perl/Ruby/VBScript/Python/whatever which in turn invoke functions in highly tuned databases (and it’s worth pointing out that Yahoo and Google use PHP, Python and Java despite scaling their apps to literally thousands of servers), and desktop apps like Word whose core is carefully tuned C/C++ and assembler, but that use languages like Basic to implement a lot of their features. The biggest example is the web browser. Most web apps are implemented in XML and Javascript (neither of which are about to set any efficiency records) that are executed by some very highly tuned browsers. So, there is considerable evidence that while you often need some expertise with a CPU efficient runtime, for almost any problem domain, what Joel calls an “inefficient” runtime is still useful and desirable.
Joel also makes some funny claims about duck typing effecting performance. Sure, it has an effect, but it is hardly the kind of thing that can’t be overcome. Yes, it makes type inferencing harder, but the key word there is harder, not impossible. Lots of folks have demonstrated how you can do really simple things like “hey, if self is of type X when I make this first call, it’s probably of type X when I make subsequent calls, and in fact, I can prove that it is always true until someone loads some more code in to this image”. With a sufficiently clever runtime (which Ruby lacks at this time), you can and should be able to get to the point where you are no worse than half as CPU efficient as C code. Joel’s right on one key point though: Ruby lacks this at this time, and that is a concern, but the concern is one he fails to mention.
Read that last sentence again: after two paragraphs pointing out that efficiency isn’t really that important, now I’m saying it is. Isn’t life full of contradictions?
Efficiency *is* important because it is a fairly reasonable proxy for the maturity of a platform. There’s a funny little factoid about software: there is almost always a way to write code in a way that gives the runtime enough information to execute efficiently. When your runtime doesn’t do this, you have to ask the question: why?
Joel makes the argument that you should be able to get the overhead of a function call down to the level where it’s a single CALL instruction. First, a CALL instruction can be expensive, thanks to the wonder of cache misses. That aside, you can in fact get it down to where the overhead isn’t even a single instruction, thanks the the wonders of inlining. As Herb Sutter pointed out in his article “Inline Redux”, inlining is almost always possible, because there are so many places where you can do it (Java, which Joel suggests has poor performance, has runtimes that inline far more aggressively than most C/C++ runtimes). As I mentioned above you can do tricks with type inferencing that get around the performance costs of late binding, except for when you are actually taking advantage of late binding’s benefits (in which case, as per Greenspun’s 10th law, the late binding runtime probably performs better than most attempts to get equivalent capabilities using C/C++). The same can be said for automatic heap management through all kinds of tricks. You can get message dispatch or generic dispatch to perform like function dispatch for the cases where you only need the simpler functionality of the latter. Zero overhead bounds checking can be done by code analysis or in the worst case using page faults. Really, the list of performance optimizations available tends to trump the best efforts of language designers to make things slow. ;-)
So that brings us back to the key question: if your runtime isn’t that efficient, why? The answer is that nobody has put that kind of effort in to making it that efficient. It just hasn’t been worth the effort yet, and that strongly suggests that the platform just isn’t that mature yet. If it were, that’d be one of the things that would have been addressed along with integration with legacy systems, sophisticated development and debugging tools, dealing with corner cases that could break the runtime, building out a complete set of support libraries, integration with various platforms and technologies, etc., etc. The bottom line is that if efficiency hasn’t been tackled to the point where you are within being about half as efficient as the ideal solution, some of those things haven’t been addressed. While efficiency might not matter to you, at least one of those other things probably will. Lack of CPU efficiency should be treated as a strong indicator that some other shortcoming that really matters to you might exist.
Now there are important exceptions to this to consider, in particular there are languages like Erlang, where the whole point is to deal with one very difficult domain efficiently (distributed computing/parallelism), and they actively encourage you to use another language (C) for more “regular” tasks. Even in those cases, you can expect that Erlang will show some lack of maturity if you try to use it for “regular” tasks, but you’re probably more than happy to use it for what it’s good at, and use something else for the rest.
Now, a runtime can be lacking in CPU efficiency in your application domain and still be useful for you. It could be your problem domain is all about other efficiencies, like IO or memory, and the runtime is great for that stuff (indeed, CPU and memory trade offs in particular create cases where it really only matters if a runtime can be memory efficient). You might be at a startup where the maturity of your platform just isn’t as important as your ability to get something up and running before the company runs out of capital, and some degree of risk due to platform immaturity is acceptible. You might be working on a problem that is so complex, and your resources are so constrained, that just getting something that works is such a victory that nobody cares about how efficient it is, how well it integrates with other technologies, etc. Fine, but most people benefit from the advantages of working with a mature platform, and as such, efficiency is a very good proxy for the far more difficult to quantify property of maturity.
So yes, I’d agree that efficiency is important, but in none of the ways that Joel suggests.
Ruby On Nails Scratching a Chalkboard 52
So, as I explore how Ruby works, I’m discovering some bits of ugliness. It’s syntax is increasingly reminding me more of Perl than Smalltalk. A case in point: blocks.
I’d heard so much about Ruby’s Smalltalkishness that I was a bit taken aback when I saw control statements in the language grammar. In Smalltalk, control flow is managed using methods and blocks, and I knew Ruby had blocks (this is one of the things that you hear so much about in Beyond Java), so why did they need these control statements? In Smalltalk, control flow looks like this:
1 + 1 = 2
ifTrue: ['it is true']
ifFalse: ['it is false']Now, I can’t claim that this provides any real productivity boost over Ruby’s approach:
if (1 + 1 == 2) then
'it is true'
else
'it is false'
endBut I was kind of surprised, given Ruby’s ties to Smalltalk, that someone hadn’t hacked it in. So, I went about hacking it in myself. That’s when I found out why.
It turns out that blocks in Ruby have a very high level of syntactic sugariness. Not only do they have their own special literal form (which is a key advantage over say Java’s Inner Classes, or C++ functors without boost::lambda), but they also have their own special status which really makes them non-objects. (I found it amusing to discover that the most non-object entity in Ruby is a block).
Here’s the magic: blocks aren’t passed as normal parameters to functions. They are passed through an implicit variable (showcasing Ruby’s Perlishness here). So, if, for example, I wanted to add something like Smalltalk’s ifTrue: to Ruby, I’d do the following:
class TrueClass
def ifTrue
yield
end
end
class FalseClass
def ifTrue
end
end
(1 + 1 == 2).ifTrue { puts 'Math works' }
(1 + 3 == 2).ifTrue { puts 'Math is broken' }Notice that ifTrue doesn’t appear to take any parameters, and neither does the “yield” method. In reality, the block is an implicit parameter. One Ruby tutorial claimed this is a good thing, because it means that all Ruby methods can take a block as a parameter…. even if they don’t use it. Me, I’m a big fan of explicitness, but I can see that in a scripting world, sometimes these kind of shortcuts are nice to have. What’s bad about this is that not only does it mean that all Ruby methods can take a block as a parameter, it also means all Ruby methods can only take exactly one block as a parameter, and it has to be the last one.
Now, it turns out that Ruby has a wrapper around blocks called Proc, which lets you treat a block like a real object, Of course, it has all the syntactic beauty of Java’s Inner Classes. Here’s how you can do ifTrueifFalse in Ruby:
class TrueClass
def ifTrueIfFalse(trueProc, falseProc)
trueProc.call
end
end
class FalseClass
def ifTrueIfFalse(trueProc, falseProc)
falseProc.call
end
end
(1 + 1 == 2).ifTrueIfFalse(Proc.new { puts 'Math works' },Proc.new {puts 'Math is broken'})But wait! There’s more! Since Proc’s are proper objects, you can query them for meta-information, which is really handy for various dynamic programming tricks. Only… Ruby’s interface is kind of weird. Proc’s have this method “arity” which tells you how many arguments the block takes… sort of. For reasons passing understanding, if a block takes zero arguments, the function returns “-1” intead of “0”, and if it takes 1 argument, it returns “-2” instead of “1”. So, now we’ve established that it can never return 0 or 1, and that you can’t always use the return value as an collection size for your argument list. Here’s where it gets really crazy though: if your function takes a variable argument list with it’s last parameter, arity returns “0 - # of args”. So, quick question for you: if arity returns back -2, does that mean it’s argumetn list is one argument long, or that it takes one argument followed by a variable list of arguments? I’m not sure how Bruce Tate can claim that Ruby doesn’t have some weird anachronisms that get in the way of doing metaprogramming with a straight face.
In fairness, the case where you want to pass a single block as your last argument seems like the common case, and Ruby is a scripting language after all. I’m mostly annoyed because I’ve heard so many people talk about Ruby’s elegance, comparing it favourably with Smalltalk (which admittedly is not entirely without warts). Upon inspection it seems to have warts just like other languages (well, some languages have a few more warts than others). Still, there is hope. Ruby does seem to have some genuinely nice features, and it is open source, so there is always the possibility that some of these idiosyncracies will get cleaned up in the future.
UPDATE: So, someone with some real Ruby experience has clarified for me that nobody actually does “Proc.new” in Ruby. Instead they use Lambda. So, invoking my ifTrueIfFalse method would normally be done like so:
(1 + 1 == 2).ifTrueIfFalse(lambda { 'Math Works' },lambda {'Math Doesn't Work' })Which I have to admit does seem a lot prettier for some reason.
ANOTHER UPDATE: I’ve gotten some great comments to this article, and I thought I should incorporate their content. First, people have suggested that you can break up ifTrueIfFalse in to two calls that are chained together, and then get back some of the elegance. I thought about this when I first looked in to it, but you lose the ability to pick up a return object cleanly.
Antti Tarvainen provided some excellent points. In particular he clarified the difference between a Proc that takes no arguments (arity returns 0) and a Proc that doesn’t define any arguments returns -1. Furthermore, arity has been updated for Ruby 1.9 to what seems like a more sensible behavior. I noticed that even in 1.8
puts lambda {|a|}.arity returns 1, which suggests the Ruby documentation is a wee bit out of date.
I still think it’d be far more sensible to not overload the arity method and instead have numArgs? which gets you the number of required arguments”, hasOptional? which gets you back a boolean as to whether there are optional arguments, and argsDefined? which gets you back a boolean as to whether the Proc has defined arguments at all. Overloading the meaning of the return value just results in more code that needs to check for special cases and cases where you can’t actually know which of two states is correct.
Also, there seems to be confusion about my point in comparing it to Smalltalk’s ifTrue:ifFalse:. Of course one should use Ruby idioms when doing Ruby. The ifTrue:ifFalse: example is just a simple and well understood example of having more than one block in your parameter list. I will say that there is a certain kind of semantic elegance that comes from having all your control flow done through methods and objects. Ruby advocates always say that in Ruby “everything is an object”, but it appears that blocks and control flow expressions are not, and in this regard Ruby doesn’t quite live up to expectations set by Smalltalk and LISP.
My First Ruby Program
I realized that earlier today I wrote my first Ruby program, and it’s probably worth documenting this moment for posterity.
It’s a trivial bit of code:
require 'postgres'
sequences = [
'blacklist_patterns', 'blogs', 'categories',
'contents', 'page_caches', 'pings',
'redirects', 'resources', 'sessions',
'sidebars', 'tags', 'text_filters',
'triggers', 'users'
]
def fixSequence(db,tableName)
results = db.query("select max(id) from #{tableName}")
max_id = results.first.first
if max_id then
db.exec("select setval('#{tableName}_id_seq'::text,#{max_id});").clear
end
end
db = PGconn.connect("localhost", 5432)
sequences.each {|sequence| fixSequence(db,sequence)}
db.close()So far, my first impression is that Ruby tries to be like Smalltalk, but is Perlish enough to fall short IMHO. Of course, I hardly know the language yet, so there may be a more elegant way to do things that I have yet to uncover. In particular, I’m wondering why the True and False classes don’t have “ifTrue:ifFalse” type methods that take blocks as arugments. Seems like an obvious “nice to have”. IIRC it’s possible to add methods to existing classes, so maybe I can do this to keep the Smalltalk cravings to a minimum.
Migrating to Postgresql 1
Yesterday, the blog got an error. Surprise, surprise, my concerns about SQLite seemed to be realized, as tracing through the error it seemed the database had gotten trapped in this locked state. Now, I’m sure if I had half a clue about SQLite or Rails I might have been able to figure this out and prevent it from happening again, but I know zip about the former and am very much in the early stages of learning about the latter. So, on the “least effort” principle, I decided to carry out that migration to Postgresql I had been wanting to do anyway.
So, the first thing I had to do was figure out how to migrate the database over. There is probably a poorly documented way to do this through Typo or Rails, but since I wasn’t familiar with either, I just did it the ol’ database developer way. Fortunately, sqlite has a “.dump” command that seems to dump out the entire database as a series of SQL commands. I figured that the commands for creating a schema were not exactly what one should do for postgres, so I stripped them from the dump.
Next, I created a database and user in postgres for the blog, and updated the Typo database.yml to point to postgres. I then ran the schema script for postgres. All was looking good… then I ran the dump script to pull the SQLite data in to postgres. It turns out that the schema script sets up more than the table schemas: it also injects some data. This meant I had some duplicate records getting injected which violated various primary key constraints in the DB. Fortunately, everything got rolled back and I was easily able to remove the offending records from the database. With that the migration of the data was complete.
That’s when the real fun started. I restarted lighttpd, and everything came up until Rails had a cache miss and had to render a page. I got errors like this in my log: “MissingSourceFile (no such file to load – postgres):”. It became abundantly clear that the typo gem that had insisted that SQLite be installed had totally ignored Postgresql. :-(
My first instinct was to install the Postgresql driver using emerge, so I did that. This appears to have been a bad idea. This installed the driver as “ruby-postgres”, and as you can see from the log message, Typo was looking for it in “postgres”. So I switched to installing through gems. That seemed to put things in the right place, but then I did something stupid: I unmerged the gentoo install. Little did I know it, but Gentoo had successfully installed the C shared library that the Ruby driver uses in the right place. So, when I unmerged, this file was removed and now the postgres gem was without its shared library.
This took me more time than I care to admit to understand, but it was a simple matter to rebuild the library, and then things seemed to be working. Except…
Somehow, the “sessions” table got an error as soon as I tried to login. Tracing it back the logs and postgres, the problem seemed to be that the import process had been setting ids explicitly, rather than letting them be set implicitly by the sequences. This is probably a good thing as any foreign keys would be messed up, but it meant that I had to fix the sequences so that they weren’t returning id’s for records that had already been assigned. A few invocations of setval() later, and I was good.
Now the blog is running 100% on Postgres. I’m still trying to make up my mind as to whether it’s worth trying to migrate over the old content from 360, but at this point I’m ready to start using this thing for real.
Observations in the first 10 minutes 2
So, my first observation is that Google’s AdSense crawler appears to not be getting the joke. So far it has selected ads with titles like “Get Paid To Create Hubs” and “Blog Advertising”.
I guess the ad selection isn’t entirely stupid, as I can only imagine how many first time bloggers expect to make millions out of the public’s intense interest in their thoughts (because in blog world, unlike reality, everyone is interesting).
The other observations I have are:
- I’m a total novice at this blog thing
- Blogs appear to have evolved in to something way more complicated than actually makes sense to me
- I am definitely going to be using Postgres going forward (SQLite is different enough to be annoying without providing any benefits that I can determine)
- Typo is currently set up to do far too much logging
- Ruby gems brings back frightening memories of CPAN
- Given how often I’ve heard Rails folks rail against the many frameworks of Java, it’s amusing to see just how many such frameworks are used by Rails
- Despite all of it’s other features, it’s not immediately obvious how to add spelling and grammar checking to Typo. You’d think this would be a priority (well, maybe only if your spelling and grammar is as bad as mine).