This is the second post of my pre-RailsConf series of blog posts on Rails application performance optimization. I'll be presenting at RailsConf about this and many other aspects of performance optimization on May 6th, so you're welcome to join me at RailsConf and please watch for more performance-related articles in this blog.
See also the previous article in this series on improving Date class performance.
We saw lots of synthetic benchmarks proving that alternative ruby implementations (namely the only two finished ones - JRuby and Ruby 1.9) are faster than good old 1.8 aka MRI. But nobody so far ported the production application to prove they are faster in the real world conditions. We did.
We ported Acunote - our online enterprise project management and Scrum software to both JRuby and Ruby 1.9 and ran our set of performance benchmarks. We have 30 benchmarks that check the server response time for most typical user requests. We use those benchmarks to check for performance regressions, and this time we used them to check the Acunote's performance with alternative rubys.
There're four common request types for Acunote, ones that primarily involve date/time operations, rendering, numerical calculations and ActiveRecord/database operations.
Here are the timings for four requests, one for each type:
| Acunote Benchmark | Request Time, sec | Improvement | |||
|---|---|---|---|---|---|
| 1.8.6 | JRuby 1.2.0 | 1.9.1 | JRuby | 1.9.1 | |
| Date/Time Intensive | 1.23 | 0.58 | 0.53 | 2.1x | 2.3x |
| Rendering Intensive | 0.61 | 0.44 | 0.30 | 1.4x | 2.0x |
| Calculations Intensive | 2.57 | 1.79 | 1.33 | 1.4x | 1.9x |
| Database Intensive | 5.58 | 4.63 | 3.29 | 1.2x | 1.7x |
Here are those timings expanded with rendering and database time:
| Acunote Benchmark | Request Time / Rendering Time / DB Time, msec | ||
|---|---|---|---|
| 1.8.6 | JRuby 1.2.0 | 1.9.1 | |
| Date/Time Intensive | 1230 / 0 / 65 | 580 / 1 / 45 | 530 / 1 / 37 |
| Rendering Intensive | 610 / 328 / 33 | 440 / 273 / 45 | 300 / 197 / 37 |
| Calculations Intensive | 2570 / 380 / 38 | 1790 / 215 / 54 | 1330 / 204 / 39 |
| Database Intensive | 5580 / 529 / 700 | 4630 / 209 / 1321 | 3290 / 221 / 685 |
As you can see from the table, Ruby 1.9 is 2x faster than 1.8. But the real world improvements are even better. Ruby 1.9 eliminates performance bottlenecks that were present in 1.8, e.g. Date class, numerical and string computations, template rendering. It lets us write more complex systems in Ruby than were ever possible before. In the immortal words of Avi Bryant, it lets us add a whole 'nother level of turtles.
The performance improvement from JRuby is not as substantial, but it's still 1.5x faster. One of the reasons for JRuby being slower than 1.9 is PostgreSQL JDBC driver and not-working "fast" mode.
Want to know why 1.9 and JRuby are so much faster? Are there any other improvements and regressions with alternative Ruby's? What happened to the garbage collection performance? To learn the answers, join me at my RailsConf session or watch for the more detailed analysis in this blog.
Spelling error: We saw lots of syntetic[sic] benchmarks proving...
should be: We saw lots of synthetic benchmarks proving...
Posted by: Steven | May 01, 2009 at 02:37 PM
Thanks, fixed
Posted by: Alexander Dymo | May 01, 2009 at 02:48 PM
Are your jruby benchmarks using the default or the 1.9 mode ? I'm curious because Jruby is much faster in 1.9 mode when it doesn't have to perform 1.8 backwards compatibility checks.
Also, jruby 1.3RC1 was released today, which includes yet another round of optimizations.
Posted by: Ivar | May 01, 2009 at 03:05 PM
Great! It's really interesting and somewhat refreshing to see performance numbers for a real application.
What was your test preparation process (app and db restarts/warmups)? And what was the testbed setup (separate app/db machines?, etc...)?
Posted by: Delano Mandelbaum | May 01, 2009 at 03:11 PM
did you run jruby with --server? it's a lot faster.
these benchmark are interesting too:
http://letsgetdugg.com/2009/04/28/ruby-scaling-up-to-multiple-cpus/
Posted by: david | May 01, 2009 at 03:32 PM
Hi Alexander,
Have you had a chance to try Acunote on our CrossTwine Linker interpreters? No porting should be required (they are drop-in replacements for the Ruby versions they are based on), and they should provide further performance enhancements:
http://crosstwine.com/linker/ruby.html
I, for one, would be very happy to hear about your experience!
Posted by: Damien Diederen | May 01, 2009 at 04:18 PM
I'll echo others who wanted to make sure you ran with --server, and I'd like to make doubly sure you're running Java 6. In general, we should be faster than 1.9. Rails is a tricky thing to run fast, but if 1.9 does well, we should at least be able to match it. If we don't, something's broken.
If you'd like to see it run even faster on JRuby, stop by #jruby on FreeNode and we'll fix you up.
Posted by: Charles Oliver Nutter | May 01, 2009 at 04:19 PM
Ivar: thanks for the 1.9 mode hint, will try. Will also try 1.3RC1.
Posted by: Alexander Dymo | May 01, 2009 at 05:13 PM
Delano: Our test process is this:
The app runs in the mode that closely resembles production.
For Ruby 1.8 and 1.9 we execute 1 warmup request (more isn't necessary, because we only need to warmup database and load all ruby code).
JRuby runs with JIT enabled and with JIT threshold 0. Experiments revealed that we need up to 40 warmup requests for JRuby.
Posted by: Alexander Dymo | May 01, 2009 at 05:14 PM
Charles: Yes, I used server VM and Java 6. The command I used to run was:
jruby -J-Xmn512m -J-Xms2048m -J-Xmx2048m -J-server -J-Djruby.compile.mode=JIT -J-Djruby.jit.threshold=0
Unfortunately, Acunote didn't run at all with jruby --fast. Also enabling some other compiler optimizations had negative effect on stability. Sometimes, after N'th repetition of the same request, the test to assert that request was ok, failed. N varied from 20 to 40.
It would be cool if we could sit together sometime at RailsConf and try to make it faster. I'm really interested to see how fast JRuby should be.
Posted by: Alexander Dymo | May 01, 2009 at 05:28 PM
A JIT threshold of zero can often have a negative impact on performance too, but you probably tried a few options experimentally. We can try a few other things to see if we can improve performance, and perhaps see if there's any specific areas that are slower than they ought to be. As you mention, the database stuff still needs more perf work, but I'm surprised we weren't at least as fast as 1.9 for the other areas you tested. That doesn't match our experience.
We'd love to sit down and figure out how to improve perf for you.
Posted by: Charles Oliver Nutter | May 01, 2009 at 06:05 PM
Damien: thanks for the hint, I've tried xtruby.
On date/time intensive requests, it was 10% faster. On rendering intensive operations it was 1-2% slower than MRI. But it couldn't finish our database intensive benchmarks - there's no error from xtruby, but our checks reported incorrect response from server.
Posted by: Alexander Dymo | May 02, 2009 at 04:32 PM
Any chance you could post the ported Acunote somewhere?
Posted by: Yehuda Katz | May 03, 2009 at 10:44 AM
Now I would like to see you port your app to use mysql and compare mysql performance to postgres performance.
Posted by: James | May 03, 2009 at 09:49 PM
Do you also have information on comparative Groovy performance figures?
This is off topic I agree, but curiosity got the better of me. :-)
Posted by: Shantanu Kumar | May 04, 2009 at 07:45 AM
if you re run the tests, I'd suggest running it against both 1.8 1.9 patched to have a more friendly GC:
1.8: http://blog.evanweaver.com/articles/2009/04/09/ruby-gc-tuning/
1.9: http://groups.google.com/group/ruby-benchmark-suite/browse_thread/thread/f56b4335cfd3ec57/c7babfb676d71450?lnk=gst&q=patch+gc#c7babfb676d71450
Also any you could add your benchmarks to the ruby benchmark suite? :)
-=r
Posted by: Roger Pack | May 06, 2009 at 10:38 AM
Could you also try out the ruby enterprise edition from the mod_passeger guys? (http://www.rubyenterpriseedition.com/)
It should be a full 1.8.6 compatible ruby, but with hugh memory improvements and some performance improvements.
And since it works nicely with their mod_passenger - it would be a nice deployment combination.
Sincerly,
Rene
Posted by: Rene A. | May 22, 2009 at 06:36 AM