The News: Ruby on Rails performance is dominated by garbage collection. We present a set of patches to greatly improve Rails performance and show how to profile memory usage to get further performance gains.
What's at Stake: Rails is slow for many uses and did not lend itself well to optimization. Significant performance gains could only be achieved at application level at large development cost.
The Upside:
- 5x potential performance gains;
- easy way to identify whether GC is a bottleneck;
- deterministic process to fix memory bottlenecks;
- set of canned patches to solve the biggest problems;
- you can help
I don't have time, make my app 5 times faster!
- Apply the monkey patches below. Enjoy.
- Time your app with garbage collection disabled to figure out how much more performance you are leaving on the table. Enjoy.
- Come back for more patches. Enjoy.
What's going on here?
Ruby runtime uses garbage collection. Your code allocates memory as it runs. From time to time garbage collector kicks in, halts the whole system, and cleans up no-longer-referenced memory using mark-and-sweep algorithm.
GC gets triggered by multiple conditions. The one that matters the most is 8M trigger. Every time you allocate 8M of memory GC runs. Complex Rails request can allocate hundreds or even thousands of megabytes of memory, making GC runs dozens of times. Each GC pass takes 50-150ms. You do the math.
It's possible to patch ruby interpreter, increase the triggers and reduce frequency of garbage collection at the cost of additional memory use. This helps, but does not resolve the problem. If there is more garbage, each GC run takes longer. We'll devote a separate post to this later.
But, I am running ruby-prof already, you ignorant fool!?
And so are we. It does not really help:
- ruby-prof attributes full cost of garbage collection to the method where GC gets triggered, not the methods that allocate the memory. That's like blaming the last straw for breaking the camel's back.
- ruby-prof does not show that garbage collection took place and how much time it took. All you can tell is that some method took a long time.
If you go by the these result, and remove the straw that's breaking the camel's back his load does not get much lighter.
Statistics being on your side, things are not entirely hopeless. GC gets triggered within malloc, so methods doing a lot allocations are also likely to trigger GC. Further, simply allocating memory is relatively expensive, and that will also show up in profiler output.
You might get lucky, optimize ruby-prof hotspot and get better performance, but more often than not you will fail. The correlation is just too weak. We believe this is exactly why Rails resisted performance optimization. Up till now.
Profiling memory allocations
Ruby-prof is almost the right tool for memory profiling. "Almost" is because the free cheese is only found in the mousetrap. You'll have to do some work first.
You need to build patched ruby (patch
here)
and latest ruby-prof (patch
here). With
both tools patched, set RubyProf.measure_mode = RubyProf::MEMORY
before running the profiler.
Make sure to turn off GC when profiling for pure performance. You'll get get much more representative results.
The detailed HOWTO and description on the process is in our next blog post.
Will it help me?
In general your code will benefit most if you:
- Create or update a lot of rows using ActiveRecord
- Run large number of SQL statements of any kind using ActiveRecord, e.g. load many AR object through separate #find calls
- Use ActiveView helpers, especially in a loop, to generate complex HTML
- Use has_many AR association
It's easy to tell for sure if you have performance tests. All that "performance test" means is that you can run a single request and time how long it takes.
Just run the same test as usual, and then with GC disabled and compare the results. Most code will run much faster with GC disabled. E.g. for Acunote we test how long it takes to copy 120 tasks from one sprint from another.
Before optimizations with GC: 14 sec
Before optimizations w/out GC: 6 sec
You can't run ruby without GC in production. It will rapidly run out of memory and crash. You can, however, apply optimization, produce less garbage and get your code be even faster than original benchmark with GC turned off!
After optimizations with GC: 3 sec
The code that allocates little memory, but is algorithmically complex and is slow because of that, will see little improvement. For example Acunote's burndown and sprint completion date calculations fall squarely into this category and did not benefit. These aside, Acunote performance was dominated by the cost of memory allocation.
Give me the patches!
| Problem | Patch |
|---|---|
ActiveRecord::Base#create calls attributes_with_quotes twice, ActiveRecord::Base#update once. #attributes_with_quotes calls #attributes, which clones all attribute values. None of these clones are necessary.
Impact: When copying 120 tasks in Acunote this costs 650M. Patch improves performance from 14s to 6s. Notes: Helps when you do a lot of creates and updates through ActiveRecord. |
Rails 1.2
Rails 2.0
|
Rails surrounds every SQL call and ActionView::Base#render with Benchmark#realtime. Benchmark#realtime allocates unnecessary 45k per call.
Impact: When copying 120 tasks in Acunote this costs 120M. Patch improves performance from 6s to 5s. Notes: Helps when you do read, create, update lots of AR records, or execute any other SQL, or call #render(:partial) in a loop. |
|
|
Rails 1.2 tries to pretty-print and log every SQL statement even when logger is in Logger::INFO mode (production environment) and prints nothing.
Impact: When copying 120 tasks in Acunote this costs 17M. Patch improves performance by 1s. Notes: Helps when you do read, create, update lots of AR records, or execute any other SQL in a loop. Patch only applicable to Rails 1.2. Rails 2.0 has the correct log_info implementation. |
Rails 1.2
Rails 2.0No patch necessary. |
|
Comparison of BigDecimal's (used by Rails for Numeric data types) to booleans is slow because it requires unnecessary method_missing call and exception catch.
Impact: When rendering 120 tasks in Acunote this costs 4M. Patch improves performance by 100-120ms. Notes: Helps when you have the imprudence to accidentally compare BigDecimal's with true or false. |
|
ActionView::Helpers::JavaScriptHelper allocates lots of memory when run. It runs every time view is rendered.
Impact: When rendering 120 tasks in Acunote this costs 5M. This improves performance by 100-110ms. Notes: Helps when you have rich AJAX page with lots of dynamic elements. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests. |
Hand-replace link_to_function with <a href="#" onclick="..."/>.
|
ActionView::Helpers::FormTagHelper allocates lots of memory when run. It runs every time view is rendered.
Impact: When rendering 120 tasks in Acunote this costs 2M. This improves performance by 40-50ms. Notes: Helps when you have page with lots of input boxes. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests. |
Hand-replace text_field_tag with <input type="text"/>.
|
ActionView::Helpers::AssetTagHelper allocates lots of memory when run. It runs every time view is rendered.
Impact: When rendering 120 tasks in Acunote this costs 8M. This improves performance by 110-150ms. Notes: Helps when you have page with lots of images. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests. image_tag itself can be optimized. Help wanted.
|
If you don't use asset hosts, hand-replace image_tag with <img src="..."/>.If you use asset hosts, there's no patch. |
Passing a block to a method on instance of ActiveRecord::Associations chews up memory.An example (from activerecord/lib association.select { |record| record.new_record? } line inside #add_multiple_associated method.
Impact: This is used all over ActiveRecord code, and often by application code. Notes: We have been unable to figure out the root cause. Help wanted. Separate article to come. |
None known. Sometimes it helps to convert association to array before using, for example association.to_a.select { |record| record.new_record? }
|
How can I help?
- Profile memory allocations for your Rails application. Figure out and fix hotspots. Post or link to patches and notes in the comments section. We'll be updating this post, as you and we figure out more.
- If you are writing a large commercial Rails application, have one of you engineers do some memory profiling. Fast is a feature, and it's a nice way to contribute to the community.
- Help us figure out the some of the issues we found, but haven't been able to figure out yet.
Submitted Patches
Benchmark#realtimepatch
Patch accepted. Matz committed it to Ruby 1.8 branch. Also, included in Rails trunk.ActiveRecord::Base#attributes_with_quotespatch
Patch accepted. Michael Koziarski committed it to Rails trunk.
Thanks To
- Alexander Dymo for actually doing all the work
- Alexander Goldstein for directing us to look at GC in the first place
- DHH and rails-core for giving us something to optimize, and for making Rails code so easy to understand and modify
- Charlie Savage and Shugo Maeda for ruby-prof
- Sylvain Joyeux's for original memory allocations number measurement patch
- Stefan Kaes for all his performance work, railsbench and especially GC patch
Sightings
- Charlie Savage: Must Read Rails Performance Article
- Charlie Savage: ruby-prof 0.6.0 and Memory Profiling
- Ruby Inside: Fixing Ruby & Rails' slow performance with patches!
- Michael Klishin: Making Ruby on Rails applications 5 times faster: patches attached
- Wes Maldonado: Acunote/Nimble Method Patches Speeding up Ruby on Rails
- Accelerate HR: More notes on super-fast data insertion in Rails
- Chu Yeow Living on the edge (of Rails) #6 - better performance, Git support, and more
- Antares Trader: Collecting the Garbage in Ruby
- edhickey.com: Mongrel memory usage
- Mike Perham: Tuning ActiveRecord
Updates
- Update 1: noted that patches are against 1.2.4 Requested patches against 2.0
- Update 2: added patches against Rails 2.0
- Update 3: added "submitted patches" section with upstream patches
- Update 4: #realtime and #attributes_with_quotes patches accepted upstream
- Update 5: added Sightings section
What version of Rails are you using?
I added the monkey patches in environment/initializers/optimizations.rb running Rails 2.0.2 and receive this error:
ArgumentError (wrong number of arguments (2 for 1)):
/vendor/rails/activerecord/lib/active_record/base.rb:2245:in `attributes_with_quotes'
/vendor/rails/activerecord/lib/active_record/base.rb:2245:in `update_without_lock'
/vendor/rails/activerecord/lib/active_record/locking/optimistic.rb:70:in `update_without_callbacks'
/vendor/rails/activerecord/lib/active_record/callbacks.rb:240:in `update_without_timestamps'
/vendor/rails/activerecord/lib/active_record/timestamp.rb:38:in `update'
/vendor/rails/activerecord/lib/active_record/base.rb:2238:in `create_or_update_without_callbacks'
/vendor/rails/activerecord/lib/active_record/callbacks.rb:213:in `create_or_update'
Posted by: Erik Peterson | January 31, 2008 at 08:24 PM
All the patches are against 1.2.4 That's what we use in production. For 2.0 try this ... Guess we should post 2.0 patches as well.
Updated: see 2.0 patches above.
Posted by: Gleb Arshinov | January 31, 2008 at 08:46 PM
I actually took a look at the source after posting my comment and arrived at changes identical to yours.
I've never poked around in the Rails source, but I can see this getting addictive. Especially since I'm deployed on a Solaris server running Ruby with dtrace probes.
Thanks for the posts, killer info.
Now I just need to get my app launched and stop optimizing prematurely. ;)
Posted by: Erik Peterson | January 31, 2008 at 09:04 PM
Eric: you could probably get all kinds of additional cool stats through dtrace. We've talked about it, but haven't had a chance to try it yet.
You are welcome. But, yuh, don't worry about optimizations just yet, launch it first :-)
Posted by: Gleb Arshinov | January 31, 2008 at 09:10 PM
Are you working to get any of these patches into Rails itself?
Posted by: Peter Harkins | February 01, 2008 at 11:22 AM
Peter: yes, we've started submitting patches to Rails and Ruby. E.g. see Rails ticket #10978. We should probably create a separate section with links on this page to track these submissions. End goal is to get all these or similar changes integrated upstream.
Posted by: Gleb Arshinov | February 01, 2008 at 11:38 AM
Pardon my ignorance... but how do you apply these patches?
Posted by: James | February 01, 2008 at 02:50 PM
This is great work. Thank you for posting this!
Posted by: Shanti Braford | February 01, 2008 at 03:53 PM
Thanks again for the patches guys, keep them coming :)
Posted by: Michael Koziarski | February 01, 2008 at 05:50 PM
Thanks for helping me understand why. And it'd be good to see the patches accepted into Core.
For simple (i.e. no call-backs) but large data imports, I find the ar-extensions gem/plugin does the trick. By creating a single sql insert statement rather than looping through ActiveRecord::Base#create or #update thousands of times, the issues you've described are avoided. For me, it's created astonishing speed increases.
Posted by: Alan Miles | February 02, 2008 at 04:00 PM
This looks like great stuff! Thanks for sharing.
Posted by: Dav Yaginuma | February 02, 2008 at 05:39 PM
Awesome awesome work Alexander (and all the other guys)!
Some of these patches have gone into core: the Benchmark#realtime one ( http://dev.rubyonrails.org/changeset/8771 ), and the attributes_with_quotes one ( http://dev.rubyonrails.org/changeset/8770 ) in particular.
Posted by: Chu Yeow | February 03, 2008 at 07:05 AM
A patch to run the GC 'after each request' would be nice, too. Thanks for doing this stuff!
Posted by: Roger Pack | February 05, 2008 at 06:23 PM
To apply these you copy the code fragments into the bottom of 'config/environment.rb', or possibly other places, but that seems to work.
Posted by: Roger Pack | February 06, 2008 at 10:47 AM
is this going to be included in official release ?
Posted by: shabo | February 07, 2008 at 11:53 PM
thanks for sharing this work. It will go along way towards closing ror's performance gap.
Posted by: Al Brown | February 08, 2008 at 10:15 AM
It would be nice mentioning some quality asset manager plugins that can dramatically reduce total load time--like http://code.google.com/p/bundle-fu/ or http://synthesis.sbecker.net/pages/asset_packager
Posted by: Roger Pack | February 08, 2008 at 05:57 PM
Shanti, Michael, Alan, Dav, Chu, Roger, Al -- you are welcome!
Alan, thanks for mentioning ar-extensions. Internally we've talked
about doing something like that, and it's great that it already
exists.
Posted by: Gleb Arshinov | February 08, 2008 at 06:03 PM
Hi!
Just patched gc.c in ruby and freezed rails but rails run even slower at least on tests and specs. Any tricks here? Should I use ruby_prof to get performance boost? BTW I am on the Gentoo Linux box.
--Yury
Posted by: Yury Kotlyarov | February 12, 2008 at 06:59 AM
After increasing GC_MALLOC_LIMIT to 400M I can see better performance. So without any patching my tests completes in 20 seconds. With your patches in 21 seconds. After increasing GC_MALLOC_LIMIT - 18 seconds.
here is the patch:
Index: gc.c
===================================================================
--- gc.c (revision 15447)
+++ gc.c (working copy)
@@ -64,7 +64,7 @@
#if defined(MSDOS) || defined(__human68k__)
#define GC_MALLOC_LIMIT 200000
#else
-#define GC_MALLOC_LIMIT 8000000
+#define GC_MALLOC_LIMIT 400000000
#endif
#endif
Posted by: Yury Kotlyarov | February 12, 2008 at 02:05 PM
This post http://cfis.savagexi.com/articles/2007/07/18/making-rails-go-vroom by Charlie Savage shows some results that he got (and might be useful).
Posted by: Roger Pack | March 11, 2008 at 12:42 PM
It would be interesting to see if there is any difference running your tests with a patched ruby:
http://lloydforge.org/projects/ruby/
There's an attempt there at adaptively performing GC rather than fixed at 8MB allocations. The goal being when you're in a period of intense allocation, to do less work trying to clean up.
Would be interesting to see how those low level techniques play out in a high level application...
fwiw,
lloyd
Posted by: Lloyd Hilaiel | March 24, 2008 at 08:50 AM
Lloyd -- looks interesting, we'll check it out.
Posted by: Gleb Arshinov | March 25, 2008 at 11:00 AM
wonder if this would help.
http://kurtstephens.com/node/34
Posted by: Roger Pack | April 29, 2008 at 06:58 PM
This looks like it has some potentially memory saving stuff, too.
http://pennysmalls.com [I'm serious].
-R
Posted by: Roger Pack | April 29, 2008 at 07:33 PM