« Safari 3.0 / Webkit file upload problem with Rails 1.2 | Main | Guerrilla's Guide to Optimizing Rails Applications »

January 31, 2008

Garbage Collection is Why Ruby on Rails is Slow: Patches to Improve Performance 5x; Memory Profiling

  • The News: Ruby on Rails performance is dominated by garbage collection. We present a set of patches to greatly improve Rails performance and show how to profile memory usage to get further performance gains.

  • What's at Stake: Rails is slow for many uses and did not lend itself well to optimization. Significant performance gains could only be achieved at application level at large development cost.

  • The Upside:

    • 5x potential performance gains;
    • easy way to identify whether GC is a bottleneck;
    • deterministic process to fix memory bottlenecks;
    • set of canned patches to solve the biggest problems;
    • you can help

I don't have time, make my app 5 times faster!

  • Apply the monkey patches below. Enjoy.
  • Time your app with garbage collection disabled to figure out how much more performance you are leaving on the table. Enjoy.
  • Come back for more patches. Enjoy.

What's going on here?

Ruby runtime uses garbage collection. Your code allocates memory as it runs. From time to time garbage collector kicks in, halts the whole system, and cleans up no-longer-referenced memory using mark-and-sweep algorithm.

GC gets triggered by multiple conditions. The one that matters the most is 8M trigger. Every time you allocate 8M of memory GC runs. Complex Rails request can allocate hundreds or even thousands of megabytes of memory, making GC runs dozens of times. Each GC pass takes 50-150ms. You do the math.

It's possible to patch ruby interpreter, increase the triggers and reduce frequency of garbage collection at the cost of additional memory use. This helps, but does not resolve the problem. If there is more garbage, each GC run takes longer. We'll devote a separate post to this later.

But, I am running ruby-prof already, you ignorant fool!?

And so are we. It does not really help:

  • ruby-prof attributes full cost of garbage collection to the method where GC gets triggered, not the methods that allocate the memory. That's like blaming the last straw for breaking the camel's back.
  • ruby-prof does not show that garbage collection took place and how much time it took. All you can tell is that some method took a long time.

If you go by the these result, and remove the straw that's breaking the camel's back his load does not get much lighter.

Statistics being on your side, things are not entirely hopeless. GC gets triggered within malloc, so methods doing a lot allocations are also likely to trigger GC. Further, simply allocating memory is relatively expensive, and that will also show up in profiler output.

You might get lucky, optimize ruby-prof hotspot and get better performance, but more often than not you will fail. The correlation is just too weak. We believe this is exactly why Rails resisted performance optimization. Up till now.

Profiling memory allocations

Ruby-prof is almost the right tool for memory profiling. "Almost" is because the free cheese is only found in the mousetrap. You'll have to do some work first.

You need to build patched ruby (patch here) and latest ruby-prof (patch here). With both tools patched, set RubyProf.measure_mode = RubyProf::MEMORY before running the profiler.

Make sure to turn off GC when profiling for pure performance. You'll get get much more representative results.

The detailed HOWTO and description on the process is in our next blog post.

Will it help me?

In general your code will benefit most if you:

  • Create or update a lot of rows using ActiveRecord
  • Run large number of SQL statements of any kind using ActiveRecord, e.g. load many AR object through separate #find calls
  • Use ActiveView helpers, especially in a loop, to generate complex HTML
  • Use has_many AR association

It's easy to tell for sure if you have performance tests. All that "performance test" means is that you can run a single request and time how long it takes.

Just run the same test as usual, and then with GC disabled and compare the results. Most code will run much faster with GC disabled. E.g. for Acunote we test how long it takes to copy 120 tasks from one sprint from another.

Before optimizations with GC:   14 sec
Before optimizations w/out GC:   6 sec

You can't run ruby without GC in production. It will rapidly run out of memory and crash. You can, however, apply optimization, produce less garbage and get your code be even faster than original benchmark with GC turned off!

After optimizations with GC:     3 sec

The code that allocates little memory, but is algorithmically complex and is slow because of that, will see little improvement. For example Acunote's burndown and sprint completion date calculations fall squarely into this category and did not benefit. These aside, Acunote performance was dominated by the cost of memory allocation.

Give me the patches!

Problem Patch
ActiveRecord::Base#create calls attributes_with_quotes twice, ActiveRecord::Base#update once. #attributes_with_quotes calls #attributes, which clones all attribute values. None of these clones are necessary.

Impact:
When copying 120 tasks in Acunote this costs 650M. Patch improves performance from 14s to 6s.
Notes:
Helps when you do a lot of creates and updates through ActiveRecord.
Rails 1.2

module ActiveRecord
    class Base
    private
        def attributes_with_quotes(include_primary_key = true)
            result = {}
            @attributes.each_key do |name|
                if column = column_for_attribute(name)
                    result[name] = quote_value(read_attribute(name), column) unless !include_primary_key && column.primary
                end
            end
            result
        end
    end
end
Rails 2.0

module ActiveRecord
    class Base
    private
        def attributes_with_quotes(include_primary_key = true, include_readonly_attributes = true)
            quoted = {}
            @attributes.each_pair do |name, value|
                if column = column_for_attribute(name)
                    quoted[name] = quote_value(read_attribute(name), column) unless !include_primary_key && column.primary
                end
            end
            include_readonly_attributes ? quoted : remove_readonly_attributes(quoted)
        end
    end
end
Rails surrounds every SQL call and ActionView::Base#render with Benchmark#realtime. Benchmark#realtime allocates unnecessary 45k per call.

Impact:
When copying 120 tasks in Acunote this costs 120M. Patch improves performance from 6s to 5s.
Notes:
Helps when you do read, create, update lots of AR records, or execute any other SQL, or call #render(:partial) in a loop.

module Benchmark
    def realtime
        r0 = Time.now
        yield
        r1 = Time.now
        r1.to_f - r0.to_f
    end
    module_function :realtime
end
Rails 1.2 tries to pretty-print and log every SQL statement even when logger is in Logger::INFO mode (production environment) and prints nothing.

Impact:
When copying 120 tasks in Acunote this costs 17M. Patch improves performance by 1s.
Notes:
Helps when you do read, create, update lots of AR records, or execute any other SQL in a loop.
Patch only applicable to Rails 1.2. Rails 2.0 has the correct log_info implementation.
Rails 1.2

module ActiveRecord
    module ConnectionAdapters
        class AbstractAdapter
        protected
            def log_info_with_level_check(sql, name, runtime)
                return unless @logger and @logger.level == Logger::DEBUG
                log_info_without_level_check(sql, name, runtime)
            end
            alias_method_chain :log_info, :level_check
        end
    end
end
Rails 2.0
No patch necessary.
Comparison of BigDecimal's (used by Rails for Numeric data types) to booleans is slow because it requires unnecessary method_missing call and exception catch.

Impact:
When rendering 120 tasks in Acunote this costs 4M. Patch improves performance by 100-120ms.
Notes:
Helps when you have the imprudence to accidentally compare BigDecimal's with true or false.

class BigDecimal
    alias_method :eq_without_boolean_comparison, :==
    def eq_with_boolean_comparison(other)
        return false if [FalseClass, TrueClass].include? other.class
        eq_without_boolean_comparison(other)
    end
    alias_method :==, :eq_with_boolean_comparison
end
ActionView::Helpers::JavaScriptHelper#link_to_function allocates lots of memory when run. It runs every time view is rendered.

Impact:
When rendering 120 tasks in Acunote this costs 5M. This improves performance by 100-110ms.
Notes:
Helps when you have rich AJAX page with lots of dynamic elements. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests.
Hand-replace link_to_function with <a href="#" onclick="..."/>.
ActionView::Helpers::FormTagHelper#text_field_tag allocates lots of memory when run. It runs every time view is rendered.

Impact:
When rendering 120 tasks in Acunote this costs 2M. This improves performance by 40-50ms.
Notes:
Helps when you have page with lots of input boxes. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests.
Hand-replace text_field_tag with <input type="text"/>.
ActionView::Helpers::AssetTagHelper#image_tag allocates lots of memory when run. It runs every time view is rendered.

Impact:
When rendering 120 tasks in Acunote this costs 8M. This improves performance by 110-150ms.
Notes:
Helps when you have page with lots of images. It's worth elimitinating helpers in hostposts identified by memory profiler. Cumulatively this has huge impact on render-bound requests. image_tag itself can be optimized. Help wanted.
If you don't use asset hosts, hand-replace image_tag with <img src="..."/>.
If you use asset hosts, there's no patch.
Passing a block to a method on instance of ActiveRecord::Associations::{HasAndBelongsToManyAssociation /etc.} chews up memory.
An example (from activerecord/lib/activerecord/associations.rb) is association.select { |record| record.new_record? } line inside #add_multiple_associated_save_callbacks method.

Impact:
This is used all over ActiveRecord code, and often by application code.
Notes:
We have been unable to figure out the root cause. Help wanted. Separate article to come.
None known.
Sometimes it helps to convert association to array before using, for example association.to_a.select { |record| record.new_record? }

How can I help?

  • Profile memory allocations for your Rails application. Figure out and fix hotspots. Post or link to patches and notes in the comments section. We'll be updating this post, as you and we figure out more.
  • If you are writing a large commercial Rails application, have one of you engineers do some memory profiling. Fast is a feature, and it's a nice way to contribute to the community.
  • Help us figure out the some of the issues we found, but haven't been able to figure out yet.

Submitted Patches

Thanks To

  • Alexander Dymo for actually doing all the work
  • Alexander Goldstein for directing us to look at GC in the first place
  • DHH and rails-core for giving us something to optimize, and for making Rails code so easy to understand and modify
  • Charlie Savage and Shugo Maeda for ruby-prof
  • Sylvain Joyeux's for original memory allocations number measurement patch
  • Stefan Kaes for all his performance work, railsbench and especially GC patch

Sightings

Updates

  • Update 1: noted that patches are against 1.2.4 Requested patches against 2.0
  • Update 2: added patches against Rails 2.0
  • Update 3: added "submitted patches" section with upstream patches
  • Update 4: #realtime and #attributes_with_quotes patches accepted upstream
  • Update 5: added Sightings section

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/1040956/25682628

Listed below are links to weblogs that reference Garbage Collection is Why Ruby on Rails is Slow: Patches to Improve Performance 5x; Memory Profiling:

Comments

What version of Rails are you using?

I added the monkey patches in environment/initializers/optimizations.rb running Rails 2.0.2 and receive this error:

ArgumentError (wrong number of arguments (2 for 1)):
/vendor/rails/activerecord/lib/active_record/base.rb:2245:in `attributes_with_quotes'
/vendor/rails/activerecord/lib/active_record/base.rb:2245:in `update_without_lock'
/vendor/rails/activerecord/lib/active_record/locking/optimistic.rb:70:in `update_without_callbacks'
/vendor/rails/activerecord/lib/active_record/callbacks.rb:240:in `update_without_timestamps'
/vendor/rails/activerecord/lib/active_record/timestamp.rb:38:in `update'
/vendor/rails/activerecord/lib/active_record/base.rb:2238:in `create_or_update_without_callbacks'
/vendor/rails/activerecord/lib/active_record/callbacks.rb:213:in `create_or_update'

All the patches are against 1.2.4 That's what we use in production. For 2.0 try this ... Guess we should post 2.0 patches as well.
Updated: see 2.0 patches above.

I actually took a look at the source after posting my comment and arrived at changes identical to yours.

I've never poked around in the Rails source, but I can see this getting addictive. Especially since I'm deployed on a Solaris server running Ruby with dtrace probes.

Thanks for the posts, killer info.

Now I just need to get my app launched and stop optimizing prematurely. ;)

Eric: you could probably get all kinds of additional cool stats through dtrace. We've talked about it, but haven't had a chance to try it yet.

You are welcome. But, yuh, don't worry about optimizations just yet, launch it first :-)

Are you working to get any of these patches into Rails itself?

Peter: yes, we've started submitting patches to Rails and Ruby. E.g. see Rails ticket #10978. We should probably create a separate section with links on this page to track these submissions. End goal is to get all these or similar changes integrated upstream.

Pardon my ignorance... but how do you apply these patches?

This is great work. Thank you for posting this!

Thanks again for the patches guys, keep them coming :)

Thanks for helping me understand why. And it'd be good to see the patches accepted into Core.

For simple (i.e. no call-backs) but large data imports, I find the ar-extensions gem/plugin does the trick. By creating a single sql insert statement rather than looping through ActiveRecord::Base#create or #update thousands of times, the issues you've described are avoided. For me, it's created astonishing speed increases.

This looks like great stuff! Thanks for sharing.

Awesome awesome work Alexander (and all the other guys)!

Some of these patches have gone into core: the Benchmark#realtime one ( http://dev.rubyonrails.org/changeset/8771 ), and the attributes_with_quotes one ( http://dev.rubyonrails.org/changeset/8770 ) in particular.

A patch to run the GC 'after each request' would be nice, too. Thanks for doing this stuff!

To apply these you copy the code fragments into the bottom of 'config/environment.rb', or possibly other places, but that seems to work.

is this going to be included in official release ?

thanks for sharing this work. It will go along way towards closing ror's performance gap.

It would be nice mentioning some quality asset manager plugins that can dramatically reduce total load time--like http://code.google.com/p/bundle-fu/ or http://synthesis.sbecker.net/pages/asset_packager

Shanti, Michael, Alan, Dav, Chu, Roger, Al -- you are welcome!

Alan, thanks for mentioning ar-extensions. Internally we've talked
about doing something like that, and it's great that it already
exists.

Hi!

Just patched gc.c in ruby and freezed rails but rails run even slower at least on tests and specs. Any tricks here? Should I use ruby_prof to get performance boost? BTW I am on the Gentoo Linux box.

--Yury

After increasing GC_MALLOC_LIMIT to 400M I can see better performance. So without any patching my tests completes in 20 seconds. With your patches in 21 seconds. After increasing GC_MALLOC_LIMIT - 18 seconds.

here is the patch:

Index: gc.c
===================================================================
--- gc.c (revision 15447)
+++ gc.c (working copy)
@@ -64,7 +64,7 @@
#if defined(MSDOS) || defined(__human68k__)
#define GC_MALLOC_LIMIT 200000
#else
-#define GC_MALLOC_LIMIT 8000000
+#define GC_MALLOC_LIMIT 400000000
#endif
#endif

This post http://cfis.savagexi.com/articles/2007/07/18/making-rails-go-vroom by Charlie Savage shows some results that he got (and might be useful).

It would be interesting to see if there is any difference running your tests with a patched ruby:

http://lloydforge.org/projects/ruby/

There's an attempt there at adaptively performing GC rather than fixed at 8MB allocations. The goal being when you're in a period of intense allocation, to do less work trying to clean up.

Would be interesting to see how those low level techniques play out in a high level application...

fwiw,
lloyd

Lloyd -- looks interesting, we'll check it out.

This looks like it has some potentially memory saving stuff, too.
http://pennysmalls.com [I'm serious].
-R

Post a comment

If you have a TypeKey or TypePad account, please Sign In

Buzzletter

  • Hear about our new Ruby on Rails performance improvements, hacks, recipes, plugins & more. Enter your email below