« Garbage Collection is Why Ruby on Rails is Slow: Patches to Improve Performance 5x; Memory Profiling | Main | Make Your Ruby/Rails App Fast: Performance And Memory Profiling Using ruby-prof and KCachegrind »

January 31, 2008

Guerrilla's Guide to Optimizing Rails Applications

The Battle For Performance

Rails is slow. Your rails application which does fairy simple database operations will be born slow and will eventually become slower and slower while still not doing anything fancy. That's like getting a new powerful locomotive (on Rails) together with a free bonus of overladen freight cars.

Having said that, I can virtually hear the herd of voices screaming "dude, optimize your own application and don't blame Ruby and Rails for everything". Sure thing, our application isn't perfect and I do believe it can be optimized but you can't win the performance battle just by that.

The winning tactics for this battle is guerrilla tactics. It turns out you can guerrilla-patch (aka monkey-patch) Ruby and Rails and easily get up to 3x speedup in some cases. We started doing that to our own application - Acunote and got amazing results so far. Read on to find out how.

Ruby-Prof: The Guerrilla's Weapon

Acunote, our pet rails application, often need to copy a bunch of ActiveRecord (AR) objects. As any other project management tools, our application works with tasks organized into iterations (sprints). Sometimes you want to copy tasks from current iteration/sprint to the next iteration/sprint. And boy, that copy can become really slow. As an example, let's look at the performance and profiling data of one request to copy 120 tasks.

Let's measure 3 things:

  • the real time request takes in production environment;
  • processing time profile information using the excellent ruby-prof profiler by Charlie Savage and Shugo Maeda;
  • memory profile information gathered with patched ruby-prof (I took Sylvain Joyeux's memory allocations measurement mode patches and hacked together my own ruby-prof memory size measurement mode).

So, for "copy 120" request we get:

Request time (without profiler): 13.6 sec.

Time:
 Total: 16.78
 %self     total     self     wait    child    calls  name
 18.36      3.34     3.08     0.00     0.26    20766  Kernel#clone
  ...

Allocated Memory Size (in KB):
 Total: 1 001 730.00
 %self     total       self     wait    child    calls  name
 61.87  646795.00 619745.00     0.00 27049.00    20766  Kernel#clone
  ...

"WTF" politely describes my initial reaction when I first saw that. How on earth we can spend 17 seconds and consume 1 Gigabyte of memory by just copying 120 ActiveRecord objects? Ok, ok, we not only copy them but store some more information to the database like task tags, copy events and so on. Therefore we end up saving actually 360 AR objects instead of 120. Also the memory profiler is inaccurate and usually shows a bit more memory than the process actually takes.

Still, what is going on during those 17 seconds? Why do we need a gig of RAM? Gee, on the same machine I can compile the whole ruby interpreter in 57 seconds and the compiler will never take more than 130M...

Heh, let's calm down and see what profiler says. The obvious suspect is Kernel#clone method. The (stripped) call tree for it as reported by ruby-prof is

Task#save
...
ActiveRecord::Base#create_without_callbacks
ActiveRecord::Base#attributes_with_quotes
ActiveRecord::Base#attributes
ActiveRecord::Base#clone_attributes
ActiveRecord::Base#clone_attribute_value
Kernel#clone

From Charlie's blog we know that AR::Base::attributes method is evil. It clones attributes before returning them and Charlie gave a wise advice to not use it. And indeed in your application is a good idea to call, for example, task['foo'] instead of task.foo or task.attributes['foo']. But here AR::Base::create itself does the evil thing.

Let me cite Rails code (from rails/activerecord/lib/activerecord/base.rb):

def update
    connection.update(
        "UPDATE #{self.class.table_name} " +
        "SET #{quoted_comma_pair_list(connection, attributes_with_quotes(false))} " +
        "WHERE #{self.class.primary_key} = #{quote_value(id)}",
        "#{self.class.name} Update"
    )
end

def create
    if self.id.nil? && connection.prefetch_primary_key?(self.class.table_name)
        self.id = connection.next_sequence_value(self.class.sequence_name)
    end
    self.id = connection.insert(
        "INSERT INTO #{self.class.table_name} " +
        "(#{quoted_column_names.join(', ')}) " +
        "VALUES(#{attributes_with_quotes.values.join(', ')})",
        "#{self.class.name} Create",
        self.class.primary_key, self.id, self.class.sequence_name
    )
    @new_record = false
    id
end

def quoted_column_names(attributes = attributes_with_quotes)
    attributes.keys.collect do |column_name|
        self.class.connection.quote_column_name(column_name)
    end
end

def attributes_with_quotes(include_primary_key = true)
    attributes.inject({}) do |quoted, (name, value)|
        if column = column_for_attribute(name)
            quoted[name] = quote_value(value, column) unless !include_primary_key && column.primary
        end
        quoted
    end
end

Great! AR::create clones all attributes twice (first with quoted_column_names and then with attributes_with_quotes). AR::update is a nicer guy, it clones attributes only once. For each of those 120 copied tasks we use AR::create twice and AR::update once. Therefore we call clone_attributes not less than 120*2*2+120 = 600 times.

Profiler says we lose 3 seconds and 650M memory because of that:

ActiveRecord::Base#clone_attributes:
Time:          3.21 sec
Memory:  650 301.00 K

Ok. Time to employ our promised guerrilla tactics, but first let's talk about memory.

Lyrical Digression: Why Do We Care About Memory?

While guerrilla warriors take a nap before the battle, let's digress and look at why we measure the size of allocated memory. Sure thing, allocating literally tons of memory is bad. But we're using Ruby and Ruby has its own guerrilla warrior which strikes our application when it innocently eats its tasty memory.

The name of our enemy is "Ruby Garbage Collector" and the cunning enemy it is. As ruby's gc.c source says and Why the Lucky Stiff explains, ruby GC kicks in when you allocate more memory than is available on the heap and heap boundary is 8M. When we sequentially allocate 650 megabytes there's a chance that GC will be called 650/8=81 times. Each GC call accounts for approximately 70-80ms which in our case should add roundabout 5 seconds to the running time.

Let's do some math... Kernel#clone itself takes 3 seconds and we expect garbage collection to take 5 more seconds. In total, 8 seconds are wasted because of cloning. See the trend? The more memory we allocate the worse our performance is.

Here's the morale: "watch your memory". Large memory consumption costs you more than you'd otherwise think because of garbage collection.

Guerrilla patching AR::create and AR::update

Now it's time to employ our guerrilla tactics and fix ActiveRecord. It's easy to rewrite attributes_with_quotes and avoid cloning the attributes:

module ActiveRecord
    class Base
    private
        def attributes_with_quotes(include_primary_key = true)
            result = {}
            @attributes.each_key do |name|
                if column = column_for_attribute(name)
                    result[name] = quote_value(read_attribute(name), column) unless !include_primary_key && column.primary
                end
            end
            result
        end
    end
end

Now let's put that to environment.rb and see what we get:

Request time (without profiler): 6.7 sec.

Time:
 Total: 9.5
 %self     total     self     wait    child    calls  name
  4.84      0.69     0.46     0.00     0.23    24023  ActiveRecord::ConnectionAdapters::Quoting#quote
  ...
  2.22      0.52     0.18     0.00     0.34     2779  <Module::Benchmark>#realtime
  ...
  0.00      0.00     0.00     0.00     0.00        6  Kernel#clone
  ...

Allocated Memory Size (in KB):
 Total: 350 966.00
 %self     total       self     wait    child    calls  name
 34.31  125655.00 120420.00     0.00  5235.00     2779  <Module::Benchmark>#realtime
  ...
  0.00       0.00      0.00     0.00     0.00        6  Kernel#clone
  ...

Ok, much better: 650 megabytes of memory saved with that little monkey patch! We estimated that will save 8 seconds. Let's look:

Original request time:          13.6 sec
Request time after AR patch:     6.7 sec
Time saved:                      6.9 sec

It turns out our estimation was almost correct. We indeed saved not only the time taken by Kernel#clone but also the time spent by garbage collector and gained 2x speedup for free. Nice! Our guerrilla tactics clearly works and we can for sure do better than 2x.

Guerrilla-patching Benchmark.realtime

Next suspect, as seen from memory profile above, is Benchmark#realtime. From time profiler the method doesn't look bad. We spend only 180 milliseconds in it but look at memory profiler! It 180ms our Benchmark#realtime friend allocates 120 megabytes of memory. Guess what? Ruby garbage collector will be happy to kick our ass again at least 15 times and that estimates to about 1 second loss.

Let's see what's wrong in benchmark.rb:

def measure(label = "")
    t0, r0 = Benchmark.times, Time.now
    yield
    t1, r1 = Benchmark.times, Time.now
    Benchmark::Tms.new(t1.utime  - t0.utime, t1.stime  - t0.stime,
        t1.cutime - t0.cutime, t1.cstime - t0.cstime, r1.to_f - r0.to_f, label)
end

def realtime(&blk)
    Benchmark::measure(&blk).real
end

Great, we calculate lots of useless information (like cpu time, user time, etc) in Benchmark#measure and then just throw it away. Let's do better, simpler and more memory efficient:

module Benchmark
    def realtime
        r0 = Time.now
        yield
        r1 = Time.now
        r1.to_f - r0.to_f
    end
    module_function :realtime
end

That's it. No extra object creation. And here's our income:

Request time (without profiler): 5.8 sec.

Time:
 Total: 8.63
 %self     total     self     wait    child    calls  name
  4.29      0.69     0.46     0.00     0.23    24023  ActiveRecord::ConnectionAdapters::Quoting#quote
  ...
  0.35      0.19     0.03     0.00     0.16     2779  <Module::Benchmark>#realtime
  ...

Allocated Memory Size (in KB):
 Total: 225 668.00
 %self     total       self     wait    child    calls  name
 43.78   98805.00  98805.00     0.00     0.00     5280  Kernel#binding
  0.00    1918.00      2.00     0.00  1916.00     2779  <Module::Benchmark>#realtime
  ...

Whoa! 120 megabytes and 1 second were saved with 9-liner guerrilla patch. Nice, isn't it? Instead of original 13.6 sec our request now runs in 5.8 sec. That's more than 2x improvement already and we can do even better, but I'll leave that for the next blog.

When Guerrilla Tactics Help

Nothing with Ruby and Rails is wrong until you do things in the loop. For example, Benchmark#realtime doesn't have any impact if all you do is save one AR object. You just allocate 25 extra objects that take 45 extra kilobytes. But you're doomed once you do that in a loop. Do the benchmark 200 times and you'll guaranteed the garbage collector will happily shoot you in the back.

Also bear in mind that garbage collector should not be running while you profile your code. With GC you'll never get repeatable results because it will kick in at different times randomly increasing the time and memory consumption of your methods.

Make Your Application Fast: Optimize MEMORY

Let me reiterate, by optimizing memory usage instead of (or in parallel with) processing time you can get significant performance improvement, and guerrilla patching memory-inefficient bits is the right way to make Ruby and Rails faster I think we're on the right track here. Memory is what we need to optimize. Memory is what wasn't optimized before. And you've just seen what we can get by optimizing memory.

We (Acunote devs) will continue the memory profiling and will blog as soon as we get new results. Our previous blog post briefly summarize our current results, be sure to read it if you haven't done that yet. Meanwhile, don't wait for us, grab ruby-prof, my memory profiler patches and dig by yourself ;) I bet you'll find more big memory eaters and when you do so, please drop us a note. Let's collect our guerrilla patches and then file them upstream.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/1040956/25685628

Listed below are links to weblogs that reference Guerrilla's Guide to Optimizing Rails Applications:

Comments

Have these patches been submitted to the Rails core?

Never mind. Looked at the previous post and got my answer.

I really love seeing people optimising the Rails core like this. Fantastic.

i have been trying the new ruby-prof ver 0.6.0
but i could't c the memroy profiling in the log
i followed ur instructions in patching ruby gc and using the line RubyProf.measure_mode = RubyProf::MEMORY

but still unable to run memory profiling in my rails app :(
can u explain more how to do so?

I'm guessing ruby can be patched to raise the memory limit at which point its GC gets called?

Great stuff though -- looking forward to seeing some of this in Rails core.

Greetings,
It's great stuff, and certainly valuable, but GC pauses should never impact the user, since I believe you should be handling all page requests with the GC disabled, and only enable (and start) GC after you've disconnected from the user.

I believe FastCGI, for instance, does it this way.

It's definitely nice to get rid of unnecessary allocations, but if you're running Rails with GC disabled during the page request, I believe these suggestions probably won't net you much user-visible speedup.

I'd *love* to hear differently mind you, as I do, in fact, work on a large consumer Rails app. :)

-- Morgan

Usual deployment scheme for rails app is running several mongrels on one or more machines. In this case you can't disable GC. One of the lengthy requests processed in one mongrel can take too much memory and other mongrels will be swapping all the time. So the overall performance will go down. And like I described in the blog post, it's not that uncommon for rails process running in mongrel to take 1 Gig of RAM ;)

Things only get worse if you run on memory-restricted hosting, say VPS with 512M of memory. You basically need GC there.

What you can certainly do is to increase the memory threshold for ruby GC from 8M to 16M or 32M (read http://whytheluckystiff.net/articles/theFullyUpturnedBin.html for more insight). Thus you'll have fewer GC collections but each of them will take a bit longer time. Still, you'll need to manually fine-tune the setting to get the good performance.

Hi, thanks for this great write up on ruby-prof. I'm following along and have rebuilt ruby with the patch included above. After changing my path and verifying that I'm using the patched version of ruby, I've found that this patched version of ruby can't find my installation of ruby gems:


irb(main):001:0> require 'rubygems'
LoadError: no such file to load -- rubygems
from (irb):1:in `require'
from (irb):1
irb(main):002:0>


Did you run into this issue?

Please check that you've installed the new ruby into the same location as the old one. In case you installed it into the new location, you'd need to install rubygems package and all your gems again.

Thanks! That worked! I also found that if I symlinked gems from my new location's bin, it would also work with the new location. Thanks again.

Correction: My little symlink trick didn't actually work, but the solution you described above did.

Post a comment

If you have a TypeKey or TypePad account, please Sign In

Buzzletter

  • Hear about our new Ruby on Rails performance improvements, hacks, recipes, plugins & more. Enter your email below