« Performance Freak's View on RailsConf 2008 | Main | Why You Should Use 'Cache-Control: public' for Caching to Work with SSL »

July 02, 2008

The Road to Hell Is Paved With Caching

In Acunote we are heavily using fragment caching. We cache whenever we can and whatever we can. Sounds like a great idea to do? Well, almost... Sometimes caching hurts and here's why.

Prehistory

Acunote is a project management tool and therefore the most prominent page in Acunote is the task list in the current sprint. The page itself shows some information specific to the user who logged in and some information that is shared among the users within the user's organization.

We can cache the shared part of the page in the fragment cache taking date and page filters into account. The code roughly looks like this:

<%- cache(:part => "#{filters}_#{DateUtils::today}") do -%>
    <%= render :partial => '...' -%>
<%- end -%>

Using filesystem caching store we end up having this (simplified) cache directory layout:

 - organization1.acunote.com
   - sprints
     - show.part=filter1.date.cache
     - show.part=filter2.date.cache
     - show.part=filter3.date.cache
 - organization2.acunote.com
   - sprints
     - show.part=filter1.date.cache
     - show.part=filter2.date.cache
     - show.part=filter3.date.cache

Problem

Problems start when we try to expire caches. We can't expire by url or by hash because we want to expire all cache parts (for all possible filters and for all dates). Therefore we use regexp:

expire_fragment(%r{show.part=.*})

That turned into a problem in production. The reason is that expire_fragment in Rails searches the whole cache directory recursively and removes files with names matching the regular expression. On machines with local disks that's usually fast, but on network filesystems or on shared disk cluster filesystems (like GFS in our case) stat'ing lots of files can be really slow. To make things worse, the performance of stat call on GFS really depends on the phase of the moon or something like that ;)

Our logs showed that operations that would usually take less than a second took on occasion took 70 seconds! They were so slow so I could actually log in to our production servers, find the process taking too much time to complete and attach strace to it. Strace blessed me, immediately showing multiple stat's for files in the cache directory.

Solution

Longer term we'll switch to a more scalable fragment caching strategy. There is a good reason why high-performance caches do not support fancy invalidation, and there are known way to code to their more restricted interface. Meanwhile, immediate solution was really simple. We only had to expire fragments in specific directories and to make that easier, I wrote a little extension to Rails' fragment caching code:

module ActionController
  module Caching
    module Fragments

    #dir is the cache path relative to the cache root
    def expire_matched_fragment_in_dir(dir, regexp, options = nil)
      return unless perform_caching
      self.class.benchmark("Expired fragments in dir matching: #{regexp.source}") do
        fragment_cache_store.delete_matched_in_dir(dir, regexp, options)
      end
    end

    class UnthreadedFileStore

      def delete_matched_in_dir(dir, matcher, options = nil)
        path = @cache_path + dir
        return unless File.exist?(path) #it's ok to not have the cache dir
        search_dir(path) do |f|
          if f =~ matcher
          begin
            File.delete(f)
          rescue SystemCallError => e
            # If there's no cache, then there's nothing to complain about
          end
          end
         end
        end
      end

    end

  end
end

#include only if you use cache test plugin
module Cosinux
  module FragmentCacheTest
    class TestStore
      def delete_matched_in_dir(dir, matcher, options = nil)
        @deleted_matchers.push(matcher)
      end
    end
  end
end

Conclusions

Monitoring the performance of production application is sometimes even more important than profiling it on the development machines. We already had monitoring system in place so we could detect when our application got slower, figure out the root cause and and fix the problem.

Another thing I highly recommend is of course strace. You may indeed see that your application is doing something so weird you'd never think of just looking at the code or at the profiler. So, I now declare strace to be officially added into my own arsenal of invaluable profiling tools.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451c8d369e200e5539acc6c8834

Listed below are links to weblogs that reference The Road to Hell Is Paved With Caching:

Comments

Honestly, you're really best off using memcached with Rails' sweepers. It's more work, yes, but it's well worth it. memcached scales across multiple servers, so you can cache for an entire cluster as opposed to per-machine, and it's in-memory access, which avoids the ever-so-slow disk seek time.

If you're clever about your cache key naming, you can have an even easier time of it. For example, if you have a page of blog posts, and you cache by blog post, then you need a SQL query to fetch the timestamp of the most recent post (which should be blazing fast with a properly indexed column). Once you have that, you can build a cache key to check for ("blog_index_7.2.2008.14.45.14"). If a new post has been added, then the cache key changes and a new page fragment is cached. memcached automatically expires old fragments once it runs out of room, so your caching expiry happens transparently without any need for you to do anything.

As Phil Karlton said, "there are only two hard things in Computer Science: cache invalidation and naming things". It's worth doing it right.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Buzzletter

  • Hear about our new Ruby on Rails performance improvements, hacks, recipes, plugins & more. Enter your email below