Archive for

December 2010

Netflix has a Chaos Monkey: A process to randomly kill things so as to engineer for failure

One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage.

On the face of it, it seems insane. Why would you intentionally kill parts of your site? Yet in practice, being able to handle failure consistently means you are never ever surprised.

Edit: My friend Vinny Magno writes:

Automobile assembly lines started doing the same thing (I think Ford was the first, though I might be wrong).

A single line can produce multiple models back to back. So for example, a Focus may be followed by an Explorer and then an Escape. At one point, the chassis and body are aligned and welded together. Obviously welding a Focus body to an Explorer chassis would be a problem, but the automation is so good that the defect rate was incredibly low. Since such a defect happened so infrequently, operators became complacent and failed to catch it the times that it did occur. To solve the problem, Ford purposefully upped the error rate to keep the operators sharp.

Turns out there's precedence!

Posted

ActiveRecord Table Transform (or, how to write to the db 27,000 times in 24 seconds instead of 9 minutes)

I implemented my new scheme and running time went from 9 minutes to 24 SECONDS. I liked this approach so much I decided to generalize it as ActiveRecord::Base.transform. Sample usage:

# if users don't have names, give them a random one
NAMES = ['Adam', 'Ethan', 'Patrick']
User.transform(:name, :conditions => 'name is null').each do |i|
  i.name = NAMES[rand * NAMES.length]
end

Really interesting use of temp tables here.

Filed under  //  Ruby on Rails   activerecord  
Posted