iostat -x

My favorite Linux tool in DB work is ‘iostat -x’ (and I really really want to see whenever I’m doing any kind of performance analysis), yet I had to learn its limitations and properties. For example, I took 1s snapshot from a slightly overloaded 16-disk database box:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.12    0.00    2.57   21.65    0.00   67.66

Device:  rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s \
sda     7684.00    19.00 2420.00  498.00 81848.00  5287.00 \

        avgrq-sz avgqu-sz   await  svctm  %util
           29.86    32.99   11.17   0.34 100.00

Thanks again to Mike Montano from Backtype. Great article to explain the simplest of Linux tools for diagnostics and troubleshooting.

ProgressBar gem for long migrations is super useful so you know how far along you are

Use the progressbar gem for long running data migrations

And one other thing that is not included with rails that you should probably be using anyway: the progressbar gem.  If you any long running data migrations, this is a must.  And just because it isn’t long running for you with your developer DB doesn’t mean it won’t be long running during deployment to production.  It’s trivially easy to use, and your deployer’s won’t be stuck wondering if their connection has been dropped or the migration has locked up.  And the ETA will let them know if they have time to get a cup of coffee.  The other developers and deployers will thank you.

Simple Example

(albeit, also a poorly contrived example)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
require 'progressbar'
 
class CreateWidgetAuxiliaryFrobs < ActiveRecord::Migration
 
  def self.up
    create_table :widget_auxiliary_frobs do |t|
      t.integer "widget_id"
      t.string  "frob_type"
      t.integer "frobitude"
      # etc...
    end
 
    say_with_time("migrating froms from widgets") do
      widgets = Widget.find(:all)
      pbar = ProgressBar.new("Generating Widget Frobs", widgets.size)
      widgets.each do |w|
       # this code changes the data irreversibly
       # this code can't be (easily) rewritten with a SQL UPDATE or INSERT
       # etc  etc  etc
       pbar.inc
      end
      pbar.finish
    end
 
    say_with_time("delete obsolete widget/wadgit data") do
      Wadget.delete_all("value = 'kerfluffle'")
      remove_column :widget, :foo
      remove_column :widget, :bar_id
      # etc...
    end
  end
 
  def self.down
    raise ActiveRecord::IrreversibleMigration
  end
end

I love rails. Every so often you see a blog post about some little bit of code that makes you go -- cool, I always wanted that.

Percona Performance Conference 2009 in Santa Clara looks badass. And it's free.

Leave it to the folks at Percona to put together a slate of tech talks that put expensive tech conferences to shame. And all for the low low price of free. Awesome.

The schedule like it's full of quite useful and practical talks for scaling MySQL, search technologies, storage engines, and just ways to do things faster, cheaper and better.

I'll be there April 22-23. Let me know if you will to!

Scalability Strategies Primer: Database Sharding

Why are we partitioning our dataset and how does it help us to achieve scalability of our application?

It is difficult, if not nearly impossible, to massively scale your data layer when the data is limited to residing on a single server. Whether the limiting factor is a hardware cost issue, or you’ve simply equipped your server with the highest performing hardware possible, we ultimately find ourselves up against a wall – there are inherent limitations to what is currently possible by vertical scaling our hardware, it is a simple matter of fact. If we instead take our dataset schema, duplicate it onto multiple servers (shards), and split (or partition) the data on the original single server into equal portions distributed amongst our new set of servers (shards), we can parallelize our query load across them. Adding more servers (shards) to our existing set of servers results in near limitless scalability potential. The theory behind how this works is simple, but the execution is a fair bit more complex thanks to a series of scale-specific issues one encounters along the way.

Great article, hat tip to Mike Montano from Backtype. We are both going through massive scale growing pains. Articles like this are like a balm for burn victims. ;-)