MongoDB Gotcha #1 - Watch your indexes and your order by's

So we're experimenting with dropping MongoDB into production here with some simple stuff. We ran into a problem where simple selects (that have indexes!) were taking 300msec to return. After digging into the database profiler (which is built-in and quite well done), we noticed something odd in the query -- it was full table scanning 600K rows. Why? Brett figured that it was this term in the query: orderby: { $natural: 1 }

But where had that come from? We didn't add it. Turns out MongoMapper uses it by default when you call find(:first, ...). Unfortunately this order by can overrule any indexes you might have intended on using.

We fixed this by adding intentional ordering to our find query that matched our indexes: find(:first, :conditions => {:model_id => model.id, :model_class => model.class.to_s}, :order => 'model_id asc, model_class asc')

300msec queries suddenly jumped down to 4 milliseconds. Problem solved. But very non-intuitive.

Handy: Regexes for smart quotes

//Quotes: Replace smart double quotes with straight double quotes.
//ANSI version for use with 8-bit regex engines and the Windows code page 1252.
preg_replace('[\x84\x93\x94]', '"', $text);

//Quotes: Replace smart double quotes with straight double quotes.
//Unicode version for use with Unicode regex engines.
gsub('[\u201C\u201D\u201E\u201F\u2033\u2036]', '"', $text);

//Quotes: Replace smart single quotes and apostrophes with straight single quotes.
//Unicode version for use with Unicode regex engines.
gsub("[\u2018\u2019\u201A\u201B\u2032\u2035]", "'", $text);

//Quotes: Replace smart single quotes and apostrophes with straight single quotes.
//ANSI version for use with 8-bit regex engines and the Windows code page 1252.
gsub("[\x82\x91\x92]", "'", $text);

//Quotes: Replace straight apostrophes with smart apostrophes
gsub("\b'\b", "?", $text);

//Quotes: Replace straight double quotes with smart double quotes.
//ANSI version for use with 8-bit regex engines and the Windows code page 1252.
gsub('\B"\b([^"\x84\x93\x94\r\n]+)\b"\B', '?\1?', $text);

//Quotes: Replace straight double quotes with smart double quotes.
//Unicode version for use with Unicode regex engines.
gsub('\B"\b([^"\u201C\u201D\u201E\u201F\u2033\u2036\r\n]+)\b"\B', '?\1?', $text);

//Quotes: Replace straight single quotes with smart single quotes.
//Unicode version for use with Unicode regex engines.
gsub("\B'\b([^'\u2018\u2019\u201A\u201B\u2032\u2035\r\n]+)\b'\B", "?\1?", $text);

//Quotes: Replace straight single quotes with smart single quotes.
//ANSI version for use with 8-bit regex engines and the Windows code page 1252.
gsub("\B'\b([^'\x82\x91\x92\r\n]+)\b'\B", "?\1?", $text);

Amazing bash scriptaculousness: Find out how long a MySQL schema change will take

#!/bin/bash
(while(true); do  \
(mysql -e 'show innodb status \G' | grep undo\ log\ entries ; sleep 1 ;  \
mysql -e 'show innodb status \G' | grep undo\ log\ entries ) |    \
egrep '[0-9][0-9][0-9][0-9]‘ |awk ‘{print $10;}’ ; done ) | \
perl -e ‘$t = ROWS_IN_TABLE; while(1) { \
$n ++; $nn; $a = <>; $b = <>; $nn += ($b-$a); \
printf “Rate: %d, avg: %d, %0.3f%% complete, done in %d sec\n”, \
$b-$a, $nn/$n, ($b/$t)*100, ($t-$b)/($nn/$n); }’;

God, altering tables is the biggest craptastic part of MySQL. Luckily this takes some of the mystery out of an otherwise harrowing experience.

+10,000 usefulness points if you use MySQL with any scale

Hat tip @michaelmontano

WTF is a SuperColumn? An Intro to the Cassandra Data Model (Arin Sarkissian explains)

For the last month or two the Digg engineering team has spent quite a bit of time looking into, playing with and finally deploying Cassandra in production. It’s been a super fun project to take on – but even before the fun began we had to spend quite a bit of time figuring out Cassandra’s data model… the phrase “WTF is a ’super column’” was uttered quite a few times. :)
via arin.me

Digg has gone long on Cassandra. Up until this point I was pretty interested in MongoDB. But suddenly maybe Cassandra is the go-to non-sql data store. If it's good enough for Twitter, Facebook and Digg...

Great, well written, detailed article on understanding Cassandra by Arin Sarkissian here.