Axon Flux // A Ruby on Rails Blog

the rate of signal flow across long nerve fiber between neurons 

MongoDB wasn't created in a lab

MongoDB wasn’t designed in a lab.  We built MongoDB from our own experiences building large scale, high availability, robust systems.  We didn’t start from scratch, we really tried to figure out what was broken, and tackle that.  So the way I think about MongoDB is that if you take MySql, and change the data model from relational to document based, you get a lot of great features: embedded docs for speed, manageability, agile development with schema-less databases, easier horizontal scalability because joins aren’t as important.  There are lots of things that work great in relational databases: indexes, dynamic queries and updates to name a few, and we haven’t changed much there.  For example, the way you design your indexes in MongoDB should be exactly the way you do it in MySql or Oracle, you just have the option of indexing an embedded field.

The MongoDB creator Eliot and his team are really on to something. Because it just so happens, after years of fighting MySQL, its pretty clear what they're creating is what we want.

Loading mentions Retweet

Comments [0]

Lessons from scaling Scribd.com: How to Break a Large Website (and how not to)

Oh, we have learned these lessons with posterous too. Good times.

Loading mentions Retweet

Comments [1]

How to make multipart split tarballs for backup to S3 to fit under the 5GB limit

If you want to burn the archive to discs, or transfer them to a filesystem with a limited max filesize (say FAT32 with a limit of 4GB per file) then you will have to split the file either during or after archive creation. A simple means is to use the split command. Below are examples of both scenarios. More information than conveyed here can be found in the man pages of split, use man split in a terminal to read. Ensure you keep these archives all together in a directory you label for extraction at a later date. Once the archives are split to a desirable size, they can be burned one at a time to disc.

To Split During Creation

tar -cvpz <put options here> / | split -d -b 3900m - /name/of/backup.tar.gz. 
  • The first half until the pipe (|) is identical to our earlier example, except for the omission of the f option. Without this, tar will output the archive to standard output, this is then piped to the split command.
  • -d - This option means that the archive suffix will be numerical instead of alphabetical, each split will be sequential starting with 01 and increasing with each new split file.

  • -b - This option designates the size to split at, in this example I've made it 3900mB to fit into a FAT32 partition.

  • - - The hyphen is a placeholder for the input file (normally an actual file already created) and directs split to use standard input.

  • /name/of/backup.tar.gz. Is the prefix that will be applied to all generated split files. It should direct to the folder you want the archives to end up. In our example, the first split archive will be in the directory /name/of/ and be named backup.tar.gz.01 .

To Split After Creation

split -d -b 3900m /path/to/backup.tar.gz /name/of/backup.tar.gz. 
  • Here instead of using standard input, we are simply splitting an existing file designated by /path/to/backup.tar.gz .

To Reconstitute the Archive
Reconstructing the complete archive is easy, first cd into the directory holding the split archives. Then simply use cat to write all the archives into one and send over standard output to tar to extract to the specified directory.

cat *tar.gz* | tar -xvpzf - -C /  
  • The use of * as a wild card before and after tar.gz tells cat to start with first matching file and add every other that matches, a process known as catenation, how the command got its name.
  • Afterwards, it simply passes all that through standard output to tar to be extracted into root in this example.

S3 has a 5GB limit. Here's how to tarball your backups up so that you can still throw them over to Amazon (or Cloudfiles, or whomever)

Loading mentions Retweet

Comments [0]

Using resolver for domains

OS X has a very cool feature built into to its resolver: /etc/resolver. It allows you to specify different DNS servers for different domains. After creating the /etc/resolver directory, I can create a /etc/resolver/erdelynet.com file with “nameserver 192.168.25.10″ in it. Now, my Mac will use 192.168.25.10 for resolving erdelynet.com and whatever my ISP assigned me for everythying else.

Loading mentions Retweet

Comments [0]

Bookmarklet hackers: You should know about this false positive XSS in Chrome and how to workaround it.

Recently I ran into a problem working with the Vodpod bookmarklet on Google Chrome. My popup window was throwing a Javascript error, and it turned out to be this security error:

Refused to execute a JavaScript script. Source code of script found within request."

I googled around and couldn’t figure out what was going on. Finally I figured out the problem, and it’s right there in the error. The issue was that my bookmarklet was passing an embed code for a video in the POST to my popup window. Then server side I was spitting that embed code out into the Javascript included on my page. This is a no-no for Chrome – it checks all the Javascript in your loaded page for any code that matches data in the POST. If it finds a match it assumes you’ve got an XSS attack and it prevents that code from being inserted into the new page.

The workaround was to have the bookmarklet encode the embed codes, and then decode those values on the server side before rendering my page. This way the POST data doesn’t match the new page source (POST data is encoded, page source is not). Simple.

Fix for Posterous bookmarklet coming right up. This is undoubtedly a Chrome bug, but we all have to live with it.

Loading mentions Retweet

Comments [0]

Cool TinyMCE plugin: autoresizing HTML text areas like a champ.

This will be handy in the new post editor for sure...

Loading mentions Retweet

Comments [0]

5 quotes by the creator of PHP, Rasmus Lerdorf: I don't like programming, and I'm not a real programmer.

  • I really don't like programming. I built this tool to program less so that I could just reuse code.
  • PHP is about as exciting as your toothbrush. You use it every day, it does the job, it is a simple tool, so what? Who would want to read about toothbrushes?
  • I was really, really bad at writing parsers. I still am really bad at writing parsers. We have things like protected properties. We have abstract methods. We have all this stuff that your computer science teacher told you you should be using. I don't care about this crap at all.
  • There are people who actually like programming. I don't understand why they like programming.
  • I'm not a real programmer. I throw together things until it works then I move on. The real programmers will say "yeah it works but you're leaking memory everywhere. Perhaps we should fix that." I'll just restart apache every 10 requests.

Hat tip @igrigorik

Go Ruby!

Loading mentions Retweet

Comments [2]

Uninstalling and reinstalling all ruby gems

sudo gem list | cut -d" " -f1 > gem_list.txt
cat gem_list.txt | xargs sudo gem uninstall -aIx
cat gem_list.txt | xargs sudo gem install

This is useful if you ever want to just reset your gems and clean them up. Or if you upgrade to Snow Leopard and want to make sure all your native gems are in decent shape (since Snow Leopard is 64-bit).

Loading mentions Retweet

Comments [1]

How Superfeedr built Analytics using MongoDB

We weren’t quite sure how to build these analytics. We slowly established a set of requirements and constraints

  • Zero performance impact
  • Fully decoupled from the current infrastructure
  • Results at most hourly
  • Data is more important than graphs
  • Easily-extensible, in case we want to measure more things

This is a really interesting tech read by our friend Julien from Superfeedr.

We've been experimenting with MongoDB in-house as well. It'll be interesting how things shake out over the next year w.r.t. nosql implementations... We're also using Redis and have found that to be much faster / more reliable, though simpler in some respects.

Loading mentions Retweet

Comments [3]

Inspect a live running Ruby process

Are you still adding printf/puts calls and restarting your app to figure what went wrong? Sometimes, the problem is hard to reproduce, or you only discover it in production. You've got a process that exhibits the bug, but you didn't run it under ruby-debug, so there's no choice but kill it and reproduce after adding some code to inspect your program, right?

Sure not. Jamis Buck blogged about how to use GDB to inspect a live Ruby process, and showed how to get a stack-trace using Ruby's C API and some GDB scripting:

(gdb) set $ary = (int)backtrace(-1)
(gdb) set $count = *($ary+8)
(gdb) set $index = 0
(gdb) while $index < $count
>  x/1s *((int)rb_ary_entry($ary, $index)+12)
>  set $index = $index + 1
>end

But it gets much easier than that. How about this:

(gdb) eval "caller"

or

(gdb) eval "local_variables"

Once you've opened that door, you get a full-powered Ruby interpreter inside GDB. Ruby's introspection capabilities do the rest. Local variables, instance variables, classes, methods, Ruby threads, object counts... evil eval can bring us pretty far. You can find the scripts to turn GDB into a sort of IRB that can attach to running processes below.

This turned out to be massively useful today while debugging some errors with the Facebook API on our live site -- it was causing zombie passenger workers.

There's something about using GDB to find problems with production instances just makes you feel cool.

Loading mentions Retweet

Comments [1]