Axon Flux // A Ruby on Rails Blog

Lessons from scaling Scribd.com: How to Break a Large Website (and how not to)

Oh, we have learned these lessons with posterous too. Good times.

How to make multipart split tarballs for backup to S3 to fit under the 5GB limit

If you want to burn the archive to discs, or transfer them to a filesystem with a limited max filesize (say FAT32 with a limit of 4GB per file) then you will have to split the file either during or after archive creation. A simple means is to use the split command. Below are examples of both scenarios. More information than conveyed here can be found in the man pages of split, use man split in a terminal to read. Ensure you keep these archives all together in a directory you label for extraction at a later date. Once the archives are split to a desirable size, they can be burned one at a time to disc.

To Split During Creation
tar -cvpz <put options here> / | split -d -b 3900m - /name/of/backup.tar.gz.
The first half until the pipe (|) is identical to our earlier example, except for the omission of the f option. Without this, tar will output the archive to standard output, this is then piped to the split command.

-d - This option means that the archive suffix will be numerical instead of alphabetical, each split will be sequential starting with 01 and increasing with each new split file.

-b - This option designates the size to split at, in this example I've made it 3900mB to fit into a FAT32 partition.

- - The hyphen is a placeholder for the input file (normally an actual file already created) and directs split to use standard input.

/name/of/backup.tar.gz. Is the prefix that will be applied to all generated split files. It should direct to the folder you want the archives to end up. In our example, the first split archive will be in the directory /name/of/ and be named backup.tar.gz.01 .

To Split After Creation
split -d -b 3900m /path/to/backup.tar.gz /name/of/backup.tar.gz.
Here instead of using standard input, we are simply splitting an existing file designated by /path/to/backup.tar.gz .

To Reconstitute the Archive
Reconstructing the complete archive is easy, first cd into the directory holding the split archives. Then simply use cat to write all the archives into one and send over standard output to tar to extract to the specified directory.
cat *tar.gz* | tar -xvpzf - -C /
The use of * as a wild card before and after tar.gz tells cat to start with first matching file and add every other that matches, a process known as catenation, how the command got its name.

Afterwards, it simply passes all that through standard output to tar to be extracted into root in this example.

via help.ubuntu.com

S3 has a 5GB limit. Here's how to tarball your backups up so that you can still throw them over to Amazon (or Cloudfiles, or whomever)

Using resolver for domains

OS X has a very cool feature built into to its resolver: /etc/resolver. It allows you to specify different DNS servers for different domains. After creating the /etc/resolver directory, I can create a /etc/resolver/erdelynet.com file with “nameserver 192.168.25.10″ in it. Now, my Mac will use 192.168.25.10 for resolving erdelynet.com and whatever my ISP assigned me for everythying else.

via erdelynet.com

Bookmarklet hackers: You should know about this false positive XSS in Chrome and how to workaround it.

Recently I ran into a problem working with the Vodpod bookmarklet on Google Chrome. My popup window was throwing a Javascript error, and it turned out to be this security error:

“Refused to execute a JavaScript script. Source code of script found within request."

I googled around and couldn’t figure out what was going on. Finally I figured out the problem, and it’s right there in the error. The issue was that my bookmarklet was passing an embed code for a video in the POST to my popup window. Then server side I was spitting that embed code out into the Javascript included on my page. This is a no-no for Chrome – it checks all the Javascript in your loaded page for any code that matches data in the POST. If it finds a match it assumes you’ve got an XSS attack and it prevents that code from being inserted into the new page.

The workaround was to have the bookmarklet encode the embed codes, and then decode those values on the server side before rendering my page. This way the POST data doesn’t match the new page source (POST data is encoded, page source is not). Simple.

via geekblog.vodpod.com

Fix for Posterous bookmarklet coming right up. This is undoubtedly a Chrome bug, but we all have to live with it.

Cool TinyMCE plugin: autoresizing HTML text areas like a champ.

via springload.co.nz

This will be handy in the new post editor for sure...