Login

Oh, that's why you should use Varnish and not Squid.

The really short answer is that computers do not have two kinds of storage any more.

It used to be that you had the primary store, and it was anything from acoustic delaylines filled with mercury via small magnetic dougnuts via transistor flip-flops to dynamic RAM.

And then there were the secondary store, paper tape, magnetic tape, disk drives the size of houses, then the size of washing machines and these days so small that girls get disappointed if think they got hold of something else than the MP3 player you had in your pocket.

And people program this way.

They have variables in "memory" and move data to and from "disk".

Take Squid for instance, a 1975 program if I ever saw one: You tell it how much RAM it can use and how much disk it can use. It will then spend inordinate amounts of time keeping track of what HTTP objects are in RAM and which are on disk and it will move them forth and back depending on traffic patterns.

Well, today computers really only have one kind of storage, and it is usually some sort of disk, the operating system and the virtual memory management hardware has converted the RAM to a cache for the disk storage.

So what happens with squids elaborate memory management is that it gets into fights with the kernels elaborate memory management, and like any civil war, that never gets anything done.

What happens is this: Squid creates a HTTP object in "RAM" and it gets used some times rapidly after creation. Then after some time it get no more hits and the kernel notices this. Then somebody tries to get memory from the kernel for something and the kernel decides to push those unused pages of memory out to swap space and use the (cache-RAM) more sensibly for some data which is actually used by a program. This however, is done without squid knowing about it. Squid still thinks that these http objects are in RAM, and they will be, the very second it tries to access them, but until then, the RAM is used for something productive.

This is what Virtual Memory is all about.

If squid did nothing else, things would be fine, but this is where the 1975 programming kicks in.

After some time, squid will also notice that these objects are unused, and it decides to move them to disk so the RAM can be used for more busy data. So squid goes out, creates a file and then it writes the http objects to the file.

Here we switch to the high-speed camera: Squid calls write(2), the address i gives is a "virtual address" and the kernel has it marked as "not at home".

So the CPU hardwares paging unit will raise a trap, a sort of interrupt to the operating system telling it "fix the memory please".

The kernel tries to find a free page, if there are none, it will take a little used page from somewhere, likely another little used squid object, write it to the paging poll space on the disk (the "swap area") when that write completes, it will read from another place in the paging pool the data it "paged out" into the now unused RAM page, fix up the paging tables, and retry the instruction which failed.

Squid knows nothing about this, for squid it was just a single normal memory acces.

So now squid has the object in a page in RAM and written to the disk two places: one copy in the operating systems paging space and one copy in the filesystem.

Squid now uses this RAM for something else but after some time, the HTTP object gets a hit, so squid needs it back.

First squid needs some RAM, so it may decide to push another HTTP object out to disk (repeat above), then it reads the filesystem file back into RAM, and then it sends the data on the network connections socket.

Did any of that sound like wasted work to you ?

Hm, instructive.

Handy Ruby one liners by David Thomas

HANDY ONE-LINERS FOR RUBY                             November 16, 2005
compiled by David P Thomas <davidpthomas@gmail.com>         version 1.0

Latest version of this file can be found at:
    http://www.fepus.net/ruby1line.txt

Last Updated: Wed Nov 16 08:35:02 CST 2005

FILE SPACING:

# double space a file
    $  ruby -pe 'puts' < file.txt
# triple space a file
    $  ruby -pe '2.times {puts}' < file.txt
# undo double-spacing (w/ and w/o whitespace in lines)
    $  ruby -lne 'BEGIN{$/="\n\n"}; puts $_' < file.txt
    $  ruby -ne 'BEGIN{$/="\n\n"}; puts $_.chomp' < file.txt
    $  ruby -e 'puts STDIN.readlines.to_s.gsub(/\n\n/, "\n")' < file.txt

NUMBERING:

# number each line of a file (left justified).
    $  ruby -ne 'printf("%-6s%s", $., $_)' < file.txt
# number each line of a file (right justified).
    $  ruby -ne 'printf("%6s%s", $., $_)' < file.txt
# number each line of a file, only print non-blank lines
    $  ruby -e 'while gets; end; puts $.' < file.txt
# count lines (emulates 'wc -l')
    $  ruby -ne 'END {puts $.}' < file.txt
    $  ruby -e 'while gets; end; puts $.' < file.txt

TEXT CONVERSION AND SUBSTITUTION:

# convert DOS newlines (CR/LF) to Unix format (LF)
# - strip newline regardless; re-print with unix EOL
    $  ruby -ne 'BEGIN{$\="\n"}; print $_.chomp' < file.txt

# convert Unix newlines (LF) to DOS format (CR/LF)
# - strip newline regardless; re-print with dos EOL
    $  ruby -ne 'BEGIN{$\="\r\n"}; print $_.chomp' < file.txt

# delete leading whitespace (spaces/tabs/etc) from beginning of each line
    $  ruby -pe 'gsub(/^\s+/, "")' < file.txt

# delete trailing whitespace (spaces/tabs/etc) from end of each line
# - strip newline regardless; replace with default platform record separator
    $  ruby -pe 'gsub(/\s+$/, $/)' < file.txt

# delete BOTH leading and trailing whitespace from each line
    $  ruby -pe 'gsub(/^\s+/, "").gsub(/\s+$/, $/)' < file.txt

# insert 5 blank spaces at the beginning of each line (ie. page offset)
    $  ruby -pe 'gsub(/%/, "   ")' < file.txt
    FAILS! $  ruby -pe 'gsub(/%/, 5.times{putc " "})' < file.txt

# align all text flush right on a 79-column width
    $  ruby -ne 'printf("%79s", $_)' < file.txt

# center all text in middle of 79-column width
    $  ruby -ne 'puts $_.chomp.center(79)' < file.txt
    $  ruby -lne 'puts $_.center(79)' < file.txt

# substitute (find and replace) "foo" with "bar" on each line
    $  ruby -pe 'gsub(/foo/, "bar")' < file.txt

# substitute "foo" with "bar" ONLY for lines which contain "baz"
    $  ruby -pe 'gsub(/foo/, "bar") if $_ =~ /baz/' < file.txt

# substitute "foo" with "bar" EXCEPT for lines which contain "baz"
    $  ruby -pe 'gsub(/foo/, "bar") unless $_ =~ /baz/' < file.txt

# substitute "foo" or "bar" or "baz".... with "baq"
    $  ruby -pe 'gsub(/(foo|bar|baz)/, "baq")' < file.txt

# reverse order of lines (emulates 'tac') IMPROVE
    $  ruby -ne 'BEGIN{@arr=Array.new}; @arr.push $_; END{puts @arr.reverse}' < file.txt

# reverse each character on the line (emulates 'rev')
    $  ruby -ne 'puts $_.chomp.reverse' < file.txt
    $  ruby -lne 'puts $_.reverse' < file.txt

# join pairs of lines side-by-side (like 'paste')
    $  ruby -pe '$_ = $_.chomp + " " + gets if $. % 2' < file.txt

# if a line ends with a backslash, append the next line to it
    $  ruby -pe 'while $_.match(/\\$/); $_ = $_.chomp.chop + gets; end' < file.txt
    $  ruby -e 'puts STDIN.readlines.to_s.gsub(/\\\n/, "")' < file.txt

# if a line begins with an equal sign, append it to the previous line (Unix)
    $  ruby -e 'puts STDIN.readlines.to_s.gsub(/\n=/, "")' < file.txt

# add a blank line every 5 lines (after lines 5, 10, 15, etc)
    $  ruby -pe 'puts if $. % 6 == 0' < file.txt

SELECTIVE PRINTING OF CERTAIN LINES

# print first 10 lines of a file (emulate 'head')
    $  ruby -pe 'exit if $. > 10' < file.txt

# print first line of a file (emulate 'head -1')
    $  ruby -pe 'puts $_; exit' < file.txt

# print the last 10 lines of a file (emulate 'tail'); NOTE reads entire file!
    $  ruby -e 'puts STDIN.readlines.reverse!.slice(0,10).reverse!' < file.txt

# print the last 2 lines of a file (emulate 'tail -2'); NOTE reads entire file!
    $  ruby -e 'puts STDIN.readlines.reverse!.slice(0,2).reverse!' < file.txt

# print the last line of a file (emulates 'tail -1')
    $  ruby -ne 'line = $_; END {puts line}' < file.txt

# print only lines that match a regular expression (emulates 'grep')
    $  ruby -pe 'next unless $_ =~ /regexp/' < file.txt

# print only lines that DO NOT match a regular expression (emulates 'grep')
    $  ruby -pe 'next if $_ =~ /regexp/' < file.txt

# print the line immediately before a regexp, but not the regex matching line
    $  ruby -ne 'puts @prev if $_ =~ /regex/; @prev = $_;' < file.txt

# print the line immediately after a regexp, but not the regex matching line
    $  ruby -ne 'puts $_ if @prev =~ /regex/; @prev = $_;' < file.txt

# grep for foo AND bar AND baz (in any order)
    $  ruby -pe 'next unless $_ =~ /foo/ && $_ =~ /bar/ && $_ =~ /baz/' < file.txt

# grep for foo AND bar AND baz (in order)
    $  ruby -pe 'next unless $_ =~ /foo.*bar.*baz/' < file.txt

# grep for foo OR bar OR baz
    $  ruby -pe 'next unless $_ =~ /(foo|bar|baz)/' < file.txt

# print paragraph if it contains regexp; blank lines separate paragraphs
    $  ruby -ne 'BEGIN{$/="\n\n"}; print $_ if $_ =~ /regexp/' < file.txt

# print paragraph if it contains foo AND bar AND baz (in any order); blank lines separate paragraphs
    $  ruby -ne 'BEGIN{$/="\n\n"}; print $_ if $_ =~ /foo/ && $_ =~ /bar/ && $_ =~ /baz/' < file.txt

# print paragraph if it contains foo AND bar AND baz (in order); blank lines separate paragraphs
    $  ruby -ne 'BEGIN{$/="\n\n"}; print $_ if $_ =~ /(foo.*bar.*baz)/' < file.txt

# print paragraph if it contains foo OR bar OR baz; blank lines separate paragraphs
    $  ruby -ne 'BEGIN{$/="\n\n"}; print $_ if $_ =~ /(foo|bar|baz)/' < file.txt

# print only lines of 65 characters or greater
    $  ruby -pe 'next unless $_.chomp.length >= 65' < file.txt
    $  ruby -lpe 'next unless $_.length >= 65' < file.txt

# print only lines of 65 characters or less
    $  ruby -pe 'next unless $_.chomp.length < 65' < file.txt
    $  ruby -lpe 'next unless $_.length < 65' < file.txt

# print section of file from regex to end of file
    $  ruby -pe '@found=true if $_ =~ /regex/; next unless @found' < file.txt

# print section of file based on line numbers (eg. lines 2-7 inclusive)
    $  ruby -pe 'next unless $. >= 2 && $. <= 7' < file.txt

# print line number 52
    $  ruby -pe 'next unless $. == 52' < file.txt

# print every 3rd line starting at line 4
    $  ruby -pe 'next unless $. >= 4 && $. % 3 == 0' < file.txt

# print section of file between two regular expressions, /foo/ and /bar/
    $  ruby -ne '@found=true if $_ =~ /foo/; next unless @found; puts $_; exit if $_ =~ /bar/' < file.txt

SELECTIVE DELETION OF CERTAIN LINES

# print all of file except between two regular expressions, /foo/ and /bar/
    $  ruby -ne '@found = true if $_ =~ /foo/; puts $_ unless @found; @found = false if $_ =~ /bar/' < file.txt

# print file and remove duplicate, consecutive lines from a file (emulates 'uniq')
    $  ruby -ne 'puts $_ unless $_ == @prev; @prev = $_' < file.txt

# print file and remove duplicate, non-consecutive lines from a file (careful of memory!)
    $  ruby -e 'puts STDIN.readlines.sort.uniq!.to_s' < file.txt

# print file except for first 10 lines
    $  ruby -pe 'next if $. <= 10' < file.txt

# print file except for last line
    $  ruby -e 'lines=STDIN.readlines; puts lines[0,lines.size-1]' < file.txt

# print file except for last 2 lines
    $  ruby -e 'lines=STDIN.readlines; puts lines[0,lines.size-2]' < file.txt

# print file except for last 10 lines
    $  ruby -e 'lines=STDIN.readlines; puts lines[0,lines.size-10]' < file.txt

# print file except for every 8th line
    $  ruby -pe 'next if $. % 8 == 0' < file.txt

# print file except for blank lines
    $  ruby -pe 'next if $_ =~ /^\s*$/' < file.txt

# delete all consecutive blank lines from a file except the first
    $  ruby -e 'BEGIN{$/=nil}; puts STDIN.readlines.to_s.gsub(/\n(\n)+/, "\n\n")' < file.txt

# delete all consecutive blank lines from a file except for the first 2
    $  ruby -e 'BEGIN{$/=nil}; puts STDIN.readlines.to_s.gsub(/\n(\n)+/, "\n\n")' < file.txt

# delete all leading blank lines at top of file
    $  ruby -pe '@lineFound = true if $_ !~ /^\s*$/; next if !@lineFound' < file.txt


If you have any additional scripts to contribute or if you find errors
in this document, please send an e-mail to the compiler.  Indicate the
version of ruby you used, the operating system it was compiled for, and
the nature of the problem.  Various scripts in this file were written or
contributed by:

 David P Thomas <davidpthomas@gmail.com>    # author of this document


Tue Jun 26 18:17:36 CDT 2007
 * Thanks to Taylor Carpenter <taylor@codecafe.com> for feedback on improving redirection format.

 

memoization is your easy caching friend

def zipcode_and_name
  @zipcode_and_name ||= "#{zipcode} #{name}"
end

The ||= operator is a so-called conditional assignment that only assigns the value on its right to the variable on its left if the variable is not true. Since pretty much everything in Ruby evaluates to true when coerced into a boolean (except nil and, of course, false, as well as a couple of other things) this is exactly what we want: The first call returns the actual string and from the second time on, no matter how often the method is called on the same object, the “cached” result will be used.

Facebook spam detection: We need a link spam detector for all of the Internet.

New warning explaining why content has been blocked.

With billions of pieces of content being shared on Facebook every month and bad actors constantly targeting the people who use Facebook, preventing spam isn't easy. Just as a community relies on its citizens to report crime, we rely on you to let us know when you encounter spam, which can be anything from a friend request sent by someone you don't know to a message that includes a link to a malicious website.

via blog.facebook.com

At Posterous we deal with our fair share of linkspam and blackhat SEO link-building scams.

We need a revolution here. Perhaps with Facebook's lead. They have probably one of the best ideas of what links and link farms are evil. I would love it if they opened up an Akismet-like service to identify spammy content or spammy links.

Or maybe we'll just build it ourselves. We'll see. But the user generated content world is threatened by a sea of bad actors trying to get their illicit pagerank or get that one eyeball to buy one more penis enlargement pill.

Style and Structure: A CSS/XHTML primer for frontend engineers by @soleio

Brilliance.

Diving into HTML5, an illustrated guide


This is really cool. Read it here:
http://diveintohtml5.org/

Full of helpful advice, tips, explanations and retro illustrations like

I really dig this. Thanks Adam!

Great overview of network theory and machine learning by @bradfordcross

Information / data analysis expert and a friend of mine Bradford Cross (with Drew Conway) explains the basics of network theory.

Also, more on machine learning too.

How Facebook uses Scribe, Hadoop, and Hive for Analytics, Ad hoc analysis, Spam detection and Ad Optimization

Scribe for bringing together the logs.

Hadoop for map/reduce. Hive for querying.

Ultimately SQL for storing final reporting info.

Extra reading: Cloudera's top 5 most frequently asked Hadoop questions

Handy! RGB to HSL and RGB to HSV color model conversion algorithms in JavaScript

Reposted from mjijackson.com -- this is so cool that we should make sure there are more copies of this on the net, just in case. =D

Here is a set of additive color model conversion algorithms that I found published on Wikipedia and have implemented in JavaScript. It was surprisingly difficult to find these actually implemented anywhere in compact, efficient, and bug-free code, so I wrote my own. These should be easily portable to other programming languages if desired.

/**
 * Converts an RGB color value to HSL. Conversion formula
 * adapted from http://en.wikipedia.org/wiki/HSL_color_space.
 * Assumes r, g, and b are contained in the set [0, 255] and
 * returns h, s, and l in the set [0, 1].
 *
 * @param   Number  r       The red color value
 * @param   Number  g       The green color value
 * @param   Number  b       The blue color value
 * @return  Array           The HSL representation
 */
function rgbToHsl(r, g, b){
    r /= 255, g /= 255, b /= 255;
    var max = Math.max(r, g, b), min = Math.min(r, g, b);
    var h, s, l = (max + min) / 2;

    if(max == min){
        h = s = 0; // achromatic
    }else{
        var d = max - min;
        s = l > 0.5 ? d / (2 - max - min) : d / (max + min);
        switch(max){
            case r: h = (g - b) / d + (g < b ? 6 : 0); break;
            case g: h = (b - r) / d + 2; break;
            case b: h = (r - g) / d + 4; break;
        }
        h /= 6;
    }

    return [h, s, l];
}

/**
 * Converts an HSL color value to RGB. Conversion formula
 * adapted from http://en.wikipedia.org/wiki/HSL_color_space.
 * Assumes h, s, and l are contained in the set [0, 1] and
 * returns r, g, and b in the set [0, 255].
 *
 * @param   Number  h       The hue
 * @param   Number  s       The saturation
 * @param   Number  l       The lightness
 * @return  Array           The RGB representation
 */
function hslToRgb(h, s, l){
    var r, g, b;

    if(s == 0){
        r = g = b = l; // achromatic
    }else{
        function hue2rgb(p, q, t){
            if(t < 0) t += 1;
            if(t > 1) t -= 1;
            if(t < 1/6) return p + (q - p) * 6 * t;
            if(t < 1/2) return q;
            if(t < 2/3) return p + (q - p) * (2/3 - t) * 6;
            return p;
        }

        var q = l < 0.5 ? l * (1 + s) : l + s - l * s;
        var p = 2 * l - q;
        r = hue2rgb(p, q, h + 1/3);
        g = hue2rgb(p, q, h);
        b = hue2rgb(p, q, h - 1/3);
    }

    return [r * 255, g * 255, b * 255];
}

/**
 * Converts an RGB color value to HSV. Conversion formula
 * adapted from http://en.wikipedia.org/wiki/HSV_color_space.
 * Assumes r, g, and b are contained in the set [0, 255] and
 * returns h, s, and v in the set [0, 1].
 *
 * @param   Number  r       The red color value
 * @param   Number  g       The green color value
 * @param   Number  b       The blue color value
 * @return  Array           The HSV representation
 */
function rgbToHsv(r, g, b){
    r = r/255, g = g/255, b = b/255;
    var max = Math.max(r, g, b), min = Math.min(r, g, b);
    var h, s, v = max;

    var d = max - min;
    s = max == 0 ? 0 : d / max;

    if(max == min){
        h = 0; // achromatic
    }else{
        switch(max){
            case r: h = (g - b) / d + (g < b ? 6 : 0); break;
            case g: h = (b - r) / d + 2; break;
            case b: h = (r - g) / d + 4; break;
        }
        h /= 6;
    }

    return [h, s, v];
}

/**
 * Converts an HSV color value to RGB. Conversion formula
 * adapted from http://en.wikipedia.org/wiki/HSV_color_space.
 * Assumes h, s, and v are contained in the set [0, 1] and
 * returns r, g, and b in the set [0, 255].
 *
 * @param   Number  h       The hue
 * @param   Number  s       The saturation
 * @param   Number  v       The value
 * @return  Array           The RGB representation
 */
function hsvToRgb(h, s, v){
    var r, g, b;

    var i = Math.floor(h * 6);
    var f = h * 6 - i;
    var p = v * (1 - s);
    var q = v * (1 - f * s);
    var t = v * (1 - (1 - f) * s);

    switch(i % 6){
        case 0: r = v, g = t, b = p; break;
        case 1: r = q, g = v, b = p; break;
        case 2: r = p, g = v, b = t; break;
        case 3: r = p, g = q, b = v; break;
        case 4: r = t, g = p, b = v; break;
        case 5: r = v, g = p, b = q; break;
    }

    return [r * 255, g * 255, b * 255];
}

 

Facebook Scribe: Open source logging platform that is used a ton

Imagine hundreds of thousands of machines across many geographical dispersed datacenters just aching to send their precious log payload to the central repository off all knowledge. Because really, when you combine all the meta data with all the events you pretty much have a complete picture of your operations. Once in the central repository logs can be scanned, indexed, summarized, aggregated, refactored, diced, data cubed, and mined for every scrap of potentially useful information.

Just imagine the log stream from all of Facebook's Apache servers alone. Brutal. My guess is these are not real-time feeds so there are no streaming query issues, but the task is still daunting. Let's say they log 10 billion messages a day. That's over 1 million messages per second!

When no off the shelf products worked for them they built their own. Scribe can be downloaded from Sourceforge. But the real action is on their wiki.

I was fascinated by this account of launching an election counter in 12 hours on Facebook's main interface, all enabled through Scribe. It was done by one engineer and just a couple of designers.

How funny is it to think that this election counter is really just script that greps of a tail of logs from Scribe?