Gruber's URL Regular Expression Explained

While America threw on its eating pants and combed the Thursday circulars for deals, John Gruber spent Thanksgiving preparing to unveil his regular expression for finding URLs in arbitrary text.

\b(([\w-]+://?|www[.])[^\s()]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

Pretty dense. Let’s be that guy and break it out, /x style (ignoring white space, with comments)

\b                          #start with a word boundary
(                           #
    (                           #
        [\w-]+://?                  #protocol://, second slash optional
        |                           #OR
        www[.]                      #a literal www. 
    )                           #
    [^\s()]+                  #non-whitespace, parens or angle brackets
                                #repeated any number of times
    (?:                         # (end game)
        \([\w\d]+\)                 #handles weird parenthesis in URLs (http://example.com/blah_blah_(wikipedia))
                                    #won't handle this twice foo_(bar)_(and)_again
        |                           #OR
        (                           #
            [^[:punct:]\s]              #NOT a single punctuation or white space
            |                           #OR
            /                           #trailing slash
        )                           #
    )                           #
)                           #

Amazing writeup by Alan Storm on Gruber's autolinking regex.

Update: Gruber updated this with an even more awesome regex, with breakout explanation

How #newtwitter uses shbangs (#!) to let Google crawl AJAX content

If you're a Twitter user, logged-in, and have Javascript, you'll be able to see my profile here:

However, Googlebot will recognize that as a URL in the new format, and will instead request this URL:

Sensibly, Twitter want to maintain backward compatibility (and not have their indexed URLs look like junk) so they 301 redirect that URL to:

(And if you're a logged-in Twitter user, that last URL will actually redirect you back to the first one.)

seomoz.org, dependable as usual.

God bless @scribd, they've released open source to solve that godawful Flash wmode z-indexing bug.

The dreaded problem that most web developers come across is the z-index issue with flash elements. When the wmode param is not set, or is set to window, flash elements will always be on top of your DOM content. No matter what kind of z-index voodoo you attempt, your content will never break through the flash. This is because flash, when in window mode, is actually rendered on a layer above all web content.

There is a lot of chatter about this issue, and the simple solution is to specify the wmode parameter to opaque or transparent. This works when you control and deliver the flash content yourself. However, this is not the case for flash ads.

...

So, to solve all this, I wrote some javascript that will dynamically add the correct wmode parameter. I call it FlashHeed. You can get it now on the GitHub repo.

It works reliably in all major browsers, and has no dependencies, so feel free to drop it into your Prototype or jQuery dependent website.

 

Impressive tech talk on Scaling MySQL at Facebook

Facebook gave a MySQL Tech Talk where they talked about many things MySQL, but one of the more subtle and interesting points was their focus on controlling the variance of request response times and not just worrying about maximizing queries per second.

But first the scalability porn. Facebook's OLTP performance numbers were as usual, quite dramatic:

  • Query response times: 4ms reads, 5ms writes. 
  • Rows read per second: 450M peak
  • Network bytes per second: 38GB peak
  • Queries per second: 13M peak
  • Rows changed per second: 3.5M peak
  • InnoDB disk ops per second: 5.2M peak

This looks to be a must-see tech talk for anyone on the scaling / infrastructure side of the web.

 

Watch live streaming video from facebookevents at livestream.com

What if Foursquare wasn't using MongoDB and was using something else?

People have chimed in and talked about the Foursquare outage. The nice part about these discussions is that they’re focusing on the technical problems with the current set up and Foursquare. They’re picking it apart and looking at what is right, what went wrong, and what needs to be done differently in MongoDB to prevent problems like this in the future.

Let’s play a “what if” game. What if Foursquare wasn’t using MongoDB? What if they were using something else?

Riak? Cassandra? HBase? MySQL?

Jeremiah Peschka fills us in.