Gruber's URL Regular Expression Explained

While America threw on its eating pants and combed the Thursday circulars for deals, John Gruber spent Thanksgiving preparing to unveil his regular expression for finding URLs in arbitrary text.

\b(([\w-]+://?|www[.])[^\s()]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

Pretty dense. Let’s be that guy and break it out, /x style (ignoring white space, with comments)

\b                          #start with a word boundary
(                           #
    (                           #
        [\w-]+://?                  #protocol://, second slash optional
        |                           #OR
        www[.]                      #a literal www. 
    )                           #
    [^\s()]+                  #non-whitespace, parens or angle brackets
                                #repeated any number of times
    (?:                         # (end game)
        \([\w\d]+\)                 #handles weird parenthesis in URLs (http://example.com/blah_blah_(wikipedia))
                                    #won't handle this twice foo_(bar)_(and)_again
        |                           #OR
        (                           #
            [^[:punct:]\s]              #NOT a single punctuation or white space
            |                           #OR
            /                           #trailing slash
        )                           #
    )                           #
)                           #

via alanstorm.com

Amazing writeup by Alan Storm on Gruber's autolinking regex.

Update: Gruber updated this with an even more awesome regex, with breakout explanation.

Axon Flux // A Ruby on Rails Blog

the rate of signal flow across long nerve fiber between neurons

Gruber's URL Regular Expression Explained

Tags