Gruber's URL Regular Expression Explained

While America threw on its eating pants and combed the Thursday circulars for deals, John Gruber spent Thanksgiving preparing to unveil his regular expression for finding URLs in arbitrary text.

\b(([\w-]+://?|www[.])[^\s()]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

Pretty dense. Let’s be that guy and break it out, /x style (ignoring white space, with comments)

\b                          #start with a word boundary
(                           #
    (                           #
        [\w-]+://?                  #protocol://, second slash optional
        |                           #OR
        www[.]                      #a literal www. 
    )                           #
    [^\s()]+                  #non-whitespace, parens or angle brackets
                                #repeated any number of times
    (?:                         # (end game)
        \([\w\d]+\)                 #handles weird parenthesis in URLs (http://example.com/blah_blah_(wikipedia))
                                    #won't handle this twice foo_(bar)_(and)_again
        |                           #OR
        (                           #
            [^[:punct:]\s]              #NOT a single punctuation or white space
            |                           #OR
            /                           #trailing slash
        )                           #
    )                           #
)                           #

Amazing writeup by Alan Storm on Gruber's autolinking regex.

Update: Gruber updated this with an even more awesome regex, with breakout explanation

views

Tags