Oops, running Windows Server costs London Stock Exchange a day in downtime. Time to switch to linux.

TradElect runs on HP ProLiant servers running, in turn, Windows Server 2003. The TradElect software itself is a custom blend of C# and .NET programs, which was created by Microsoft and Accenture, the global consulting firm. On the back-end, it relied on Microsoft SQL Server 2000. Its goal was to maintain sub-ten millisecond response times, real-time system speeds, for stock trades.

It never, ever came close to achieving these performance goals. Worse still, the LSE's competition, such as its main rival Chi-X with its MarketPrizm trading platform software, was able to deliver that level of performance and in general it was running rings about TradElect. Three guesses what MarketPrizm runs on and the first two don't count. The answer is Linux.

Lessons on a Thursday night: Using Hpricot with the tag vomit that is Microsoft Word HTML

Yes, Microsoft Word is a disaster. Its markup is easily the ugliest ever created, and it is so widely used, it makes me weep.
 
Here's a small snippet of terribleness:

<p style='mso-margin-top-alt:12.0pt;margin-right:0cm;margin-bottom:12.0pt; margin-left:0cm;line-height:14.25pt'><span style='font-size:10.0pt;font-family:"Lucida Grande"'>
Microsoft word ruins my life hardcore.</span></p>

Unfortunately, the biggest problem with this snippet of code is that it uses double quotes inside singlequotes (shown in red), e.g. 'font-family:"Lucida Grande", "sans-serif"'. This may or may not be to spec -- I doubt it, though. Who the heck would use quotes without escaping? Well, Microsoft Word does.
 
Hpricot actually chokes on this. Here's what I learned -- before ever passing Microsoft Word/Outlook - generated HTML, be sure to run this regex on the markup so that we kill these double quotes and avoid big-time disasters.
 

   output = input.gsub(/style='[^']*(font-family:)[^']*'/mi) { |sub| sub.gsub(/"/, '') }
Hopefully through the magic of Google, this will save somebody some time. I know I searched far and wide and couldn't really find even one article on cleaning up Microsoft Word HTML, what with its disastrous mso tags and MsoNormal classes strewn all over the place.
 
It's still tag vomit, but at least it's tag vomit that won't end the life of your Hpricot parser.

Google undocumented API - Convert favicons to pngs automatically. Great for links.

Google's undocumented favicon to png convertor (via) Showing the favicon of a domain next to a link is a really nice trick, but it’s slightly tricky to achieve as IE won’t display a .ico file if you link to it from an img element, so you need to convert the images server-side. This undocumented Google API does that for you, meaning it’s much easier to add favicons as a feature to your site.

This is a smart little handy tool for building links that are more interesting to click on than just the link text. May use this soon.