Ruby is 16 years in the making. It takes a long time for great things to get recognition.

Matz started working on Ruby in 1993, but it took him until 1995 to make a public release. Yet, it took at least another 5 years to catch the eye of the mainstream developers. DHH, also did worked more than 1 1/2 years on Rails behind closed doors. During this period, Matz or DHH surely wouldn't have imagined how their work would be accepted by the public or where this would end up.

A quick and dirty way to extract undeliverable email addresses from an maildir-style mail server

Anyone who writes any app that does anything with email inevitably has to do something to stem the tide of undeliverables. Sending mail continually to dead mailboxes is a sure way to reduce the deliverability of the legit email you do want to send.

Just funnel the undeliverables using Postbox or Mozilla Thunderbird. The screenshot below seems to be a relatively complete set of subject lines you can go by, since there doesn't seem to be one particular string people use. Create this rule on an email box that is your FROM address. At posterous, we use help@posterous.com as a catch-all -- yes, it's more work to look at it, but it makes it that much easier to talk to your customers and fix any problems that arise. High touch service goes a long way for a web startup.

Then run this and you'll get back a clean list of newline-delimited email addresses that you can feed to your application to invalidate email addresses. Change your mail_dir to the current directory on your mail server, assuming you use maildir-formatted mail storage on your mail server. Should run fine on basically any linux box. =)




#!/usr/bin/ruby

# set this to your mail directory on linux
mail_dir = '/home/vmail/help@posterous.com/.Undelivered/cur'

output = `egrep -i -h -C 1 '<[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}>:' #{mail_dir}/*`
emails = []
puts output  # you can double check that everything's kosher here...
output.split(/\n/).each do |l|
  emails.push($1) if l.match(/\b([A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4})\b/i)
end

puts "------------------------------"
puts emails.compact.uniq.join("\n")

puts "------------------------------"
puts "#{emails.size} emails undeliverable"


Kodak Gallery is holding our photos hostage. But me and Open Source are here to bust them out! (Ruby source code included)

I used kodakgallery.com in 2004 for some photos. I was younger then. I didn't know that I shouldn't bother storing photos on one of the worst photo hosts on the Internet. I thought Kodak was a brand I could trust.

"WE DON'T WANT TO DELETE YOUR PHOTOS" says the email I got today. Fine, delete them. But only after I download them. And I'm not going to pay you $20 to get an Archive CD either.


They're well within their rights to do so (heck, everyone needs to make some money) -- but they also make it quite difficult to download these photos. If you had 200 photos, you would have to click the "Download Full Resolution" button 200 times individually. If these photos are mine, then shouldn't it be a little easier? Sorry, a product manager in some stovepiped organization said no.

A quick google search returned basically no super-easy way to download and archive these things in one fell swoop.

I thought I'd whip up this simple ruby script to download them to your local directory. It requires the 'mechanize' and Ruby 1.8.6+. Save this code to a file, e.g. "kodak_easyshare.rb",  edit your username and password in the script below, then run 'ruby kodak_easyshare.rb' -- and watch the magic happen.

You own those photos. Get them. Download them. Don't let Kodak hold you hostage.

This is hereby released under MIT license.

 
require 'rubygems' 
require 'mechanize' 
require 'fileutils' 
 
agent = WWW::Mechanize.new 
signin_page = agent.get('http://www.kodakgallery.com/Signin.jsp') 
signin_page.forms[0].email = 'yourlogin_at_gmail.com' 
signin_page.forms[0].password = 'your_password_here' 
signin_page.forms[0].submit 
 
album_page = agent.get('http://www.kodakgallery.com/AlbumMenu.jsp?Upost_signin=Welcome.jsp') 
albums = album_page.links.map{|l| (l.href.match(/BrowsePhotos.jsp/) && l.text && l.text.match(/[A-Za-z0-9]/)) ? {:href => l.href, :name => l.text} : nil}.compact 
 
albums.each do |album_hash| 
  puts "\n\n\n" 
  puts "-----------------------------------------" 
  puts "'#{album_hash[:name]}'" 
  puts "#{album_hash[:href]}" 
 
  gallery_page = agent.get("http://www.kodakgallery.com/#{album_hash[:href]}") 
  photos = gallery_page.links.map{|l| (l.href.match(/PhotoView.jsp/) && !l.href.match(/javascript/)) ? l.href : nil}.compact.uniq 
 
  album_dirname = album_hash[:name].gsub(/[\n\t\?\:\>\<\\\/|]/, '_') 
 
  unless File.exists?(album_dirname) 
   puts "creating album #{album_dirname}" 
   FileUtils.mkdir(album_dirname) 
  end 
 
  FileUtils.cd(album_dirname, :verbose => true) do 
 
   photos.each do |p| 
    photo_page = agent.get("http://www.kodakgallery.com/#{p}") 
    fullres = photo_page.links.map{|l| (l.href.match(/FullResDownload/) && !l.href.match(/javascript/)) ? l.href : nil}.compact.uniq.first 
 
    file_to_dl = "http://www.kodakgallery.com#{fullres}" 
    result = agent.get(file_to_dl) 
    if result.class == WWW::Mechanize::File 
     result.save 
     puts "Saved #{file_to_dl}" 
    else 
     puts "FAIL on #{file_to_dl}" 
    end 
 
   end 
 
  end 
 
  puts "-----------------------------------------" 
  puts "-----------------------------------------" 
  puts "-----------------------------------------" 
end 
 

External service dependencies can really screw over your site, but as an engineer, your job is to design around it.

Watch for dependencies on external services like ad serving networks or RSS feeds. If a service isn’t responding or can’t handle your growing request load, make sure that you have a fallback strategy.

This from a list of tips on Rails scaling from Engine Yard.

This is so true. If you ever make an external request, you should expect it to fail at some point in the future. You should test what happens when it's down. Make sure there's a recovery strategy so that your site doesn't fall over or block users.

Timeouts are a good way to handle this. It's better to slightly degrade service temporarily than to wait for minutes or hours for a service to return. Be able to clean up / recover once the service you were depending on returns.

This goes for everything from S3 (which is rarely down), to Twitter API (which is... down all the time.)