Login

Flash security policy attacks hit Facebook/MySpace... a techie look at what happened

Cross-domain policy files (crossdomain.xml) are forgivingly parsed by Flash. If an attacker can construct an HTTP request that results in the server sending back a policy file, then Flash will accept the policy file. For instance, imagine a university website that responds to a course listing request:

http://www.example.com/CourseListing?format=js&callback=<cross-domain-policy><allow-access-from%20domain="*"/></cross-domain-policy>

...with the following output:

<cross-domain-policy><allow-access-from%20domain="*"/></cross-domain-policy>() {  return {name:”English101”, desc:”Read Books”}, {name:”Computers101”, desc:”play on computers”}};

Then one could load this policy via the following ActionScript? code:

System.security.loadPolicyFile("http://www.university.edu/CourseListing?format=json&callback=<cross-domain-policy>" + "<allow-access-from%20domain=\"*\"/></cross-domain-policy>”);

This results in the Flash application having complete cross-domain access to www.example.com.

Github releases Resque. Wow, this is exactly what we wanted in Rails queueing. Much respect.

Resque is our Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later.

Background jobs can be any Ruby class or module that responds to perform. Your existing classes can easily be converted to background jobs or you can create new classes specifically to do work. Or, you can do both.

This goes into great detail into the tradeoffs and problems with existing delayed job queueing systems. Brett was mentioning -- this post reads like a laundry list of stuff we've dealt with!

Resque is a Delayed-Job-like queue that is built on Redis instead of MySQL. Brilliant.

XML is like violence

XML is like violence - if it doesn’t solve your problems, you are not using enough of it.

Heh, I guess a lot of things are like violence.

JSON / ActiveSupport Rails gotcha: Avoid XSS exploits when passing HTML in JSON

Ran into a tricky situation today -- we're working on the ability to support javascript on Posterous blogs. One problem we saw with the Theme Editor was that </script> tags were causing problems with JSON. Browsers would see a </script> block and actually interpret as the end of the entire script block, as opposed to merely an entity within the JSON string. When dealing with user generated content, that also opens your site up to a pretty serious JS XSS attack.

ActiveSupport actually has been modified to translate < and > into their unicode encodings and avoid this problem. However, if you like many people use / require the JSON gem, this ActiveSupport to_json implementation is stripped entirely.

The simple fix -- make your own String that contains the necessary String to_json method. Be sure to use the new string class in place of the standard string class when you want the appropriate behavior of escaping angle brackets.

Here's the code:

IE still runs 65% of the web, and that sucks. But luckily, there are some references that can help.

As of this writing, Internet Explorer holds about a 65% market share combined across all their currently used browsers.

Finally a pretty good resource on all CSS bugs on IE.

If I ruled the world, IE would not be used by people. Since that is not the case, we need to put these rules to memory.

Scaling Facebook vs Scaling Digg: It's a question of disk vs RAM

Facebook takes a Pull on Demand approach. To recreate a page or a display fragment they run the complete query. To find out if one of your friends has added a new favorite band Facebook actually queries all your friends to find what's new. They can get away with this but because of their awesome infrastructure.  

But if you've ever wondered why Facebook has a 5,000 user limit on the number of friends, this is why. At a certain point it's hard to make Pull on Demand scale.

Another approach to find out what's new is the Push on Change model. In this model when a user makes a change it is pushed out to all the relevant users and the changes (in some form) are stored with each user. So when a user want to view their updates all they need to access is their own account data. There's no need to poll all their friends for changes.

Really interesting article at High Scalability on ways to approach scaling your data store.

We use push on change as well, particularly for your reading list subscriptions. To be honest, it's cheaper. You use disk space to pre-compute things that would be expensive to ask the database repeatedly. It allows you to just add disk -- even though disk is orders of magnitudes slower than RAM.

The Facebook approach is *really hard* to get right. It's costly because so much info just has to live in RAM, and could be one reason why it's much harder for Facebook to reach profitability than most other sites. If you have to add RAM to keep all user data in cache, that's a lot of hardware to keep going.

But when it comes to realtime, Facebook is as realtime as they come. They are some real engineering badasses.

It makes no sense at all that John Resig's JS book underperforms that other turd of a JS book everyone buys

Tracking my ranking over the past year [Pro Javascript Techniques] been consistently in the 10-20,000 range, with occasional dips into the < 10,000 range. JavaScript: The Definitive Guide is always < 5,000 (for comparison).
--John Resig via ejohn.org

Really? That is some really really weak sauce. Pro Javascript Techniques rocks. It's a great book, and totally useful.

Javascript: The Definitive Guide, on the other hand, is a piece of turd. It routinely assumes you *already know* javascript and its prose is almost unreadable.

I wonder why Resig's awesome book is completely dominated by sales of the abysmal O'Reilly book. Probably for two reasons: a) the title (one is way more universal and likely to be bought by aspirational JS developers) and b) O'Reilly doesn't usually put out crap, so its very surprising when it does.

Google Wave Robots in Ruby

Rave: A Google Wave robot client framework for Ruby
There are several parallel efforts to create a Ruby implementation of the robot API. The rest of this post is going to focus on Rave, but I just wanted to link to the other implementations that I'm aware of:

Happiness is MySQL replication lag going to 0

Feels so good to be caught up!

MongoDB Gotcha #1 - Watch your indexes and your order by's

So we're experimenting with dropping MongoDB into production here with some simple stuff. We ran into a problem where simple selects (that have indexes!) were taking 300msec to return. After digging into the database profiler (which is built-in and quite well done), we noticed something odd in the query -- it was full table scanning 600K rows. Why? Brett figured that it was this term in the query: orderby: { $natural: 1 }

But where had that come from? We didn't add it. Turns out MongoMapper uses it by default when you call find(:first, ...). Unfortunately this order by can overrule any indexes you might have intended on using.

We fixed this by adding intentional ordering to our find query that matched our indexes: find(:first, :conditions => {:model_id => model.id, :model_class => model.class.to_s}, :order => 'model_id asc, model_class asc')

300msec queries suddenly jumped down to 4 milliseconds. Problem solved. But very non-intuitive.