hpricot text_transform, updated for hpricot 0.8.1

This is an updated version of Henrik Nyh's text_transform! library. It allows you to extract text nodes with ease. It broke with internal changes to Hpricot 0.8.1 (some internal variables changed names), but this fixes it.


# By Henrik Nyh <http: henrik.nyh.se=""> 2007-03-28.
# Based on http://vemod.net/code/hpricot_goodies/hpricot_text_gsub.rb.
# MODIFIED BY GARRY TAN ON 4/21 to support Hpricot 0.8.1
# Licensed under the same terms as Ruby.

require "rubygems"
require "hpricot"

module Posterous
  module Extensions
    module HpricotTextTransform
      module NodeWithChildrenExtension
        def text_transform!(options={}, &block)
          return if defined?(name) and name and (name.to_sym == options[:except] or Array(options[:except]).include?(name.to_sym))
          children.each { |c| c.text_transform!(options, &block) }
        end
      end
 
      module TextNodeExtension
        def text_transform!(options={}, &block)
          content.replace yield(content)
        end
      end

      module EmptyTransform
        def text_transform!(options={}, &block)
        end
      end
    end
  end
end
Hpricot::Doc.send(:include,  Posterous::Extensions::HpricotTextTransform::NodeWithChildrenExtension)
Hpricot::Elem.send(:include, Posterous::Extensions::HpricotTextTransform::NodeWithChildrenExtension)
Hpricot::Text.send(:include, Posterous::Extensions::HpricotTextTransform::TextNodeExtension)


Hpricot::Comment.send(:include, Posterous::Extensions::HpricotTextTransform::EmptyTransform)
Hpricot::BogusETag.send(:include, Posterous::Extensions::HpricotTextTransform::EmptyTransform)
Hpricot::XMLDecl.send(:include, Posterous::Extensions::HpricotTextTransform::EmptyTransform)
Hpricot::ETag.send(:include, Posterous::Extensions::HpricotTextTransform::EmptyTransform)
Hpricot::ProcIns.send(:include, Posterous::Extensions::HpricotTextTransform::EmptyTransform)
Hpricot::DocType.send(:include, Posterous::Extensions::HpricotTextTransform::EmptyTransform)

if __FILE__ == $0
  require "test/unit"
  
  class HpricotTextTransformTest < Test::Unit::TestCase
    def assert_hpricot_transform(expected, input, options={}, &block)
      doc = Hpricot(input)
      doc.text_transform!(options, &block)
      assert_equal(expected, doc.to_s)
    end
    
    def test_with_gsub
      input    = 'xxx'
      expected = 'yyy'
      assert_hpricot_transform(expected, input, {}) { |text| text.gsub("x", "y") }
    end

    def test_with_reverse
      input    = 'hello world from <code>ruby</code>'
      expected = 'olleh morf dlrow <code>ybur</code>'
      assert_hpricot_transform(expected, input, {}) { |text| text.reverse }
    end

    def test_with_reverse_exclude_one_tag
      input    = 'hello world from <code>ruby</code>'
      expected = 'olleh morf dlrow <code>ruby</code>'
      assert_hpricot_transform(expected, input, {:except => :code}) { |text| text.reverse }
    end
    
    def test_with_reverse_exclude_multiple_tags
      input    = 'hello world from <code>ruby</code>'
      expected = 'hello morf dlrow <code>ruby</code>'
      assert_hpricot_transform(expected, input, {:except => [:a, :code]}) { |text| text.reverse }
    end
    
    def test_with_reverse_exclude_nested_tag
      input    = 'hello world from </http:><pre><code>ruby</code></pre>

'
      expected = 'olleh morf dlrow <pre><code>ruby</code></pre>

'
      assert_hpricot_transform(expected, input, {:except => :code}) { |text| text.reverse }
    end

  end
end

The Book of Ruby by Huw Collingbourne just released as a free PDF eBook. This looks handy.

I've largely been unsatisfied with most Ruby books (especially the oft-referenced Pickaxe book) because they don't quite go to the depth that you need day-to-day. Often a Ruby hacker needs to resort to google-fu to understand certain particularly gnarly parts of the language.

I flipped through this and it looks like a well written, well-structured overview of the Ruby language. Thanks to Huw for a great book and for providing it online here.

Anecdote: Don't reinvent the wheel

Example: I once encountered a group of 6 people who called themselves "engineers." To solve what they thought was a new problem, they were going to build their own little database management system with their own query language that was SQL-like without being SQL. I pointed them to some published research by a gang of PhD computer scientists from IBM Almaden, the same lab that developed the RDBMS and SQL to begin with in the 1970s. The research had been done over a five-year period and yet they hadn't become aware of it during several months of planning. I pointed them to the SQL-99 standard wherein this IBM research approach of augmenting a standard RDBMS to solve the problem they were attacking was becoming an ISO standard. They ignored it and spent another few months trying to build their enormously complex architecture. Exasperated, I got a kid fresh out of school to code up some Java stored procedures to run inside Oracle. After a week he had his system working and ready for open-source release, something that the team of 6 "engineers" hadn't been able to accomplish in 6 months of full-time work. Yet they never accepted that they were going about things in the wrong way though eventually they did give up on the project.

Using MMM to ALTER huge tables

Few months ago, I wrote about a faster way to do certain table modifications online. It works well when all you want is to remove auto_increment or change ENUM values. When it comes to changes that really require table to be rebuilt - adding/dropping columns or indexes, changing data type, converting data to different character set - MySQL master-master replication especially accompanied by MMM can be very handy to do the changes with virtually no downtime.

Couple of days ago I worked with one of our MySQL support customers as they were upgrading their application and mysql schema. We deployed and used MySQL Master-Master replication manager (MMM) ever since we started working so doing all the schema changes synchronously and with only couple seconds of “downtime” was really trivial. I’d like to share my experience.


DataFabric - The guys at FiveRuns brew some great stuff yet again for Rails sharding / multi DB support

One of the lingering issues we’ve had to deal with over the last year in the Manage service is ActiveRecord’s reluctance to talk to more than one database. It’s really quite stubborn in that regard and while there are a few solutions out there, none of them worked quite like we wanted. Some required intrusive application-level code changes, some didn’t offer the features we needed, so we bit the bullet and wrote what we needed.