The Accidental Rubyist

invalid byte sequence in UTF-8

Ecto script: URLIZE

leave a comment »

Apart from the confusingly similar name, the only other difference in URLIZE and URLIFY is this that URLIZE does not require any tags in your post. It searches and replaces all keys with the corresponding value in the URL hash. It does this for only the first match, so that each occurrence of the word is not URLIZED.

One issue with this is that if you run the script twice on the same input, it will attempt to URLIZE again, and this mangles the output.

URLIZE is almost like the ABBR script (next post) except that ABBR replaces all occurrences.

By now you could be eager to see the URLIZE script:

#!/usr/bin/env ruby
require 'yaml'
file = ARGV[0];
text =,"r").readlines.join;
mymap = YAML::load("urls.yaml")); 
mymap.each_pair{ |key, value|
  text.sub!(/(\s)#{key}/, '\1%s' % value);
}, "w").puts text;

Note that the script checks for a white-space before the key, so that substrings are not replaced. You may want to check for a space or comma or period after too. This wonderful program uses the urlify.yaml which goes as follows (warning, this is being processed by the blog system, see actual file):

ruby: <a href="">ruby</a>
perl: <a href="">perl</a>
python: <a href="">python</a>
ecto: <a href="">ecto</a>

Written by totalrecall

March 1, 2007 at 4:11 pm

Posted in ecto, ruby

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: