The Accidental Rubyist

invalid byte sequence in UTF-8

Archive for August 2008

ruby not using daylight saving (dst) ?

leave a comment »


irb(main):023:0> t=Time.parse("12:20 EST")
=> Sun Aug 31 22:50:00 +0530 2008

Since it is daylight saving in New York, I should be getting 21:50 not 22:50. The dst? method gives false, since it is not daylight saving in my timezone.

The Mac widget as well as http://www.timezoneconverter.com/cgi-bin/tzc.tzc give me the correct result.

(In order to run Time.parse on irb you may need to give: irb -rtime on the prompt.)

Written by totalrecall

August 31, 2008 at 11:59 pm

Posted in ruby

Vim and Textile

leave a comment »

Tim the Enchanter, has some great things to say about the vim editor, my childhood friend. He talks of Textile for Vim. Sounds great. However, I wanted to leave a note telling him about HTML.vim but Tim doesn’t take comments.

I used to like Textile but much prefer Maruku now.


Neverblock: asynch IO for ruby.
Neverblock mysql.

Written by totalrecall

August 31, 2008 at 6:07 pm

Posted in vim

OS X Leopard: So it is /etc/paths.d or /etc/paths ?

leave a comment »

So, as always, my shell script runs fine on the terminal, but bombs as a cron-job! The PATH variable for the system is different.
Leopard now requires us the create a file in /etc/paths.d (see this). Did not work for me! In any case, that does not let me specify the order of inclusion.

There’s also a file named /etc/paths which has some paths listed in it. I googled /etc/paths and found a page asking me to update that. Let’s see if that works.

btw, the cronjob spewed this out for growlnotify:

2008-08-31 17:05:03.477 growlnotify[79056:10b] could not find local GrowlApplicationBridgePathway, falling back to NSDNC
Someone on a cocoa forum said:


Cron doesn’t have access to systemuiserver, so it can’t send notifications to Growl. Search the forums.

Anyone else finding that growlnotify’s wait and sticky don’t work ?

Empty the elements in this list, by removing their insides.
— From Hpricot::Elements documentation of empty()

Written by totalrecall

August 31, 2008 at 5:57 pm

Posted in mac, unix

Extracting data with Hpricot – Night 2

with 4 comments

To the newbie following examples, who has not poured through the docs, and is not a ruby expert (me), Hpricot does give some surprises.
I spent a lot of time figuring this out.

check= trrow.search("//td[@width='30']//img[@alt='Winner']")

I need to see if the html row contains this image or not.
On some rows check is blank, on some it has the entire html as expected.

However, if i do as follows: if check != "" then this always evaluates to true.

I looked everywhere else before i found this out. There was no way for me to differentiate between the check which was blank, and the check which contained the td.
In the case of the blank check, print " #{check}" always printed nothing.

Finally i had to do this, which I don’t like: if "#{check}" != "" then. Reminds me of unix shell scripting.

I had problems cleanly separating text inside nested html such as (see source, search Rafael):

<td width="268" align="left" valign="middle">&nbsp;&nbsp;
<a href="/en_US/bios/overview/atpn409.html" class="alt2"><b>Rafael Nadal</b></a>
&nbsp;ESP&nbsp;(1)</td>


inner_text on the entire element gives me both Rafael Nadal and ESP with “?” inside.
inner_text on the a block gives me the name, but no way to extract just ESP.

There are lots of “??”‘s that come in the text. So in some cases, I just had to parse the inner_text and split on the “?”‘s.

Finally, I did get my program running. It is extremely dependent on the html, the slightest change will make this program inoperable. However, i was able to transform a difficult to visually process format to an easy one.
My output comes out like this:

Rafael Nadal           ESP  (1) def. Ryler DeHeart          USA      6-1 6-2 6-4
James Blake            USA  (9) def. Steve Darcis           BEL      4-6 6-3 1-0 (Retired)
Mardy Fish             USA      def. Paul-Henri Mathieu     FRA (24) 6-2 3-6 6-3 6-4
Gael Monfils           FRA (32) def. Evgeny Korolev         RUS      6-2 6-3 3-6 6-4
Stanislas Wawrinka     SUI (10) def. Wayne Odesnik          USA      6-4 7-6 (8-6) 6-2

The original page is here, see how different it is. I have put the winner on the left side always. The program tennsc.rb lies here. Sample usage:

./tennsc.rb http://www.usopen.org/en_US/scores/cmatch/10ms.html


Tennis scores in an easy to read format:
http://sports.yahoo.com/ten/matches

Written by totalrecall

August 30, 2008 at 6:58 pm

Posted in ruby

Tagged with

Extracting data with Hpricot – Night 1

leave a comment »

I saw the tutorial on scraping gmail with Hpricot and Mechanize, and thought I’d try it. Strangely, it works from irb, but gives errors when run as a ruby program. Mainly, page does not have a method names uri. And page.class gives String.
Also, the url of the actual email has two “?” in it. Had to remove one.

The example did not state that “email” variable extracted only gives links present in the emails. It took me hours of pouring through the resultant (huge) html to get this. (Dumb me!). So I should have used open-uri to read the email. By then I remembered that I already have fetchmail reading my new mail using IMAP and I really don’t need this program.

So I moved my attention to a larger problem at hand. I read more tutorials such as this, which referred me to the priceless Firebug addon which really helps navigate complex htmls. My problem is: each morning fetch the latest tennis scores in a simple format and email me. I always forget to open that page and miss out on news. The page is awfully complex. Each line has umpteen tables and td’s and most have no classname.

After much struggle, I went to plan B. I found another page that had all the results in a simple concize form (single line for each match) on yahoo. Used links to dump the output, used sed to filter out and mail me the page.
Aah, i had a script running in minutes, and i could even raise an alarm if my fav players had played using good ole grep.
The script (source) boils down to these 3 lines:
links -dump "http://sports.yahoo.com/ten/matches" > match.txt
sed -n '/^Matches:/,/Sports Home/p' match.txt > trimmed.txt
cat trimmed.txt | mail -s " tennis results for: $yest" [username]

Substitute [username] with your local user name or your email id.

Then I kept struggling with the complex html and fell asleep. To be continued.

there is no place like ~
— seen on http://tty1.net

Written by totalrecall

August 30, 2008 at 6:36 pm

Posted in ruby

Javascript: the good and bad

leave a comment »

Just wrote my first program in javascript (see previous post). Things I liked about javascript:
1. C-like syntax – thankfully!
2. Good regexp classes.
3. Decent handling of arrays, strings, etc. (Although, NO PRINTF !!!)

Problems I faced:

1. While executing, Firefox does NOTHING if there is an error in the file. No error message! It takes me ages to figure out each missing “;” or typo.
2. I tried looking for verifiers, and found several. None have a compile procedure along with the code. Could get none to install on my command line. JSLint comes closest in giving an online textarea which will validate.
But JSlint.com expects me to remove all HTML first. Then it complains about each “==” and insists on my using “===”. It also complains incorrectly about my using the Array(a,b,c) constructor. I have surprisingly not been able to find any utility to help me in finding errors.

3. Vim does NOT indent the program due to the “.html” extension — I have to rename the file to “.js”, indent and then rename back to “.html”.

If anyone has solutions to the above, would be grateful to hear from you.

Mozilla’s current and prior versions use a format called MORK to structure the history data. It has been charmingly referred to by former Netscape engineer Jamie Zawinski as “…the single most braindamaged file format that I have ever seen in my nineteen year career.”

Written by totalrecall

August 26, 2008 at 11:00 pm

Posted in Uncategorized

Runner’s Time calculator

leave a comment »

Some days back (while watching the Olympics), I wrote (in ruby) a command line program that given a distance and time, would calculate how much time it would take to complete some other distance.

So if Haile runs 1000 meters in 2:32.1 (2 minutes, 32.1 seconds), how much time will he take for the 10000 meters, or 5000 meters, or marathon.

Or if Bekele ran the 5000 meters in 12:56.33, how much time did he take per 400 meters, and what were his splits for each 1000m. This is something that runner’s keep calculating, and we need to keep converting to seconds, doing arithmetic and converting back. This little program does the job.

I wanted to put this program up on a site for people to use, but ruby is typically not available on apache. Most servers have php. Yesterday, I started doing this in php (my first php!), but then decided that javascript would be better. Users can download the file and run it on their computers. So now I have my first javascript program, too.

The program sits here. If you know any runners, do point them to the program (free and open source, of course).

Javascript online version: http://www.benegal.org/files/trackcalc.html
Ruby command-line version: http://www.benegal.org/files/trackcalc.rb

Let me make it clear that McCusker is a complete barking lunatic.
This is just about the stupidest file format I’ve ever seen.
– Jamie Zawinski on Mozilla’s History format (mork)

Written by totalrecall

August 26, 2008 at 9:37 pm

Posted in ruby