The Accidental Rubyist

invalid byte sequence in UTF-8

Another ruby config file reader

leave a comment »

Yaml is a great format for storing data. However, if that data is complex, YAML’s parser or loader can refuse to work. YAML expects multiline data to be indented and not contain anything YAMLish. I was once reading up large docs into a hash, and using yaml to dump them. However, YAML could not read them up.
Someone wrote a ruby website system using yaml as the template (I think hobix), and I suspect the data would have to be simple text. The text I use is often programming code, ruby, even YAML itself.
I needed a heredoc kind of facility for multiline comments.

Until now I’ve written a bunch of stuff that uses an outer loop of: while file.readline. However, for reading up blocks that leads to keeping switches and the code gets bugly.

I also did the same thing using regexp’s which means going through the whole file many times, again not clean.

Amazingly in just five minutes of coding, I was able to get this working perfectly the very first time. It’s the kind of thing Cobol coders would recognize easily – the control break kind of flow.

The format of multiline data is as follows:

BODY: <<!!
My data comes here. Anythin goes …
!!

So what comes after the 2 <<’s specifies the delimiter to look for as the only text in a line as the end of that data block. In this example it is “!!”.

 
line = file.gets;
while line
  # check for heredoc start
  if line =~ /^(\w+):\s*<<(.*)$/
    kword=$1;
    delim=$2;
    buffer="";
    line = file.gets;
    # keep reading until you find the end of heredoc
    while line !~ /^#{delim}$/
      buffer = buffer + line;
      line = file.gets;
    end
    myhash[kword]=buffer;
  else
    # this is a simple, single line assignment
    if line =~ /^(\w+):\s*(.*)$/
      myhash[$1]=$2;
    else
      # did not know what to know with this
      print "ERROR: #{line}\n"
    end
  end
  line = file.gets;
end
print myhash["INTRO"],"\n";;
print myhash["BODY"],"\n";;

The program does one file.gets outside the overall loop. The rest of the gets happen at the end of the loop. If a block start is encountered, then the program keeps reading withing another loop, until it reaches the block end.
This is a cleaner logic, with no flags to be set and checked.

Here’s a tiny sample input file:


TITLE: Introduction to ruby
SUBTITLE: This is a subtitle
INTRO: <<!!
This page is to introduce newcomers to ruby. Welcome to ruby 1.9.
Please note that ruby is an improvement on Cobol and Fortran.

!!
BODY: <<!!

Ruby 1.9
========
Changes from Ruby 1.8.x to Ruby 1.9
-----------------------------------
Here goes:
- Introduction
  - Memory leaks introduced for first time
  - Garbage creator improvements
- Crashes
  - Choice of ways to crash the program
    - segmentation violation
    - intentionally dangling pointers
    - use `rash -f` command

!!


# grep for foo OR bar OR baz
$ cat | ruby -pe ‘next unless $_ =~ /(foo|bar|baz)/’

Advertisements

Written by totalrecall

September 3, 2008 at 5:44 pm

Posted in ruby

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: