Inconsistencies in Unix programs
I use the command-line all the time, as well as ruby and Vim. I also use regular expressions all the time.
With all the brains going behind the various unices/unixes one would expect consistent handling of regexes across various unix programs on one system.
vim does take “\d” in place of [0-9], however, it requires escaping of the “+” but not of the “*”.
So i can say “\d*” but i must say “\d\+”
Most other programs (I am using OS X, perhaps the GNU programs are improved) do not recognize “\d” and other such shorter forms.
The other day I found that
expr does not understand the “+” at all, even with escaping!
None of the standard unix programs such as
expr understand minimal matching, which from my perspective should have been the default.
The escaping of round brackets differs between vim and the unix programs on one hand, and perl/ruby on the other.
For those needing a quick way of doing minimal regexp matching, here’s something in ruby:
ruby -ne ‘if /<title.*?>(.*?)<\/title>/ then puts $1;end’
The first “.*” after title is there becos the string contains single quotes, and i cannot put a single quote within the command being sent to ruby. If I use double quotes around the command, the “$1” is interpreted by the shell.
So then i tried putting this in a program, to which i could pass a regexp and filenames. Since the regexp passed in would have to be substituted (
if /$regexp/) the command would have to be in double quotes, but then the “$1” also gets substituted by the shell! A little delving into the pickaxe got me an answer …
ruby -ne “if /$regexp/ then puts Regexp.last_match(1);end” $*
Save the above as rugrep.sh, and call as follows:
./rugrep.sh ‘<title.*?>(.*?)<\/title>’ *.html
“Who is General Failure and why is he reading my hard disk ?”