(from discussion on OpenOffice.org questions list)

Ne 27 listopadu 2005

Regexp is a fairly complex beast and probably quite unnatural unless you have some sort of programming training. In that sense it is questionable how useful regexps are in a generic word processor for the general public, but if you happen to have regexp experience by using tools like perl, awk, grep, lex and alike then you can express quite complex searches efficiently.

OK, first of all there is a famous cite of Jamie Zawinski: ‘(Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.)’ There is something about that :-). Nevertheless, I use regexps quite often and when limited to useful level of complexity, they could be quite useful. But, it is difficult to use them and learning curve is quite steep. Perl (probably the best and fastest implementation of RE currently available) has four manpages for RE (perlrequick, perlretut, perlre, and perlreref).

Sideshow for serious geeks: first read this, its continuation, and conclusion. Explanation of this mystery is simple, but thought provoking—apparently Perl has support for REs so complex, that all other RE implementations break down on them, but this complexity has its cost in slightly lower speed. And BTW I do not use Perl if I don’t have to (much prefer Python, but apparently here Perl is better than anybody else).

Back to our main presentation tonight: there seems to be two ways how to deal with REs in OpenOffice.org (and elsewhere). Either you will ignore them, or you will bite the bullet and learn them. Actually, the first way is not so ridiculous as it seems to be. As it was repeated many times by vi-people (vi-family editors don’t have anything else than RE for searching): “plain string is valid RE and as such will be evaluated” (let’s ignore case sensitivity of REs for a moment); i.e., when you are searching for “moron”, you can just put “moron” into your RE field and everything will work as expected. Being in this position you are not worse off, then if there were no REs at all.

However, learning REs is not so difficult as it seems to be from looking at some really advanced examples (yeah, sure you want some examples; this RE in Python syntax r"(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$" parses US phone numbers and returns their parts in different fields; courtesy of Mark Pilgrim). You can begin for starters with just something so simple as “colou?r” and even that will be incredibly helpful. Just throw “regular expression tutorial” into your friendly Google and you will find a lot of stuff which can help. You have to be aware only of couple of things—first of all, that there are at least two incompatible lines of REs living well “in wild” (for more info on that read aricle on Wikipedia). The best way how to deal with this is to learn just the type of RE used in the application you’re going to use (for OOo I just randomly stumbled upon some tutoliar on RE in OOo). BTW, you could just go to Help “List of Regular Expression”, but it is really just a reference material, which is not enough for somebody who doesn’t what’s going on.

The last thing—thank you, OOo developers, that you have included full-size REs into OOo and not something crippled like “wildcards” in M$ Word (which is just a small subset of REs packaged for non-geeks). This and other things (XSLT filters and scripting, albeit the latter is severly underdocumented) made OOo much more than just another free office suite-like (there are others), but serious platform for doing things in the proper geek-like way. Thanks!

Category: computer Tagged: OpenOffice vim regexp