The use of a number of repetitors in one pattern can make the search rather slow. The pattern
//.*.*.*abc/needs for each position in a line that doesn't contain the string abc a search time that is proportional to the third power of the number of characters in the line. Such a situation could be improved by making the pattern matcher smarter. In the above example
.*.*.*
could be
replaced by .*
. That is functionally the same. One could also
start looking for the string `abc' first (this doesn't work always
because patterns can contain linefeeds, unlike the patterns in most
regular expression matchers). All this intelligence would add much code,
and even then users will invent patterns that will take much time. So it
is left to the user to keep his patterns simple. A general rule is that
if the first character in a pattern is a fixed character the search will
be much quicker.
When a search takes much more time than expected the search can be
interrupted by pressing the `interrupt' key combination. On a UNIX system
this is the <Control>-C combination.
The above effect will occur mainly when repetitors are used that
interfere with each other. Two repetitors of the type .*.*
leave
an ambiguity of some type during the match. So when the pattern
//E.*.*2/is confronted with the line
E = m * c^3the first
.*
will be made maximal at first (10 characters). The
second .*
contains then 0 characters and then there is no more
character for the 2. So now the first repetitor goes down to 9. The
second takes 1 and there is no room. Then the second becomes 0 again and
the 2 is compared with the 3. No match! The the sizes
(8,2),(8,1),(8,0),(7,3) etc. are tried. In total 66 combinations are
tried!
On the other hand the pattern
//ab*cd*ef*g/has hardly any problems with the line
abbbbcddddddeffgeven though there are three repetitors. Here the repetitors don't interfere and the match is found in one attempt.
Patterns that start with a large degree of freedom will be rather slow. The pattern
//.zzzz/will be significantly slower than
//zzzz/The first pattern will always score a partial match at the position of the period. Then the whole pattern matching engine is started to find that there is no full match. In the second case the first character is a fixed character and looking for it can be done fast. In most files there should be very few hits and the pattern matchine engine is rarely needed.