Groups are special objects that can match a class of
characters. A group
is indicated by a pair of braces [ and ], with the
characters that belong to the class between the braces. So the pattern
//[13579]/
will match any of the single characters 1, 3, 5, 7, or 9. This isn't
very practical when many characters are involved, so there is a way to
indicate a range of characters:
//[3-7f-p]/
The above group contains the characters 3 to 7 and f to p. If the first
character in the group is the character
//[^3-7f-p]/
matches anything except for the digits 3 to 7 and the characters f to
p. The above leaves one problem: How are the characters [,
] or - included in a group? This can be done by putting
them in a position in which they `cannot' occur, or which would make the
whole group meaningless as in:
//[]-[]/
//[^]^-[]/
//[-z]/
The first group contains exactly the three special characters, the
second group contains all characters except for the characters ],
-, ^ and [. The third group contains the two
characters - and z. To facilitate the use of the special non-ASCII
characters that occur in the native character fonts on some
computers there are some special ranges of characters: Whereas a-z means
all lower case regular characters a-ä means all lower case characters,
including the accented ones. The ä may be replaced by any other
accented character in the extended character set. Similarly A-Ä means
all uppercase characters including the ones in the extended set. The
range ä-ö (or any other two lower case extended characters) gives
all extended lower case characters, Ä-Ö gives all extended upper
case characters and ä-Ä gives all characters in the extended
character set.
Finally there are some shortcuts for groups that are used frequently. These are:
| character | group |
| # | [0-9] |
| & | [a-äA-Ä] (all alphabetic characters) |
| [0-9a-äA-Ä] (all alphanumerics) | |
| ! | any `word' character |
| ! |
any character not in words |