We can create matching patterns that will match a fixed number of characters. For example, a pattern for a zero padded decimal number that is 4 digits long would be:
/\d\d\d\d/
Or it could be expressed like this:
/[0-9][0-9][0-9][0-9]/
Or even like this:
/[0123456789][0123456789][0123456789][0123456789]/
These are fine for fixed length formatted numbers but very often the formatting of some text can be variable and unless numbers are zero padded, we don't know how long they are. We can extend a matching pattern to signify that it should match a certain number of times or an unspecified and variable number of times.
The curly braces are used to indicate a repetition count. A single value indicates a repeat count that is a fixed length. So our first example could be expressed like this:
/\d{4}/
If the curly braces include a pair of comma separated values, the first is a minimum number and the second is a maximum. So if we wanted to match a value that was between 10 and 1000 and it was not zero padded, we could do this:
/\d{2,4}/
UK style postcodes are formatted in general with a standard layout. Mostly they conform to the pattern:
AANN NNAA
That is two letters, up to two numbers, a space and then up to two numbers and two letters.
You could match a UK style postcode with something like this:
/\w{2}\d{1,2}\s\d{1,2}\w{2}/
A United States Zip code is simpler being made up of two letters, a space and 5 digits. So it could be matched with:
/\w\w\s\d{5}/
There is a way to indicate that characters are optional with the question mark character (?).
The UK postcode example could then be simplified to allow the second character group to be optional as could the inner space character. Like this:
/\w\w\d\d?\s?\d?\d?\w?\w?/
The plus sign is used to match one or more instances of the character to its left and the asterisk to match zero or more occurrences. Here are some examples:
Seq | Pattern | Description |
---|---|---|
01 | {a,b} | Match the item to the left between a and b times. |
02 | {a,} | Match the item to the left at least a times or more. |
03 | {a} | Match the item to the left exactly a times, no more, no less. |
04 | ? | Match the item to the left zero or one times. |
05 | + | Match the item to the left 1 or more times. |
05 | +? | Match the item to the left 1 or more times using a minimal matching technique. (JavaScript 1.3) |
06 | * | Match the item to the left zero or more times. |
06 | *? | Match the item to the left zero or more times using a minimal matching technique. (JavaScript 1.3) |
99 | {0,1} | Match the item to the left zero or one times (alternative form). |
99 | {0,} | Match the item to the left zero or more times (alternative form). |
99 | {1,} | Match the item to the left 1 or more times (alternative form). |
The minimal matching technique is implemented in JavaScript version 1.3 and is based on the facilities of Perl version 5 interpreters. Minimal matching is where a match occurs with the minimum number of characters necessary to make a match. This is as opposed to the normal technique, which matches as many characters as possible.
See also: | RegExp pattern, RegExp pattern - alternation |
Prev | Home | Next |
RegExp pattern - references | Up | RegExp pattern - sub-patterns |
JavaScript Programmer's Reference, Cliff Wootton Wrox Press (www.wrox.com) Join the Wrox JavaScript forum at p2p.wrox.com Please report problems to support@wrox.com © 2001 Wrox Press. All Rights Reserved. Terms and conditions. |