tmehta.com Home
regexp/index.htm
  VB(A) functions
  Using functions
  Examples
  Patterns
 

Understanding Regular Expression patterns

This page was meant to introduce beginners to regular patterns.  A lot of work still remains.  In the meantime, you are urged to search google.com for the examples of regular expressions and also to read up on Microsoft's implementation at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/reconintroductiontoregularexpressions.asp

At first, a regular expression pattern can be confusing, even intimidating.  But, with a little patience, it will all fall in place and the power of regular expressions will be at your disposal.  We first look at the basic building blocks, and then follow up with a few patterns that result from the combination of the building blocks.  This is meant to be an introductory tutorial and doesn't even attempt to cover advanced and sophisticated pattern matching.

A string literal

This is something straight out of the normal search routines that everyone is familiar with.  An example would be Excel's FIND function

The pattern "hello" matches the hello in "hello world" and in "Charlie says hello"

It does not match anything in "Charlie says hel-lo"

Escape character

As shown below, there are some number of characters with specific meaning in a regular expression.  To use any of those characters as itself, it must be 'escaped' with the reverse slash character.

For example, ^ is a special character indicating the start of the string.  \^ means the caret character itself.  Similarly, the dot . in a pattern stands for any single character at all.  So, the correct way to specify a dot as itself in a pattern is \.

At the start of the string

^ indicates the start of the string. 

The pattern "^hello" matches the hello in "hello world" but not in "Charlie says hello"

At the end of the string

$ indicates the end of the string

The pattern "hello$" matches the hello in "Charlie says hello" but not in "hello world"

Any single character

. is the pattern for any one character.  In other Microsoft software, it has been the question mark character ?

The pattern "he." matches hel in "hello world"  and in "Charlie says hello" but finds no match in "he"

A set of characters

Specify a set characters by enclosing them inside square brackets [ and ].

Create a list of all characters between two specified characters with the -.

 

The pattern [hlo] specifies any of the characters h, l, or o.

The pattern [a-z] specifies any of the lower case letters between a and z, both inclusive.  Similarly, [a-zA-Z0-9] specifies any letter between a and z, both inclusive irrespective of case or any digit between 0 and 9, both inclusive.

   
   
   
Special characters

Repeat a pattern

There are three variants of this capability

Repeat a character zero or one times

? is the pattern to repeat the previous token zero or one time

The pattern 9,? will match a nine by itself or a nine followed by a comma.  Similarly, 9\.? will match any occurrence of a nine followed by a dot.  Note that the dot has to be 'escaped' because by itself it has special significance in a regular expression.  For example, 9.? would match 9x in the string 9xyz.

Repeat a character zero or more times

* is the pattern to repeat the previous token zero or more times

The pattern 9,* will match a nine by itself or a nine followed by any number of commas including zero.   Effectively, it would match 9 or 9, or 9,, or 9,,, or...you get the idea.

Repeat a character one or more times

+ is the pattern to repeat the previous token one or more times

The pattern 9,+ will match a nine followed by one or more commas.   Effectively, it would match 9, or 9,, or 9,,, or...you get the idea.  However, it would not match 9 by itself.