tmehta.com Home
regexp/index.htm
  VB(A) functions
  Using functions
  Examples
  Patterns
 
This page lists some real-life examples of where a regular expression played an important role in an Excel worksheet.  You will find many other examples through a google search of the subject.  Just keep in mind that different implementations of regular expressions may have different interpretations of patterns, especially some of the more advanced capabilities.
Remove all text other than alphanumeric characters and underscore characters

The pattern that recognizes all characters other than letters, numbers, and the underscore character is \W.

  • This is a special metacharacter recognized by RegExp as a 'non word character.'

Suppose column A contains fields imported from a database with A1 being the column header. Each cell contains unacceptable characters that need to be removed. The only text to be retained includes the letters a through z (both upper and lower case), the numbers 0 through 9, and the underscore character.  In B2, enter the formula =RegExpSubstitute(A2,"[^\w+]","") and copy B2 as far down column B as required.

For a complete list of meta-characters see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/jsgrpRegExpSyntax.asp

Remove all text other than letters and numbers

As a further refinement, we wish to remove underscores also. Consequently, we cannot use the \w shortcut and must use the complete description of the pattern [^a-zA-Z0-9]+.

An example of an Excel formula is =RegExpSubstitute(A2,"[^a-zA-Z0-9]+","")
 

Find one string near another

Suppose we want to find a personís name exists in a string. Letís say we are looking for my name, Tushar Mehta. However, we want to find the name irrespective of the case of the letters and we want to find the name irrespective of the presence (or absence) of either a middle initial or the complete middle name. Effectively, we want to a TRUE indication for all but the last of the possible combinations:

tushar mehta TRUE
Tushar is a Mehta TRUE
tushar d. mehta TRUE
TuShArMeHtA TRUE
tushar-mehta TRUE
tushar_mehta TRUE
tushar 1234567890 mehta TRUE
tushar 12345678901 mehta FALSE

This is not an easy task with normal Excel functions, even if it is at all possible. However, the simple regular expression pattern TUSHAR.{0,12}MEHTA recognizes any instance of the tokens TUSHAR and MEHTA that are separated by no more than 12 other characters. An Excel formula with this pattern would be: =NOT(ISERROR(RegEexpFind((A26,"TUSHAR.{0,12}MEHTA",TRUE)))
where A26 is one of the cells containing the text being searched. The dot between the R and the opening curly bracket stands for any character at all (except the newline character).
 

Replacing accounting currency values with values that Excel recognizes

Suppose an Excel spreadsheet imports data from a database where the monetary figures are represented either by a plain number of by a number followed by the letters CR (for Credit). Any such number needs to be converted into a negative number, while a plain number needs to be retained as is.

45.00CR -45
7919.82CR -7919.82
29062.47CR -29062.47
3222.08 3222.08
18046.94 18046.94
110.40CR -110.4

The pattern that recognizes a number followed by a CR is (\d*\.)?\d+(CR)

The sub-expression (\d*\.)?\d+ isolates the number, which may contain decimal values, and the next sub-expression (CR) isolates the CR.

The resulting Excel formula is =RegExpSubstitute(G1,((\d*\.)?\d+)(CR),"-$1")+0

The -$1 indicates that the first sub-expression, if any, should have a hyphen inserted in front of it. The absence of $2 indicates that the CR should be ignored. In case the cell has no CR, the function returns the original value itself. The +0 at the end forces Excel to convert the returned string to a number.

My thanks to Harlan Grove for pointing out errors and for suggesting appropriate corrections in the examples.