This page lists some real-life examples of where a regular
expression played an important role in an Excel worksheet. You will
find many other examples through a google search of the subject. Just
keep in mind that different implementations of regular expressions may have
different interpretations of patterns, especially some of the more advanced
capabilities.
Remove all text other than alphanumeric
characters and underscore characters The pattern that recognizes all characters other than
letters, numbers, and the underscore character is \W.
- This is a special metacharacter recognized by RegExp as a 'non
word character.'
Suppose column A contains fields imported from a database with A1 being
the column header. Each cell contains unacceptable characters that need
to be removed. The only text to be retained includes the letters a
through z (both upper and lower case), the numbers 0 through 9, and the
underscore character.
In B2, enter the formula =RegExpSubstitute(A2,"[^\w+]","")
and copy B2 as far down column B as required. For a complete list of
meta-characters see
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/jsgrpRegExpSyntax.asp |
Remove all text other than letters and
numbers As a further refinement, we wish to remove
underscores also. Consequently, we cannot use the \w shortcut and must
use the complete description of the pattern
[^a-zA-Z0-9]+.
An example of an Excel formula is
=RegExpSubstitute(A2,"[^a-zA-Z0-9]+","")
|
Find one string near another
Suppose we want to find a person?me exists in a string. Let?y we
are looking for my name, Tushar Mehta. However, we want to find the name
irrespective of the case of the letters and we want to find the name
irrespective of the presence (or absence) of either a middle initial or
the complete middle name. Effectively, we want to a TRUE indication for
all but the last of the possible combinations:
tushar mehta |
TRUE |
Tushar is a Mehta |
TRUE |
tushar d. mehta |
TRUE |
TuShArMeHtA |
TRUE |
tushar-mehta |
TRUE |
tushar_mehta |
TRUE |
tushar 1234567890 mehta |
TRUE |
tushar 12345678901 mehta |
FALSE |
This is not an easy task with normal Excel functions, even if it is
at all possible. However, the simple regular expression pattern
TUSHAR.{0,12}MEHTA recognizes any instance of
the tokens TUSHAR and MEHTA that are separated by no more than 12 other
characters. An Excel formula with this pattern would be:
=NOT(ISERROR(RegEexpFind((A26,"TUSHAR.{0,12}MEHTA",TRUE)))
where A26 is one of the cells containing the text being searched.
The dot between the R and the opening curly bracket stands for any
character at all (except the newline character).
|
Replacing accounting currency values with
values that Excel recognizes Suppose an Excel spreadsheet
imports data from a database where the monetary figures are represented
either by a plain number of by a number followed by the letters CR (for
Credit). Any such number needs to be converted into a negative number,
while a plain number needs to be retained as is.
45.00CR |
-45 |
7919.82CR |
-7919.82 |
29062.47CR |
-29062.47 |
3222.08 |
3222.08 |
18046.94 |
18046.94 |
110.40CR |
-110.4 |
The pattern that recognizes a number followed by a CR is
(\d*\.)?\d+(CR)
The sub-expression
(\d*\.)?\d+ isolates the
number, which may contain decimal values, and the next sub-expression
(CR) isolates the CR.
The resulting Excel formula is
=RegExpSubstitute(G1,((\d*\.)?\d+)(CR),"-$1")+0
The -$1 indicates that the first
sub-expression, if any, should have a hyphen inserted in front of it.
The absence of $2 indicates that the CR should be ignored. In case the
cell has no CR, the function returns the original value itself. The +0
at the end forces Excel to convert the returned string to a number.
My thanks to Harlan Grove for pointing out errors and for suggesting
appropriate corrections in the examples.
|
|