Perl Pattern Matching

A pattern is a sequence of characters to be searched for in a character string. Perl patterns are normally enclosed in slash characters /def/.

There are two pattern Matching operators

Pattern Matching Operators
Pattern Match

Pattern matched - NON Zero or true returned
NO Pattern Matched - 0 or False returned

No Pattern Match NO Pattern matched - NON Zero or true returned
Pattern Matched - 0 or False returned
Pattern Matching Operators Examples
Pattern Match

$result = $var =~ /abc/;                     # true if $var has abc in the string

if ( $question =~ /right/ ) { .... }         # true if $var has abc in the string, false if abc is not in $var

No Pattern Match

$result = $var !~ /abc/;                     # true if $var does not have abc in the string

if ( $question !~ /right/ ) { .... }         # true if $var does not have abc, false if abc does appear in string $var

Pattern match operators do have a order of precedence which can be viewed from the Perl Cheat Sheet.

Special Characters

There are a number of special characters which can be used inside the patterns, which enables you to match any of a number of character strings, these are what make patterns useful.

Special Characters
. character matches any character except the newline character, the special combination of .* tries to match as much as possible.
+ character means one or more of the preceding characters
[ ] character enable you to define patterns that match one of a group of alternatives, you can also uses ranges such as [0-9] or [a-z,A-Z]
* character match zero or more occurrences of the preceding character
? character match zero or one occurrence of the preceding character
Pattern anchor there are a number of pattern anchors, match at beginning of a string (^ or \A), match at the end of a string ($ or \Z), match on word boundary (\b) and match inside a work (\B - opposite of \b)
Escape sequence if you want to include a character that is normally treated as a special character, you must precede the character with a backslash, you can use the \Q to tell perl to treat everything after as a normal character until it see's \E
Excluding you can exclude words or characters by using the ^ inside square brackets [^]
Character-Range escape sequences there are special character range escape sequences such as any digit (\d), anything other than a digit (\D), to see the full list see Perl Cheat Sheet
Specified number of occurrences you can define how any occurrences you want to match using the {<minimum>,<maximum>}
specify choice the special character | (pipe) enables you to specify two or more alternatives to choose from when matching a pattern
Portition reuse some times you want to store what has been matched, you can do this by using (), the first set will be store in \1 (used in pattern matching) or $1 (used when assigning to variables) , the second set \2 or $2 and so on.
Different delimiter you can specify a different delimiter
Special Characters Examples
. character /d.f/          # could match words like def, dif, duf
/d.*f/         # could match words like deaf, deef, def, dzzf, etc
+ character /de+f/         # could match words like def, deef, deeef, deeeef, etc
/ +/           # match words between multiple spaces
[ ] character

/d[eE]f/       # match words def or dEf
/a[456]c/      # match a followed by any digit then c such as a4c, a5c or a6c
/d[eE]+f/      # match words like def, dEf, deef, dEeF, dEEeeEef, etc

/d[a-z]f/      # match words like def, def, dzf, dsf, etc
/1[0-9]0/      / match numbers like 100, 110, 120, 150, 170, 190, etc

* character /de*f/         # match words like df, def, deef, deeef, etc
? character /de?f/         # match only the words df and def (not deef only matches one occurence)
Pattern anchors

/^hello/       # match only if line starts with hello
/hello$/       # match only if hello is at end of line

/\bdef/        # only matches when def is at the beginning of a word define, defghi
/def\b/        # only matches when def is at the end of a word abcdef

/\Bdef/        # matches abcdef (opposite of \b)
/def\B/        # matches defghi (opposite of \b)

Escape sequence /\+salary/     # will match the word +salary, the + (plus) is treated as a normal character because of the \
/\Q**++\E/     # will match **++
Excluding /d[^eE]f/      # 1st character is d, 2nd character is anything other than e or E, last character is f
Character-Range escape sequences /\d/           # match any digit
/\d+/          # match any number of digits
Specified number of occurrences /de{3}f/       # match only deeeef the {3} means three preceding e's
/de{1,3}       # match only deef, deeef and deeeef ( minimum = 1, maximum = 3 occurrences)  
specify choice /def|ghi/      # match either def or ghi
Portition reuse /(def)(ghi)/   # the first matched pattern will be store in \1 or $1, the second in \2 or $2
$result = $1;  # assign the obtained matched pattern above in $result
$result2 = $2; # assign the second obtained matched pattern above in $result2
Different delimiter !/usr/sbin!    # match /usr/sbin, here we are using the ! (bang) character as a delimiter

Pattern-Matching Options

When you specify a pattern, you can also supply options that control how the pattern is to be matched, to see the full list see the Perl Cheat Sheet

Pattern-Matching Options Examples
Match all possible patterns
@matches = "balata" =~ /.a/g;    # Matches ba, la and ta
$matches = "balata" =~ /.a/g;    # Matches the first occurrence ba
Ignore case /de/i                            # matches de, De, dE or DE
Treat string as multiple lines /^The/m                          # Match the word in multiple lines
Only evaluate once /def/o                           # Match only the first occurrence
Treat string as single line /a.*bc/s                         # Used when you multiple lines
Ignore white space in pattern /\d{2} \d{2} /x                  # ignore the spaces in the pattern so the code is                                  # interpreted as /\d{2}\d{2}/ spaces make the code easier
                                 # to read

s/abc/def/                       # substitute the first occurrence of abc with def
s/abc//                          # delete first occurrence abc
s/abc//g                         # delete all occurrences of abc (global substitution)

Note: you can use any other pattern-matching option with substitution


tr/abc/def/                      # translate all a into d, all b into e, all c into f
y/a-z/A-Z/                       # translate all characters into upper case, a=A, b=B, etc

Note: you can use y/ instead of tr/, you have several options
/c  -  Translate all characters not specified
/d  -  Delete all specified characters
/s  -  Replace multiple identical output characters with a single character