A string is just a sequence of characters. Let’s say we want to find a particular sequence of characters in a string; we want to find a pattern. That pattern can be anything from a sequence of numbers to a sequence of characters and everything in between.
Solution
To understand the solution to this problem, we need to first go over regular expressions.
Regular Expressions
In computing, a regular expression is defined as “a sequence of symbols and characters expressing a string or pattern to be searched for within a longer string”.
For us to be able to let the compiler know which regular expression we want it to find, we need to first learn how to write a regular expression.
Writing Regular Expressions
Writing regular expressions is all about syntax. Once you know the syntax, it’s quite easy.
The * symbol is used to represent the repetition of the character preceding it. It basically tells us that “the character before me can exist 0 or more times”
The regular expression above is representing the patterns ac, abc, abbc, abbbc, etc. The character ‘b’ can be present from 0 times to an infinite number of times.
The + symbol is also used to represent the repetition of the character preceding it. However, unlike *, the character must be present at least once.
The regular expression above is representing the patterns abc, abbc, abbbc, etc. The character ‘b’ can be present from 1 time to an infinite number of times.
If you want to specify the number of times a character is being repeated, we can use curly brackets {} along with the number of repetitions we want in the pattern.
The regular expression above is representing the pattern abbc.
We can also specify the minimum and the maximum number of times a character can be repeated.
The regular expression above is representing abbc, abbbc, abbbbc, abbbbbc.
We can also specify a set of characters using square brackets []. Any character in the set will be matched to a character in a string.
The regular expression above is representing the patterns a, b, c.
By inserting a + after the above expression we can have any combination of characters in the set.
The regular expression above is representing the patterns a, b, c, ab, ac, ba, bc, ca, cb, abc, abb, aaa, bbb, ccc, acc, bac, etc.
To show multiple sets, we simply keep adding ranges one after the other.
The regular expression above is representing patterns which include all the letters of the alphabet, regardless of if they are upper or lower case, i.e., ‘a-z’ and ‘A-Z’.
Sometimes we are looking for a character from a range of characters. For this, we can use -.
The regular expression above is representing the patterns 1, 2, 3, 4, 5, 6, 7, 8, 9.
Like the previous example, we can also have a combination of characters from the set.
The ^ symbol is used to represent the characters we do not want in our pattern.
The regular expression above is representing all the letters in the alphabet except for a, b, and c.