The regular expression mechanism of grep provides for some very powerful patterns that can fit most of your needs.
A regular expression describes patterns for matching against strings. Any alphabetic character just matches that character in the string.
“A” matches “A”, “B” matches “B”; no surprise there.
But regular expressions define other special characters that can be used by themselves or in combination with other characters to make more complex patterns.
We already said that any character without some special meaning simply matches itself—“A” to “A” and so on.
The next important rule is to combine letters just by position, so “AB” matches “A” followed by “B”.
This, too, seems obvious.
The first special character is (.). A period (.) matches any single character.
Therefore …. matches any four characters; A. matches an “A” followed by any character; and
.A. matches any character, then an “A”, then any character (not necessarily the same character as the first).
An asterisk (*) means to repeat zero or more occurrences of the previous character.
So A* means zero or more “A” characters, and .* means zero or more characters of any sort (such as “abcdefg”, “aaaabc”, “sdfgf ;lkjhj”, or even an empty line).
So what does ..* mean? Any single character followed by zero or more of any character (i.e., one or more characters) but not an empty line.
Speaking of lines, the caret ^ matches the beginning of a line of text and the dollar sign $ matches the end of a line; hence ^$ matches an empty line (the beginning followed by the end, with nothing in between).
What if you want to match an actual period, caret, dollar sign, or any other special character? Precede it by a backslash (\).
So ion. matches the letters “ion” followed by any other letter, but ion\. matches “ion” bounded by a period (e.g., at the end of a sentence or wherever else it appears with a trailing dot).
A set of characters enclosed in square brackets (e.g., [abc]) matches any one of those characters (e.g., “a” or “b” or “c”).
If the first character inside the square brackets is a caret, then it matches any character that is not in that set.
For example, [AaEeIiOoUu] matches any of the vowels, and [^AaEeIiOoUu] matches any character that is not a vowel.
This last case is not the same as saying that it matches consonants because [^AaEeIiOoUu] also matches punctuation and other special characters that are neither vowels nor consonants.
Another mechanism we want to introduce is a repetition mechanism written as \{n,m\} where n is the minimum number of repetitions and m is the maximum.
If it is written as \{n\} it means “exactly n times,” and when written as “\{n,\}” then “at least n times.”
For example, the regular expression A\{5\} means five capital A letters in a row, whereas A\{5,\} means five or more capital A letters.