You are searching for a phrase in a document, and want to show the paragraph after the found phrase.
We’re assuming a simple text file, where paragraph means all the text between blank lines, so the occurrence of a blank line implies a paragraph break.
Given that, it’s a pretty short awk program:
1 2 3 4 5 6 |
$ cat para.awk /keyphrase/ { flag=1 } { if (flag == 1) { print $0 } } /^$/ { flag=0 } $ $ awk -f para.awk < searchthis.txt |
There are just three simple code blocks.
The first is invoked when a line of input matches the regular expression (here just the word “keyphrase”). If “keyphrase” occurs anywhere within a line of input, that is a match and this block of code will be executed.
All that happens in this block is that the flag is set.
The second code block is invoked for every line of input, since there is no regular expression preceding its open brace.
Even the input that matches “keyphrase” will also be applied to this code block (if we didn’t want that effect, we could use a continue statement in the first block).
All this second block does is print the entire input line, but only if the flag is set.
The third block has a regular expression that, if satisfied, will simply reset (turn off) the flag.
That regular expression uses two characters with special meaning—the caret (^), when used as the first character of a regular expression, matches the beginning of the line; the dollar sign ($), when used as the last character, matches the end of the line.
So the regular expression ^$ means “an empty line” because it has no characters between the beginning and end of the line.
We could have used a slightly more complicated regular expression for an empty line to let it handle any line with just whitespace rather than a completely blank line.
That would make the third line look like this:
1 |
/^[:blank:]*$/ { flag=0 } |
Perl programmers love the sort of problem and solution discussed in this recipe, but we’ve implemented it with awk because Perl is (mostly) beyond the scope of this book.
If you know Perl, by all means use it. If not, awk might be all you need.