You need to trim leading and/or trailing whitespace from lines for fields of data.
These solutions rely on a bash-specific treatment of read and $REPLY.
See the end of the discussion for an alternate solution.
First, we’ll show a file with some leading and trailing whitespace. Note we add ~~ to show the whitespace.
Note the ➝ denotes a literal tab character in the output:
1 2 3 4 5 6 7 8 9 10 11 |
# Show the whitespace in our sample file $ while read; do echo ~~"$REPLY"~~; done < whitespace ~~ This line has leading spaces.~~ ~~This line has trailing spaces. ~~ ~~ This line has both leading and trailing spaces. ~~ ~~ ➝ Leading tab.~~ ~~Trailing tab. ➝ ~~ ~~ ➝ Leading and trailing tab. ➝ ~~ ~~ ➝ Leading mixed whitespace.~~ ~~Trailing mixed whitespace. ➝ ~~ ~~ ➝ Leading and trailing mixed whitespace. ➝ ~~ |
To trim both leading and trailing whitespace use $IFS add the built-in REPLY variable (see the discussion for why this works):
1 2 3 4 5 6 7 8 9 10 |
$ while read REPLY; do echo ~~"$REPLY"~~; done < whitespace ~~This line has leading spaces.~~ ~~This line has trailing spaces.~~ ~~This line has both leading and trailing spaces.~~ ~~Leading tab.~~ ~~Trailing tab.~~ ~~Leading and trailing tab.~~ ~~Leading mixed whitespace.~~ ~~Trailing mixed whitespace.~~ ~~Leading and trailing mixed whitespace.~~ |
To trim only leading or only trailing spaces, use a simple pattern match:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# Leading spaces only $ while read; do echo "~~${REPLY## }~~"; done < whitespace ~~This line has leading spaces.~~ ~~This line has trailing spaces. ~~ ~~This line has both leading and trailing spaces. ~~ ~~ ➝ Leading tab.~~ ~~Trailing tab. ~~ ~~ ➝ Leading and trailing tab. ➝ ~~ ~~ ➝ Leading mixed whitespace.~~ ~~Trailing mixed whitespace. ➝ ~~ ~~ ➝ Leading and trailing mixed whitespace. ➝ ~~ # Trailing spaces only $ while read; do echo "~~${REPLY%% }~~"; done < whitespace ~~ This line has leading spaces.~~ ~~This line has trailing spaces.~~ ~~ This line has both leading and trailing spaces.~~ ~~ ➝ Leading tab.~~ ~~Trailing tab. ~~ ~~ ➝ Leading and trailing tab. ➝ ~~ ~~ ➝ Leading mixed whitespace.~~ ~~Trailing mixed whitespace. ➝ ~~ ~~ ➝ Leading and trailing mixed whitespace. ➝ ~~ |
Trimming only leading or only trailing whitespace (including tab) is a bit more complicated:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
# You need this either way $ shopt -s extglob # Leading whitespaces only $ while read; do echo "~~${REPLY##+([[:space:]])}~~"; done < whitespace ~~This line has leading spaces.~~ ~~This line has trailing spaces. ~~ ~~This line has both leading and trailing spaces. ~~ ~~Leading tab.~~ ~~Trailing tab. ~~ ~~Leading and trailing tab. ➝ ~~ ~~Leading mixed whitespace.~~ ~~Trailing mixed whitespace. ➝ ~~ ~~Leading and trailing mixed whitespace. ➝ ~~ # Trailing whitespaces only $ while read; do echo "~~${REPLY%%+([[:space:]])}~~"; done < whitespace ~~ This line has leading spaces.~~ ~~This line has trailing spaces.~~ ~~ This line has both leading and trailing spaces.~~ ~~ ➝ Leading tab.~~ ~~Trailing tab.~~ ~~ ➝ Leading and trailing tab.~~ ~~ ➝ Leading mixed whitespace.~~ ~~Trailing mixed whitespace.~~ ~~ ➝ Leading and trailing mixed whitespace.~~ |
OK, at this point you are probably looking at these lines and wondering how we’re going to make this comprehensible.
It turns out there’s a simple, if subtle explanation. Here we go. The first example used the default $REPLY variable that read uses when you do not supply your own variable name(s).
Chet Ramey (maintainer of bash) made a design decision that, “[if] there are no variables, save the text of the line read to the variable $REPLY [unchanged, else parse using $IFS].”
1 |
$ while read; do echo ~~"$REPLY"~~; done < whitespace |
But when we supply one or more variable names to read, it does parse the input, using the values in $IFS (which are space, tab, and newline by default).
One step of that parsing process is to trim leading and trailing whitespace—just what we want:
1 |
$ while read REPLY; do echo ~~"$REPLY"~~; done < whitespace |
To trim leading or trailing (but not both) spaces is easy using the ${##} or ${%%} operators.
1 2 |
$ while read; do echo "~~${REPLY## }~~"; done < whitespace $ while read; do echo "~~${REPLY%% }~~"; done < whitespace |
But covering tabs is a little harder. If we had only tabs, we could use the ${##} or ${%%} operators and insert literal tabs using the Ctrl-V Ctrl-I key sequence.
But that’s risky since it’s probable there’s a mix of spaces and tabs, and some text editors or unwary users may strip out the tabs.
So we turn on extended globbing and use a character class to make our intent clear.
The [:space:] character class would work without extglob, but we need to say “one or more occurrences” using +( ) or else it will trim a single space or tabs, but not multiples or both on the same line.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# This works, need extglob for +( ) part $ shopt -s extglob $ while read; do echo "~~${REPLY##+([[:space:]])}~~"; done < whitespace $ while read; do echo "~~${REPLY%%+([[:space:]])}~~"; done < whitespace # This doesn't $ while read; do echo "~~${REPLY##[[:space:]]}~~"; done < whitespace ~~This line has leading spaces.~~ ~~This line has trailing spaces. ~~ ~~This line has both leading and trailing spaces. ~~ ~~Leading tab.~~ ~~Trailing tab. ~~ ~~Leading and trailing tab. ~~ ~~ ➝ Leading mixed whitespace.~~ ~~Trailing mixed whitespace. ➝ ~~ ~~ ➝ Leading and trailing mixed whitespace. ➝ ~~ |
Here’s a different take, exploiting the same $IFS parsing, but to parse out fields (or words) instead of records (or lines):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
$ for i in $(cat white_space); do echo ~~$i~~; done ~~This~~ ~~line~~ ~~has~~ ~~leading~~ ~~white~~ ~~space.~~ ~~This~~ ~~line~~ ~~has~~ ~~trailing~~ ~~white~~ ~~space.~~ ~~This~~ ~~line~~ ~~has~~ ~~both~~ ~~leading~~ ~~and~~ ~~trailing~~ ~~white~~ ~~space.~~ |
Finally, although the original solutions rely on Chet’s design decision about read and $REPLY, this solution does not:
1 2 3 4 5 6 7 8 9 10 |
shopt -s extglob while IFS= read -r line; do echo "None: ~~$line~~" # preserve all whitespaces echo "Ld: ~~${line##+([[:space:]])}~~" # trim leading whitespace echo "Tr: ~~${line%%+([[:space:]])}~~" # trim trailing whitespace line="${line##+([[:space:]])}" # trim leading and... line="${line%%+([[:space:]])}" # ...trailing whitespace echo "All: ~~$line~~" # Show all trimmed done < whitespace |