How to exclude patterns, files and directories with grep
Since 1974, Linux grep
The command helps people find strings in files. But sometimes grep
is just too thorough. Here are several ways to say grep
ignore different things.
The grep command
The grep
The command searches text files for strings that match the search patterns you provide on the command line. The power of grep
lies in its use of regular expressions. These allow you to describe what you are looking for, rather than having to define it explicitly.
The birth of grep
predates Linux. it was developed in the early 1970s on Unix. It takes its name from the g/re/p key sequence in the ed
line editor (by the way, pronounced “ee-dee”). This represented goverall, reregular express search, pprint the corresponding lines.
grep
is famously – perhaps, notoriously – thorough and determined. Sometimes it will search for files or directories that you’d rather not waste your time on, as the results may prevent you from seeing the wood for the trees.
Of course, there are ways to master grep. You can tell it to ignore patterns, files, and directories so that grep completes its searches faster and you don’t get swamped with meaningless false positives.
Exclusion of grounds
To search with grep
you can direct the entry of another process there such as cat
or you can supply a filename as the last command line parameter.
We use a short file that contains the text of the poem Jabberwocky, by Lewis Caroll. In both of these examples, we are looking for rows that match the search term “Jabberwock”.
cat jabberwocky.txt | grep "Jabberwock"
grep "Jabberwock" jabberwocky.text
Rows that contain matches with the search index are listed for us, with the corresponding item in each row highlighted in red. It’s a simple search. But what if we want to exclude rows containing the word “Jabberwock” and print the rest?
We can accomplish this with the -v
(reverse correspondence). This lists rows that do not match the search term.
grep -v "Jabberwock" jabberwocky.text
Lines that do not contain “Jabberwock” are listed in the terminal window.
We can exclude as many terms as we want. Let’s filter out all rows that contain “Jabberwock” and all rows that contain “and”. To achieve this, we will use the -e
option (expression). We have to use it for every search template we use.
grep -v -e "Jabberwock" -e "and" jabberwocky.txt
There is a corresponding drop in the number of rows in the output.
If we use the -E
(extended regular expressions), we can combine search patterns with “|
“, which in this context does not indicate a pipe, this is the logic OR
operator.
grep -Ev "Jabberwock|and" jabberwocky.txt
We get exactly the same result as with the previous, longer command.
The command format is the same if you want to use a regex pattern instead of an explicit search index. This command will exclude all lines starting with any letter in the set of “ACHT”.
grep -Ev "^ACHT" jabberwocky.txt
To see rows that contain a pattern but don’t either, we can pipe grep
in grep
. We will search for all rows that contain the word “Jabberwock” and then filter out all rows that too contain the word “killed”.
grep "Jabberwock" jabberwocky.txt | grep -v "slain"
File exclusion
We can ask grep
to search for a string or pattern in a collection of files. You can list each file on the command line, but with many files this approach doesn’t scale.
grep "vorpal" verse-1.txt verse-2.txt verse-3.txt verse-4.txt verse-5.txt verse-6.txt
Note that the name of the file containing the corresponding line is displayed at the beginning of each line of output.
To reduce typing, we can use wildcards. But that can be counter-intuitive. It seems to work.
grep "vorpal" *.txt
However, in this directory there are other TXT files, which have nothing to do with the poem. If we search for the word “sword” with the same command structure, we get a lot of false positives.
grep "sword" *.txt
The results we want are masked by the deluge of fake results from other files that have the TXT extension.
The word “vorpal” didn’t match anything, but “sword” is included in the word “password”, so it was found several times in some pseudo log files.
We need to exclude these files. To do this, we will use the --exclude
option. To exclude a single file called “vol-log-1.txt” we would use this command:
grep --exclude=vol-log-1.txt "sword" *.txt
In this case, we want to exclude several log files whose names begin with “vol”. The syntax we need is:
grep --exclude=vol*.txt "sword" *.txt
When we use the -R
(dereferencing-recursive) option grep
will search entire directory trees for us. By default, it will search all files in these locations. There may very well be several file types that we want to exclude.
Under the current directory on this test machine, there are nested directories containing log files, CSV files, and MD files. These are all types of text files we want to exclude. We could use a --exclude
option for each file type, but we can achieve what we want more efficiently by grouping file types together.
This command excludes all files that have CSV or MD extensions, and all TXT files whose names begin with “vol” or “log”.
grep -R --exclude=*.{csv,md} --exclude={vol*,log*}.txt "sword" /home/dave/data/
Exclude directories
If the files we want to ignore are contained in directories and there are no files in those directories that we want to search, we can exclude those entire directories.
The concept is very similar to excluding files, except we use the --exclude-dir
option and name the directories to ignore.
grep -R --exclude-dir=backup "vorpal" /home/dave/data
We’ve excluded the “backup” directory, but we’re still looking in another directory called “backup2”.
It will not be surprising that we can use the --exclude-dir
option multiple times in a single command. Note that the path to excluded directories must be given relative to the directory in which the search will begin. Do not use the absolute path from the root of the filesystem.
grep -R --exclude-dir=backup --exclude-dir=backup2 "vorpal" /home/dave/data
We can also use groupings. We can achieve the same thing more succinctly with:
grep -R --exclude-dir={backup,backup2} "vorpal" /home/dave/data
You can combine file and directory exclusions in the same command. If you want to exclude all files in a directory and exclude certain file types from directories that are sought, use this syntax:
grep -R --exclude=*.{csv,md} --exclude-dir=backup/archive "frumious" /home/dave/data
Sometimes it’s what you leave out
sometimes with grep
it can feel like trying to find a needle in a haystack. it makes a big difference to take the haystack off.
RELATED: How to use regular expressions (regex) in Linux
Comments are closed.