I Lessen Data: August 2013

I have a nested file structure, with 200 subdirectories, and I want to get rid of everything that ends with "Deriv.csv" (but keep all other files). Usually I would do this:
find . -name "*Deriv.csv"
to find all the files (find everything in the directory structure starting here (.) that has the name "anything that ends with Deriv.csv"), and then just rm them all, with the help of xargs. It would look like this:
find . -name "*Deriv.csv" | xargs rm
(this is a good pattern to know, though a little hard to wrap your head around at first; piping to xargs makes the results of your first thing be the argument to the second thing. So if it finds file1Deriv.csv, file2Deriv.csv, file3Deriv.csv, then the | xargs rm makes it do "rm file1Deriv.csv file2Deriv.csv file3Deriv.csv".)

The added difficulty comes in because some of the filenames have spaces, so the find returns something like:
file 1 Deriv.csv
file 2 Deriv.csv
and then just passing that to xargs rm makes it run:
rm file 1 Deriv.csv file 2 Deriv.csv
which of course makes it complain that "file" is not found, "1" is not found, "Deriv.csv" is not found, etc. (I'm lucky that I didn't have any stray things called "file" or "Deriv.csv" sitting around that I wanted to keep, or they would have been removed by this mistake!)

One thing I found here (thanks!) is:
find . -name "*Deriv.csv" | xargs -I{} rm {}
-I does two things:
1. separate arguments by line, not by whitespace (great!)
2. make everything after the -I be its own command that you can control however you want. In this case, we do something simple: take the argument (with the {}) and put it after an rm. But you could also do, for example:
mkdir deriv_files
find . -name "*Deriv.csv" | xargs -I{} mv {} deriv_files/

which is pretty neat. (reference)

I Lessen Data

Friday, August 16, 2013

find | xargs rm, if your filenames have spaces