Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I just use find. it's a little longer but gives me the full paths and is more consistent. also works well if you need to recurse. something like `find . -type f | while read -r filepath; do whatever "${filepath}"; done`


I love this example, because it highlights how absolutely cursed shell is if you ever want to do anything correctly or robustly.

In your example, newlines and spaces in your filenames will ruin things. Better is

    find … -print0 | while read -r -d $'\0'; do …; done
This works in most cases, but it can still run into problems. Let's say you want to modify a variable inside the loop (this is a toy example, please don't nit that there are easier ways of doing this specific task).

    declare -a list=()

    find … -print0 | while read -r -d $'\0' filename; do
        list+=("${filename}")
    done
The variable `list` isn't updated at the end of the loop, because the loop is done in a subshell and the subshell doesn't propagate its environment changes back into the outer shell. So we have to avoid the subshell by reading in from process substitution instead.

    declare -a list=()

    while read -r -d $'\0' filename; do
        list+=("${filename}")
    done < <(find … -print0)
Even this isn't perfect. If the command inside the process substitution exits with an error, that error will be swallowed and your script won't exit even with `set -o errexit` or `shopt -s inherit_errexit` (both of which you should always use). The script will continue on as if the command inside the subshell suceeded, just with no output. What you have to do is read it into a variable first, and then use that variable as standard input.

    files="$(find … -print0)"
    declare -a list=()

    while read -r -d $'\0' filename; do
        list+=("${filename}")
    done <<< "${files}"
I think there's an alternative to this that lets you keep the original pipe version when `shopt -s lastpipe` is set, but I couldn't get it to work with a little experimentation.

Also be aware that in all of these, standard input inside the loop is redirected. So if you want to prompt a user for input, you need to explicitly read from `/dev/tty`.

My point with all this isn't that you should use the above example every single time, but that all of the (mis)features of shell compose extremely badly. Even piping to a loop causes weird changes in the environment that you now have to work around with other approaches. I wouldn't be surprised if there's something still terribly broken about that last example.


You have really proven your point even more than you meant to. Unfortunately none of these examples are robust.

The "-r" flag allows backslash escaping record terminators. The "find" command doesn't do such escaping itself, so that flag will cause files with backslashes at the end to concatenate themselves with the next file.

Furthermore, if IFS='' is not placed before each instance of read, or set somewhere earlier in the program, than each run of white-space in a filename will be converted into a single space.

EDIT: I proved your point even more. The "-r" flag does the opposite of what I thought it did, and disables record continuation. So the correct way to use read would be with IFS='' and the -r flag.


Love it. And I wouldn’t be surprised in the least if even this fell apart in some scenarios too.


Wow, you people are really young.

http://www.etalabs.net/sh_tricks.html


Is there a reason to prefer `while read; ...;done` over find's -exec or piping into xargs?


Both `find -exec` and xargs expect an executable command whereas `while read; ...; done` executes inline shell code.

Of course you can pass `sh -c '...'` (or Bash or $SHELL) to `find -exec` or xargs but then you easily get into quoting hell for anything non-trivial, especially if you need to share state from the parent process to the (grand) child process.

You can actually get `find -exec` and xargs to execute a function defined in the parent shell script (the one that's running the `find -exec` or xargs child process) using `export -f` but to me this feels like a somewhat obscure use case versus just using an inline while loop.


I will sometimes use the "| while read" syntax with find. One reason for doing so is that the "-exec" option to find uses {} to represent the found path, and it can only be used ONCE. Sometimes I need to use the found path more than once in what I'm executing, and capturing it via a read into a reusable variable is the easiest option for that. I'd say I use "-exec" and "| while read" about equally, actually. And I admittedly almost NEVER use xargs.


This will fail for files with newlines.


How common are they?


This whole post is about uncommon things that can break naive file parsing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: