DaemonForums  

Go Back   DaemonForums > Miscellaneous > Guides

Guides All Guides and HOWTO's.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 15th January 2015
Carpetsmoker's Avatar
Carpetsmoker Carpetsmoker is offline
Real Name: Martin
Tcpdump Spy
 
Join Date: Apr 2008
Location: Netherlands
Posts: 2,243
Default Making find -exec faster

Here’s a little find trick that not many people seem to know:

Code:
# 13 seconds...
$ time find . -type f -exec stat {} \; > /dev/null
        13.20s real             3.94s user              9.22s sys

# 1.5 seconds! That's almost 10 times faster!
$ time find . -type f -exec stat {} + > /dev/null
        1.48s real              0.68s user              0.79s sys

# Run the first command again, to make sure we’re not being biased by fs
# cache or got some fluke
[~]% time find . -type f -exec stat {} \; > /dev/null
        13.40s real             3.67s user              9.51s sys

# FYI...
[~]% find . -type f | wc -l
    2641
That’s quite a large difference! All we did was swap the ; for a +.

Let’s see what POSIX has to say about it (emphases mine):

Quote:
If the primary expression is punctuated by a <semicolon>, the utility utility_name shall be invoked once for each pathname

[.. snip ..]

If the primary expression is punctuated by a <plus-sign>, the primary shall always evaluate as true, and the pathnames for which the primary is evaluated shall be aggregated into sets. The utility utility_name shall be invoked once for each set of aggregated pathnames.
Or in slightly more normal English: If you use ;, find will execute the utility once for every path; if you use +, it will cram as many paths as it can in an invocation.

How many? Well, as many as ARG_MAX allows. Quoting from POSIX Again:

Quote:
{ARG_MAX}
Maximum length of argument to the exec functions including environment data.
Minimum Acceptable Value: {_POSIX_ARG_MAX}

{_POSIX_ARG_MAX}
Maximum length of argument to the exec functions including environment data.
Value: 4096
Most contemporary systems have it set much higher though; Linux (3.16, x86_64) defines ARG_MAX as 131072 (128k), while FreeBSD (10, i386) gives it as 262144 (256k).

Let’s verify this with truss[^1]:

Code:
# Amount of files we have
$ find . -type f | wc -l
    2641

$ truss find . -type f -exec stat {} \; >& truss-slow
$ truss find . -type f -exec stat {} + >& truss-fast

# Less than ARG_MAX, so we expect one fork()
$ find . -type f | xargs | wc -c
    119528

# Yup!
$ grep fork truss-fast | wc -l
    1

# And we fork() once for every file
$ grep fork truss-slow | wc -l
    2641
Caveat

There is one small caveat, this won’t work:

Code:
# FreeBSD find
$ find . -type f -exec cp {} /tmp +
find: -exec: no terminating ";" or "+"

# GNU find is even more cryptic:
$ find: missing argument to `-exec'
Going back to POSIX:

Quote:
Only a <plus-sign> that immediately follows an argument containing only the two characters shall punctuate the end of the primary expression. Other uses of the <plus-sign> shall not be treated as special.
In other words, the command [em]needs[/em] to end with {} +. cp {} /tmp + doesn’t, and thus gives an error.

We can work around this by spawning a sh one-liner:

Code:
$ find . -type f -exec sh -c 'cp "$@" /tmp' {} +
__________________
UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things.
Reply With Quote
  #2   (View Single Post)  
Old 16th January 2015
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 1,027
Default

Cool, thanks! Hopefully I'll remember this next time it can be used.
Reply With Quote
  #3   (View Single Post)  
Old 22nd January 2015
Mike-Sanders Mike-Sanders is offline
Fdisk Soldier
 
Join Date: Dec 2012
Posts: 52
Default

Ahh! Great, many thanks for the tip Carpetsmoker.
__________________
www.tacoshack.xyz
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Making your own Distro SL6-A2000 FreeBSD General 4 22nd August 2011 01:50 PM
HBGary chief exec resigns over Anon hack J65nko News 0 1st March 2011 06:24 PM
HOWTO: Lightest XFCE - Making XFCE lighter and faster vermaden Guides 27 2nd September 2010 12:24 PM
init: can't exec getty after power failure mtx FreeBSD General 0 20th January 2009 10:14 AM
Exec command in zsh prompt? bsddaemon General software and network 2 18th October 2008 09:37 PM


All times are GMT. The time now is 10:56 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick