View Single Post
Old 27th April 2010
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 1,027
Default

Quote:
Originally Posted by TerryP View Post
... brought a smile to this programmers heart.
heheh, thanks, glad you enjoyed it TerryP.

Quote:
It only ads two delemas: [*]First that although doing that algorithm is not likely to be hard, it is more naturally done using hashes (as in ksh) then the usual portable sh trick of treating a scalar $variable as a list of words: which can be manipulated using filters and variable=`assignments` (or $() if a modern sh is guaranteed: older sh's only supported ``). Lisp is quite a bit better at list processing then generic shell scripting.
I'm not too familiar with that aspect of ksh, nor am I awake enough to absorb all of this comment at the moment; also I didn't think much about how to do it. That said, one vague thought was to put the output of the size step into a file, suitably formatted for easy use by the second md5 step. The file is probably cached by the OS anyway, and this is probably a case where the algorithm is more important than the hardware. I guess you might also do something recursively, which maybe is included in your view?

Quote:
[*]Second is an obvious race condition that can cause files not to be deleted. I.e. if all files of size X have been enqueued for checksuming, and something/someone creates another file of size X at the right point in time, it can be done in such a way that it won't be checksumed along with its older peers.
Wouldn't that kind of problem by there anyway? Someone could create or delete a file while find was looking over the tree, say? Just a thought, I don't know nearly enough about such things to be sure. Of course, adding a second step might make the problem worse, yet if the whole thing is faster than a lengthy md5sum on the tree ... but, yes, if such problems exist then it's caveat emptor to the script user.

Quote:
For Vermdens purposes, I reckon such concerns are likely of esoteric value only: but the file size driven skip list idea is a great idea.
Thanks again, I'm glad if it seems like a good idea.
Reply With Quote