Quote:
Originally Posted by TerryP
... brought a smile to this programmers heart.
|
heheh, thanks, glad you enjoyed it TerryP.
Quote:
It only ads two delemas: [*]First that although doing that algorithm is not likely to be hard, it is more naturally done using hashes (as in ksh) then the usual portable sh trick of treating a scalar $variable as a list of words: which can be manipulated using filters and variable=`assignments` (or $() if a modern sh is guaranteed: older sh's only supported ``). Lisp is quite a bit better at list processing then generic shell scripting.
|
I'm not too familiar with that aspect of ksh, nor am I awake enough to absorb all of this comment at the moment; also I didn't think much about how to do it. That said, one vague thought was to put the output of the size step into a file, suitably formatted for easy use by the second md5 step. The file is probably cached by the OS anyway, and this is probably a case where the algorithm is more important than the hardware. I guess you might also do something recursively, which maybe is included in your view?
Quote:
[*]Second is an obvious race condition that can cause files not to be deleted. I.e. if all files of size X have been enqueued for checksuming, and something/someone creates another file of size X at the right point in time, it can be done in such a way that it won't be checksumed along with its older peers.
|
Wouldn't that kind of problem by there anyway? Someone could create or delete a file while find was looking over the tree, say? Just a thought, I don't know nearly enough about such things to be sure. Of course, adding a second step might make the problem worse, yet if the whole thing is faster than a lengthy md5sum on the tree ... but, yes, if such problems exist then it's caveat emptor to the script user.
Quote:
For Vermdens purposes, I reckon such concerns are likely of esoteric value only: but the file size driven skip list idea is a great idea.
|
Thanks again, I'm glad if it seems like a good idea.