View Single Post
Old 27th April 2010
TerryP's Avatar
TerryP TerryP is offline
Arp Constable
 
Join Date: May 2008
Location: USofA
Posts: 1,547
Default

Quote:
Originally Posted by IdOp View Post
I'm not too familiar with that aspect of ksh, nor am I awake enough to absorb all of this comment at the moment; also I didn't think much about how to do it. That said, one vague thought was to put the output of the size step into a file, suitably formatted for easy use by the second md5 step. The file is probably cached by the OS anyway, and this is probably a case where the algorithm is more important than the hardware. I guess you might also do something recursively, which maybe is included in your view?
Generally I skip using bash/ksh features like that when possible because it is usually a warning sign that shell script isn't ideal. But in ksh and bash, it's not hard. Can't remember what NetBSDs /bin/sh is, but OpenBSD at least provides a nice korn shell .

There's several ways of implementing the algorithm, but associate array style data structures that can map things, like sizes to filenames, is how most people would likely first engage the problem (e.g. awk/perl thinking). One could actually get away with a simple list: and that can be easily accomplished in portable sh (if you actually know what you're doing), but less naturally than most scripters tend to be accustomed to reading.

Using an external file could solve it, but unless the data set is large enough to consume several megs of precious server memory, it's probably not worth the extra effort to process it that way (nor appropriate increases in security conciousness, for having to use temp files). Although one upside would be logging actions becomes easier that way. Even if the memory used without resorting to temp files, was a real issue: it would probably be better to tune it other other ways. (E.g. from sh to C, or Apache to Nginx if it's run on a webserver)


Quote:
Originally Posted by IdOp View Post
Wouldn't that kind of problem by there anyway? Someone could create or delete a file while find was looking over the tree, say? Just a thought, I don't know nearly enough about such things to be sure. Of course, adding a second step might make the problem worse, yet if the whole thing is faster than a lengthy md5sum on the tree ... but, yes, if such problems exist then it's caveat emptor to the script user.

Yes, there's no complete way around it: most operations can't be guaranteed to be atomic. At the best, you can only minimise the probability rate at which external users/daemons might step on your toes. If the directory being cleaned isn't, for example a cache of files downloaded by a web spider, than it isn't to big a problem. If it was such a cache, it might be considered a feature rather than a bug.

The race issue, is more of an issue to enjoy calculating the intellectual implications, than a serious impact on the expected problem domain. I'm also paranoid
__________________
My Journal

Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest ``foo'' someone someday shall type ``supercalifragilisticexpialidocious''.
Reply With Quote