January 8, 2005

du discrepancy

Will someone please explain to this poor, confused person what's going on here?

Du

OS X's lovely GUI suggests that I only have 26GB to copy, but the 'du' unix utility claims there's 30GB. Which is right? Why is there a difference?
I've just spent a long time waiting for a copy to complete, when it was doomed to fail due to lack of disk space at the destination.
Thankfully I used rsync so all is not lost, and I can prune and start again, but I'm still bemused. WTF?

Posted by savs at January 8, 2005 9:15 PM
Comments

All sorts of culprits for this one, usually.

First off would be unit confusion. I don't know if the IT world in general ever managed to reach agreement over whether to use K = 2^10 or K = 10^3, etc., when measuring disk space.

Second would be filesystem overhead - depends on the fs, but on ext2 for example, there's partially used blocks, indirect blocks, and other fs metadata to consider. The GUI may just be reporting the total of the data in all the files, rather than the space used by them, IYSWIM. If the fs you're copying to is heavily fragmented, the metadata may take up more space than it did on the source volume, particularly with something like NTFS.

Third would be sparse file confusion (a sparse file takes up *less* space and holds less data than its nominal extent), but I would expect du to count the size used on disk, not just the maximum extent of the file.

Other weirdnesses would be namespace issues - volumes mounted beneath the directory which one utility isn't traversing, hardlinked files being counted twice, symlinks being followed (or not followed), etc.

I don't know enough about OS X to know which, if any, of these are likely to apply.. Is it a whole filesystem you're backing up? What does df say?

Posted by: Steve D at January 8, 2005 9:52 PM

It might be worth examining all the options to du and seeing if you can get the figure that the GUI gave you out of it by playing with them..

Anyway, that'll teach you to use the icky-sticky point-and-drool interface intead of the command line ;-)

Posted by: Steve D at January 8, 2005 9:55 PM

Depending on the version of du... you'll want the --si option and -h or just -H these make the division correct by 1024 block sizes as apposed to 1000.

Which I think is what Steve said ;)

Posted by: Brett Parker at January 9, 2005 5:56 PM

Actually though, looking at the screenshots more closely, the GUI gives the number in bytes, too, and it doesn't seem to be using K = 1024 (though it doesn't seem to quite be using K=1000, either, or the total would be 26.42 Gb, surely)?

Posted by: Steve D at January 10, 2005 8:51 AM

Hey Steve,

This is exactly what bothered me. Surely the GUI is giving "human" numbers - which is exactly what du is supposed to be doing. So the fact that they disagree is annoying.

I just tried it on a smaller directory, and the GUI reports 6.25GB used (6,565,509,635 bytes) whilst du -sh reports 6.3GB used. du -s reports 13113744, du -sk reports 6556872. A mystery!

Posted by: Andrew Savory at January 10, 2005 9:12 AM

I don't have access to a Mac box (or anything like it - its BSDish, isn't it?), but a quick perusal of the man page for GNU du suggests that by default, it counts space used by each file, not the amount of data in it. There's a --apparent-size option to change this behaviour. There's also a -l option to make it count a hardlinked file each time it appears in the directory tree, rather than just once.

Even then, just because a given set of files takes up a certain amount of space on one volume doesn't mean it will on another, unfortunately.

What fs does your system use?

Posted by: Steve D at January 12, 2005 12:10 PM

It's built on BSD, yep.

The fs on the disk in question is HFS+ ... there's no --apparent-size option.

Posted by: Andrew Savory at January 12, 2005 12:20 PM