Adeptus-Mechanicus

Main
Codex
Librarium Whitehat
Advisories
Blog Pics
"Inveniam viam aut faciam" : I will either find a way, or I shall make one


MY SERVER IS LYING ABOUT MY DISKS

Here is an interesting problem I can up against again recently. On a linux server, when you want to see your disk usage, you can do this..
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              99M   12M   82M  13% /boot
tmpfs                1014M     0 1014M   0% /dev/shm
/dev/sdb1             493G  319G  149G  69% /remotelog


..and we see that "/remotelog" is 69% used. End of Story. But, what if you decided you wanted to just check where the usage occurs, so you do..
# du -csh /remotelog/
155G    /remotelog/
155G    total

Suddenly you get two very different answers. Engage HCM (Headless Chicken Mode), run around the room screaming either "My server is haxored" or "I have gone mad" or a combination of the two. Once you got that out your system disengage HCM and lets take a look at this.You see, neither of the previous two explanations are right. The answer is actually fairly simple. df and du both report on disk usage, but they both (and I am simplifying here) get there data from different places. du looks at what is on the disk (like adding up the file sizes in a ls) while df looks at what the system is doing. While they should be the same, they can differ. On this system for example, lets do a lsof (a glorious tool)..
# lsof | grep remote
rsyslogd   6756      root    1w      REG       8,17        12106   12861530 /remotelog/2010/09/30/10.172.80.17.log (deleted)
rsyslogd   6756      root    5w      REG       8,17        15386   12861506 /remotelog/2010/09/30/choprly1.log (deleted)
rsyslogd   6756      root    8w      REG       8,17  94709719760   12861535 /remotelog/2010/10/01/10.172.3.172.log
rsyslogd   6756      root    9w      REG       8,17         1568   12861537 /remotelog/2010/10/01/10.172.2.24.log
rsyslogd   6756      root   10w      REG       8,17         1918   12861538 /remotelog/2010/10/01/192.168.50.127.log
rsyslogd   6756      root   11w      REG       8,17         1922   12861516 /remotelog/2010/10/01/10.172.103.27.log
rsyslogd   6756      root   12w      REG       8,17         1680   12861536 /remotelog/2010/10/01/10.172.2.25.log
rsyslogd   6756      root   13w      REG       8,17         5552   12861539 /remotelog/2010/10/01/10.172.80.17.log
rsyslogd   6756      root   14w      REG       8,17        11603   12861517 /remotelog/2010/10/01/choprly1.log
rsyslogd   6756      root   15w      REG       8,17 176674191060   12861526 /remotelog/2010/09/30/10.172.3.172.log (deleted)

And the penny (or locale-specific minimum denomination currency unit) drops. The 'system' is still busy with some files on that disk. The files may be gone, but for whatever reason the files are still in use by the system. This is why du and df are giving different answers. So to fix this, a simple..
/etc/init.d/rsyslogd restart

..and wait a bit for the old process to wrap up. Then when we do a df we get..
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              99M   12M   82M  13% /boot
tmpfs                1014M     0 1014M   0% /dev/shm
/dev/sdb1             493G  156G  312G  34% /remotelog

..and there we go, our server utilities are back to agreeing with one another.

Final Words
This little hiccup is kinda interesting in it's own way, but to me what it highlights more is that (1) faulty understanding can lead to faulty conclusions and (2) it is important to know not only how the tools work but why they work. Go play, have fun and learn.