Filesystem changes in linux
I would like to have a program that monitors any changes to the filesystem, accross all users. How can it be done?
I want to build a metadata server, that does fulltext search, stores links between files, ideally accross users. How can this be done? Especially: how can the server be notified of all the important changes in the filesystem in realtime?
Results
With the different technical approaches one of the most important bits showed to be the upstart time. E.g. inotify is a really nice system to get informed. But everytime I would reboot the machine (or my laptop) minutes of idling are required to start up the system.
In the end it seems that right now different solutions are suitable for different scenarios:
- fanotify: sounds like the best approach, but we have to wait until its there.
- intofiy if the machine is very rarely restarted, and startup time does not matter. And memory consumption neither
- samba audit vfs: when startup time is relevant, and at the same time it can be ensured that all file access goes through samba only
- python-fuse: When startup time is crucial, and only one user accesses the directories (maybe this can be fixed?)
inotify
This seems to be the standard in modern kernels. One needs to add watches for all files, and then gets notified. Problems seem to be the number of watches - its at least one watch per directory. If the system restarts, the watches need to be set again, and a stat is done on each of the dirs. Takes a rather long time.
dnotify
The antecessor of inotify. Seemed to have the issue of blocking filesystems.
fam
http://oss.sgi.com/projects/fam/
http://oss.sgi.com/projects/fam/news.html
File alteration monitor - doesn't seem to be in use any more
Fanotify
http://lwn.net/Articles/339253/
)
"fanotify, built on top of fsnotify, is supposed to replace intofiy which replaced dnotify".
"fanotify has two basic 'modes' directed and global. fanotify directed works much like inotify in that userspace marks inodes it is interested in and gets events from those inodes. fanotify global instead indicates that it wants everything on the system and then individually marks inodes that it doesn't care about."
This is very much exactly what I would want to use, only its not there yet.
tripwire
Used for security audits to see if files have changed. This is more part of intrusion detection than a file system change monitor. Needs to be run regulary to scan the filesystem.
samba vfs audit
http://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/VFS.html#id2650781
A module for the samba server. Can be configured (it seems) to monitor use and change of files. Requires access to files through samba, of course.
Systemtap
http://sourceware.org/systemtap/
Could be a nice approach, but I haven't managed to write a script that handles the case where files are changed without full path, e.g. 'touch foobar' instead of '/home/joerg/tmp/foobar'. So far I only got notified of a 'foobar' being accessed, but which one?
Python-fuse
https://sourceforge.net/apps/mediawiki/fuse/index.php?title=Main_Page
The idea is to put a small layer on top of the real filesystem. Something along the line of: http://esteve.tizos.net/archives/searchable-filesystem-with-fuse-python/. I modified his script a bit, so that it does not do any indexing, but logs to a file: my proof of concept
llfuse (python)
(update April 8th, 2011)
http://code.google.com/p/python-llfuse/
This seems to be a better fuse binding which actually supports proper release calls. Which means we could act upon having written the file.
There is a ubuntu .deb at:
http://ppa.launchpad.net/nikratio/s3ql/ubuntu/pool/main/p/python-llfuse/
Research links
http://www.little-idiot.de/linuxsolutionguide/notify.htm
An older german page, points to changedfiles
http://www.bangstate.com/changedfiles/
Exactly what I would need, but needs a 2.4 kernel
http://www.linux.com/archive/feature/150200
http://projects.l3ib.org/trac/fsniper
Fsniper allows watching directories / files
http://www.pubbs.net/kernel/200905/109416/
Links to fsnotify/fanotify. From what I see that would be exactly whats needed, but it does not seem to be there (yet).
http://esteve.tizos.net/archives/searchable-filesystem-with-fuse-python/
python-fuse driven filesystem with hook to indexing. Maybe a good starting point?
