Georg's Log

Wed 08 February 2017

Btrfs requires noatime

Posted by Georg Sauthoff in linux   

Traditionally, UNIX filesystems also maintain the access time (atime) of a file. This is very much an anti-feature because it yields a write operation for each read operation. Which is obviously bad for performance. Since Btrfs is a copy-on-write (COW) filesystem maintaining atimes is even more painful. Thus, a sensible recommendation is to make sure that every filesystem is mounted with the noatime option. Especially, if it is a Btrfs filesystem.

Btrfs Case Study

A typical example of how writing the atime can significantly affect performance.

The symptoms: an rsync job from a Btrfs filesystem (located on an SSD) to an external USB 3 disk drive with XFS runs with ~ 17 MiB/s while iotop and dstat show significantly higher IO values for the source device, e.g. ~ 60 MiB/s.

Reasons: This Btrfs filesystem is mounted with default options, that means that instead of noatime the relatime is active, resulting in atime updates for effectively each file.

In addition to that, the filesystem contains some snapshots, i.e. for all subvolumes there is also one or more snapshots. Creating a filesystem snapshot is very cheap on Btrfs because of its COW design. An atime update of a snapshotted file only updates the atime of one copy. Thus, in the likely case that the filesystem sector with the atime is still shared it has to be copied on that atime write. Meaning 2 or more write operations as a result of one read.

Resolution: After canceling the rsync command, remounting the Btrfs filesystem with noatime and restarting the rsync job it sure enough performs much better, i.e. the numbers reported by rsync match the dstat ones, as expected.

Note that the number of files included in a snapshot is most relevant for this issue, not necessarily the number of snapshots.

If you are really unlucky, running a backup job (or even just a grep) on a Btrfs filesystem mounted without noatime might even yield out-of-space errors in case all the copy-on-write atime updates exceed the available free space.

Relatime

Since 2009 or so Linux kernels by default mount filesystems with the relatime option. With this options, the atime is updated somewhat less frequently. That means under the presence of reads, it is updated at least once a day:

relatime
   Update  inode access times relative to modify or change
   time.  Access time is only updated if the previous
   access time was earlier than the current modify or
   change time.  (Similar to noatime, but it doesn't break
   mutt or other applications that need to know  if  a
   file has been read since the last time it was modified.)

   Since  Linux  2.6.30,  the  kernel defaults to the
   behavior provided by this option (unless noatime was
   specified), and the strictatime option is required to
   obtain traditional semantics.  In addition, since Linux
   2.6.30, the file's last access time is always updated
   if it is more than 1 day old.

(mount(8))

Thus, relatime doesn't help much with the principal cause for the described performance issue. You still likely get massive atime update induced writes during bulk read-only activity like backup or search jobs. You only have to experience a timespan greater than 24h were most files aren't read. A pretty standard scenario.

Unfortunately, switching the system default from relatime to noatime isn't possible (as of 2017-02). Thus, one has to remember to always specify noatime. Think mount -o noatime ... and add it to each /etc/fstab entry.

Testing

Testing the effect of atime uptime in the presence of snapshots is easier when the Btrfs filesystem is mounted with strictatime. This overrides the relatime default option. The time attributes of a file or directory can be displayed with ls but the GNU utility stat is more convenient for printing all timestamps.

P.S.

There are use-cases for an actively maintained file atime. Examples are measuring the usage of installed binaries (cf. Debian's popcontest) or detecting unread mails. But there are better alternatives to implement such tasks and thus those aren't very convincing arguments for enabling atime writing, by default (be it relative or even strict). In conclusion, writing the access time means much pain with little gain.

See also