Be careful with your server! A bug in OpenZFS 2.2.0 causes data corruption

0
70

The ZFS file system is one of the most advanced that exists, incorporating multiple security measures to maintain data integrity. This file system is widely used in servers, since it will provide us with good performance and the security that our data will be safe. Operating systems such as FreeBSD and derivatives make use of this file system, but also Linux, and even NAS-oriented operating systems such as TrueNAS SCALE, TrueNAS CORE and even commercial ones such as QNAP’s QuTS hero, make use of this file system because it is more I have proven that he is one of the best. However, now a bug has been discovered after the release of OpenZFS 2.2.0, this bug can cause data destruction, so it is very serious if we take into account that ZFS is widely used at a professional level.

Ed Maste from the FreeBSD Foundation has published a statement for all users of that operating system, we leave it below: «A possible data corruption problem has been discovered that affects several versions of OpenZFS. Initially it was reported that the affected version was 2.2.0, but it has been verified that it also affects earlier and later versions. Although this error can be reproduced by performing a few very specific steps, no problems have been observed in real-life scenarios. The issue is not related to lock cloning, although enabling this feature may increase the likelihood of encountering the data corruption issue.

What versions are affected?

From what is known so far, the OpenZFS 2.2.0 version is affected, as well as some earlier versions and also some later versions. It was initially thought that the new feature called block cloning caused this data loss. However, it now appears that this block cloning feature has brought to light a previously unknown underlying flaw. Ed Maste has also said that it is not at all clear if the problem is reproducible in the version of ZFS in FreeBSD 12.4, and that a variation of the bugged code may exist in FreeBSD 12.4 and Illumos, but the problem is masked for other reasons.

OpenZFS file system logo

Any operating system that uses the latest versions of OpenZFS is potentially vulnerable to this bug which can cause data loss. Not only FreeBSD is vulnerable, but any Linux-based operating system is also vulnerable, which is why they are giving top priority to finding and fixing this serious bug. There is a method to mitigate this corruption problem, but it does not deterministically prevent the problem, although it is true that it drastically reduces the probability of encountering it.

There was initially a lot of confusion about whether the bug is due to the new block cloning functionality, and the update to OpenZFS 2.2.1 disables this cloning feature, which may reduce the chances of us encountering the bug. However, this feature is not to blame, but has brought to light a bug that was already present in the file system source code.

Process to generate data corruption

The process to generate data corruption is really specific, in a real scenario it is difficult for it to occur, although the probability is there, which is why the OpenZFS development team is working to solve this problem at its roots. In order to “exploit” this bug, we must do:

  • While we are writing a file (normally it would be asynchronous, meaning the write is not completed at the time the writing process thinks it is), we need that at the same time (when ZFS is still writing the data) , the modified part of the file is read. “At the same time” means that it must be done at a very specific time measured in microseconds.
  • If the data is read at that specific time, when we read it we will see all zeros instead of the data that is actually written.
  • If whoever reads the data stores these incorrectly read zeros in another location (on a local computer, somewhere else on the server), that is where the data is being corrupted.

Although the ZFS file system has a very powerful tool to detect data integrity problems, and has a storage group “cleanup” tool, in this case it is useless because it is not capable of detecting this problem. . It is very possible that in the next version OpenZFS 2.2.2 they have already fixed this problem, in the meantime, you can prevent this data corruption silently by running the following command at the beginning of boot:

echo 0 >> /sys/module/zfs/parameters/zfs_dmu_offset_next_sync

Upon restart it will be modified and the value will be set to 1 again, so you will have to configure it in the system so that it runs at startup. We recommend you visit the TrueNAS forum thread where this issue affecting your NAS oriented operating system is being addressed.

disclaimer

Previous articleDon’t miss any movie or series: these are the notable Netflix releases for December
Next articleNewborns are able to perceive the rhythm of music