First draft of ZFS docs

2024-12-26 03:23:11 +00:00 · 2019-04-12 22:14:05 +02:00 · 2019-04-12 22:14:05 +02:00 · 4820965e1d
commit 4820965e1d
parent 2165b93f72
3 changed files with 441 additions and 5 deletions
--- a/docs/installation.md
+++ b/docs/installation.md
@ -8,13 +8,17 @@ You can run Ansible-NAS from the computer you plan to use for your NAS, or from
 1. Copy `group_vars/all.yml.dist` to  `group_vars/all.yml`.
-1. Open up `group_vars/all.yml` and follow the instructions there for configuring your Ansible NAS.
+1. Open up `group_vars/all.yml` and follow the instructions there for
   configuring your Ansible NAS.
-1. If you plan to use Transmission with OpenVPN, also copy `group_vars/vpn_credentials.yml.dist` to
+1. If you plan to use Transmission with OpenVPN, also copy
-`group_vars/vpn_credentials.yml` and fill in your settings.
+   `group_vars/vpn_credentials.yml.dist` to `group_vars/vpn_credentials.yml` and
   fill in your settings.
 1. Copy `inventory.dist` to `inventory` and update it.
-1. Install the dependent roles: `ansible-galaxy install -r requirements.yml` (you might need sudo to install Ansible roles)
+1. Install the dependent roles: `ansible-galaxy install -r requirements.yml`
   (you might need sudo to install Ansible roles)
-1. Run the playbook - something like `ansible-playbook -i inventory nas.yml -b -K` should do you nicely.
+1. Run the playbook - something like `ansible-playbook -i inventory nas.yml -b
   -K` should do you nicely.
--- a/docs/zfs_configuration.md
+++ b/docs/zfs_configuration.md
@ -0,0 +1,232 @@
 This text deals with specific ZFS configuration questions for Ansible-NAS. If
 you are new to ZFS and are looking for the big picture, please read the [ZFS
 overview](zfs_overview.md) introduction first.
 ## Just so there is no misunderstanding
 Unlike other NAS variants, Ansible-NAS does not install, configure or manage the
 disks or file systems for you. It doesn't care which file system you use -- ZFS,
 Btrfs, XFS or EXT4, take your pick. It also provides no mechanism for external
 backups, snapshots or disk monitoring. As Tony Stark said to Loki in _Avengers_:
 It's all on you.
 However, Ansible-NAS has traditionally been used with with the powerful ZFS
 filesystem ([OpenZFS](http://www.open-zfs.org/wiki/Main_Page), to be exact).
 Since [ZFS on Linux](https://zfsonlinux.org/) is comparatively new, this text
 provides a very basic example of setting up a simple storage configuration with
 scrubs and snapshots. To paraphrase Nick Fury from _Winter Soldier_: We do
 share. We're nice like that.
 > Using ZFS for Docker containers is currently not covered by this document. See
 > [the Docker ZFS
 > documentation](https://docs.docker.com/storage/storagedriver/zfs-driver/) for
 > details.
 ## The obligatory warning
 We take no responsibility for any bad thing that might happen if you follow this
 guide. We strongly suggest you test these procedures in a virtual machine.
 Always, always, always backup your data.
 ## The basic setup
 For this example, we're assuming two identical spinning rust hard drives for all
 Ansible-NAS storage. These two drives will be **mirrored** to provide
 redundancy. The actual Ubuntu system will be on a different drive and is not our
 concern here.
 > [Root on ZFS](https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS)
 > is currently still a hassle for Ubuntu. If that changes, this document might
 > be updated accordingly. Until then, don't ask us about it.
 The Ubuntu kernel is already ready for ZFS. We only need the utility package
 which we install with `sudo apt install zfsutils`.
 ### Creating the pool
 We assume you don't mind totally destroying whatever data might be on your
 storage drives, have used a tool such as `gparted` to remove any existing
 partitions, and have installed a GPT partition table. To create our ZFS pool, we
 will use a command of the form
 ```
        sudo zpool create -o ashift=<ASHIFT> <NAME> mirror <DRIVE1> <DRIVE2>
 ```
 The options from simple to complex are:
 1. **<NAME>**: ZFS pools traditionally take their names from characters in the
   [The Matrix](https://www.imdb.com/title/tt0133093/fullcredits). The two most
   common are `tank` and `dozer`. Whatever you use, it should be short.
 1. **<DRIVES>**: The Linux command `lsblk` will give you a quick overview of the
   hard drives in the system. However, we don't want to pass a drive
   specification in the format `/dev/sde` because this is not persistant.
   Instead, [we
   use](https://github.com/zfsonlinux/zfs/wiki/FAQ#selecting-dev-names-when-creating-a-pool)
   the output of `ls /dev/disk/by-id/` to find the drives' IDs. 
 1. **<ASHIFT>**: This is required to pass the [sector
   size](https://github.com/zfsonlinux/zfs/wiki/FAQ#advanced-format-disks) of
   the drive to ZFS for optimal performance. You might have to do this by hand
   because some drives lie: Whereas modern drives have 4k sector sizes (or 8k in
   case of many SSDs), they will report 512 bytes for backward compatibility.
   ZFS tries to [catch the
   liars](https://github.com/zfsonlinux/zfs/blob/master/cmd/zpool/zpool_vdev.c)
   and use the correct value. However, that sometimes fails, and you have to add
   it by hand. The `ashift` value is a power of two, so we have **9** for 512
   bytes, **12** for 4k, and **13** for 8k. You can create a pool without this
   parameter and then use `zdb -C | grep ashift` to see what ZFS generated
   automatically. If it isn't what you think, you can destroy the pool (see
   below) and add it manually when creating it again.
 In our pretend case, we use 3 TB WD Red drives. Listing all drives by ID gives
 us something like this, but with real serial numbers:
 ```
        ata-WDC_WD30EFRX-68EUZN0_WD-WCCFAKESN01
        ata-WDC_WD30EFRX-68EUZN0_WD-WCCFAKESN02
 ```
 The actual command to create the pool would be: 
 ```
        sudo zpool create -o ashift=12 tank mirror ata-WDC_WD30EFRX-68EUZN0_WD-WCCFAKESN01 ata-WDC_WD30EFRX-68EUZN0_WD-WCCFAKESN02
 ```
 Our new pool is named `tank` and is mirrored. To see information about it, use
 `zpool status tank` (no `sudo` necessary). If you screwed up (usually with
 `ashift`), use `sudo zpool destroy tank` and start over _now_, before it's too
 late.
 ### Pool default parameters
 Setting pool-wide default parameters makes life easier when we create our
 datasets. To see them all, you can use the command `zfs get all tank`. Most are
 perfectly sensible. Some you'll [want to
 change](https://jrs-s.net/2018/08/17/zfs-tuning-cheat-sheet/) are:
 ```
        sudo zfs set atime=off tank
        sudo zfs set compression=lz4 tank
        sudo zfs set autoexpand=on tank
 ```
 The `atime` parameter means that your system updates an attribute of a file
 every time the file is accessed, which uses a lot of resources. Usually, you
 don't care. Compression is a no-brainer on modern CPUs and should be on by
 default (we will discuss exceptions for compressed media files later).
 `autoexpand` lets the pool grow when you add larger hard drives.
 ## Creating the filesystems
 To actually store the data, we need filesystems (also known as "datasets"). For
 our very simple default Ansible-NAS setup, we will create two examples: One
 filesystem for movies (`movies_root` in `all.yml`) and one for downloads
 (`downloads_root`). 
 ### Movies (and other large, pre-compressed files)
 We first create the basic file system for movies:
 ```
        sudo zfs create tank/movies
 ```
 Movie files are usually rather large, already in a compressed format, and the
 files stored there shouldn't be executed for security reasons. We change the
 properties of the filesystem accordingly:
 ```
        sudo zfs set recordsize=1M tank/movies
        sudo zfs set compression=off tank/movies
        sudo zfs set exec=off tank/movies
 ```
 The **recordsize** here is set to the currently largest possible value [to
 increase performance](https://jrs-s.net/2019/04/03/on-zfs-recordsize/) and save
 storage.  Recall that we used `ashift` during the creation of the pool to match
 the ZFS block size with the drives' sector size. Records are created out of
 these blocks. Having larger records reduces the amount of metadata that is
 required, and various aspects of ZFS such as caching and checksums work on this
 level.
 **Compression** is unnecessary for movie files because they are usually in a
 compressed format anyway. ZFS is good about recognizing this, and so if you
 happen to leave compression on as the default for the pool, it won't make much
 of a difference. 
 [By default](https://zfsonlinux.org/manpages/0.7.13/man8/zfs.8.html#lbAI), ZFS
 stores pools directly under the root directory and do not have to be listed in
 `/etc/fstab` to be mounted. This means that our filesystem will appear as
 `/tank/movies`. We need to change the line in `all.yml` accordingly: 
 ```
        movies_root: "/tank/movies"
 ```
 You can also set a traditional mount point if you wish with the `mountpoint`
 property. Setting this to `none` prevents the file system from being
 automatically mounted at all. 
 The filesystems for TV shows, music files and podcasts - all large,
 pre-compressed files - should take the exact same parameters as the one for
 movies. 
 ### Downloads 
 For downloads, we can leave most of the default parameters the way they are. 
 ```
        sudo zfs create tank/downloads
        sudo zfs set exec=off tank/downloads
 ```
 The recordsize stays at the 128k default. In `all.yml`, the new line is
 ```
        downloads_root: "/tank/downloads"
 ```
 ### Other data
 Depending on the use case, you might want to tune your filesystems. For example,
 [Bit Torrent](http://open-zfs.org/wiki/Performance_tuning#Bit_Torrent),
 [MySQL](http://open-zfs.org/wiki/Performance_tuning#MySQL) and [Virtual
 Machines](http://open-zfs.org/wiki/Performance_tuning#Virtual_machines) all have
 known best configurations. 
 ## Setting up scrubs
 On Ubuntu, scrubs are configurated out of the box to run on the second Sunday of
 every month. See `/etc/cron.d/zfsutils-linux` to change this.
 ## Email notifications
 To have the [ZFS
 demon](http://manpages.ubuntu.com/manpages/bionic/man8/zed.8.html) `zed` send
 you emails when there is trouble, you first have to [install an email
 agent](https://www.reddit.com/r/zfs/comments/90prt4/zed_config_on_ubuntu_1804/)
 such as postfix. In the file `/etc/zfs/zed.d/zed.rc`, change the three entries:
 ```
 ZED_EMAIL_ADDR=<YOUR_EMAIL_ADDRESS_HERE>
 ZED_NOTIFY_INTERVAL_SECS=3600
 ZED_NOTIFY_VERBOSE=1
 ```
 If `zed` is not enabled, you might have to run `systemctl enable zed`. You can
 test the setup by manually starting a scrub with `sudo zpool scrub tank`. 
 ## Setting up automatic snapshots
 See [sanoid](https://github.com/jimsalterjrs/sanoid/) as a tool for snapshot
 management. 
--- a/docs/zfs_overview.md
+++ b/docs/zfs_overview.md
@ -0,0 +1,200 @@
 This is a general overview of the ZFS file system for people who are new to it.
 If you have some experience and are looking for specific information about how
 to configure ZFS for Ansible-NAS, check out the [ZFS example
 configuration](zfs_configuration.md) instead.  
 ## What is ZFS and why would I want it?
 [ZFS](https://en.wikipedia.org/wiki/ZFS) is an advanced filesystem and volume
 manager originally created by Sun Microsystems from 2001 onwards. First released
 in 2005 for OpenSolaris, Oracle later bought Sun and started developing ZFS as
 closed source software. An open source fork took the name
 [OpenZFS](http://www.open-zfs.org/wiki/Main_Page), but is still called "ZFS" for
 short. It runs on Linux, FreeBSD, illumos and other platforms.  
 ZFS aims to be the ["last word in
 filesystems"](https://blogs.oracle.com/bonwick/zfs:-the-last-word-in-filesystems)
 - a system so future-proof that Michael W. Lucas and Allan Jude famously stated
 that the _Enterprise's_ computer on _Star Trek_ probably runs it. The design
 was based on [four principles](https://www.youtube.com/watch?v=MsY-BafQgj4):
 1. "Pooled" storage to completely eliminate the notion of volumes. You can
   add more storage the same way you just add a RAM stick to memory.
 1. Make sure data is always consistant on the disks. There is no `fsck` command
   for ZFS.
 1. Detect and correct data corruption ("bitrot"). ZFS is one of the few storage
   systems that checksums everything and is "self-healing".
 1. Make it easy to use. Try to "end the suffering" for the admins involved in
   managing storage.
 ZFS includes a host of other features such as snapshots, transparent
 compression, and encryption. During the early years of ZFS, this all came with
 hardware requirements which only enterprise users could afford. By now, however,
 computers have become so powerful that ZFS can run (with some effort) on a
 [Raspberry
 Pi](https://gist.github.com/mohakshah/b203d33a235307c40065bdc43e287547). FreeBSD
 and FreeNAS make extensive use of ZFS. What is holding ZFS back on Linux are
 [licensing conflicts](https://en.wikipedia.org/wiki/OpenZFS#History) beyond the
 scope of this document. 
 Ansible-NAS doesn't actually specify a filesystem - you can use EXT4, XFS, Btrfs
 or pretty much anything you like. However, ZFS not only provides the benefits
 listed above, but also lets you use your hard drives with different operating
 systems. Some people now using Ansible-NAS originally came from FreeNAS, and
 were able to `export` their ZFS pools there and `import` them to Ubuntu. On the
 other hand, if you ever decide to switch back to FreeNAS or maybe try FreeBSD
 instead of Linux, you should be able to do so using the same ZFS pools. 
 ## A small taste of ZFS 
 Storage in ZFS is organized in **pools**. Inside these pools, you create
 **filesystems** (also known as "datasets") which are like partitions on
 steroids. For instance, you can keep each user's `/home/` files in a separate
 filesystem. ZFS systems tend to use lots and lots of specialized filesystems.
 They share the available storage in their pool. 
 Pools do not directly consist of hard disks or SSDs. Instead, drives are
 organized as **virtual devices** (VDEV). This is where the physical redundancy
 in ZFS is located. Drives in a VDEV can be "mirrored" or combined as "RaidZ",
 roughly the equivalent of RAID5. These VDEVs are then combined into a pool by the
 administrator.  
 To give you some idea of how this works, this is how to create a pool:
 ```
        sudo zpool create tank mirror /dev/sda /dev/sdb
 ```
 This combines `/dev/sba` and `/dev/sdb` to a mirrored VDEV, and then defines a
 new pool named `tank` consisting of this single VDEV. We can now create a
 filesystem in this pool to hold our books: 
 ```
        sudo zfs create tank/books
 ```
 You can then enable automatic and transparent compression on this filesystem
 with `sudo zfs set compression=lz4 tank/books`. To take a **snapshot**, use
 ```
        sudo zfs snapshot tank/books@monday
 ```
 Now, if evil people were somehow to encrypt your book files with ransomware on
 Wednesday, you can laugh and revert to the old version: 
 ```
        sudo zfs rollback tank/books@monday
 ```
 Of course, you did lose any work from Tuesday unless you created a snapshot then
 as well. Usually, you'll have some form of **automatic snapshot
 administration**.  
 To detect bitrot and other defects, ZFS periodically runs **scrubs**: The system
 compares the available copies of each data record with their checksums. If there
 is a mismatch, the data is repaired. 
 ## Known issues 
 Constructing the pools out of virtual devices creates some problems. You can't
 just detach a drive (or a VDEV) and have the pool reconfigure itself. To
 reorganize the pool, you'd have to create a new, temporary pool out of separate
 hard drives, move the data over, destroy and reconfigure the original pool, and
 then move the data back. Increasing the size of a pool involves either adding
 more VDEVs (_not_ just additional disks) or replacing each disk in a VDEV by a
 larger version with the `autoexpand` parameter set. 
 > At time of writing (April 2019), ZFS on Linux does not offer native encryption,
 > trim support, or device removal, which are all scheduled to be included in the
 > [0.8 release](https://www.phoronix.com/scan.php?page=news_item&px=ZFS-On-Linux-0.8-RC1-Released)
 > in the near future. 
 ## Myths and misunderstandings 
 There are a bunch of false or simply outdated information about ZFS. To clear up
 the worst of them:
 ### No, ZFS does not need at least 8 GB of RAM
 This myth is especially common [in FreeNAS
 circles](https://www.ixsystems.com/community/threads/does-freenas-really-need-8gb-of-ram.38685/).
 Note that FreeBSD, the basis of FreeNAS, will run with as little [as 1
 GB](https://wiki.freebsd.org/ZFSTuningGuide). The [ZFS on Linux
 FAQ](https://github.com/zfsonlinux/zfs/wiki/FAQ#hardware-requirements), which is
 more relevant here, states under "suggested hardware":
 > 8GB+ of memory for the best performance. It's perfectly possible to run with
 > 2GB or less (and people do), but you'll need more if using deduplication.
 (Deduplication is only useful in [very special
 cases](http://open-zfs.org/wiki/Performance_tuning#Deduplication). If you are
 reading this, you probably don't need it.)
 What everybody agrees on is that ZFS _loves_ RAM, and you should have as much of
 it as you possibly can. So 8 GB is in fact a sensible lower limit you shouldn't
 go below unless for testing. When in doubt, add more RAM, and even more, and
 them some, until your motherboard's capacity is reached.
 ### No, ECC RAM is not required for ZFS
 This again is a case where a recommendation has been taken as a requirement. To
 quote the [ZFS on Linux
 FAQ](https://github.com/zfsonlinux/zfs/wiki/FAQ#do-i-have-to-use-ecc-memory-for-zfs)
 again: 
 > Using ECC memory for OpenZFS is strongly recommended for enterprise
 > environments where the strongest data integrity guarantees are required.
 > Without ECC memory rare random bit flips caused by cosmic rays or by faulty
 > memory can go undetected. If this were to occur OpenZFS (or any other
 > filesystem) will write the damaged data to disk and be unable to automatically
 > detect the corruption.
 It is _always_ better to have ECC RAM on all computers if you can afford it, and
 ZFS is no exception. However, there is absolutely no requirement for ZFS to have
 ECC RAM.
 ### No, the SLOG is not really a write cache
 You'll hear the suggestion that you add a fast SSD or NVMe as a "SLOG"
 (mistakingly also called "ZIL") drive for write caching. This isn't what would
 happen, because ZFS already includes [a write
 cache](https://linuxhint.com/configuring-zfs-cache/). It is located in RAM.
 Since RAM is always faster than any drive, adding a disk as a write cache
 doesn't make sense. 
 What the ZFS Intent Log (ZIL) does, with or without a dedicated drive, is handle
 synchronous writes. These occur when the system refuses to signal a successful
 write until the data is actually on a physical disk somewhere. This keeps it
 safe. By default, the ZIL initially shoves a copy of the data on a normal VDEV
 somewhere and then gives the thumbs up. The actual write to the pool is
 performed later from the normal write cache, _not_ the temporary copy. The data
 there is only ever read if the power fails before the last step. 
 A Separate Intent Log (SLOG) is a fast drive for the ZIL's temporary synchronous
 writes. It allows the ZIL give the thumbs up quicker. This means that SLOG is
 never read unless the power has failed before the final write to the pool.
 Asynchronous writes just go through the normal write cache. If the power fails,
 the data is gone.
 In summary, the ZIL is concerned with preventing data loss for synchronous
 writes, not with speed. You always have a ZIL. A SLOG will make the ZIL faster.
 You'll need to [do some
 research](https://www.ixsystems.com/blog/o-slog-not-slog-best-configure-zfs-intent-log/)
 to figure out if your system would benefit from a SLOG. NFS for instance uses
 synchonous writes, SMB usually doesn't. If in doubt, add more RAM instead. 
 ## Further reading and viewing
 - One of the best books around is _FreeBSD Mastery: ZFS_ by Michael W.
  Lucas and Allan Jude. Though it is written for FreeBSD, the general guidelines
  apply for all variants. There is a second book for advanced users.
 - Jeff Bonwick, one of the original creators of ZFS, tells the story of how ZFS
  came to be [on YouTube](https://www.youtube.com/watch?v=dcV2PaMTAJ4).