ansible-nas/docs/zfs/zfs_configuration.md

372 lines
14 KiB
Markdown
Raw Normal View History

2019-04-28 15:07:55 +00:00
# ZFS Configuration
2019-04-12 20:14:05 +00:00
This text deals with specific ZFS configuration questions for Ansible-NAS. If
you are new to ZFS and are looking for the big picture, please read the [ZFS
overview](zfs_overview.md) introduction first.
## Just so there is no misunderstanding
Unlike other NAS variants, Ansible-NAS does not install, configure or manage the
2019-04-13 13:56:36 +00:00
disks or file systems for you. It doesn't care which file system you use - ZFS,
Btrfs, XFS or EXT4, take your pick. Nor does it provides a mechanism for
2019-04-15 16:35:06 +00:00
snapshots or disk monitoring. As Tony Stark said to Loki in _Avengers_: It's all
on you.
2019-04-13 13:56:36 +00:00
However, Ansible-NAS has traditionally been used with the powerful ZFS
filesystem. Since out of the box support for [ZFS on
Linux](https://zfsonlinux.org/) with Ubuntu is comparatively new, this text
shows how to set up a simple storage configuration. To paraphrase Nick Fury from
_Winter Soldier_: We do share. We're nice like that.
2019-04-12 20:14:05 +00:00
> Using ZFS for Docker containers is currently not covered by this document. See
2019-04-13 13:56:36 +00:00
> [the official Docker ZFS
> documentation](https://docs.docker.com/storage/storagedriver/zfs-driver/)
> instead.
2019-04-12 20:14:05 +00:00
## The obligatory warning
We take no responsibility for any bad thing that might happen if you follow this
2019-04-13 13:56:36 +00:00
guide. We strongly suggest you test these procedures in a virtual machine first.
2019-04-12 20:14:05 +00:00
Always, always, always backup your data.
## The basic setup
2019-04-13 13:56:36 +00:00
For this example, we're assuming two identical spinning rust hard drives for
2019-04-12 20:14:05 +00:00
Ansible-NAS storage. These two drives will be **mirrored** to provide
redundancy. The actual Ubuntu system will be on a different drive and is not our
2019-04-13 13:56:36 +00:00
concern.
2019-04-12 20:14:05 +00:00
> [Root on ZFS](https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS)
2019-04-13 14:41:29 +00:00
> is still a hassle for Ubuntu. If that changes, this document might be updated
> accordingly. Until then, don't ask us about it.
2019-04-12 20:14:05 +00:00
The Ubuntu kernel is already ready for ZFS. We only need the utility package
which we install with `sudo apt install zfsutils`.
2019-04-13 13:56:36 +00:00
### Creating a pool
2019-04-12 20:14:05 +00:00
2019-04-13 13:56:36 +00:00
We assume you don't mind totally destroying whatever data might be on your two
2019-04-12 20:14:05 +00:00
storage drives, have used a tool such as `gparted` to remove any existing
2019-04-13 13:56:36 +00:00
partitions, and have installed a new GPT partition table on each drive. To
create our ZFS pool, we will use a command in this form:
2019-04-12 20:14:05 +00:00
```
sudo zpool create -o ashift=<ASHIFT> <NAME> mirror <DRIVE1> <DRIVE2>
```
The options from simple to complex are:
2019-04-13 14:41:29 +00:00
**NAME**: ZFS pools traditionally take their names from characters in the [The
2019-04-13 14:27:03 +00:00
Matrix](https://www.imdb.com/title/tt0133093/fullcredits). The two most common
2019-04-13 14:41:29 +00:00
are `tank` and `dozer`. Whatever you use, it should be short - think `ash`, not
`xenomorph`.
2019-04-13 14:27:03 +00:00
2019-04-13 14:41:29 +00:00
**DRIVES**: The Linux command `lsblk` will give you a quick overview of the
2019-04-13 14:27:03 +00:00
hard drives in the system. However, we don't pass the drive specification in the
format `/dev/sde` because this is not persistent. Instead,
2019-04-13 14:41:29 +00:00
[always use](https://github.com/zfsonlinux/zfs/wiki/FAQ#selecting-dev-names-when-creating-a-pool)
2019-04-13 14:27:03 +00:00
the output of `ls /dev/disk/by-id/` to find the drives' IDs.
2019-04-13 14:41:29 +00:00
**ASHIFT**: This is required to pass the [sector
2019-04-13 14:27:03 +00:00
size](https://github.com/zfsonlinux/zfs/wiki/FAQ#advanced-format-disks) of the
drive to ZFS for optimal performance. You might have to do this by hand because
2019-04-13 14:41:29 +00:00
some drives lie: Whereas modern drives have 4k sector sizes (or 8k for many
SSDs), they will report 512 bytes because Windows XP [can't handle 4k
2019-04-13 14:27:03 +00:00
sectors](https://support.microsoft.com/en-us/help/2510009/microsoft-support-policy-for-4k-sector-hard-drives-in-windows).
ZFS tries to [catch the
liars](https://github.com/zfsonlinux/zfs/blob/master/cmd/zpool/zpool_vdev.c) and
use the correct value. However, this sometimes fails, and you have to add it by
hand.
The `ashift` value is a power of two, so we have **9** for 512 bytes, **12** for
4k, and **13** for 8k. You can create a pool without this parameter and then use
`zdb -C | grep ashift` to see what ZFS generated automatically. If it isn't what
you think, destroy the pool again and add it manually.
2019-04-12 20:14:05 +00:00
2019-04-13 13:56:36 +00:00
In our pretend case, we use two 3 TB WD Red drives. Listing all drives by ID
gives us something like this, but with real serial numbers:
2019-04-12 20:14:05 +00:00
```
ata-WDC_WD30EFRX-68EUZN0_WD-WCCFAKESN01
ata-WDC_WD30EFRX-68EUZN0_WD-WCCFAKESN02
```
2019-04-13 13:56:36 +00:00
WD Reds have a 4k sector size. The actual command to create the pool would then be:
2019-04-12 20:14:05 +00:00
```
sudo zpool create -o ashift=12 tank mirror ata-WDC_WD30EFRX-68EUZN0_WD-WCCFAKESN01 ata-WDC_WD30EFRX-68EUZN0_WD-WCCFAKESN02
```
Our new pool is named `tank` and is mirrored. To see information about it, use
`zpool status tank` (no `sudo` necessary). If you screwed up (usually with
2019-04-13 13:56:36 +00:00
`ashift`), use `sudo zpool destroy tank` and start over _now_ before it's too
2019-04-12 20:14:05 +00:00
late.
### Pool and filesystem properties
2019-04-12 20:14:05 +00:00
Pools have properties that apply either to the pool itself or to filesystems
created in the pool. You can use the command `zpool get all tank` to see the
pool properties and `zfs get all tank` to see the filesystem properties. Most
default values are perfecly sensible, some you'll [want to
change](https://jrs-s.net/2018/08/17/zfs-tuning-cheat-sheet/). Setting
defaults makes life easier when we create our filesystems.
2019-04-12 20:14:05 +00:00
```
sudo zpool set autoexpand=on tank
2019-04-12 20:14:05 +00:00
sudo zfs set atime=off tank
sudo zfs set compression=lz4 tank
```
`autoexpand=on` lets the pool grow when you add larger hard drives. `atime=off`
means that your system won't update a time stamp every time a file is accessed,
something which would use a lot of resources. Usually, you don't care.
2019-04-13 13:56:36 +00:00
Compression is a no-brainer on modern CPUs and should be on by default (we will
discuss exceptions for compressed media files later).
2019-04-12 20:14:05 +00:00
2019-04-13 13:56:36 +00:00
## Creating filesystems
2019-04-12 20:14:05 +00:00
To actually store the data, we need filesystems (also known as "datasets"). For
2019-04-13 13:56:36 +00:00
our very simple default Ansible-NAS setup, we will create two: One filesystem
for movies (`movies_root` in `all.yml`) and one for downloads
2019-04-12 20:14:05 +00:00
(`downloads_root`).
### Movies (and other large, pre-compressed files)
2019-04-13 13:56:36 +00:00
We first create the basic filesystem:
2019-04-12 20:14:05 +00:00
```
sudo zfs create tank/movies
```
2019-04-13 13:56:36 +00:00
Movie files are usually rather large, already in a compressed format and for
security reasons, the files stored there shouldn't be executable. We change the
2019-04-12 20:14:05 +00:00
properties of the filesystem accordingly:
```
sudo zfs set recordsize=1M tank/movies
sudo zfs set compression=off tank/movies
sudo zfs set exec=off tank/movies
```
The **recordsize** here is set to the currently largest possible value [to
increase performance](https://jrs-s.net/2019/04/03/on-zfs-recordsize/) and save
2019-04-13 13:56:36 +00:00
storage. Recall that we used `ashift` during the creation of the pool to match
2019-04-12 20:14:05 +00:00
the ZFS block size with the drives' sector size. Records are created out of
these blocks. Having larger records reduces the amount of metadata that is
2019-04-13 13:56:36 +00:00
required, because various parts of ZFS such as caching and checksums work on
this level.
2019-04-12 20:14:05 +00:00
**Compression** is unnecessary for movie files because they are usually in a
compressed format anyway. ZFS is good about recognizing this, and so if you
happen to leave compression on as the default for the pool, it won't make much
of a difference.
[By default](https://zfsonlinux.org/manpages/0.7.13/man8/zfs.8.html#lbAI), ZFS
2019-04-13 13:56:36 +00:00
stores pools directly under the root directory. Also, the filesystems don't have
to be listed in `/etc/fstab` to be mounted. This means that our filesystem will
appear as `/tank/movies` if you don't change anything. We need to change the
line in `all.yml` accordingly:
2019-04-12 20:14:05 +00:00
```
movies_root: "/tank/movies"
```
You can also set a traditional mount point if you wish with the `mountpoint`
property. Setting this to `none` prevents the file system from being
automatically mounted at all.
The filesystems for TV shows, music files and podcasts - all large,
2019-04-13 13:56:36 +00:00
pre-compressed files - should probably take the exact same parameters.
2019-04-12 20:14:05 +00:00
### Downloads
For downloads, we can leave most of the default parameters the way they are.
2019-04-12 20:14:05 +00:00
```
sudo zfs create tank/downloads
sudo zfs set exec=off tank/downloads
```
2019-04-13 13:56:36 +00:00
The recordsize stays the 128 KB default. In `all.yml`, the new line is
2019-04-12 20:14:05 +00:00
```
downloads_root: "/tank/downloads"
```
### Other data
2019-04-13 13:56:36 +00:00
Depending on the use case, you might want to create and tune more filesystems.
For example, [Bit
Torrent](http://open-zfs.org/wiki/Performance_tuning#Bit_Torrent),
2019-04-12 20:14:05 +00:00
[MySQL](http://open-zfs.org/wiki/Performance_tuning#MySQL) and [Virtual
Machines](http://open-zfs.org/wiki/Performance_tuning#Virtual_machines) all have
known best configurations.
## Setting up scrubs
2019-04-13 14:27:03 +00:00
On Ubuntu, scrubs are configured out of the box to run on the second Sunday of
2019-04-12 20:14:05 +00:00
every month. See `/etc/cron.d/zfsutils-linux` to change this.
## Email notifications
To have the [ZFS
demon](http://manpages.ubuntu.com/manpages/bionic/man8/zed.8.html) `zed` send
you emails when there is trouble, you first have to [install an email
agent](https://www.reddit.com/r/zfs/comments/90prt4/zed_config_on_ubuntu_1804/)
such as postfix. In the file `/etc/zfs/zed.d/zed.rc`, change the three entries:
```
ZED_EMAIL_ADDR=<YOUR_EMAIL_ADDRESS_HERE>
ZED_NOTIFY_INTERVAL_SECS=3600
ZED_NOTIFY_VERBOSE=1
```
If `zed` is not enabled, you might have to run `systemctl enable zed`. You can
test the setup by manually starting a scrub with `sudo zpool scrub tank`.
2019-05-17 17:24:29 +00:00
## Snapshots
2019-04-12 20:14:05 +00:00
Snapshots create a "frozen" version of a filesystem, providing a safe copy of
2019-05-17 16:59:04 +00:00
the contents. Correctly configured, they provide good protection against
2019-05-17 17:24:29 +00:00
accidental deletion and certain types of attacks such as ransomware. On
2019-05-17 16:59:04 +00:00
copy-on-write (COW) filesystems such as ZFS, they are cheap and fast to create.
It is very rare that you _won't_ want snapshots.
> Snapshots do not replace the need for backups. Nothing replaces the need for
> backups except more backups.
### Managing snapshots by hand
If you have data in a filesystem that never or very rarely changes, it might be
2019-05-17 17:24:29 +00:00
easiest to just take a snapshot by hand after every major change. Use the `zfs
snapshot` command with the name of the filesystem combined with an identifier
separated by the `@` sign. Traditionally, this somehow includes the date of the
snapshot, usually in some variant of the [ISO
8601](https://en.wikipedia.org/wiki/ISO_8601) format.
```
2019-05-17 17:24:29 +00:00
zfs snapshot tank/movies@2019-04-24
```
To see the list of snapshots in the system, run
```
zfs list -t snapshot
```
2019-05-17 17:24:29 +00:00
To revert ("roll back") to the previous snapshot, use the `zfs rollback`
command.
```
2019-05-17 17:24:29 +00:00
zfs rollback tank/movies@2019-04-24
```
2019-05-17 17:24:29 +00:00
By default, you can only roll back to the most recent snapshot. Anything before
2019-05-17 16:59:04 +00:00
then requires trickery outside the scope of this document. Finally, to get rid
2019-05-17 17:24:29 +00:00
of a snapshot, use the `zfs destroy` command.
```
2019-05-17 17:24:29 +00:00
zfs destroy tank/movies@2019-04-24
```
2019-05-17 17:24:29 +00:00
> Be **very** careful with `destroy`. If you leave out the snapshot identifier
> and only list the filesystem - in our example, `tank/movies` - the filesystem
> itself will immediately be destroyed. There will be no confirmation prompt,
> because ZFS doesn't believe in that sort of thing.
### Managing snapshots with Sanoid
Usually, you'll want the process of creating new and deleting old snapshots to
2019-05-17 16:59:04 +00:00
be automatic, especially on filesystems that change frequently. One tool for
this is [sanoid](https://github.com/jimsalterjrs/sanoid/). There are various
instructions for setting it up, the following is based on notes from
[SvennD](https://www.svennd.be/zfs-snapshots-of-proxmox-using-sanoid/). For this
example, we'll assume we have a single dataset `tank/movies` that holds, ah,
movies.
2019-05-17 17:24:29 +00:00
First, we install sanoid to the `/opt` directory. This assumes that Perl itself
2019-05-17 16:59:04 +00:00
is already installed.
2019-05-17 16:59:04 +00:00
```
sudo apt install libconfig-inifiles-perl libcapture-tiny-perl
2019-05-17 16:59:04 +00:00
cd /opt
sudo git clone https://github.com/jimsalterjrs/sanoid
```
It is probably easiest to link sanoid to `/usr/sbin`:
```
sudo ln /opt/sanoid/sanoid /usr/sbin/
```
Then we need to setup the configuration files.
```
sudo mkdir /etc/sanoid
sudo cp /opt/sanoid/sanoid.conf /etc/sanoid/sanoid.conf
sudo cp /opt/sanoid/sanoid.defaults.conf /etc/sanoid/sanoid.defaults.conf
```
2019-05-17 17:24:29 +00:00
We don't change the defaults file, but it has to be copied to the folder anyway.
Next, we edit the `/etc/sanoid/sanoid.conf` configuration file in two steps: We
design the "templates" and then tell sanoid which filesystems to use it on.
2019-05-17 16:59:04 +00:00
The configuration file included with sanoid contains a "production" template for
filesystems that change frequently. For media files, we assume that there is not
going to be that much change from day-to-day, and especially there will be very
2019-05-17 17:24:29 +00:00
few deletions. We use snapshots because this provides protection against
cryptolocker attacks and against accidental deletions.
> Again, snapshots, even lots of snapshots, do not replace backups.
For our example, we configure for two hourly snapshots (against "oh crap"
deletions), 31 daily, one monthly and one yearly snapshot.
2019-05-17 16:59:04 +00:00
```
[template_media]
frequently = 0
hourly = 2
daily = 31
monthly = 1
yearly = 1
autosnap = yes
autoprune = yes
```
2019-05-17 17:24:29 +00:00
That might seem like a bunch of daily snapshots, but remember, if nothing has
2019-05-17 16:59:04 +00:00
changed, a ZFS snapshot is basically free.
Once we have an entry for the template, we assign it to the filesystem.
```
[tank/movies]
use_template = media
```
Finally, we edit `/etc/crontab` to run sanoid every five minutes:
```
*/5 * * * * root /usr/sbin/sanoid --cron
```
After five minutes, you should see the first snapshots (use `zfs list -t
2019-05-17 17:24:29 +00:00
snapshot` again). The list will look something like this mock example:
2019-05-17 16:59:04 +00:00
```
NAME USED AVAIL REFER MOUNTPOINT
tank/movies@autosnap_2019-05-17_13:55:01_yearly 0B - 1,53G -
tank/movies@autosnap_2019-05-17_13:55:01_monthly 0B - 1,53G -
tank/movies@autosnap_2019-05-17_13:55:01_daily 0B - 1,53G -
```
2019-05-17 17:24:29 +00:00
Note that the snapshots use no storage, because we haven't changed anything.
2019-05-17 16:59:04 +00:00
This is a very simple use of sanoid. Other functions include running scripts
before and after snapshots, and setups to help with backups. See the included
configuration files for examples.