mirror of
https://github.com/davestephens/ansible-nas
synced 2024-12-25 11:03:07 +00:00
First draft of ZFS docs
This commit is contained in:
parent
2165b93f72
commit
4820965e1d
3 changed files with 441 additions and 5 deletions
|
@ -8,13 +8,17 @@ You can run Ansible-NAS from the computer you plan to use for your NAS, or from
|
|||
|
||||
1. Copy `group_vars/all.yml.dist` to `group_vars/all.yml`.
|
||||
|
||||
1. Open up `group_vars/all.yml` and follow the instructions there for configuring your Ansible NAS.
|
||||
1. Open up `group_vars/all.yml` and follow the instructions there for
|
||||
configuring your Ansible NAS.
|
||||
|
||||
1. If you plan to use Transmission with OpenVPN, also copy `group_vars/vpn_credentials.yml.dist` to
|
||||
`group_vars/vpn_credentials.yml` and fill in your settings.
|
||||
1. If you plan to use Transmission with OpenVPN, also copy
|
||||
`group_vars/vpn_credentials.yml.dist` to `group_vars/vpn_credentials.yml` and
|
||||
fill in your settings.
|
||||
|
||||
1. Copy `inventory.dist` to `inventory` and update it.
|
||||
|
||||
1. Install the dependent roles: `ansible-galaxy install -r requirements.yml` (you might need sudo to install Ansible roles)
|
||||
1. Install the dependent roles: `ansible-galaxy install -r requirements.yml`
|
||||
(you might need sudo to install Ansible roles)
|
||||
|
||||
1. Run the playbook - something like `ansible-playbook -i inventory nas.yml -b -K` should do you nicely.
|
||||
1. Run the playbook - something like `ansible-playbook -i inventory nas.yml -b
|
||||
-K` should do you nicely.
|
||||
|
|
232
docs/zfs_configuration.md
Normal file
232
docs/zfs_configuration.md
Normal file
|
@ -0,0 +1,232 @@
|
|||
This text deals with specific ZFS configuration questions for Ansible-NAS. If
|
||||
you are new to ZFS and are looking for the big picture, please read the [ZFS
|
||||
overview](zfs_overview.md) introduction first.
|
||||
|
||||
## Just so there is no misunderstanding
|
||||
|
||||
Unlike other NAS variants, Ansible-NAS does not install, configure or manage the
|
||||
disks or file systems for you. It doesn't care which file system you use -- ZFS,
|
||||
Btrfs, XFS or EXT4, take your pick. It also provides no mechanism for external
|
||||
backups, snapshots or disk monitoring. As Tony Stark said to Loki in _Avengers_:
|
||||
It's all on you.
|
||||
|
||||
However, Ansible-NAS has traditionally been used with with the powerful ZFS
|
||||
filesystem ([OpenZFS](http://www.open-zfs.org/wiki/Main_Page), to be exact).
|
||||
Since [ZFS on Linux](https://zfsonlinux.org/) is comparatively new, this text
|
||||
provides a very basic example of setting up a simple storage configuration with
|
||||
scrubs and snapshots. To paraphrase Nick Fury from _Winter Soldier_: We do
|
||||
share. We're nice like that.
|
||||
|
||||
> Using ZFS for Docker containers is currently not covered by this document. See
|
||||
> [the Docker ZFS
|
||||
> documentation](https://docs.docker.com/storage/storagedriver/zfs-driver/) for
|
||||
> details.
|
||||
|
||||
## The obligatory warning
|
||||
|
||||
We take no responsibility for any bad thing that might happen if you follow this
|
||||
guide. We strongly suggest you test these procedures in a virtual machine.
|
||||
Always, always, always backup your data.
|
||||
|
||||
## The basic setup
|
||||
|
||||
For this example, we're assuming two identical spinning rust hard drives for all
|
||||
Ansible-NAS storage. These two drives will be **mirrored** to provide
|
||||
redundancy. The actual Ubuntu system will be on a different drive and is not our
|
||||
concern here.
|
||||
|
||||
> [Root on ZFS](https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS)
|
||||
> is currently still a hassle for Ubuntu. If that changes, this document might
|
||||
> be updated accordingly. Until then, don't ask us about it.
|
||||
|
||||
The Ubuntu kernel is already ready for ZFS. We only need the utility package
|
||||
which we install with `sudo apt install zfsutils`.
|
||||
|
||||
|
||||
### Creating the pool
|
||||
|
||||
We assume you don't mind totally destroying whatever data might be on your
|
||||
storage drives, have used a tool such as `gparted` to remove any existing
|
||||
partitions, and have installed a GPT partition table. To create our ZFS pool, we
|
||||
will use a command of the form
|
||||
|
||||
```
|
||||
sudo zpool create -o ashift=<ASHIFT> <NAME> mirror <DRIVE1> <DRIVE2>
|
||||
```
|
||||
|
||||
The options from simple to complex are:
|
||||
|
||||
1. **<NAME>**: ZFS pools traditionally take their names from characters in the
|
||||
[The Matrix](https://www.imdb.com/title/tt0133093/fullcredits). The two most
|
||||
common are `tank` and `dozer`. Whatever you use, it should be short.
|
||||
|
||||
1. **<DRIVES>**: The Linux command `lsblk` will give you a quick overview of the
|
||||
hard drives in the system. However, we don't want to pass a drive
|
||||
specification in the format `/dev/sde` because this is not persistant.
|
||||
Instead, [we
|
||||
use](https://github.com/zfsonlinux/zfs/wiki/FAQ#selecting-dev-names-when-creating-a-pool)
|
||||
the output of `ls /dev/disk/by-id/` to find the drives' IDs.
|
||||
|
||||
1. **<ASHIFT>**: This is required to pass the [sector
|
||||
size](https://github.com/zfsonlinux/zfs/wiki/FAQ#advanced-format-disks) of
|
||||
the drive to ZFS for optimal performance. You might have to do this by hand
|
||||
because some drives lie: Whereas modern drives have 4k sector sizes (or 8k in
|
||||
case of many SSDs), they will report 512 bytes for backward compatibility.
|
||||
ZFS tries to [catch the
|
||||
liars](https://github.com/zfsonlinux/zfs/blob/master/cmd/zpool/zpool_vdev.c)
|
||||
and use the correct value. However, that sometimes fails, and you have to add
|
||||
it by hand. The `ashift` value is a power of two, so we have **9** for 512
|
||||
bytes, **12** for 4k, and **13** for 8k. You can create a pool without this
|
||||
parameter and then use `zdb -C | grep ashift` to see what ZFS generated
|
||||
automatically. If it isn't what you think, you can destroy the pool (see
|
||||
below) and add it manually when creating it again.
|
||||
|
||||
In our pretend case, we use 3 TB WD Red drives. Listing all drives by ID gives
|
||||
us something like this, but with real serial numbers:
|
||||
|
||||
```
|
||||
ata-WDC_WD30EFRX-68EUZN0_WD-WCCFAKESN01
|
||||
ata-WDC_WD30EFRX-68EUZN0_WD-WCCFAKESN02
|
||||
```
|
||||
|
||||
The actual command to create the pool would be:
|
||||
|
||||
```
|
||||
sudo zpool create -o ashift=12 tank mirror ata-WDC_WD30EFRX-68EUZN0_WD-WCCFAKESN01 ata-WDC_WD30EFRX-68EUZN0_WD-WCCFAKESN02
|
||||
```
|
||||
|
||||
Our new pool is named `tank` and is mirrored. To see information about it, use
|
||||
`zpool status tank` (no `sudo` necessary). If you screwed up (usually with
|
||||
`ashift`), use `sudo zpool destroy tank` and start over _now_, before it's too
|
||||
late.
|
||||
|
||||
### Pool default parameters
|
||||
|
||||
Setting pool-wide default parameters makes life easier when we create our
|
||||
datasets. To see them all, you can use the command `zfs get all tank`. Most are
|
||||
perfectly sensible. Some you'll [want to
|
||||
change](https://jrs-s.net/2018/08/17/zfs-tuning-cheat-sheet/) are:
|
||||
|
||||
```
|
||||
sudo zfs set atime=off tank
|
||||
sudo zfs set compression=lz4 tank
|
||||
sudo zfs set autoexpand=on tank
|
||||
```
|
||||
|
||||
The `atime` parameter means that your system updates an attribute of a file
|
||||
every time the file is accessed, which uses a lot of resources. Usually, you
|
||||
don't care. Compression is a no-brainer on modern CPUs and should be on by
|
||||
default (we will discuss exceptions for compressed media files later).
|
||||
`autoexpand` lets the pool grow when you add larger hard drives.
|
||||
|
||||
|
||||
## Creating the filesystems
|
||||
|
||||
To actually store the data, we need filesystems (also known as "datasets"). For
|
||||
our very simple default Ansible-NAS setup, we will create two examples: One
|
||||
filesystem for movies (`movies_root` in `all.yml`) and one for downloads
|
||||
(`downloads_root`).
|
||||
|
||||
### Movies (and other large, pre-compressed files)
|
||||
|
||||
We first create the basic file system for movies:
|
||||
|
||||
```
|
||||
sudo zfs create tank/movies
|
||||
```
|
||||
|
||||
Movie files are usually rather large, already in a compressed format, and the
|
||||
files stored there shouldn't be executed for security reasons. We change the
|
||||
properties of the filesystem accordingly:
|
||||
|
||||
```
|
||||
sudo zfs set recordsize=1M tank/movies
|
||||
sudo zfs set compression=off tank/movies
|
||||
sudo zfs set exec=off tank/movies
|
||||
```
|
||||
|
||||
The **recordsize** here is set to the currently largest possible value [to
|
||||
increase performance](https://jrs-s.net/2019/04/03/on-zfs-recordsize/) and save
|
||||
storage. Recall that we used `ashift` during the creation of the pool to match
|
||||
the ZFS block size with the drives' sector size. Records are created out of
|
||||
these blocks. Having larger records reduces the amount of metadata that is
|
||||
required, and various aspects of ZFS such as caching and checksums work on this
|
||||
level.
|
||||
|
||||
**Compression** is unnecessary for movie files because they are usually in a
|
||||
compressed format anyway. ZFS is good about recognizing this, and so if you
|
||||
happen to leave compression on as the default for the pool, it won't make much
|
||||
of a difference.
|
||||
|
||||
[By default](https://zfsonlinux.org/manpages/0.7.13/man8/zfs.8.html#lbAI), ZFS
|
||||
stores pools directly under the root directory and do not have to be listed in
|
||||
`/etc/fstab` to be mounted. This means that our filesystem will appear as
|
||||
`/tank/movies`. We need to change the line in `all.yml` accordingly:
|
||||
|
||||
```
|
||||
movies_root: "/tank/movies"
|
||||
```
|
||||
|
||||
You can also set a traditional mount point if you wish with the `mountpoint`
|
||||
property. Setting this to `none` prevents the file system from being
|
||||
automatically mounted at all.
|
||||
|
||||
The filesystems for TV shows, music files and podcasts - all large,
|
||||
pre-compressed files - should take the exact same parameters as the one for
|
||||
movies.
|
||||
|
||||
### Downloads
|
||||
|
||||
For downloads, we can leave most of the default parameters the way they are.
|
||||
|
||||
```
|
||||
sudo zfs create tank/downloads
|
||||
sudo zfs set exec=off tank/downloads
|
||||
```
|
||||
|
||||
The recordsize stays at the 128k default. In `all.yml`, the new line is
|
||||
|
||||
```
|
||||
downloads_root: "/tank/downloads"
|
||||
```
|
||||
|
||||
### Other data
|
||||
|
||||
Depending on the use case, you might want to tune your filesystems. For example,
|
||||
[Bit Torrent](http://open-zfs.org/wiki/Performance_tuning#Bit_Torrent),
|
||||
[MySQL](http://open-zfs.org/wiki/Performance_tuning#MySQL) and [Virtual
|
||||
Machines](http://open-zfs.org/wiki/Performance_tuning#Virtual_machines) all have
|
||||
known best configurations.
|
||||
|
||||
|
||||
## Setting up scrubs
|
||||
|
||||
On Ubuntu, scrubs are configurated out of the box to run on the second Sunday of
|
||||
every month. See `/etc/cron.d/zfsutils-linux` to change this.
|
||||
|
||||
|
||||
## Email notifications
|
||||
|
||||
To have the [ZFS
|
||||
demon](http://manpages.ubuntu.com/manpages/bionic/man8/zed.8.html) `zed` send
|
||||
you emails when there is trouble, you first have to [install an email
|
||||
agent](https://www.reddit.com/r/zfs/comments/90prt4/zed_config_on_ubuntu_1804/)
|
||||
such as postfix. In the file `/etc/zfs/zed.d/zed.rc`, change the three entries:
|
||||
|
||||
```
|
||||
ZED_EMAIL_ADDR=<YOUR_EMAIL_ADDRESS_HERE>
|
||||
ZED_NOTIFY_INTERVAL_SECS=3600
|
||||
ZED_NOTIFY_VERBOSE=1
|
||||
```
|
||||
|
||||
If `zed` is not enabled, you might have to run `systemctl enable zed`. You can
|
||||
test the setup by manually starting a scrub with `sudo zpool scrub tank`.
|
||||
|
||||
|
||||
## Setting up automatic snapshots
|
||||
|
||||
See [sanoid](https://github.com/jimsalterjrs/sanoid/) as a tool for snapshot
|
||||
management.
|
||||
|
||||
|
||||
|
200
docs/zfs_overview.md
Normal file
200
docs/zfs_overview.md
Normal file
|
@ -0,0 +1,200 @@
|
|||
This is a general overview of the ZFS file system for people who are new to it.
|
||||
If you have some experience and are looking for specific information about how
|
||||
to configure ZFS for Ansible-NAS, check out the [ZFS example
|
||||
configuration](zfs_configuration.md) instead.
|
||||
|
||||
## What is ZFS and why would I want it?
|
||||
|
||||
[ZFS](https://en.wikipedia.org/wiki/ZFS) is an advanced filesystem and volume
|
||||
manager originally created by Sun Microsystems from 2001 onwards. First released
|
||||
in 2005 for OpenSolaris, Oracle later bought Sun and started developing ZFS as
|
||||
closed source software. An open source fork took the name
|
||||
[OpenZFS](http://www.open-zfs.org/wiki/Main_Page), but is still called "ZFS" for
|
||||
short. It runs on Linux, FreeBSD, illumos and other platforms.
|
||||
|
||||
ZFS aims to be the ["last word in
|
||||
filesystems"](https://blogs.oracle.com/bonwick/zfs:-the-last-word-in-filesystems)
|
||||
- a system so future-proof that Michael W. Lucas and Allan Jude famously stated
|
||||
that the _Enterprise's_ computer on _Star Trek_ probably runs it. The design
|
||||
was based on [four principles](https://www.youtube.com/watch?v=MsY-BafQgj4):
|
||||
|
||||
1. "Pooled" storage to completely eliminate the notion of volumes. You can
|
||||
add more storage the same way you just add a RAM stick to memory.
|
||||
|
||||
1. Make sure data is always consistant on the disks. There is no `fsck` command
|
||||
for ZFS.
|
||||
|
||||
1. Detect and correct data corruption ("bitrot"). ZFS is one of the few storage
|
||||
systems that checksums everything and is "self-healing".
|
||||
|
||||
1. Make it easy to use. Try to "end the suffering" for the admins involved in
|
||||
managing storage.
|
||||
|
||||
ZFS includes a host of other features such as snapshots, transparent
|
||||
compression, and encryption. During the early years of ZFS, this all came with
|
||||
hardware requirements which only enterprise users could afford. By now, however,
|
||||
computers have become so powerful that ZFS can run (with some effort) on a
|
||||
[Raspberry
|
||||
Pi](https://gist.github.com/mohakshah/b203d33a235307c40065bdc43e287547). FreeBSD
|
||||
and FreeNAS make extensive use of ZFS. What is holding ZFS back on Linux are
|
||||
[licensing conflicts](https://en.wikipedia.org/wiki/OpenZFS#History) beyond the
|
||||
scope of this document.
|
||||
|
||||
Ansible-NAS doesn't actually specify a filesystem - you can use EXT4, XFS, Btrfs
|
||||
or pretty much anything you like. However, ZFS not only provides the benefits
|
||||
listed above, but also lets you use your hard drives with different operating
|
||||
systems. Some people now using Ansible-NAS originally came from FreeNAS, and
|
||||
were able to `export` their ZFS pools there and `import` them to Ubuntu. On the
|
||||
other hand, if you ever decide to switch back to FreeNAS or maybe try FreeBSD
|
||||
instead of Linux, you should be able to do so using the same ZFS pools.
|
||||
|
||||
## A small taste of ZFS
|
||||
|
||||
Storage in ZFS is organized in **pools**. Inside these pools, you create
|
||||
**filesystems** (also known as "datasets") which are like partitions on
|
||||
steroids. For instance, you can keep each user's `/home/` files in a separate
|
||||
filesystem. ZFS systems tend to use lots and lots of specialized filesystems.
|
||||
They share the available storage in their pool.
|
||||
|
||||
Pools do not directly consist of hard disks or SSDs. Instead, drives are
|
||||
organized as **virtual devices** (VDEV). This is where the physical redundancy
|
||||
in ZFS is located. Drives in a VDEV can be "mirrored" or combined as "RaidZ",
|
||||
roughly the equivalent of RAID5. These VDEVs are then combined into a pool by the
|
||||
administrator.
|
||||
|
||||
To give you some idea of how this works, this is how to create a pool:
|
||||
|
||||
```
|
||||
sudo zpool create tank mirror /dev/sda /dev/sdb
|
||||
```
|
||||
|
||||
This combines `/dev/sba` and `/dev/sdb` to a mirrored VDEV, and then defines a
|
||||
new pool named `tank` consisting of this single VDEV. We can now create a
|
||||
filesystem in this pool to hold our books:
|
||||
|
||||
```
|
||||
sudo zfs create tank/books
|
||||
```
|
||||
|
||||
You can then enable automatic and transparent compression on this filesystem
|
||||
with `sudo zfs set compression=lz4 tank/books`. To take a **snapshot**, use
|
||||
|
||||
```
|
||||
sudo zfs snapshot tank/books@monday
|
||||
```
|
||||
|
||||
Now, if evil people were somehow to encrypt your book files with ransomware on
|
||||
Wednesday, you can laugh and revert to the old version:
|
||||
|
||||
```
|
||||
sudo zfs rollback tank/books@monday
|
||||
```
|
||||
|
||||
Of course, you did lose any work from Tuesday unless you created a snapshot then
|
||||
as well. Usually, you'll have some form of **automatic snapshot
|
||||
administration**.
|
||||
|
||||
To detect bitrot and other defects, ZFS periodically runs **scrubs**: The system
|
||||
compares the available copies of each data record with their checksums. If there
|
||||
is a mismatch, the data is repaired.
|
||||
|
||||
|
||||
## Known issues
|
||||
|
||||
Constructing the pools out of virtual devices creates some problems. You can't
|
||||
just detach a drive (or a VDEV) and have the pool reconfigure itself. To
|
||||
reorganize the pool, you'd have to create a new, temporary pool out of separate
|
||||
hard drives, move the data over, destroy and reconfigure the original pool, and
|
||||
then move the data back. Increasing the size of a pool involves either adding
|
||||
more VDEVs (_not_ just additional disks) or replacing each disk in a VDEV by a
|
||||
larger version with the `autoexpand` parameter set.
|
||||
|
||||
> At time of writing (April 2019), ZFS on Linux does not offer native encryption,
|
||||
> trim support, or device removal, which are all scheduled to be included in the
|
||||
> [0.8 release](https://www.phoronix.com/scan.php?page=news_item&px=ZFS-On-Linux-0.8-RC1-Released)
|
||||
> in the near future.
|
||||
|
||||
## Myths and misunderstandings
|
||||
|
||||
There are a bunch of false or simply outdated information about ZFS. To clear up
|
||||
the worst of them:
|
||||
|
||||
### No, ZFS does not need at least 8 GB of RAM
|
||||
|
||||
This myth is especially common [in FreeNAS
|
||||
circles](https://www.ixsystems.com/community/threads/does-freenas-really-need-8gb-of-ram.38685/).
|
||||
Note that FreeBSD, the basis of FreeNAS, will run with as little [as 1
|
||||
GB](https://wiki.freebsd.org/ZFSTuningGuide). The [ZFS on Linux
|
||||
FAQ](https://github.com/zfsonlinux/zfs/wiki/FAQ#hardware-requirements), which is
|
||||
more relevant here, states under "suggested hardware":
|
||||
|
||||
> 8GB+ of memory for the best performance. It's perfectly possible to run with
|
||||
> 2GB or less (and people do), but you'll need more if using deduplication.
|
||||
|
||||
(Deduplication is only useful in [very special
|
||||
cases](http://open-zfs.org/wiki/Performance_tuning#Deduplication). If you are
|
||||
reading this, you probably don't need it.)
|
||||
|
||||
What everybody agrees on is that ZFS _loves_ RAM, and you should have as much of
|
||||
it as you possibly can. So 8 GB is in fact a sensible lower limit you shouldn't
|
||||
go below unless for testing. When in doubt, add more RAM, and even more, and
|
||||
them some, until your motherboard's capacity is reached.
|
||||
|
||||
### No, ECC RAM is not required for ZFS
|
||||
|
||||
This again is a case where a recommendation has been taken as a requirement. To
|
||||
quote the [ZFS on Linux
|
||||
FAQ](https://github.com/zfsonlinux/zfs/wiki/FAQ#do-i-have-to-use-ecc-memory-for-zfs)
|
||||
again:
|
||||
|
||||
> Using ECC memory for OpenZFS is strongly recommended for enterprise
|
||||
> environments where the strongest data integrity guarantees are required.
|
||||
> Without ECC memory rare random bit flips caused by cosmic rays or by faulty
|
||||
> memory can go undetected. If this were to occur OpenZFS (or any other
|
||||
> filesystem) will write the damaged data to disk and be unable to automatically
|
||||
> detect the corruption.
|
||||
|
||||
It is _always_ better to have ECC RAM on all computers if you can afford it, and
|
||||
ZFS is no exception. However, there is absolutely no requirement for ZFS to have
|
||||
ECC RAM.
|
||||
|
||||
### No, the SLOG is not really a write cache
|
||||
|
||||
You'll hear the suggestion that you add a fast SSD or NVMe as a "SLOG"
|
||||
(mistakingly also called "ZIL") drive for write caching. This isn't what would
|
||||
happen, because ZFS already includes [a write
|
||||
cache](https://linuxhint.com/configuring-zfs-cache/). It is located in RAM.
|
||||
Since RAM is always faster than any drive, adding a disk as a write cache
|
||||
doesn't make sense.
|
||||
|
||||
What the ZFS Intent Log (ZIL) does, with or without a dedicated drive, is handle
|
||||
synchronous writes. These occur when the system refuses to signal a successful
|
||||
write until the data is actually on a physical disk somewhere. This keeps it
|
||||
safe. By default, the ZIL initially shoves a copy of the data on a normal VDEV
|
||||
somewhere and then gives the thumbs up. The actual write to the pool is
|
||||
performed later from the normal write cache, _not_ the temporary copy. The data
|
||||
there is only ever read if the power fails before the last step.
|
||||
|
||||
A Separate Intent Log (SLOG) is a fast drive for the ZIL's temporary synchronous
|
||||
writes. It allows the ZIL give the thumbs up quicker. This means that SLOG is
|
||||
never read unless the power has failed before the final write to the pool.
|
||||
Asynchronous writes just go through the normal write cache. If the power fails,
|
||||
the data is gone.
|
||||
|
||||
In summary, the ZIL is concerned with preventing data loss for synchronous
|
||||
writes, not with speed. You always have a ZIL. A SLOG will make the ZIL faster.
|
||||
You'll need to [do some
|
||||
research](https://www.ixsystems.com/blog/o-slog-not-slog-best-configure-zfs-intent-log/)
|
||||
to figure out if your system would benefit from a SLOG. NFS for instance uses
|
||||
synchonous writes, SMB usually doesn't. If in doubt, add more RAM instead.
|
||||
|
||||
|
||||
## Further reading and viewing
|
||||
|
||||
- One of the best books around is _FreeBSD Mastery: ZFS_ by Michael W.
|
||||
Lucas and Allan Jude. Though it is written for FreeBSD, the general guidelines
|
||||
apply for all variants. There is a second book for advanced users.
|
||||
|
||||
- Jeff Bonwick, one of the original creators of ZFS, tells the story of how ZFS
|
||||
came to be [on YouTube](https://www.youtube.com/watch?v=dcV2PaMTAJ4).
|
||||
|
Loading…
Reference in a new issue