Second rewrite

This commit is contained in:
Scot W. Stevenson 2019-04-13 16:27:03 +02:00
parent 336b288615
commit fd3872db48
2 changed files with 76 additions and 68 deletions

View file

@ -54,31 +54,31 @@ create our ZFS pool, we will use a command in this form:
The options from simple to complex are: The options from simple to complex are:
1. **<NAME>**: ZFS pools traditionally take their names from characters in the **<NAME>**: ZFS pools traditionally take their names from characters in the [The
[The Matrix](https://www.imdb.com/title/tt0133093/fullcredits). The two most Matrix](https://www.imdb.com/title/tt0133093/fullcredits). The two most common
common are `tank` and `dozer`. Whatever you use, it should be short. are `tank` and `dozer`. Whatever you use, it should be short.
1. **<DRIVES>**: The Linux command `lsblk` will give you a quick overview of the **<DRIVES>**: The Linux command `lsblk` will give you a quick overview of the
hard drives in the system. However, we don't pass the drive specification in hard drives in the system. However, we don't pass the drive specification in the
the format `/dev/sde` because this is not persistant. Instead, format `/dev/sde` because this is not persistent. Instead,
[use](https://github.com/zfsonlinux/zfs/wiki/FAQ#selecting-dev-names-when-creating-a-pool) [use](https://github.com/zfsonlinux/zfs/wiki/FAQ#selecting-dev-names-when-creating-a-pool)
the output of `ls /dev/disk/by-id/` to find the drives' IDs. the output of `ls /dev/disk/by-id/` to find the drives' IDs.
1. **<ASHIFT>**: This is required to pass the [sector **<ASHIFT>**: This is required to pass the [sector
size](https://github.com/zfsonlinux/zfs/wiki/FAQ#advanced-format-disks) of size](https://github.com/zfsonlinux/zfs/wiki/FAQ#advanced-format-disks) of the
the drive to ZFS for optimal performance. You might have to do this by hand drive to ZFS for optimal performance. You might have to do this by hand because
because some drives lie: Whereas modern drives have 4k sector sizes (or 8k in some drives lie: Whereas modern drives have 4k sector sizes (or 8k in case of
case of many SSDs), they will report 512 bytes because Windows XP [can't many SSDs), they will report 512 bytes because Windows XP [can't handle 4k
handle 4k
sectors](https://support.microsoft.com/en-us/help/2510009/microsoft-support-policy-for-4k-sector-hard-drives-in-windows). sectors](https://support.microsoft.com/en-us/help/2510009/microsoft-support-policy-for-4k-sector-hard-drives-in-windows).
ZFS tries to [catch the ZFS tries to [catch the
liars](https://github.com/zfsonlinux/zfs/blob/master/cmd/zpool/zpool_vdev.c) liars](https://github.com/zfsonlinux/zfs/blob/master/cmd/zpool/zpool_vdev.c) and
and use the correct value. However, this sometimes fails, and you have to add use the correct value. However, this sometimes fails, and you have to add it by
it by hand. The `ashift` value is a power of two, so we have **9** for 512 hand.
bytes, **12** for 4k, and **13** for 8k. You can create a pool without this
parameter and then use `zdb -C | grep ashift` to see what ZFS generated The `ashift` value is a power of two, so we have **9** for 512 bytes, **12** for
automatically. If it isn't what you think, destroy the pool again and add it 4k, and **13** for 8k. You can create a pool without this parameter and then use
manually. `zdb -C | grep ashift` to see what ZFS generated automatically. If it isn't what
you think, destroy the pool again and add it manually.
In our pretend case, we use two 3 TB WD Red drives. Listing all drives by ID In our pretend case, we use two 3 TB WD Red drives. Listing all drives by ID
gives us something like this, but with real serial numbers: gives us something like this, but with real serial numbers:
@ -200,7 +200,7 @@ known best configurations.
## Setting up scrubs ## Setting up scrubs
On Ubuntu, scrubs are configurated out of the box to run on the second Sunday of On Ubuntu, scrubs are configured out of the box to run on the second Sunday of
every month. See `/etc/cron.d/zfsutils-linux` to change this. every month. See `/etc/cron.d/zfsutils-linux` to change this.

View file

@ -13,8 +13,8 @@ as closed source software. An open source fork took the name
short. It runs on Linux, FreeBSD, illumos and other platforms. short. It runs on Linux, FreeBSD, illumos and other platforms.
ZFS aims to be the ["last word in ZFS aims to be the ["last word in
filesystems"](https://blogs.oracle.com/bonwick/zfs:-the-last-word-in-filesystems) filesystems"](https://blogs.oracle.com/bonwick/zfs:-the-last-word-in-filesystems),
- a technology so future-proof that Michael W. Lucas and Allan Jude famously a technology so future-proof that Michael W. Lucas and Allan Jude famously
stated that the _Enterprise's_ computer on _Star Trek_ probably runs it. The stated that the _Enterprise's_ computer on _Star Trek_ probably runs it. The
design was based on [four design was based on [four
principles](https://www.youtube.com/watch?v=MsY-BafQgj4): principles](https://www.youtube.com/watch?v=MsY-BafQgj4):
@ -22,7 +22,7 @@ design was based on [four
1. "Pooled" storage to eliminate the notion of volumes. You can add more storage 1. "Pooled" storage to eliminate the notion of volumes. You can add more storage
the same way you just add a RAM stick to memory. the same way you just add a RAM stick to memory.
1. Make sure data is always consistant on the disks. There is no `fsck` command 1. Make sure data is always consistent on the disks. There is no `fsck` command
for ZFS and none is needed. for ZFS and none is needed.
1. Detect and correct data corruption ("bitrot"). ZFS is one of the few storage 1. Detect and correct data corruption ("bitrot"). ZFS is one of the few storage
@ -36,9 +36,10 @@ ZFS includes a host of other features such as snapshots, transparent compression
and encryption. During the early years of ZFS, this all came with hardware and encryption. During the early years of ZFS, this all came with hardware
requirements only enterprise users could afford. By now, however, computers have requirements only enterprise users could afford. By now, however, computers have
become so powerful that ZFS can run (with some effort) on a [Raspberry become so powerful that ZFS can run (with some effort) on a [Raspberry
Pi](https://gist.github.com/mohakshah/b203d33a235307c40065bdc43e287547). FreeBSD Pi](https://gist.github.com/mohakshah/b203d33a235307c40065bdc43e287547).
and FreeNAS make extensive use of ZFS. What is holding ZFS back on Linux are
[licensing issues](https://en.wikipedia.org/wiki/OpenZFS#History) beyond the FreeBSD and FreeNAS make extensive use of ZFS. What is holding ZFS back on Linux
are [licensing issues](https://en.wikipedia.org/wiki/OpenZFS#History) beyond the
scope of this document. scope of this document.
Ansible-NAS doesn't actually specify a filesystem - you can use EXT4, XFS or Ansible-NAS doesn't actually specify a filesystem - you can use EXT4, XFS or
@ -59,7 +60,7 @@ with tailored parameters such as record size and compression. All filesystems
share the available storage in their pool. share the available storage in their pool.
Pools do not directly consist of hard disks or SSDs. Instead, drives are Pools do not directly consist of hard disks or SSDs. Instead, drives are
organized as **virtual devices** (VDEV). This is where the physical redundancy organized as **virtual devices** (VDEVs). This is where the physical redundancy
in ZFS is located. Drives in a VDEV can be "mirrored" or combined as "RaidZ", in ZFS is located. Drives in a VDEV can be "mirrored" or combined as "RaidZ",
roughly the equivalent of RAID5. These VDEVs are then combined into a pool by the roughly the equivalent of RAID5. These VDEVs are then combined into a pool by the
administrator. The command might look something like this: administrator. The command might look something like this:
@ -69,7 +70,8 @@ administrator. The command might look something like this:
``` ```
This combines `/dev/sba` and `/dev/sdb` to a mirrored VDEV, and then defines a This combines `/dev/sba` and `/dev/sdb` to a mirrored VDEV, and then defines a
new pool named `tank` consisting of this single VDEV. You can now create a new pool named `tank` consisting of this single VDEV. (Actually, you'd want to
use a different ID for the drives, but you get the idea.) You can now create a
filesystem in this pool for, say, all of your _Mass Effect_ fan fiction: filesystem in this pool for, say, all of your _Mass Effect_ fan fiction:
``` ```
@ -84,7 +86,7 @@ compression=lz4 tank/mefanfic`. To take a **snapshot**, use
``` ```
Now, if evil people were somehow able to encrypt your precious fan fiction files Now, if evil people were somehow able to encrypt your precious fan fiction files
with ransomware, you can laugh maniacally and revert to the old version: with ransomware, you can simply laugh maniacally and revert to the old version:
``` ```
sudo zfs rollback tank/mefanfic@21540411 sudo zfs rollback tank/mefanfic@21540411
@ -101,30 +103,33 @@ If there is a mismatch, the data is repaired.
## Known issues ## Known issues
> At time of writing (April 2019), ZFS on Linux does not yet offer native > At time of writing (April 2019), ZFS on Linux does not offer native
> encryption, TRIM support, or device removal, which are all scheduled to be > encryption, TRIM support or device removal, which are all scheduled to be
> included in the upcoming [0.8 > included in the upcoming [0.8
> release](https://www.phoronix.com/scan.php?page=news_item&px=ZFS-On-Linux-0.8-RC1-Released). > release](https://www.phoronix.com/scan.php?page=news_item&px=ZFS-On-Linux-0.8-RC1-Released)
> any day now.
ZFS' original design for enterprise systems and redundancy requirements can make ZFS' original design for enterprise systems and redundancy requirements can make
some things more difficult. You can't just add individual drives to a pool and some things difficult. You can't just add individual drives to a pool and tell
tell the system to reconfigure automatically. Instead, you have to either add a the system to reconfigure automatically. Instead, you have to either add a new
new VDEV, or replace each of the existing drives with one of higher capacity. In VDEV, or replace each of the existing drives with one of higher capacity. In an
an enterprise environment, of course, you would just _buy_ a bunch of new drives enterprise environment, of course, you would just _buy_ a bunch of new drives
and move the data from the old pool to the new pool. Shrinking a pool is even and move the data from the old pool to the new pool. Shrinking a pool is even
harder - put simply, ZFS is not built for this. harder - put simply, ZFS is not built for this, though it is [being worked
on](https://www.delphix.com/blog/delphix-engineering/openzfs-device-removal).
If you need to be able to add or remove single drives, ZFS might not be the If you absolutely must be able to add or remove single drives, ZFS might not be
filesystem for you. the filesystem for you.
## Myths and misunderstandings ## Myths and misunderstandings
Information on the internet about about ZFS can be outdated, conflicting, or Information on the internet about ZFS can be outdated, conflicting or flat-out
simply wrong. Partially this is because it has been in use for almost 15 years wrong. Partially this is because it has been in use for almost 15 years now and
now and things change, partially it is the result of being used on different things change, partially it is the result of being used on different operating
operating systems which have minor differences under the hood. Also, Google systems which have minor differences under the hood. Also, Google searches tend
searches tend to return the Sun/Oracle documentation for their closed source ZFS to first return the Oracle documentation for their closed source ZFS variant,
variant, which is increasingly diverging from the open source OpenZFS standard. which is increasingly diverging from the open source OpenZFS standard.
To clear up some of the most common misunderstandings: To clear up some of the most common misunderstandings:
### No, ZFS does not need at least 8 GB of RAM ### No, ZFS does not need at least 8 GB of RAM
@ -143,11 +148,11 @@ more relevant for Ansible-NAS, states under "suggested hardware":
cases](http://open-zfs.org/wiki/Performance_tuning#Deduplication). If you are cases](http://open-zfs.org/wiki/Performance_tuning#Deduplication). If you are
reading this, you probably don't need it.) reading this, you probably don't need it.)
What everybody agrees on is that ZFS _loves_ RAM and works better the more it Experience shows that 8 GB of RAM is in fact a sensible minimal amount for
has, so you should have as much of it as you possibly can. When in doubt, add continuous use. But it's not a requirement. What everybody agrees on is that ZFS
more RAM, and even more, and them some, until your motherboard's capacity is _loves_ RAM and works better the more it has, so you should have as much of it
reached. Experience shows that 8 GB of RAM is in fact a sensible minimal amount as you possibly can. When in doubt, add more RAM, and even more, and them some,
for continious use. But it's not a requirement. until your motherboard's capacity is reached.
### No, ECC RAM is not required for ZFS ### No, ECC RAM is not required for ZFS
@ -170,13 +175,13 @@ have ECC RAM.
### No, the SLOG is not really a write cache ### No, the SLOG is not really a write cache
You'll read the suggestion to add a fast SSD or NVMe as a "SLOG" (mistakingly You'll read the suggestion to add a fast SSD or NVMe as a "SLOG drive"
also called "ZIL") drive for write caching. This isn't what happens, because ZFS (mistakenly also called "ZIL") for write caching. This isn't what happens,
already includes [a write cache](https://linuxhint.com/configuring-zfs-cache/) because ZFS already includes [a write
in RAM. Since RAM is always faster, adding a disk as a write cache doesn't make cache](https://linuxhint.com/configuring-zfs-cache/) in RAM. Since RAM is always
sense. faster, adding a disk as a write cache doesn't even make sense.
What the ZFS Intent Log (ZIL) does, with or without a dedicated drive, is handle What the **ZFS Intent Log (ZIL)** does, with or without a dedicated drive, is handle
synchronous writes. These occur when the system refuses to signal a successful synchronous writes. These occur when the system refuses to signal a successful
write until the data is actually stored on a physical disk somewhere. This keeps write until the data is actually stored on a physical disk somewhere. This keeps
the data safe, but is slower. the data safe, but is slower.
@ -187,18 +192,21 @@ performed later from the write cache in RAM, _not_ the temporary copy. The data
there is only ever read if the power fails before the last step. The ZIL is all there is only ever read if the power fails before the last step. The ZIL is all
about protecting data, not making transfers faster. about protecting data, not making transfers faster.
A Separate Intent Log (SLOG) is an additional fast drive for these temporary A **Separate Intent Log (SLOG)** is an additional fast drive for these temporary
synchronous writes. It simply allows the ZIL give the thumbs up quicker. This synchronous writes. It simply allows the ZIL give the thumbs up quicker. This
means that a SLOG is never read unless the power has failed before the final means that a SLOG is never read unless the power has failed before the final
write to the pool. Asynchronous writes just go through the normal write cache, write to the pool.
by the way. If the power fails, the data is gone.
In summary, the ZIL prevents data loss during synchronous writes. You always Asynchronous writes just go through the normal write cache, by the way. If the
have a ZIL. A SLOG will make the ZIL faster. You'll probably need to [do some power fails, the data is gone.
In summary, the ZIL prevents data loss during synchronous writes, or at least
ensures that the data in storage is consistent. You always have a ZIL. A SLOG
will make the ZIL faster. You'll probably need to [do some
research](https://www.ixsystems.com/blog/o-slog-not-slog-best-configure-zfs-intent-log/) research](https://www.ixsystems.com/blog/o-slog-not-slog-best-configure-zfs-intent-log/)
and some testing to figure out if your system would benefit from a SLOG. NFS for and some testing to figure out if your system would benefit from a SLOG. NFS for
instance uses synchonous writes, SMB usually doesn't. If in doubt, add more RAM instance uses synchronous writes, SMB usually doesn't. When in doubt, add more
instead. RAM instead.
## Further reading and viewing ## Further reading and viewing