mirror of
https://github.com/davestephens/ansible-nas
synced 2025-01-13 03:58:47 +00:00
Second rewrite
This commit is contained in:
parent
336b288615
commit
fd3872db48
2 changed files with 76 additions and 68 deletions
|
@ -54,31 +54,31 @@ create our ZFS pool, we will use a command in this form:
|
||||||
|
|
||||||
The options from simple to complex are:
|
The options from simple to complex are:
|
||||||
|
|
||||||
1. **<NAME>**: ZFS pools traditionally take their names from characters in the
|
**<NAME>**: ZFS pools traditionally take their names from characters in the [The
|
||||||
[The Matrix](https://www.imdb.com/title/tt0133093/fullcredits). The two most
|
Matrix](https://www.imdb.com/title/tt0133093/fullcredits). The two most common
|
||||||
common are `tank` and `dozer`. Whatever you use, it should be short.
|
are `tank` and `dozer`. Whatever you use, it should be short.
|
||||||
|
|
||||||
1. **<DRIVES>**: The Linux command `lsblk` will give you a quick overview of the
|
**<DRIVES>**: The Linux command `lsblk` will give you a quick overview of the
|
||||||
hard drives in the system. However, we don't pass the drive specification in
|
hard drives in the system. However, we don't pass the drive specification in the
|
||||||
the format `/dev/sde` because this is not persistant. Instead,
|
format `/dev/sde` because this is not persistent. Instead,
|
||||||
[use](https://github.com/zfsonlinux/zfs/wiki/FAQ#selecting-dev-names-when-creating-a-pool)
|
[use](https://github.com/zfsonlinux/zfs/wiki/FAQ#selecting-dev-names-when-creating-a-pool)
|
||||||
the output of `ls /dev/disk/by-id/` to find the drives' IDs.
|
the output of `ls /dev/disk/by-id/` to find the drives' IDs.
|
||||||
|
|
||||||
1. **<ASHIFT>**: This is required to pass the [sector
|
**<ASHIFT>**: This is required to pass the [sector
|
||||||
size](https://github.com/zfsonlinux/zfs/wiki/FAQ#advanced-format-disks) of
|
size](https://github.com/zfsonlinux/zfs/wiki/FAQ#advanced-format-disks) of the
|
||||||
the drive to ZFS for optimal performance. You might have to do this by hand
|
drive to ZFS for optimal performance. You might have to do this by hand because
|
||||||
because some drives lie: Whereas modern drives have 4k sector sizes (or 8k in
|
some drives lie: Whereas modern drives have 4k sector sizes (or 8k in case of
|
||||||
case of many SSDs), they will report 512 bytes because Windows XP [can't
|
many SSDs), they will report 512 bytes because Windows XP [can't handle 4k
|
||||||
handle 4k
|
sectors](https://support.microsoft.com/en-us/help/2510009/microsoft-support-policy-for-4k-sector-hard-drives-in-windows).
|
||||||
sectors](https://support.microsoft.com/en-us/help/2510009/microsoft-support-policy-for-4k-sector-hard-drives-in-windows).
|
ZFS tries to [catch the
|
||||||
ZFS tries to [catch the
|
liars](https://github.com/zfsonlinux/zfs/blob/master/cmd/zpool/zpool_vdev.c) and
|
||||||
liars](https://github.com/zfsonlinux/zfs/blob/master/cmd/zpool/zpool_vdev.c)
|
use the correct value. However, this sometimes fails, and you have to add it by
|
||||||
and use the correct value. However, this sometimes fails, and you have to add
|
hand.
|
||||||
it by hand. The `ashift` value is a power of two, so we have **9** for 512
|
|
||||||
bytes, **12** for 4k, and **13** for 8k. You can create a pool without this
|
The `ashift` value is a power of two, so we have **9** for 512 bytes, **12** for
|
||||||
parameter and then use `zdb -C | grep ashift` to see what ZFS generated
|
4k, and **13** for 8k. You can create a pool without this parameter and then use
|
||||||
automatically. If it isn't what you think, destroy the pool again and add it
|
`zdb -C | grep ashift` to see what ZFS generated automatically. If it isn't what
|
||||||
manually.
|
you think, destroy the pool again and add it manually.
|
||||||
|
|
||||||
In our pretend case, we use two 3 TB WD Red drives. Listing all drives by ID
|
In our pretend case, we use two 3 TB WD Red drives. Listing all drives by ID
|
||||||
gives us something like this, but with real serial numbers:
|
gives us something like this, but with real serial numbers:
|
||||||
|
@ -200,7 +200,7 @@ known best configurations.
|
||||||
|
|
||||||
## Setting up scrubs
|
## Setting up scrubs
|
||||||
|
|
||||||
On Ubuntu, scrubs are configurated out of the box to run on the second Sunday of
|
On Ubuntu, scrubs are configured out of the box to run on the second Sunday of
|
||||||
every month. See `/etc/cron.d/zfsutils-linux` to change this.
|
every month. See `/etc/cron.d/zfsutils-linux` to change this.
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -13,16 +13,16 @@ as closed source software. An open source fork took the name
|
||||||
short. It runs on Linux, FreeBSD, illumos and other platforms.
|
short. It runs on Linux, FreeBSD, illumos and other platforms.
|
||||||
|
|
||||||
ZFS aims to be the ["last word in
|
ZFS aims to be the ["last word in
|
||||||
filesystems"](https://blogs.oracle.com/bonwick/zfs:-the-last-word-in-filesystems)
|
filesystems"](https://blogs.oracle.com/bonwick/zfs:-the-last-word-in-filesystems),
|
||||||
- a technology so future-proof that Michael W. Lucas and Allan Jude famously
|
a technology so future-proof that Michael W. Lucas and Allan Jude famously
|
||||||
stated that the _Enterprise's_ computer on _Star Trek_ probably runs it. The
|
stated that the _Enterprise's_ computer on _Star Trek_ probably runs it. The
|
||||||
design was based on [four
|
design was based on [four
|
||||||
principles](https://www.youtube.com/watch?v=MsY-BafQgj4):
|
principles](https://www.youtube.com/watch?v=MsY-BafQgj4):
|
||||||
|
|
||||||
1. "Pooled" storage to eliminate the notion of volumes. You can add more storage
|
1. "Pooled" storage to eliminate the notion of volumes. You can add more storage
|
||||||
the same way you just add a RAM stick to memory.
|
the same way you just add a RAM stick to memory.
|
||||||
|
|
||||||
1. Make sure data is always consistant on the disks. There is no `fsck` command
|
1. Make sure data is always consistent on the disks. There is no `fsck` command
|
||||||
for ZFS and none is needed.
|
for ZFS and none is needed.
|
||||||
|
|
||||||
1. Detect and correct data corruption ("bitrot"). ZFS is one of the few storage
|
1. Detect and correct data corruption ("bitrot"). ZFS is one of the few storage
|
||||||
|
@ -36,9 +36,10 @@ ZFS includes a host of other features such as snapshots, transparent compression
|
||||||
and encryption. During the early years of ZFS, this all came with hardware
|
and encryption. During the early years of ZFS, this all came with hardware
|
||||||
requirements only enterprise users could afford. By now, however, computers have
|
requirements only enterprise users could afford. By now, however, computers have
|
||||||
become so powerful that ZFS can run (with some effort) on a [Raspberry
|
become so powerful that ZFS can run (with some effort) on a [Raspberry
|
||||||
Pi](https://gist.github.com/mohakshah/b203d33a235307c40065bdc43e287547). FreeBSD
|
Pi](https://gist.github.com/mohakshah/b203d33a235307c40065bdc43e287547).
|
||||||
and FreeNAS make extensive use of ZFS. What is holding ZFS back on Linux are
|
|
||||||
[licensing issues](https://en.wikipedia.org/wiki/OpenZFS#History) beyond the
|
FreeBSD and FreeNAS make extensive use of ZFS. What is holding ZFS back on Linux
|
||||||
|
are [licensing issues](https://en.wikipedia.org/wiki/OpenZFS#History) beyond the
|
||||||
scope of this document.
|
scope of this document.
|
||||||
|
|
||||||
Ansible-NAS doesn't actually specify a filesystem - you can use EXT4, XFS or
|
Ansible-NAS doesn't actually specify a filesystem - you can use EXT4, XFS or
|
||||||
|
@ -59,7 +60,7 @@ with tailored parameters such as record size and compression. All filesystems
|
||||||
share the available storage in their pool.
|
share the available storage in their pool.
|
||||||
|
|
||||||
Pools do not directly consist of hard disks or SSDs. Instead, drives are
|
Pools do not directly consist of hard disks or SSDs. Instead, drives are
|
||||||
organized as **virtual devices** (VDEV). This is where the physical redundancy
|
organized as **virtual devices** (VDEVs). This is where the physical redundancy
|
||||||
in ZFS is located. Drives in a VDEV can be "mirrored" or combined as "RaidZ",
|
in ZFS is located. Drives in a VDEV can be "mirrored" or combined as "RaidZ",
|
||||||
roughly the equivalent of RAID5. These VDEVs are then combined into a pool by the
|
roughly the equivalent of RAID5. These VDEVs are then combined into a pool by the
|
||||||
administrator. The command might look something like this:
|
administrator. The command might look something like this:
|
||||||
|
@ -69,7 +70,8 @@ administrator. The command might look something like this:
|
||||||
```
|
```
|
||||||
|
|
||||||
This combines `/dev/sba` and `/dev/sdb` to a mirrored VDEV, and then defines a
|
This combines `/dev/sba` and `/dev/sdb` to a mirrored VDEV, and then defines a
|
||||||
new pool named `tank` consisting of this single VDEV. You can now create a
|
new pool named `tank` consisting of this single VDEV. (Actually, you'd want to
|
||||||
|
use a different ID for the drives, but you get the idea.) You can now create a
|
||||||
filesystem in this pool for, say, all of your _Mass Effect_ fan fiction:
|
filesystem in this pool for, say, all of your _Mass Effect_ fan fiction:
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -84,7 +86,7 @@ compression=lz4 tank/mefanfic`. To take a **snapshot**, use
|
||||||
```
|
```
|
||||||
|
|
||||||
Now, if evil people were somehow able to encrypt your precious fan fiction files
|
Now, if evil people were somehow able to encrypt your precious fan fiction files
|
||||||
with ransomware, you can laugh maniacally and revert to the old version:
|
with ransomware, you can simply laugh maniacally and revert to the old version:
|
||||||
|
|
||||||
```
|
```
|
||||||
sudo zfs rollback tank/mefanfic@21540411
|
sudo zfs rollback tank/mefanfic@21540411
|
||||||
|
@ -101,30 +103,33 @@ If there is a mismatch, the data is repaired.
|
||||||
|
|
||||||
## Known issues
|
## Known issues
|
||||||
|
|
||||||
> At time of writing (April 2019), ZFS on Linux does not yet offer native
|
> At time of writing (April 2019), ZFS on Linux does not offer native
|
||||||
> encryption, TRIM support, or device removal, which are all scheduled to be
|
> encryption, TRIM support or device removal, which are all scheduled to be
|
||||||
> included in the upcoming [0.8
|
> included in the upcoming [0.8
|
||||||
> release](https://www.phoronix.com/scan.php?page=news_item&px=ZFS-On-Linux-0.8-RC1-Released).
|
> release](https://www.phoronix.com/scan.php?page=news_item&px=ZFS-On-Linux-0.8-RC1-Released)
|
||||||
|
> any day now.
|
||||||
|
|
||||||
ZFS' original design for enterprise systems and redundancy requirements can make
|
ZFS' original design for enterprise systems and redundancy requirements can make
|
||||||
some things more difficult. You can't just add individual drives to a pool and
|
some things difficult. You can't just add individual drives to a pool and tell
|
||||||
tell the system to reconfigure automatically. Instead, you have to either add a
|
the system to reconfigure automatically. Instead, you have to either add a new
|
||||||
new VDEV, or replace each of the existing drives with one of higher capacity. In
|
VDEV, or replace each of the existing drives with one of higher capacity. In an
|
||||||
an enterprise environment, of course, you would just _buy_ a bunch of new drives
|
enterprise environment, of course, you would just _buy_ a bunch of new drives
|
||||||
and move the data from the old pool to the new pool. Shrinking a pool is even
|
and move the data from the old pool to the new pool. Shrinking a pool is even
|
||||||
harder - put simply, ZFS is not built for this.
|
harder - put simply, ZFS is not built for this, though it is [being worked
|
||||||
|
on](https://www.delphix.com/blog/delphix-engineering/openzfs-device-removal).
|
||||||
|
|
||||||
If you need to be able to add or remove single drives, ZFS might not be the
|
If you absolutely must be able to add or remove single drives, ZFS might not be
|
||||||
filesystem for you.
|
the filesystem for you.
|
||||||
|
|
||||||
## Myths and misunderstandings
|
## Myths and misunderstandings
|
||||||
|
|
||||||
Information on the internet about about ZFS can be outdated, conflicting, or
|
Information on the internet about ZFS can be outdated, conflicting or flat-out
|
||||||
simply wrong. Partially this is because it has been in use for almost 15 years
|
wrong. Partially this is because it has been in use for almost 15 years now and
|
||||||
now and things change, partially it is the result of being used on different
|
things change, partially it is the result of being used on different operating
|
||||||
operating systems which have minor differences under the hood. Also, Google
|
systems which have minor differences under the hood. Also, Google searches tend
|
||||||
searches tend to return the Sun/Oracle documentation for their closed source ZFS
|
to first return the Oracle documentation for their closed source ZFS variant,
|
||||||
variant, which is increasingly diverging from the open source OpenZFS standard.
|
which is increasingly diverging from the open source OpenZFS standard.
|
||||||
|
|
||||||
To clear up some of the most common misunderstandings:
|
To clear up some of the most common misunderstandings:
|
||||||
|
|
||||||
### No, ZFS does not need at least 8 GB of RAM
|
### No, ZFS does not need at least 8 GB of RAM
|
||||||
|
@ -143,11 +148,11 @@ more relevant for Ansible-NAS, states under "suggested hardware":
|
||||||
cases](http://open-zfs.org/wiki/Performance_tuning#Deduplication). If you are
|
cases](http://open-zfs.org/wiki/Performance_tuning#Deduplication). If you are
|
||||||
reading this, you probably don't need it.)
|
reading this, you probably don't need it.)
|
||||||
|
|
||||||
What everybody agrees on is that ZFS _loves_ RAM and works better the more it
|
Experience shows that 8 GB of RAM is in fact a sensible minimal amount for
|
||||||
has, so you should have as much of it as you possibly can. When in doubt, add
|
continuous use. But it's not a requirement. What everybody agrees on is that ZFS
|
||||||
more RAM, and even more, and them some, until your motherboard's capacity is
|
_loves_ RAM and works better the more it has, so you should have as much of it
|
||||||
reached. Experience shows that 8 GB of RAM is in fact a sensible minimal amount
|
as you possibly can. When in doubt, add more RAM, and even more, and them some,
|
||||||
for continious use. But it's not a requirement.
|
until your motherboard's capacity is reached.
|
||||||
|
|
||||||
### No, ECC RAM is not required for ZFS
|
### No, ECC RAM is not required for ZFS
|
||||||
|
|
||||||
|
@ -170,13 +175,13 @@ have ECC RAM.
|
||||||
|
|
||||||
### No, the SLOG is not really a write cache
|
### No, the SLOG is not really a write cache
|
||||||
|
|
||||||
You'll read the suggestion to add a fast SSD or NVMe as a "SLOG" (mistakingly
|
You'll read the suggestion to add a fast SSD or NVMe as a "SLOG drive"
|
||||||
also called "ZIL") drive for write caching. This isn't what happens, because ZFS
|
(mistakenly also called "ZIL") for write caching. This isn't what happens,
|
||||||
already includes [a write cache](https://linuxhint.com/configuring-zfs-cache/)
|
because ZFS already includes [a write
|
||||||
in RAM. Since RAM is always faster, adding a disk as a write cache doesn't make
|
cache](https://linuxhint.com/configuring-zfs-cache/) in RAM. Since RAM is always
|
||||||
sense.
|
faster, adding a disk as a write cache doesn't even make sense.
|
||||||
|
|
||||||
What the ZFS Intent Log (ZIL) does, with or without a dedicated drive, is handle
|
What the **ZFS Intent Log (ZIL)** does, with or without a dedicated drive, is handle
|
||||||
synchronous writes. These occur when the system refuses to signal a successful
|
synchronous writes. These occur when the system refuses to signal a successful
|
||||||
write until the data is actually stored on a physical disk somewhere. This keeps
|
write until the data is actually stored on a physical disk somewhere. This keeps
|
||||||
the data safe, but is slower.
|
the data safe, but is slower.
|
||||||
|
@ -187,18 +192,21 @@ performed later from the write cache in RAM, _not_ the temporary copy. The data
|
||||||
there is only ever read if the power fails before the last step. The ZIL is all
|
there is only ever read if the power fails before the last step. The ZIL is all
|
||||||
about protecting data, not making transfers faster.
|
about protecting data, not making transfers faster.
|
||||||
|
|
||||||
A Separate Intent Log (SLOG) is an additional fast drive for these temporary
|
A **Separate Intent Log (SLOG)** is an additional fast drive for these temporary
|
||||||
synchronous writes. It simply allows the ZIL give the thumbs up quicker. This
|
synchronous writes. It simply allows the ZIL give the thumbs up quicker. This
|
||||||
means that a SLOG is never read unless the power has failed before the final
|
means that a SLOG is never read unless the power has failed before the final
|
||||||
write to the pool. Asynchronous writes just go through the normal write cache,
|
write to the pool.
|
||||||
by the way. If the power fails, the data is gone.
|
|
||||||
|
|
||||||
In summary, the ZIL prevents data loss during synchronous writes. You always
|
Asynchronous writes just go through the normal write cache, by the way. If the
|
||||||
have a ZIL. A SLOG will make the ZIL faster. You'll probably need to [do some
|
power fails, the data is gone.
|
||||||
|
|
||||||
|
In summary, the ZIL prevents data loss during synchronous writes, or at least
|
||||||
|
ensures that the data in storage is consistent. You always have a ZIL. A SLOG
|
||||||
|
will make the ZIL faster. You'll probably need to [do some
|
||||||
research](https://www.ixsystems.com/blog/o-slog-not-slog-best-configure-zfs-intent-log/)
|
research](https://www.ixsystems.com/blog/o-slog-not-slog-best-configure-zfs-intent-log/)
|
||||||
and some testing to figure out if your system would benefit from a SLOG. NFS for
|
and some testing to figure out if your system would benefit from a SLOG. NFS for
|
||||||
instance uses synchonous writes, SMB usually doesn't. If in doubt, add more RAM
|
instance uses synchronous writes, SMB usually doesn't. When in doubt, add more
|
||||||
instead.
|
RAM instead.
|
||||||
|
|
||||||
|
|
||||||
## Further reading and viewing
|
## Further reading and viewing
|
||||||
|
|
Loading…
Reference in a new issue