Update tar-compress docs

- Call finish to finalize the archive [recommended by tar](https://docs.rs/tar/0.4.41/tar/struct.Builder.html#method.append_dir_all).
- Show an example of adding an archive **without** renaming it.
- Callout differences between `tar::Builder` defaults and `tar(1)` defaults

## Context

On seeing these docs I was unsure if there was another, better way, to add all the contents of one directory into an archive. After researching, I see people use this function by call it with an empty string. I feel this is a common operation and would like to see it spelled out in the documentation.

In addition, I hit an edge case where `tar(1)` produced significantly smaller files than the `tar` crate, reproduction: bfd420a012/README.md. It turns out that the problem was that the directory contained symlinks to other files in the same directory. The `follow_symlinks(true)` behavior is on by default and is the opposite of the `tar(1)` default. It caused the program to duplicate the same file multiple times which significantly increased the archive size. I believe someone looking for "How to  Compress a directory into tarball" is likely looking to replicate `tar czf` behavior and would like to know the main differences.
This commit is contained in:
Schneems 2024-06-06 17:04:16 -05:00
parent 752b035c1a
commit c42ea012aa

View file

@ -6,7 +6,7 @@ Compress `/var/log` directory into `archive.tar.gz`.
Creates a [`File`] wrapped in [`GzEncoder`] Creates a [`File`] wrapped in [`GzEncoder`]
and [`tar::Builder`]. </br>Adds contents of `/var/log` directory recursively into the archive and [`tar::Builder`]. </br>Adds contents of `/var/log` directory recursively into the archive
under `backup/logs`path with [`Builder::append_dir_all`]. under `backup/logs` path with [`Builder::append_dir_all`].
[`GzEncoder`] is responsible for transparently compressing the [`GzEncoder`] is responsible for transparently compressing the
data prior to writing it into `archive.tar.gz`. data prior to writing it into `archive.tar.gz`.
@ -21,11 +21,35 @@ fn main() -> Result<(), std::io::Error> {
let enc = GzEncoder::new(tar_gz, Compression::default()); let enc = GzEncoder::new(tar_gz, Compression::default());
let mut tar = tar::Builder::new(enc); let mut tar = tar::Builder::new(enc);
tar.append_dir_all("backup/logs", "/var/log")?; tar.append_dir_all("backup/logs", "/var/log")?;
tar.finish()?;
Ok(()) Ok(())
} }
``` ```
To add the contents without renaming them, an empty string can be used as the first argument of [`Builder::append_dir_all`]:
```rust,edition2018,no_run
use std::fs::File;
use flate2::Compression;
use flate2::write::GzEncoder;
fn main() -> Result<(), std::io::Error> {
let tar_gz = File::create("archive.tar.gz")?;
let enc = GzEncoder::new(tar_gz, Compression::default());
let mut tar = tar::Builder::new(enc);
tar.append_dir_all("", "/var/log")?;
tar.finish()?;
Ok(())
}
```
The default behavior of [`tar::Builder`] differs from the GNU `tar` utility's defaults [tar(1)],
notably [`tar::Builder::follow_symlinks(true)`] is the equivalent of `tar --dereference`.
[tar(1)]: https://man7.org/linux/man-pages/man1/tar.1.html
[`Builder::append_dir_all`]: https://docs.rs/tar/*/tar/struct.Builder.html#method.append_dir_all [`Builder::append_dir_all`]: https://docs.rs/tar/*/tar/struct.Builder.html#method.append_dir_all
[`File`]: https://doc.rust-lang.org/std/fs/struct.File.html [`File`]: https://doc.rust-lang.org/std/fs/struct.File.html
[`GzEncoder`]: https://docs.rs/flate2/*/flate2/write/struct.GzEncoder.html [`GzEncoder`]: https://docs.rs/flate2/*/flate2/write/struct.GzEncoder.html
[`tar::Builder`]: https://docs.rs/tar/*/tar/struct.Builder.html [`tar::Builder`]: https://docs.rs/tar/*/tar/struct.Builder.html
[`tar::Builder::follow_symlinks(true)`]: https://docs.rs/tar/latest/tar/struct.Builder.html#method.follow_symlinks