coreutils/CONTRIBUTING.md
2023-11-02 18:41:16 +01:00

12 KiB

Contributing to coreutils

Hi! Welcome to uutils/coreutils!

Thanks for wanting to contribute to this project! This document explains everything you need to know to contribute. Before you start make sure to also check out these documents:

Now follows a very important warning:

Warning

uutils is original code and cannot contain any code from GNU or other implementations. This means that we cannot accept any changes based on the GNU source code. To make sure that cannot happen, you cannot link to the GNU source code either.

Finally, feel free to join our Discord!

Getting Oriented

uutils is a big project consisting of many parts. Here are the most important parts for getting started:

Each utility is defined as a separate crate. The structure of each of these crates is as follows:

  • Cargo.toml
  • src/main.rs: contains only a single macro call
  • src/<util name>.rs: the actual code for the utility
  • <util name>.md: the documentation for the utility

We have separated repositories for crates that we maintain but also publish for use by others:

Design Goals

todo

How to Help

There are several ways to help and writing code is just one them. Reporting issues and writing documentation are just as important as writing code.

Reporting Issues

We can't fix bugs we don't know about, so good issues are super helpful! Here are some tips for writing good issues:

  • If you find a bug, make sure it's still a problem on the main branch.
  • Search through the existing issues to see whether it has already been reported.
  • Make sure to include all relevant information, such as:
    • Which version of uutils did you check?
    • Which version of GNU coreutils are you comparing with?
    • What platform are you on?
  • Provide a way to reliably reproduce the issue.
  • Be as specific as possible!

Writing Documentation

There's never enough documentation. If you come across any documentation that could be improved, feel free to submit a PR for it!

Writing Code

If you want to submit a PR, make sure that you've discussed the solution with the maintainers beforehand. We want to avoid situations where you put a lot of work into a fix that we can't merge! If there's no issue for what you're trying to fix yet, make one before you start working on the PR.

Generally, we try to follow what GNU is doing in terms of options and behavior. It is recommended to look at the GNU Coreutils manual (on the web, or locally using info <utility>). It is more in depth than the man pages and provides a good description of available features and their implementation details. But remember, you cannot look at the GNU source code!

Also remember that we can only merge PRs which pass our test suite, follow rustfmt, and do not have any warnings from clippy. See DEVELOPMENT.md for more information. Be sure to also read about our Rust style.

Our Rust Style

We want uutils to be written in idiomatic Rust, so here are some guidelines to follow. Some of these are aspirational, meaning that we don't do them correctly everywhere in the code. If you find violations of the advice below, feel free to submit a patch!

Don't panic!

The coreutils should be very reliable. This means that we should never panic!. Therefore, you should avoid using .unwrap() and panic!. Sometimes the use of unreachable! can be justified with a comment explaining why that code is unreachable.

Don't exit

We want uutils to be embeddable in other programs. This means that no function in uutils should exit the program. Doing so would also lead to code with more confusing control flow. Avoid therefore std::process::exit and similar functions which exit the program early.

unsafe

uutils cannot be entirely safe, because we have to call out to libc and do syscalls. However, we still want to limit our use of unsafe. We generally only accept unsafe for FFI, with very few exceptions. Note that performance is very rarely a valid argument for using unsafe.

If you still need to write code with unsafe, make sure to read the Rustonomicon and annotate the calls with // SAFETY: comments explaining why the use of unsafe is sound.

Macros

Macros can be a great tool, but they are also usually hard to understand. They should be used sparingly. Make sure to explore simpler options before you reach for a solution involving macros.

str, OsStr & Path

Rust has many string-like types, and sometimes it's hard to choose the right one. It's tempting to use str (and String) for everything, but that is not always the right choice for uutils, because we need to support invalid UTF-8, just like the GNU coreutils. For example, paths on Linux might not be valid UTF-8! Whenever we are dealing with paths, we should therefore stick with OsStr and Path. Make sure that you only convert to str/String if you know that something is always valid UTF-8. If you need more operations on OsStr, you can use the bstr crate.

Doc-comments

We use rustdoc for our documentation, so it's best to follow rustdoc's guidelines. Make sure that your documentation is not just repeating the name of the function, but actually giving more useful information. Rustdoc recommends the following structure:

[short sentence explaining what it is]

[more detailed explanation]

[at least one code example that users can copy/paste to try it]

[even more advanced explanations if necessary]

Other comments

Comments should be written to explain the code, not to describe the code. Try to focus on explaining why the code is the way it is. If you feel like you have to describe the code, that's usually a sign that you could improve the naming of variables and functions.

If you edit a piece of code, make sure to update any comments that need to change as a result. The only thing worse than having no comments is having outdated comments!

Git Etiquette

todo

Platforms

We take pride in supporting many operating systems and architectures. Any code you contribute must at least compile without warnings for all platforms in the CI. However, you can use #[cfg(...)] attributes to create platform dependent features.

Tip: For Windows, Microsoft provides some images (VMWare, Hyper-V, VirtualBox and Parallels) for development: https://developer.microsoft.com/windows/downloads/virtual-machines/

Improving the GNU compatibility

Please make sure you have installed GNU utils and prerequisites and can execute commands described in Comparing with GNU section of DEVELOPMENT.md

The Python script ./util/remaining-gnu-error.py shows the list of failing tests in the CI.

To improve the GNU compatibility, the following process is recommended:

  1. Identify a test (the smaller, the better) on a program that you understand or is easy to understand. You can use the ./util/remaining-gnu-error.py script to help with this decision.
  2. Build both the GNU and Rust coreutils using: bash util/build-gnu.sh
  3. Run the test with bash util/run-gnu-test.sh <your test>
  4. Start to modify <your test> to understand what is wrong. Examples:
    1. Add set -v to have the bash verbose mode
    2. Add echo $? where needed
    3. When the variable fail is used in the test, echo $fail to see when the test started to fail
    4. Bump the content of the output (ex: cat err)
    5. ...
  5. Or, if the test is simple, extract the relevant information to create a new test case running both GNU & Rust implementation
  6. Start to modify the Rust implementation to match the expected behavior
  7. Add a test to make sure that we don't regress (our test suite is super quick)

Commit messages

To help the project maintainers review pull requests from contributors across numerous utilities, the team has settled on conventions for commit messages.

From https://git-scm.com/book/ch5-2.html:

Capitalized, short (50 chars or less) summary

More detailed explanatory text, if necessary.  Wrap it to about 72
characters or so.  In some contexts, the first line is treated as the
subject of an email and the rest of the text as the body.  The blank
line separating the summary from the body is critical (unless you omit
the body entirely); tools like rebase will confuse you if you run the
two together.

Write your commit message in the imperative: "Fix bug" and not "Fixed bug"
or "Fixes bug."  This convention matches up with commit messages generated
by commands like git merge and git revert.

Further paragraphs come after blank lines.

  - Bullet points are okay, too

  - Typically a hyphen or asterisk is used for the bullet, followed by a
    single space, with blank lines in between, but conventions vary here

  - Use a hanging indent

Furthermore, here are a few examples for a summary line:

  • commit for a single utility
nohup: cleanup and refactor
  • commit for a utility's tests
tests/rm: test new feature

Beyond changes to an individual utility or its tests, other summary lines for non-utility modules include:

README: add help
uucore: add new modules
uutils: add new utility
gitignore: add temporary files

Code coverage

To generate code coverage report locally please follow Code coverage report section of DEVELOPMENT.md

Other implementations

The Coreutils have different implementations, with different levels of completions:

However, when reimplementing the tools/options in Rust, don't read their source codes when they are using reciprocal licenses (ex: GNU GPL, GNU LGPL, etc).

Licensing

uutils is distributed under the terms of the MIT License; see the LICENSE file for details. This is a permissive license, which allows the software to be used with few restrictions.

Copyrights in the uutils project are retained by their contributors, and no copyright assignment is required to contribute.

If you wish to add or change dependencies as part of a contribution to the project, a tool like cargo-license can be used to show their license details. The following types of license are acceptable:

  • MIT License
  • Dual- or tri-license with an MIT License option ("Apache-2.0 or MIT" is a popular combination)
  • "MIT equivalent" license (2-clause BSD, 3-clause BSD, ISC)
  • License less restrictive than the MIT License (CC0 1.0 Universal)
  • Apache License version 2.0

Licenses we will not use:

  • An ambiguous license, or no license
  • Strongly reciprocal licenses (GNU GPL, GNU LGPL)

If you wish to add a reference but it doesn't meet these requirements, please raise an issue to describe the dependency.