Improve Cargo.toml discovery algorithm

Our current Cargo.toml discovery algorithm is based on reading the main
Cargo.toml and looking at its `workspace.members` - unfortunately, as it
turns out there's actually no requirement for all workspace crates to be
listed there, and some applications do fancy things like:

```toml
[workspace]
members = [ "crates/foo" ]

[dependencies]
foo = { path = "crates/foo" }
bar = { path = "crates/bar" } # whoopsie
```

... which doesn't play nice with Naersk's incremental builds (that is,
the compilation fails unless `singleStep = true;` is turned on).

This commit changes the logic to use a different approach: now we simply
recursively scan the root directory, noting down all the Cargo.tomls we
find; Crane seems to use the same algorithm.

I think the only case where this approach falls short are spurious
Cargo.tomls that are present somewhere in the source directory but are
actually unused - for instance, imagine I'm writing a Cargo clone where
I've got files like `src/tests/broken-manifest/Cargo.toml` - in cases
like these, the current approach will cause the build to fail, because
we will try to fixup a broken Cargo.toml, thinking it's used somewhere.

We could try to handle cases like these by using an even different
approach: instead of traversing the source tree, we could load the main
Cargo.toml and traverse its `workspace.members` + all path-based
dependencies recursively, noting down all the *explicitly present*
Cargo.tomls.

That is, given a top-level manifest of:

```
[workspace]
members = [ "crates/foo" ]

[dependencies]
foo = { path = "crates/foo" }
bar = { path = "crates/bar" }
```

... we'd note down and load `crates/foo/Cargo.toml` +
`crates/bar/Cargo.toml`, and then load _their_ path-based dependencies,
to find all transitive Cargo.tomls that belong to this workspace.

(that is, we have to scan the crates recursively, because `bar` might
depend on `crates/zar` - and that's not an imaginary edge case, that's
actually what happens inside Nushell, for instance.)

I don't have enough time to implement this more extensive approach now,
and IMO the improvement presented here is still an improvement, so I
think it's safe to go with the new-but-still-subpar approach here and
consider the better algorithm in the future :-)

# Testing

All tests pass, I've also made sure that incremental builds are still
incremental.

Closes https://github.com/nix-community/naersk/issues/274
This commit is contained in:
Patryk Wychowaniec 2023-02-24 21:00:26 +01:00
parent d998160d6a
commit 2b41e666c1
6 changed files with 91 additions and 153 deletions

View file

@ -213,7 +213,6 @@ process, rest is passed-through into `mkDerivation`.
| `override` | An override for all derivations involved in the build. Default: `(x: x)` |
| `overrideMain` | An override for the top-level (last, main) derivation. If both `override` and `overrideMain` are specified, _both_ will be applied to the top-level derivation. Default: `(x: x)` |
| `singleStep` | When true, no intermediary (dependency-only) build is run. Enabling `singleStep` greatly reduces the incrementality of the builds. Default: `false` |
| `targets` | The targets to build if the `Cargo.toml` is a virtual manifest. |
| `copyBins` | When true, the resulting binaries are copied to `$out/bin`. <br/> Note: this relies on cargo's `--message-format` argument, set in the default `cargoBuildOptions`. Default: `true` |
| `copyLibs` | When true, the resulting binaries are copied to `$out/lib`. <br/> Note: this relies on cargo's `--message-format` argument, set in the default `cargoBuildOptions`. Default: `false` |
| `copyBinsFilter` | A [`jq`](https://stedolan.github.io/jq) filter for selecting which build artifacts to release. This is run on cargo's [`--message-format`](https://doc.rust-lang.org/cargo/reference/external-tools.html#json-messages) JSON output. <br/> The value is written to the `cargo_bins_jq_filter` variable. Default: `''select(.reason == "compiler-artifact" and .executable != null and .profile.test == false)''` |

View file

@ -116,9 +116,6 @@ let
# `singleStep` greatly reduces the incrementality of the builds.
singleStep = attrs0.singleStep or false;
# The targets to build if the `Cargo.toml` is a virtual manifest.
targets = attrs0.targets or null;
# When true, the resulting binaries are copied to `$out/bin`. <br/>
# Note: this relies on cargo's `--message-format` argument, set in the
# default `cargoBuildOptions`.
@ -277,77 +274,48 @@ let
buildPlanConfig = rec {
inherit userAttrs;
inherit (sr) src root;
# Whether we skip pre-building the deps
isSingleStep = attrs.singleStep;
inherit (attrs) overrideMain gitAllRefs gitSubmodules;
# The members we want to build
# (list of directory names)
wantedMembers =
lib.mapAttrsToList (member: _cargotoml: member) wantedMemberCargotomls;
isSingleStep = attrs.singleStep;
# Member path to cargotoml
# (attrset from directory name to Nix object)
wantedMemberCargotomls =
let
pred =
if ! isWorkspace
then (_member: _cargotoml: true)
else
if ! isNull attrs.targets
then (_member: cargotoml: lib.elem cargotoml.package.name attrs.targets)
else (member: _cargotoml: member != ".");
in
lib.filterAttrs pred cargotomls;
# All cargotomls, from path to nix object
# (attrset from directory name to Nix object)
# List of all the Cargo.tomls in the workspace.
#
# Note that the simplest thing here would be to read `workspace.members`,
# but somewhat unfortunately there's no requirement that all workspace
# crates should be listed there - for instance, some projects¹ do:
#
# ```
# [workspace]
# members = [ "crates/foo", "crates/bar" ]
#
# [dependencies]
# foo = { path = "crates/foo" }
# bar = { path = "crates/bar" }
# zar = { path = "crates/zar" }
# ```
#
# ... which Cargo allows and so should we.
#
# ¹ such as Nushell
cargotomls =
let
readTOML = builtinz.readTOML usePureFromTOML;
in
{ "." = toplevelCargotoml; } // lib.optionalAttrs isWorkspace
(
lib.listToAttrs
(
map
(
member:
{
name = member;
value = readTOML (root + "/${member}/Cargo.toml");
}
)
members
)
);
findCargoTomls = dir:
lib.mapAttrsToList
(name: type:
let
path = "${root}/${dir}/${name}";
# The cargo members
members =
let
# the members, as listed in the virtual manifest
listedMembers = toplevelCargotoml.workspace.members or [];
# this turns members like "foo/*" into [ "foo/bar" "foo/baz" ]
# as in https://github.com/rust-analyzer/rust-analyzer/blob/b2ed130ffd9c79de26249a1dfb2a8312d6af12b3/Cargo.toml#L2
expandMember = member:
if (lib.hasSuffix "/*" member) || (lib.hasSuffix "/*/" member)
then
let
rootDir = lib.replaceStrings [ "/*/" "/*" ] [ "" "" ] member;
subdirs = (
builtins.attrNames (
lib.filterAttrs
(name: type: type == "directory")
(builtins.readDir (root + "/${rootDir}"))
)
);
in map (subdir: "${rootDir}/${subdir}") subdirs
else [ member ];
in
if type == "regular" && name == "Cargo.toml" then
[{ name = dir; toml = readTOML path; }]
else if type == "directory" then
findCargoTomls "${dir}/${name}"
else
[])
(builtins.readDir "${root}/${dir}");
in
lib.unique (lib.concatMap expandMember listedMembers);
lib.flatten (findCargoTomls ".");
# If `copySourcesFrom` is set, then it looks like the benefits brought by
# two-step caching break, for unclear reasons as of now. As such, do not set
@ -379,13 +347,13 @@ let
toplevelCargotoml = readTOML (root + "/Cargo.toml");
# The cargo lock
cargolock =
cargolock =
let
cargolock-file = root + "/Cargo.lock";
in
if builtins.pathExists cargolock-file then
readTOML (cargolock-file)
else
else
throw "Naersk requires Cargo.lock to be available in root. Check that it is not in .gitignore and stage it when using git to filter sources (which flakes does)";
packageName =

17
lib.nix
View file

@ -121,15 +121,16 @@ rec
# A very minimal 'src' which makes cargo happy nonetheless
dummySrc =
{ cargoconfig # string
, cargotomls # attrset
, cargolock # attrset
{ cargoconfig # string
, cargotomls # list
, cargolock # attrset
, copySources # list of paths that should be copied to the output
, copySourcesFrom # path from which to copy ${copySources}
}:
let
config = writeText "config" cargoconfig;
cargolock' = builtinz.writeTOML "Cargo.lock" cargolock;
fixupCargoToml = cargotoml:
let
attrs =
@ -143,10 +144,9 @@ rec
package = removeAttrs attrs.package [ "build" ];
};
# a list of tuples from member to cargo toml:
# "foo-member:/path/to/toml bar:/path/to/other-toml"
cargotomlss = lib.mapAttrsToList
(k: v: "${k}:${builtinz.writeTOML "Cargo.toml" (fixupCargoToml v)}")
cargotomlss = map
({ name, toml }:
"${name}:${builtinz.writeTOML "Cargo.toml" (fixupCargoToml toml)}")
cargotomls;
in
@ -166,11 +166,12 @@ rec
final_path="$final_dir/Cargo.toml"
cp $cargotoml "$final_path"
# make sure cargo is happy
pushd $out/$member > /dev/null
mkdir -p src
# Avoid accidentally pulling `std` for no-std crates.
echo '#![no_std]' >src/lib.rs
# pretend there's a `build.rs`, otherwise cargo doesn't build
# the `[build-dependencies]`. Custom locations of build scripts
# aren't an issue because we strip the `build` field in

View file

@ -11,18 +11,6 @@
"url": "https://github.com/ninegua/agent-rs/archive/4a22e590516bc79ec3c75a320f7941e7762ea098.tar.gz",
"url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
},
"cargo": {
"branch": "nm/canonicalize-path-args",
"description": "The Rust package manager",
"homepage": "https://doc.rust-lang.org/cargo",
"owner": "nmattia",
"repo": "cargo",
"rev": "979904e092b39bb6aafe62a3ff3d994d3d8b6247",
"sha256": "078hhwii2qhdsm179xl249p0c2445yiyl5dv48kvx2372mc0g2gd",
"type": "tarball",
"url": "https://github.com/nmattia/cargo/archive/979904e092b39bb6aafe62a3ff3d994d3d8b6247.tar.gz",
"url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
},
"lorri": {
"branch": "master",
"description": "foo",
@ -35,18 +23,6 @@
"url": "https://github.com/target/lorri/archive/ba299dde70af5cb8bffbce3eb770ec6cfb2fd1aa.tar.gz",
"url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
},
"lucet": {
"branch": "master",
"description": "Lucet, the Sandboxing WebAssembly Compiler.",
"homepage": "",
"owner": "fastly",
"repo": "lucet",
"rev": "8b20e9527aabd27e8360bd2aac763cedf4a96a8b",
"sha256": "0wpfj1qyqbkym83yvlg3xmj8vlf1g7r0azmif51m58k3qqm86fca",
"type": "tarball",
"url": "https://github.com/fastly/lucet/archive/8b20e9527aabd27e8360bd2aac763cedf4a96a8b.tar.gz",
"url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
},
"niv": {
"branch": "master",
"description": "Easy dependency management for Nix projects",
@ -107,28 +83,28 @@
"url": "https://github.com/NixOS/nixpkgs/archive/432864f33c84af53192d4b23ee5871697e57f094.tar.gz",
"url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
},
"noria": {
"nixpkgs-mozilla": {
"branch": "master",
"description": "Dynamically changing, partially-stateful data-flow for web application backends.",
"description": "Mozilla overlay for Nixpkgs.",
"homepage": "",
"owner": "mit-pdos",
"repo": "noria",
"rev": "e1dd3d4bdeda53ff72f8ee0b8deaa923afe4f67a",
"sha256": "1x3lx58qhgy8iw3ykd3cwia5bnfflgxpf88ssi2p631kcail5m6z",
"owner": "mozilla",
"repo": "nixpkgs-mozilla",
"rev": "85eb0ba7d8e5d6d4b79e5b0180aadbdd25d76404",
"sha256": "15a7zd7nrnfgjzs8gq2cpkxg7l3c38jradkxxyaf136kkqhlc0k4",
"type": "tarball",
"url": "https://github.com/mit-pdos/noria/archive/e1dd3d4bdeda53ff72f8ee0b8deaa923afe4f67a.tar.gz",
"url": "https://github.com/mozilla/nixpkgs-mozilla/archive/85eb0ba7d8e5d6d4b79e5b0180aadbdd25d76404.tar.gz",
"url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
},
"ripgrep": {
"branch": "master",
"description": "ripgrep recursively searches directories for a regex pattern",
"homepage": "",
"owner": "BurntSushi",
"repo": "ripgrep",
"rev": "d1389db2e39802d5e04dc7b902fd2b1f9f615b01",
"sha256": "11cflb47pjyb5kx8n2a4w34y47z8dw935i7v11g3ynzhq4apsavi",
"nushell": {
"branch": "main",
"description": "A new type of shell",
"homepage": "https://www.nushell.sh/",
"owner": "nushell",
"repo": "nushell",
"rev": "85bfdca578157072e51e6972d370cfe63b0fda77",
"sha256": "0virlm3wpzsavdlci725iw0f8amr6ijxg9p2xp2q41zngjgb3w98",
"type": "tarball",
"url": "https://github.com/BurntSushi/ripgrep/archive/d1389db2e39802d5e04dc7b902fd2b1f9f615b01.tar.gz",
"url": "https://github.com/nushell/nushell/archive/85bfdca578157072e51e6972d370cfe63b0fda77.tar.gz",
"url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
},
"ripgrep-all": {
@ -143,30 +119,6 @@
"url": "https://github.com/phiresky/ripgrep-all/archive/v0.9.6.tar.gz",
"url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
},
"rust": {
"branch": "master",
"description": "Empowering everyone to build reliable and efficient software.",
"homepage": "https://www.rust-lang.org",
"owner": "rust-lang",
"repo": "rust",
"rev": "8ebd67e4ee394cad9441a801f2022724ae7e07db",
"sha256": "10r5q5vgpmqkgq942qzajfgz5fwm3m9jbg8h22ykb62nsi35wv02",
"type": "tarball",
"url": "https://github.com/rust-lang/rust/archive/8ebd67e4ee394cad9441a801f2022724ae7e07db.tar.gz",
"url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
},
"rustfmt": {
"branch": "master",
"description": "Format Rust code",
"homepage": "",
"owner": "rust-lang",
"repo": "rustfmt",
"rev": "1d19a08ed4743e3c95176fb639ebcd50f68a3313",
"sha256": "1cschys4igyraqqw1gq3279bavvsjy6hx93is5myr70kx998x48k",
"type": "tarball",
"url": "https://github.com/rust-lang/rustfmt/archive/1d19a08ed4743e3c95176fb639ebcd50f68a3313.tar.gz",
"url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
},
"rustlings": {
"branch": "master",
"description": "Small exercises to get you used to reading and writing Rust code!",
@ -179,18 +131,6 @@
"url": "https://github.com/rust-lang/rustlings/archive/b8d59d699bed76b6be95d1ed41881d00d4b0d533.tar.gz",
"url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
},
"servo": {
"branch": "master",
"description": "The Servo Browser Engine",
"homepage": "https://servo.org/",
"owner": "servo",
"repo": "servo",
"rev": "a2b76b0169bbc7691fe2dc173ce7948ee1aca876",
"sha256": "0a9p5hd8aq07ygdgjmhlmy7xajaz6ahnj38c2xipbhzyp6wfmd4y",
"type": "tarball",
"url": "https://github.com/servo/servo/archive/a2b76b0169bbc7691fe2dc173ce7948ee1aca876.tar.gz",
"url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
},
"talent-plan": {
"branch": "master",
"description": "PingCAP training courses",

View file

@ -1,6 +1,7 @@
args: {
agent-rs = import ./agent-rs args;
lorri = import ./lorri args;
nushell = import ./nushell args;
ripgrep-all = import ./ripgrep-all args;
rustlings = import ./rustlings args;
talent-plan = import ./talent-plan args;

View file

@ -0,0 +1,29 @@
{ sources, ... }:
let
pkgs = import sources.nixpkgs {
overlays = [
(import sources.nixpkgs-mozilla)
];
};
toolchain = (pkgs.rustChannelOf {
rustToolchain = "${sources.nushell}/rust-toolchain.toml";
sha256 = "sha256-Zk2rxv6vwKFkTTidgjPm6gDsseVmmljVt201H7zuDkk=";
}).rust;
naersk = pkgs.callPackage ../../../default.nix {
cargo = toolchain;
rustc = toolchain;
};
app = naersk.buildPackage {
src = sources.nushell;
nativeBuildInputs = with pkgs; [ pkg-config ];
buildInputs = with pkgs; [ openssl ];
};
in
pkgs.runCommand "nushell-test"
{
buildInputs = [ app ];
} "nu -c 'echo yes!' && touch $out"