Start developer documentation (#746)

* draft outline for developing docs Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * update outline Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * list testing dependencies Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * fix header indention Signed-off-by: Alex Goodman <alex.goodman@anchore.com> * fix title Signed-off-by: Alex Goodman <alex.goodman@anchore.com>
2024-11-10 06:14:16 +00:00 · 2022-01-14 06:17:25 -05:00 · 2022-01-14 06:17:25 -05:00 · 86c3c1c531
commit 86c3c1c531
parent f59af255e3
1 changed files with 179 additions and 0 deletions
--- a/DEVELOPING.md
+++ b/DEVELOPING.md
@ -0,0 +1,179 @@
+# Developing
+
+## Getting started
+
+In order to test and develop in this repo you will need the following dependencies installed:
+- docker
+- make
+
+After cloning do the following:
+1. run `make bootstrap` to download go mod dependencies, create the `/.tmp` dir, and download helper utilities.
+2. run `make` to run linting, tests, and other verifications to make certain everything is working alright.
+
+Checkout `make help` to see what other actions you can take.
+
+The main make tasks for common static analysis and testing are `lint`, `lint-fix`, `unit`, `integration`, and `cli`.
+
+## Levels of testing
+
+- `unit`: The default level of test which is distributed throughout the repo are unit tests. Any `_test.go` file that 
+  does not reside somewhere within the `/test` directory is a unit test. Other forms of testing should be organized in 
+  the `/test` directory. These tests should focus on correctness of functionality in depth. % Test coverage metrics 
+  only considers unit tests and no other forms of testing.
+
+- `integration`: located within `test/integration`, these tests focus on the behavior surfaced by the common library 
+  entrypoints from the `syft` package and make light assertions about the results surfaced. Additionally, these tests
+  tend to make diversity assertions for enum-like objects, ensuring that as enum values are added to a definition
+  that integration tests will automatically fail if no test attempts to use that enum value. For more details see 
+  the "Data diversity and freshness assertions" section below.
+
+- `cli`: located with in `test/cli`, these are tests that test the correctness of application behavior from a 
+  snapshot build. This should be used in cases where a unit or integration test will not do or if you are looking
+  for in-depth testing of code in the `cmd/` package (such as testing the proper behavior of application configuration,
+  CLI switches, and glue code before syft library calls).
+
+- `acceptance`: located within `test/acceptance`, these are smoke-like tests that ensure that application packaging
+  and installation works as expected. For example, during release we provide RPM packages as a download artifact. We 
+  also have an accompanying RPM acceptance test that installs the RPM from a snapshot build and ensures the output
+  of a syft invocation matches canned expected output. New acceptance tests should be added for each release artifact
+  and architecture supported (when possible).
+
+### Data diversity and freshness assertions
+
+It is important that tests against the codebase are flexible enough to begin failing when they do not cover "enough"
+of the objects under test. "Cover" in this case does not mean that some percentage of the code has been executed 
+during testing, but instead that there is enough diversity of data input reflected in testing relative to the
+definitions available.
+
+For instance, consider an enum-like value like so:
+```go
+type Language string
+
+const (
+  Java            Language = "java"
+  JavaScript      Language = "javascript"
+  Python          Language = "python"
+  Ruby            Language = "ruby"
+  Go              Language = "go"
+)
+```
+
+Say we have a test that exercises all the languages defined today:
+
+```go
+func TestCatalogPackages(t *testing.T) {
+  testTable := []struct {
+    // ... the set of test cases that test all languages
+  }
+  for _, test := range cases {
+    t.Run(test.name, func (t *testing.T) {
+      // use inputFixturePath and assert that syft.CatalogPackages() returns the set of expected Package objects
+      // ...
+    })
+  }
+}
+```
+
+Where each test case has a `inputFixturePath` that would result with packages from each language. This test is
+brittle since it does not assert that all languages were exercised directly and future modifications (such as 
+adding a new language) won't be covered by any test cases.
+
+To address this the enum-like object should have a definition of all objects that can be used in testing:
+
+```go
+type Language string
+
+// const( Java Language = ..., ... )
+
+var AllLanguages = []Language{
+	Java,
+	JavaScript,
+	Python,
+	Ruby,
+	Go,
+	Rust,
+}
+```
+
+Allowing testing to automatically fail when adding a new language:
+
+```go
+func TestCatalogPackages(t *testing.T) {
+  testTable := []struct {
+  	// ... the set of test cases that (hopefully) covers all languages
+  }
+
+  // new stuff...
+  observedLanguages := strset.New()
+  
+  for _, test := range cases {
+    t.Run(test.name, func (t *testing.T) {
+      // use inputFixturePath and assert that syft.CatalogPackages() returns the set of expected Package objects
+    	// ...
+    	
+    	// new stuff...
+    	for _, actualPkg := range actual {
+        observedLanguages.Add(string(actualPkg.Language))
+    	}
+    	
+    })
+  }
+
+   // new stuff...
+  for _, expectedLanguage := range pkg.AllLanguages {
+    if 	!observedLanguages.Contains(expectedLanguage) {
+      t.Errorf("failed to test language=%q", expectedLanguage)	
+    }
+  }
+}
+```
+
+This is a better test since it will fail when someone adds a new language but fails to write a test case that should
+exercise that new language. This method is ideal for integration-level testing, where testing correctness in depth 
+is not needed (that is what unit tests are for) but instead testing in breadth to ensure that units are well integrated.
+
+A similar case can be made for data freshness; if the quality of the results will be diminished if the input data
+is not kept up to date then a test should be written (when possible) to assert any input data is not stale.
+
+An example of this is the static list of licenses that is stored in `internal/spdxlicense` for use by the SPDX 
+presenters. This list is updated and published periodically by an external group and syft can grab and update this
+list by running `go generate ./...` from the root of the repo.
+
+An integration test has been written to grabs the latest license list version externally and compares that version
+with the version generated in the codebase. If they differ, the test fails, indicating to someone that there is an
+action needed to update it.
+
+**_The key takeaway is to try and write tests that fail when data assumptions change and not just when code changes.**_
+
+### Snapshot tests
+
+The format objects make a lot of use of "snapshot" testing, where you save the expected output bytes from a call into the
+git repository and during testing make a comparison of the actual bytes from the subject under test with the golden
+copy saved in the repo. The "golden" files are stored in the `test-fixtures/snapshot` directory relative to the go 
+package under test and should always be updated by invoking `go test` on the specific test file with a specific CLI 
+update flag provided.
+
+Many of the `Format` tests make use of this approach, where the raw SBOM report is saved in the repo and the test 
+compares that SBOM with what is generated from the latest presenter code. For instance, at the time of this writing 
+the CycloneDX presenter snapshots can be updated by running:
+
+```bash
+go test ./internal/formats -update-cyclonedx
+```
+
+These flags are defined at the top of the test files that have tests that use the snapshot files.
+
+Snapshot testing is only as good as the manual verification of the golden snapshot file saved to the repo! Be careful 
+and diligent when updating these files.
+
+## Architecture
+
+TODO: outline:
+- analysis creates a static SBOM which can be encoded and decoded.
+- format objects, should strive to not add or enrich data in encoding that could otherwise be done during analysis
+- pkg.Catalogers
+- file catalogers
+- source.Source
+- file.Resolvers
+- logger abstraction 
+- events / bus abstraction