* Add flag to write job reports to disk
* Fix nil pointer / non-nil interface bug
* Synchronize job report writer goroutine
* Log when the report has been written
* Implement SourceUnitEnumChunker for GitLab
* Add GitLab engine integration test
* Use a SliceReporter instead of checking for nil reporters
* Use more generic VisitorReporter
* Merge logic from getReposFromGitlab into getAllProjectRepos
* Update integration test to have a lower bound
Unfortunately, the GitLab integration test does not appear to be
deterministic. Sometimes 36390 chunks are found, sometimes 36312, or
even lower.
* Refactor UnitHook to block the scan if finished metrics aren't handled
* Log once when back-pressure is detected
* Add hook channel size metric
* Use plural "metrics" for consistency
* Replace LRU cache with map
* use diff chan
* correctly use the buffered file writer
* use value from source
* reorder fields
* add tests and update
* Fix issue with buffer slices growing
* fix test
* correctly use the buffered file writer
* use value from source
* reorder fields
* fix
* add singleton
* use shared pool
* optimize
* rename and cleanup
* add metrics
* add print
* rebase
* remove extra inc
* add metrics for checkout time
* add comment
* use microseconds
* add metrics
* add metrics pkg
* add more metrics
* rever test
* remove fields
* fix
* resize and return
* update metric name
* remove comment
* address comments
* add comment
This is a follow-up to #1912, which used the headers from the response to determine rate-limiting information, instead of using the values from RateLimitError.Rate. Although that logic seemed solid, I discovered that it did not work in some circumstances. This lead to the "unexpected" path more often than intended, and periodic instances where requests would be made before the ratelimit was refreshed.
* correctly use the buffered file writer
* use value from source
* reorder fields
* use only the DetectorKey as a map field
* correctly use the buffered file writer
* use value from source
* reorder fields
* add tests and update
* Fix issue with buffer slices growing
* fix test
* fix
* add singleton
* use shared pool
* optimize
* rename and cleanup
* use correct calculation to grow buffer
* only grow if needed
* address comments
* remove unused
* remove
* rip out Grow
* address coment
* use 2k default buffer
* update comment allow large buffers to be garbage collected
Waiting for the sub-command will block until all of `stdout` has been
read. In some cases, we return early due to failed chunking without
reading all of the data, and thus, get stuck waiting for the command to
finish. Closing the pipe will ensure `Wait` does not block on that I/O.
* correctly use the buffered file writer
* use value from source
* reorder fields
* use only the DetectorKey as a map field
* address comments and use factory function
* fix optional params
* remove commented out code
* draft reverify chunks
* remove
* remove
* reduce dupe map cap
* do not verify chunk
* cli arg and use val for dupe lut
* remove counter
* skipp empty results]
* working on test and normalizing val for comparison
* forgot to save file
* optimize normalize
* reuse map
* remove print
* use levenshtein distance to check dupes
* forgot to leave in emptying map
* use slice
* small tweak
* comment
* use bytes
* praise
* use ctx logger
* add len check
* add comments
* use 8x concurrency for reverifier workers
* revert worker count
* use more workers
* process result directly for any collisions
* continue after decoder match for reverifying
* use map
* use map
* otimization and fix the bug.
* revert worker count
* better option naming
* handle identical secrets in chunks
* update comment
* update comment
* fix test
* use DetecotrKey
* rm out of scope tests and testdata
* rename all reverification elements
* don't re-write map entry
* use correct key
* rename worker, remove log val
* test likelydupe, add eq detector check in loop
* add test
* add comment
* add test
* Set verification error
* Update tests
---------
Co-authored-by: Zachary Rice <zachary.rice@trufflesec.com>
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
* Write large diffs to tmp files
* address comments
* Move bufferedfilewriter to own pkg
* update test
* swallow write err
* use buffer pool
* use size vs len
* use interface
* fix test
* update comments
* fix test
* remove unused
* remove
* remove unused
* move parser and commit struct closer to where they are used
* linter change
* add more kvp pairs to error
* fix test
* update
* address comments
* remove bufferedfile writer
* address comments
* adjust interface
* fix finalize
* address comments
* lint
* remove guard
* fix
* add TODO
* updating alibaba
* updating agora
* updating aeroworkflow
* updating aha
* updating artifactory
* updating abbysale
* updating abstract
* updating abuseipdb
* updating accuweather
* updating adafruitio
* updating adzuna
* cleanup on abuseipdb
* cleanup on aha
* cleanup on abuseipdb
* cleanup on aeroworkflow
* cleanup on adzuna
* cleanup on accuweather
* cleanup/refactor
* update token pattern to be explicitly 73char (old) or 64char (new)
* comment to clarify 403 on Aha
* mocking out verified case for aha + adding inactive account test
* using contact response instead of gock
* update 403 to be determinate
* added azurefunctionkey detector
* update raw field to include url
* clean up and added prefix on key pattern
* update bench script
* update imports, snifftest, and gen proto
---------
Co-authored-by: Dustin Decker <dustin@trufflesec.com>