This PR modifies the GitLab source:
* emits a new "groups enumerated" metric
* logs more information about group enumeration
* emits the repo enumeration metric inside getAllProjectRepos, which means it will work when units are flipped on
* emits the repo enumeration metric more granularly
This is a follow-up to #2379.
It fixes the following issues:
GitHub API calls missing rate-limit handling
The fix for Refactor GitHub source #2379 (comment) inadvertently resulting in duplicate API calls
* MaxMind detector uses the right endpoint
The endpoint that the current detector uses fails in validating the license key as some license keys do not have permissions to the geoip API. This commit is to make the detector use the right endpoint https://dev.maxmind.com/license-key-validation-api
* Remove RawV2
* Remove trimspace and extra if branch
* Add the proper tests
* Use SetVerificationError
* Add SetVerificationError
* update tests
---------
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
* pull out verification logic from github detectors
* deduplicate verify github logic
* pull out nil check
* return nil instead of empty struct
* skip gh old test bc we can't make new tokens
This is a follow-up to #2107 and #2335. It adds a new (hidden) --results flag that allows a user to show any combination of verified, unverified, and indeterminate secrets.
* verify canaries against SNS; get ARN
* clean comments
* Update tests and logic
* added test for invalid canary secret
* added verify logic for canaries
* go mod tidy
---------
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
When we fail to clone a git repository we log the command output to help with diagnosis. However, this output can include credentials in certain cases (such as certain errors associated with redirects). We don't want to log credentials when this happens.
This PR adds the ability to exclude buckets from S3 scans. The capability is pretty rudimentary right now, and does not support globbing. If both lists are specified the source to fail to initialize.
* JDBC test and parsing improvements
- Uses net/url for more robust URI parsing
- Supports common JDBC formats for MySQL
- Supports URI format for MSSQL
- Uses allowlist for params across all drivers
- Uses testcontainers-go for integration testing - much faster, more robust, no port collisions
- Uses gofakeit for random data (db, user, password) generation in integration tests
- Adds connection timeouts
- Use Microsoft's driver for MSSQL
* go mod tidy
* Add Display method to SourceUnit and Kind member to the CommonSourceUnit
* Make SourceUnitID return the ID and a kind
These two values together uniquely represent a unit.
* Add flag to write job reports to disk
* Fix nil pointer / non-nil interface bug
* Synchronize job report writer goroutine
* Log when the report has been written
* Implement SourceUnitEnumChunker for GitLab
* Add GitLab engine integration test
* Use a SliceReporter instead of checking for nil reporters
* Use more generic VisitorReporter
* Merge logic from getReposFromGitlab into getAllProjectRepos
* Update integration test to have a lower bound
Unfortunately, the GitLab integration test does not appear to be
deterministic. Sometimes 36390 chunks are found, sometimes 36312, or
even lower.
* Refactor UnitHook to block the scan if finished metrics aren't handled
* Log once when back-pressure is detected
* Add hook channel size metric
* Use plural "metrics" for consistency
* Replace LRU cache with map
* use diff chan
* correctly use the buffered file writer
* use value from source
* reorder fields
* add tests and update
* Fix issue with buffer slices growing
* fix test
* correctly use the buffered file writer
* use value from source
* reorder fields
* fix
* add singleton
* use shared pool
* optimize
* rename and cleanup
* add metrics
* add print
* rebase
* remove extra inc
* add metrics for checkout time
* add comment
* use microseconds
* add metrics
* add metrics pkg
* add more metrics
* rever test
* remove fields
* fix
* resize and return
* update metric name
* remove comment
* address comments
* add comment
This is a follow-up to #1912, which used the headers from the response to determine rate-limiting information, instead of using the values from RateLimitError.Rate. Although that logic seemed solid, I discovered that it did not work in some circumstances. This lead to the "unexpected" path more often than intended, and periodic instances where requests would be made before the ratelimit was refreshed.
* correctly use the buffered file writer
* use value from source
* reorder fields
* use only the DetectorKey as a map field
* correctly use the buffered file writer
* use value from source
* reorder fields
* add tests and update
* Fix issue with buffer slices growing
* fix test
* fix
* add singleton
* use shared pool
* optimize
* rename and cleanup
* use correct calculation to grow buffer
* only grow if needed
* address comments
* remove unused
* remove
* rip out Grow
* address coment
* use 2k default buffer
* update comment allow large buffers to be garbage collected
Waiting for the sub-command will block until all of `stdout` has been
read. In some cases, we return early due to failed chunking without
reading all of the data, and thus, get stuck waiting for the command to
finish. Closing the pipe will ensure `Wait` does not block on that I/O.
* correctly use the buffered file writer
* use value from source
* reorder fields
* use only the DetectorKey as a map field
* address comments and use factory function
* fix optional params
* remove commented out code
* draft reverify chunks
* remove
* remove
* reduce dupe map cap
* do not verify chunk
* cli arg and use val for dupe lut
* remove counter
* skipp empty results]
* working on test and normalizing val for comparison
* forgot to save file
* optimize normalize
* reuse map
* remove print
* use levenshtein distance to check dupes
* forgot to leave in emptying map
* use slice
* small tweak
* comment
* use bytes
* praise
* use ctx logger
* add len check
* add comments
* use 8x concurrency for reverifier workers
* revert worker count
* use more workers
* process result directly for any collisions
* continue after decoder match for reverifying
* use map
* use map
* otimization and fix the bug.
* revert worker count
* better option naming
* handle identical secrets in chunks
* update comment
* update comment
* fix test
* use DetecotrKey
* rm out of scope tests and testdata
* rename all reverification elements
* don't re-write map entry
* use correct key
* rename worker, remove log val
* test likelydupe, add eq detector check in loop
* add test
* add comment
* add test
* Set verification error
* Update tests
---------
Co-authored-by: Zachary Rice <zachary.rice@trufflesec.com>
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
* Write large diffs to tmp files
* address comments
* Move bufferedfilewriter to own pkg
* update test
* swallow write err
* use buffer pool
* use size vs len
* use interface
* fix test
* update comments
* fix test
* remove unused
* remove
* remove unused
* move parser and commit struct closer to where they are used
* linter change
* add more kvp pairs to error
* fix test
* update
* address comments
* remove bufferedfile writer
* address comments
* adjust interface
* fix finalize
* address comments
* lint
* remove guard
* fix
* add TODO
* updating alibaba
* updating agora
* updating aeroworkflow
* updating aha
* updating artifactory
* updating abbysale
* updating abstract
* updating abuseipdb
* updating accuweather
* updating adafruitio
* updating adzuna
* cleanup on abuseipdb
* cleanup on aha
* cleanup on abuseipdb
* cleanup on aeroworkflow
* cleanup on adzuna
* cleanup on accuweather
* cleanup/refactor
* update token pattern to be explicitly 73char (old) or 64char (new)
* comment to clarify 403 on Aha
* mocking out verified case for aha + adding inactive account test
* using contact response instead of gock
* update 403 to be determinate