* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* Handle non-archive data within the DefaultHandler
* make structs and methods private
* Remove non-archive data handling within sources
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Handle non-archive data within the DefaultHandler
* rebase
* Remove non-archive data handling within sources
* Adjust check for rpm/deb archive type
* add additional deb mime type
* add gzip
* move diskbuffered rereader setup into handler pkg
* remove DiskBuffereReader creation logic within sources
* update comment
* move rewind closer
* reduce log verbosity
* add metrics for file handling
* add metrics for errors
* make defaultBufferSize a const
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* Allow git cat-file blob to complete before trying to handle the file
* wrap compReader with DiskbufferReader
* Allow git cat-file blob to complete before trying to handle the file
* updates
* use buffer writer
* update
* refactor
* update context pkg
* revert stuff
* update test
* fix test
* remove
* use correct reader
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* Update write method in contentWriter interface
* Add bufferReadSeekCloser
* update name
* update comment
* fix lint
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Handle non-archive data within the DefaultHandler
* rebase
* Remove non-archive data handling within sources
* Handle non-archive data within the DefaultHandler
* add gzip
* move diskbuffered rereader setup into handler pkg
* remove DiskBuffereReader creation logic within sources
* update comment
* move rewind closer
* reduce log verbosity
* make defaultBufferSize a const
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* wrap compReader with DiskbufferReader
* Allow git cat-file blob to complete before trying to handle the file
* updates
* use buffer writer
* update
* refactor
* update context pkg
* revert stuff
* update test
* remove
* rebase
* go mod tidy
* lint check
* update metric to ms
* update metric
* update comments
* dont use ptr
* update
* fix
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Adjust check for rpm/deb archive type
* add additional deb mime type
* update comment
* go mod tidy
* update go mod
* Add a buffered file reader
* update comments
* use Buffered File Readder
* return buffer
* update
* fix
* return
* go mod tidy
* merge
* use a shared pool
* use sync.Once
* reorganzie
* remove unused code
* fix double init
* fix stuff
* nil check
* reduce allocations
* updates
* update metrics
* updates
* reset buffer instead of putting it back
* skip binaries
* skip
* concurrently process diffs
* close chan
* concurrently enumerate orgs
* increase workers
* ignore pbix and vsdx files
* add metrics for gitparse's Diffchan
* fix metric
* update metrics
* update
* fix checks
* fix
* inc
* update
* reduce
* Create workers to handle binary files
* modify workers
* updates
* add check
* delete code
* use custom reader
* rename struct
* add nonarchive handler
* fix break
* add comments
* add tests
* refactor
* remove log
* do not scan rpm links
* simplify
* rename var
* rename
* fix benchmark
* add buffer
* buffer
* buffer
* handle panic
* merge main
* merge main
* add recover
* revert stuff
* revert
* revert to using reader
* fixes
* remove
* update
* fixes
* linter
* fix test
* fix comment
* update field name
* fix
This automated test used to run with the real GitLab detectors because they were versioned. However, the test doesn't need real detectors to actually validate the functionality in question, and relying on real detectors means that we're susceptible to token expiration, which we recently discovered when it happened. The test has been updated to use fake detectors (which means it can run correctly in the community suite as well now.)
* Response structure added for service api of Twilio.
added two response fields in extra data:
1) friendly_name
2) account_sid
* mark credentials verified for non-fatal errors.
also check for atleast one service in response before extracting metadata.
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Adjust check for rpm/deb archive type
* add additional deb mime type
* update comment
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Adjust check for rpm/deb archive type
* add additional deb mime type
* update comment
* go mod tidy
* update go mod
* go mod tidy
* add comment
* update max depth check to >
* go mod tidy
* rename
* [refactor] - Refactor Archive Handling Logic - Part 4: Non-Archive Data Handling and Cleanup (#2704)
* Handle non-archive data within the DefaultHandler
* make structs and methods private
* Remove non-archive data handling within sources
* Handle non-archive data within the DefaultHandler
* rebase
* Remove non-archive data handling within sources
* add gzip
* move diskbuffered rereader setup into handler pkg
* remove DiskBuffereReader creation logic within sources
* move rewind closer
* reduce log verbosity
* make defaultBufferSize a const
* use correct reader
* address comments
* update test
* [feat] - Add Prometheus Metrics for File Handlers (#2705)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* add metrics for archive depth
* [bug] - Enhanced Archive Handling to Address Interface Constraints (#2710)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* wrap compReader with DiskbufferReader
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* replace diskbuffereader with bufferedfilereader
* updtes
* add metric back
* [bug] - Fix bug and simplify git cat-file command execution and output handling (#2719)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* Allow git cat-file blob to complete before trying to handle the file
* wrap compReader with DiskbufferReader
* Allow git cat-file blob to complete before trying to handle the file
* updates
* revert stuff
* update test
* remove
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* inline
This test has a race condition. This change makes it less likely to cause a test failure, and is a stopgap measure to de-flake the test while we investigate the underlying issue.
* Update rabbitmq.go regex detect amqps protocol
Old one couldn't detect amqps:// connection string, and only the amqp://
* [Revised] Update rabbitmq.go regex detect amqps protocol
Co-authored-by: Richard Gomez <32133502+rgmz@users.noreply.github.com>
---------
Co-authored-by: Richard Gomez <32133502+rgmz@users.noreply.github.com>
This PR:
Creates an optional interface that detectors can use to customize their false positive detection
Implements this interface on detectors that have custom logic
In most cases this "custom logic" is simply a no-op because the detector does not participate in false positive detection
Eliminates inline (old-style) false positive exclusion in a few detectors that #2643 missed
This is a follow-up to #2713 that fixes the strange test error.
As suspected, the failure was caused by additional diffs not being included in the test's expected data.
This fixes#2683. It scans the commit author, committer (which is typically GitHub <noreply@github.com> for GitHub, but can be different), and message.
It also scans Git notes.
This PR adds false positive information to the Result protobuf message in anticipation of us tracking it as first-class secret metadata. We're not doing that yet (it's blocked behind #2643) but setting up the messages now means we'll be able to do it later with less of a code delta.
This PR modifies the GitLab source:
* emits a new "groups enumerated" metric
* logs more information about group enumeration
* emits the repo enumeration metric inside getAllProjectRepos, which means it will work when units are flipped on
* emits the repo enumeration metric more granularly
This is a follow-up to #2379.
It fixes the following issues:
GitHub API calls missing rate-limit handling
The fix for Refactor GitHub source #2379 (comment) inadvertently resulting in duplicate API calls
* MaxMind detector uses the right endpoint
The endpoint that the current detector uses fails in validating the license key as some license keys do not have permissions to the geoip API. This commit is to make the detector use the right endpoint https://dev.maxmind.com/license-key-validation-api
* Remove RawV2
* Remove trimspace and extra if branch
* Add the proper tests
* Use SetVerificationError
* Add SetVerificationError
* update tests
---------
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
* pull out verification logic from github detectors
* deduplicate verify github logic
* pull out nil check
* return nil instead of empty struct
* skip gh old test bc we can't make new tokens
This is a follow-up to #2107 and #2335. It adds a new (hidden) --results flag that allows a user to show any combination of verified, unverified, and indeterminate secrets.
* verify canaries against SNS; get ARN
* clean comments
* Update tests and logic
* added test for invalid canary secret
* added verify logic for canaries
* go mod tidy
---------
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
When we fail to clone a git repository we log the command output to help with diagnosis. However, this output can include credentials in certain cases (such as certain errors associated with redirects). We don't want to log credentials when this happens.
This PR adds the ability to exclude buckets from S3 scans. The capability is pretty rudimentary right now, and does not support globbing. If both lists are specified the source to fail to initialize.
* JDBC test and parsing improvements
- Uses net/url for more robust URI parsing
- Supports common JDBC formats for MySQL
- Supports URI format for MSSQL
- Uses allowlist for params across all drivers
- Uses testcontainers-go for integration testing - much faster, more robust, no port collisions
- Uses gofakeit for random data (db, user, password) generation in integration tests
- Adds connection timeouts
- Use Microsoft's driver for MSSQL
* go mod tidy
* Add Display method to SourceUnit and Kind member to the CommonSourceUnit
* Make SourceUnitID return the ID and a kind
These two values together uniquely represent a unit.
* Add flag to write job reports to disk
* Fix nil pointer / non-nil interface bug
* Synchronize job report writer goroutine
* Log when the report has been written
* Implement SourceUnitEnumChunker for GitLab
* Add GitLab engine integration test
* Use a SliceReporter instead of checking for nil reporters
* Use more generic VisitorReporter
* Merge logic from getReposFromGitlab into getAllProjectRepos
* Update integration test to have a lower bound
Unfortunately, the GitLab integration test does not appear to be
deterministic. Sometimes 36390 chunks are found, sometimes 36312, or
even lower.
* Refactor UnitHook to block the scan if finished metrics aren't handled
* Log once when back-pressure is detected
* Add hook channel size metric
* Use plural "metrics" for consistency
* Replace LRU cache with map
* use diff chan
* correctly use the buffered file writer
* use value from source
* reorder fields
* add tests and update
* Fix issue with buffer slices growing
* fix test
* correctly use the buffered file writer
* use value from source
* reorder fields
* fix
* add singleton
* use shared pool
* optimize
* rename and cleanup
* add metrics
* add print
* rebase
* remove extra inc
* add metrics for checkout time
* add comment
* use microseconds
* add metrics
* add metrics pkg
* add more metrics
* rever test
* remove fields
* fix
* resize and return
* update metric name
* remove comment
* address comments
* add comment
This is a follow-up to #1912, which used the headers from the response to determine rate-limiting information, instead of using the values from RateLimitError.Rate. Although that logic seemed solid, I discovered that it did not work in some circumstances. This lead to the "unexpected" path more often than intended, and periodic instances where requests would be made before the ratelimit was refreshed.
* correctly use the buffered file writer
* use value from source
* reorder fields
* use only the DetectorKey as a map field
* correctly use the buffered file writer
* use value from source
* reorder fields
* add tests and update
* Fix issue with buffer slices growing
* fix test
* fix
* add singleton
* use shared pool
* optimize
* rename and cleanup
* use correct calculation to grow buffer
* only grow if needed
* address comments
* remove unused
* remove
* rip out Grow
* address coment
* use 2k default buffer
* update comment allow large buffers to be garbage collected
Waiting for the sub-command will block until all of `stdout` has been
read. In some cases, we return early due to failed chunking without
reading all of the data, and thus, get stuck waiting for the command to
finish. Closing the pipe will ensure `Wait` does not block on that I/O.
* correctly use the buffered file writer
* use value from source
* reorder fields
* use only the DetectorKey as a map field
* address comments and use factory function
* fix optional params
* remove commented out code
* draft reverify chunks
* remove
* remove
* reduce dupe map cap
* do not verify chunk
* cli arg and use val for dupe lut
* remove counter
* skipp empty results]
* working on test and normalizing val for comparison
* forgot to save file
* optimize normalize
* reuse map
* remove print
* use levenshtein distance to check dupes
* forgot to leave in emptying map
* use slice
* small tweak
* comment
* use bytes
* praise
* use ctx logger
* add len check
* add comments
* use 8x concurrency for reverifier workers
* revert worker count
* use more workers
* process result directly for any collisions
* continue after decoder match for reverifying
* use map
* use map
* otimization and fix the bug.
* revert worker count
* better option naming
* handle identical secrets in chunks
* update comment
* update comment
* fix test
* use DetecotrKey
* rm out of scope tests and testdata
* rename all reverification elements
* don't re-write map entry
* use correct key
* rename worker, remove log val
* test likelydupe, add eq detector check in loop
* add test
* add comment
* add test
* Set verification error
* Update tests
---------
Co-authored-by: Zachary Rice <zachary.rice@trufflesec.com>
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
* Write large diffs to tmp files
* address comments
* Move bufferedfilewriter to own pkg
* update test
* swallow write err
* use buffer pool
* use size vs len
* use interface
* fix test
* update comments
* fix test
* remove unused
* remove
* remove unused
* move parser and commit struct closer to where they are used
* linter change
* add more kvp pairs to error
* fix test
* update
* address comments
* remove bufferedfile writer
* address comments
* adjust interface
* fix finalize
* address comments
* lint
* remove guard
* fix
* add TODO
* updating alibaba
* updating agora
* updating aeroworkflow
* updating aha
* updating artifactory
* updating abbysale
* updating abstract
* updating abuseipdb
* updating accuweather
* updating adafruitio
* updating adzuna
* cleanup on abuseipdb
* cleanup on aha
* cleanup on abuseipdb
* cleanup on aeroworkflow
* cleanup on adzuna
* cleanup on accuweather
* cleanup/refactor
* update token pattern to be explicitly 73char (old) or 64char (new)
* comment to clarify 403 on Aha
* mocking out verified case for aha + adding inactive account test
* using contact response instead of gock
* update 403 to be determinate
* added azurefunctionkey detector
* update raw field to include url
* clean up and added prefix on key pattern
* update bench script
* update imports, snifftest, and gen proto
---------
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
* added azuredevopspersonalaccesstoken detector
* fix comment
* update raw field to include all parts of the credential
---------
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
* Walk directories in filesystem source enumeration
* Ignore all directories instead of just the root
* Fix bug with multiple directories
* Skip filesystem TestEnumerate
* Update filesystem enumeration test to create files and folders
* Extend memory cache to allow for configuring custom expiration and purge interval
* use any for value type
* fix test
* fix test
* address comments
* address
* make new construct more clear
* reduce duplication
* fix test
The source manager attaches some context keys, but in certain circumstances, they're already present, resulting in duplicate keys. This PR changes the attachment to be conditional. It also adds some new log messages to track source startup progress.
* Update metabase verification to check for a valid JSON response
* added test tokens + cleanup
---------
Co-authored-by: ahmed <ahmed.zahran@trufflesec.com>
* add tempfile creation
- break PID retrieval into sep. function
* add tmpfile cleanup func
* add file cleanup to main cleanup func
* refactor file logic to only return name string
* add temp buffer naming to gcs
* add temp buffer naming to s3
* add temp buffer naming to filesystem
* add temp buffer naming to git
* consolidate cleanup functions
- have single function handle both files and dirs
- remove interface(not needed with a single func implementation)
- change calls to `New(...)` to reflect config implementation
- simplify automation in main.go
- update disk-buffer-reader dependency
* integrate changes from pr #2133
* merge main
* checkout from main to revert conflict issues
* re-add buffer logic to git
* interface no longer needed
* move string format to global const
---------
Co-authored-by: Ahrav Dutta <ahrav.dutta@trufflesec.com>
* add rotation guides to SlackWebhook tests
* begin cleaning up tests
* have slack webhook detector use malformed json
* update test secrets
---------
Co-authored-by: Ahrav Dutta <ahrav.dutta@trufflesec.com>