* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Adjust check for rpm/deb archive type
* add additional deb mime type
* update comment
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Adjust check for rpm/deb archive type
* add additional deb mime type
* update comment
* go mod tidy
* update go mod
* go mod tidy
* add comment
* update max depth check to >
* go mod tidy
* rename
* [refactor] - Refactor Archive Handling Logic - Part 4: Non-Archive Data Handling and Cleanup (#2704)
* Handle non-archive data within the DefaultHandler
* make structs and methods private
* Remove non-archive data handling within sources
* Handle non-archive data within the DefaultHandler
* rebase
* Remove non-archive data handling within sources
* add gzip
* move diskbuffered rereader setup into handler pkg
* remove DiskBuffereReader creation logic within sources
* move rewind closer
* reduce log verbosity
* make defaultBufferSize a const
* use correct reader
* address comments
* update test
* [feat] - Add Prometheus Metrics for File Handlers (#2705)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* add metrics for archive depth
* [bug] - Enhanced Archive Handling to Address Interface Constraints (#2710)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* wrap compReader with DiskbufferReader
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* replace diskbuffereader with bufferedfilereader
* updtes
* add metric back
* [bug] - Fix bug and simplify git cat-file command execution and output handling (#2719)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* Allow git cat-file blob to complete before trying to handle the file
* wrap compReader with DiskbufferReader
* Allow git cat-file blob to complete before trying to handle the file
* updates
* revert stuff
* update test
* remove
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* inline
This is a follow-up to #2713 that fixes the strange test error.
As suspected, the failure was caused by additional diffs not being included in the test's expected data.
This fixes#2683. It scans the commit author, committer (which is typically GitHub <noreply@github.com> for GitHub, but can be different), and message.
It also scans Git notes.
When we fail to clone a git repository we log the command output to help with diagnosis. However, this output can include credentials in certain cases (such as certain errors associated with redirects). We don't want to log credentials when this happens.
* Add Display method to SourceUnit and Kind member to the CommonSourceUnit
* Make SourceUnitID return the ID and a kind
These two values together uniquely represent a unit.
This is a follow-up to #1912, which used the headers from the response to determine rate-limiting information, instead of using the values from RateLimitError.Rate. Although that logic seemed solid, I discovered that it did not work in some circumstances. This lead to the "unexpected" path more often than intended, and periodic instances where requests would be made before the ratelimit was refreshed.
Waiting for the sub-command will block until all of `stdout` has been
read. In some cases, we return early due to failed chunking without
reading all of the data, and thus, get stuck waiting for the command to
finish. Closing the pipe will ensure `Wait` does not block on that I/O.
* correctly use the buffered file writer
* use value from source
* reorder fields
* use only the DetectorKey as a map field
* address comments and use factory function
* fix optional params
* remove commented out code
* Write large diffs to tmp files
* address comments
* Move bufferedfilewriter to own pkg
* update test
* swallow write err
* use buffer pool
* use size vs len
* use interface
* fix test
* update comments
* fix test
* remove unused
* remove
* remove unused
* move parser and commit struct closer to where they are used
* linter change
* add more kvp pairs to error
* fix test
* update
* address comments
* remove bufferedfile writer
* address comments
* adjust interface
* fix finalize
* address comments
* lint
* remove guard
* fix
* add TODO
* add tempfile creation
- break PID retrieval into sep. function
* add tmpfile cleanup func
* add file cleanup to main cleanup func
* refactor file logic to only return name string
* add temp buffer naming to gcs
* add temp buffer naming to s3
* add temp buffer naming to filesystem
* add temp buffer naming to git
* consolidate cleanup functions
- have single function handle both files and dirs
- remove interface(not needed with a single func implementation)
- change calls to `New(...)` to reflect config implementation
- simplify automation in main.go
- update disk-buffer-reader dependency
* integrate changes from pr #2133
* merge main
* checkout from main to revert conflict issues
* re-add buffer logic to git
* interface no longer needed
* move string format to global const
---------
Co-authored-by: Ahrav Dutta <ahrav.dutta@trufflesec.com>
ChunkReporter is more flexible and will allow code reuse for unit
chunking. ChanReporter was added as a way to maintain the original
channel functionality, so this PR should not alter existing behavior.
* adds func to get scannerPIDs
* add cleanup and call to get pids
* move pid handling to git module
* remove PID logic from main
* refactor testing code to handle different exec name
* cleanup linting errors
* add better logging, fix dir if clause
* some PR fixups
* mod fixup
* add interfaces for helper funcs
* refactor cleanup into main, getPID into git
* lint and test fixups, remove fail on n<2 pids
* simplify pid sorting
* use filepath.Join
* use Args[0] for exec name, fix logger
* formatting fixup
* move functionality into cleantemp pkg
* go mod fixup
* remove redundant testing comment
* fix go.sum issues
* add 15m ticker loop for cleanup
* enclose ticker in function for goroutine defer
fix cleantemp interface
* make time more readable
* add check for non-local Trufflehog PIDs
* allow deletion even if no non-local pids found
* bundle intial cleanup into runCleanup func
* add explicit regex check for tempdir format
The previous implementation used int64 for both, which can be mixed up
easily. Using distinct types adds a layer of type safety checked by the
compiler.
This PR implements validation of Gitlab source configuration.
I was hoping to be able to unify more of the implementation of Validate and Chunks, but there was more divergence than I expected. Specifically, Chunks handles a fair number of Gitlab errors that aren't configuration errors (e.g. "Gitlab returned a repo with an unparseable URL"). Accommodating these in the Validate code path felt wrong, and I wasn't able to create a common code path that could accommodate both Validate and Chunks without looking awful.
* Add common chunker.
* add comment.
* use better config name.
* Add common chunk reader to s3.
* Add common chunk reader to git, gcs, circleci.
* revert gcs.
* revert gcs.
* fix chunker.
* revert gcs.
* update cancellablewrite.
* revert impl.
* update to remove totalsize.
* Fix my goof.
* Use unified struct in chunkreader.
* return err instead of logging and returning.
* rename error to err.
* only send single ChunkResult even if there is an error and chunkBytes.
* fix logic.
* feat: initial support for bare repositories
* feat: use concatenation instead of formatting and os.Getenv instead of os.Environ
Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
* fix: go-git update with pre-receive hooks fix
Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
* fix: remove info about pre-receive hook from README.md for now
Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
* fix: don't scan staged while using --bare option, fixes to make it work with the latest master
Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
* fix: small refactor according to #1518
Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
---------
Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
* Refactor git source to allow ScanOptions and use source in engine
Refactor the Chunks method of the git Source to call out to two helper
methods: scanRepos and scanDirs which scans s.conn.Repositories and
s.conn.Directories respectively. The only notable change in behavior is
that a credential is no longer necessary if there are no
s.conn.Repositories to scan.
* Preserve ScanGit functionality of not cleaning up temporary files