* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* Handle non-archive data within the DefaultHandler
* make structs and methods private
* Remove non-archive data handling within sources
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Handle non-archive data within the DefaultHandler
* rebase
* Remove non-archive data handling within sources
* Adjust check for rpm/deb archive type
* add additional deb mime type
* add gzip
* move diskbuffered rereader setup into handler pkg
* remove DiskBuffereReader creation logic within sources
* update comment
* move rewind closer
* reduce log verbosity
* add metrics for file handling
* add metrics for errors
* make defaultBufferSize a const
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* Allow git cat-file blob to complete before trying to handle the file
* wrap compReader with DiskbufferReader
* Allow git cat-file blob to complete before trying to handle the file
* updates
* use buffer writer
* update
* refactor
* update context pkg
* revert stuff
* update test
* fix test
* remove
* use correct reader
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* Update write method in contentWriter interface
* Add bufferReadSeekCloser
* update name
* update comment
* fix lint
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Handle non-archive data within the DefaultHandler
* rebase
* Remove non-archive data handling within sources
* Handle non-archive data within the DefaultHandler
* add gzip
* move diskbuffered rereader setup into handler pkg
* remove DiskBuffereReader creation logic within sources
* update comment
* move rewind closer
* reduce log verbosity
* make defaultBufferSize a const
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* wrap compReader with DiskbufferReader
* Allow git cat-file blob to complete before trying to handle the file
* updates
* use buffer writer
* update
* refactor
* update context pkg
* revert stuff
* update test
* remove
* rebase
* go mod tidy
* lint check
* update metric to ms
* update metric
* update comments
* dont use ptr
* update
* fix
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Adjust check for rpm/deb archive type
* add additional deb mime type
* update comment
* go mod tidy
* update go mod
* Add a buffered file reader
* update comments
* use Buffered File Readder
* return buffer
* update
* fix
* return
* go mod tidy
* merge
* use a shared pool
* use sync.Once
* reorganzie
* remove unused code
* fix double init
* fix stuff
* nil check
* reduce allocations
* updates
* update metrics
* updates
* reset buffer instead of putting it back
* skip binaries
* skip
* concurrently process diffs
* close chan
* concurrently enumerate orgs
* increase workers
* ignore pbix and vsdx files
* add metrics for gitparse's Diffchan
* fix metric
* update metrics
* update
* fix checks
* fix
* inc
* update
* reduce
* Create workers to handle binary files
* modify workers
* updates
* add check
* delete code
* use custom reader
* rename struct
* add nonarchive handler
* fix break
* add comments
* add tests
* refactor
* remove log
* do not scan rpm links
* simplify
* rename var
* rename
* fix benchmark
* add buffer
* buffer
* buffer
* handle panic
* merge main
* merge main
* add recover
* revert stuff
* revert
* revert to using reader
* fixes
* remove
* update
* fixes
* linter
* fix test
* fix comment
* update field name
* fix
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Adjust check for rpm/deb archive type
* add additional deb mime type
* update comment
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Adjust check for rpm/deb archive type
* add additional deb mime type
* update comment
* go mod tidy
* update go mod
* go mod tidy
* add comment
* update max depth check to >
* go mod tidy
* rename
* [refactor] - Refactor Archive Handling Logic - Part 4: Non-Archive Data Handling and Cleanup (#2704)
* Handle non-archive data within the DefaultHandler
* make structs and methods private
* Remove non-archive data handling within sources
* Handle non-archive data within the DefaultHandler
* rebase
* Remove non-archive data handling within sources
* add gzip
* move diskbuffered rereader setup into handler pkg
* remove DiskBuffereReader creation logic within sources
* move rewind closer
* reduce log verbosity
* make defaultBufferSize a const
* use correct reader
* address comments
* update test
* [feat] - Add Prometheus Metrics for File Handlers (#2705)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* add metrics for archive depth
* [bug] - Enhanced Archive Handling to Address Interface Constraints (#2710)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* wrap compReader with DiskbufferReader
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* replace diskbuffereader with bufferedfilereader
* updtes
* add metric back
* [bug] - Fix bug and simplify git cat-file command execution and output handling (#2719)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* Allow git cat-file blob to complete before trying to handle the file
* wrap compReader with DiskbufferReader
* Allow git cat-file blob to complete before trying to handle the file
* updates
* revert stuff
* update test
* remove
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* inline
This PR adds the ability to exclude buckets from S3 scans. The capability is pretty rudimentary right now, and does not support globbing. If both lists are specified the source to fail to initialize.
* add tempfile creation
- break PID retrieval into sep. function
* add tmpfile cleanup func
* add file cleanup to main cleanup func
* refactor file logic to only return name string
* add temp buffer naming to gcs
* add temp buffer naming to s3
* add temp buffer naming to filesystem
* add temp buffer naming to git
* consolidate cleanup functions
- have single function handle both files and dirs
- remove interface(not needed with a single func implementation)
- change calls to `New(...)` to reflect config implementation
- simplify automation in main.go
- update disk-buffer-reader dependency
* integrate changes from pr #2133
* merge main
* checkout from main to revert conflict issues
* re-add buffer logic to git
* interface no longer needed
* move string format to global const
---------
Co-authored-by: Ahrav Dutta <ahrav.dutta@trufflesec.com>
ChunkReporter is more flexible and will allow code reuse for unit
chunking. ChanReporter was added as a way to maintain the original
channel functionality, so this PR should not alter existing behavior.
This PR updates the S3 source to use explicitly configured credentials if they're available and follow the normal AWS credentials waterfall if they're not. This is irrespective of whether role assumption is configured. This changes the previous behavior, which was to use waterfall credentials only if role assumption was configured and explicitly configured credentials only when it was not.
The previous implementation used int64 for both, which can be mixed up
easily. Using distinct types adds a layer of type safety checked by the
compiler.
This PR unifies some code paths within the S3 source. This is being done to better support a future implementation of S3 source validation; less code that runs means less code to validate. The logical change is to move the handling of "role-less" operation down the call tree, which allows for a single code path for more of the S3 code.
This PR also fixes a bug that would occur in the (rare) case that the source couldn't create a regional S3 client. Before, an error would be logged, but it would be followed by a panic. Now the bucket in question is skipped.
* add role assumption for s3 source
* refactor role assumption to repeatable string
user can pass array of roles to assume
* refactor s3 chunks to handle passed roleARNs
* add role-session name
use timestamp to make dynamic
* add docstring for rolearn strings()
* make sure role ars are passed into source
* refactor role assumption functionality
break s3 bucket scanning into sep. function
* add log check on assume role
* fix role iteration
- Make sure s3 struct is populated with roles
- add separate new client instantiation for role-based access
- iterates through each role
* add comment
* protobuf revert for merge
* re-run make proto
* lint cleanup
* cleanup TODOs
* drop redundant switch case in assumerole client
* use less verbose 'ctx' designator
* breakout functionality from Chunks
- separate functions for:
- enumerating buckets to scan
- scanning objects within the buckets
* remake protobuf defs
* allow scan to continue on single bucket err
* add readme docs
* minor fixups
* Add common chunker.
* add comment.
* use better config name.
* Add common chunk reader to s3.
* Add common chunk reader to git, gcs, circleci.
* revert gcs.
* revert gcs.
* fix chunker.
* revert gcs.
* update cancellablewrite.
* revert impl.
* update to remove totalsize.
* Fix my goof.
* Use unified struct in chunkreader.
* return err instead of logging and returning.
* rename error to err.
* only send single ChunkResult even if there is an error and chunkBytes.
* fix logic.
* Exit with non-zero exit code on chunk source error
* Exit with a non-zero exit code whenever we hit an error getting
chunks. Previously the error would be logged but trufflehog would exit
with a 0 (success) status code.
* fix gcs test
---------
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
Co-authored-by: ahrav <ahravdutta02@gmail.com>
* Implement CommonSourceUnitUnmarshaller
* Add SourceUnitUnmarshaller to all sources using
All sources, with the exception of git, will use the CommonSourceUnit as
they only contain a single type of unit to scan.
* Fix method comments to adhere to Go's style guide
* Resolve#1167 by adding support for the AWS_SESSION_TOKEN environment variable and adding a --session-token cli arg
* fix error message
---------
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
* Handle errors w/ github source.
* Fix loop var captured by func literal.
* Fix loop var captured by func literal.
* Set completed progress if the scan completes with no errors.
* Set progress to 100% if the scope and iteration are both 0.
* Fix commentary.
* Fix test.
* Return after the defer to os.RemoveAll.
* Fix unauth scan.
* Inline range loop.
* update tests for partial scan completion with errors. Ensure correct progress is set.
* Update progress for all sources.
* Update github test.
* Address comments.
* Use a config struct when scanning and engine source.
* fix tests.
* Move test_helpers to the sources pkg.
* Handle ScanGit error in tests.
* adderss comments.
* Use functional options.
* Remove temp var.
* Add better var names for the setup functions for each config.
* Remove unused var.
* fix error logs.
* fix error logs.
* single line.
* remove blank lines.