* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Adjust check for rpm/deb archive type
* add additional deb mime type
* update comment
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Adjust check for rpm/deb archive type
* add additional deb mime type
* update comment
* go mod tidy
* update go mod
* go mod tidy
* add comment
* update max depth check to >
* go mod tidy
* rename
* [refactor] - Refactor Archive Handling Logic - Part 4: Non-Archive Data Handling and Cleanup (#2704)
* Handle non-archive data within the DefaultHandler
* make structs and methods private
* Remove non-archive data handling within sources
* Handle non-archive data within the DefaultHandler
* rebase
* Remove non-archive data handling within sources
* add gzip
* move diskbuffered rereader setup into handler pkg
* remove DiskBuffereReader creation logic within sources
* move rewind closer
* reduce log verbosity
* make defaultBufferSize a const
* use correct reader
* address comments
* update test
* [feat] - Add Prometheus Metrics for File Handlers (#2705)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* add metrics for archive depth
* [bug] - Enhanced Archive Handling to Address Interface Constraints (#2710)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* wrap compReader with DiskbufferReader
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* replace diskbuffereader with bufferedfilereader
* updtes
* add metric back
* [bug] - Fix bug and simplify git cat-file command execution and output handling (#2719)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* Allow git cat-file blob to complete before trying to handle the file
* wrap compReader with DiskbufferReader
* Allow git cat-file blob to complete before trying to handle the file
* updates
* revert stuff
* update test
* remove
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* inline
* Add Display method to SourceUnit and Kind member to the CommonSourceUnit
* Make SourceUnitID return the ID and a kind
These two values together uniquely represent a unit.
* Walk directories in filesystem source enumeration
* Ignore all directories instead of just the root
* Fix bug with multiple directories
* Skip filesystem TestEnumerate
* Update filesystem enumeration test to create files and folders
* add tempfile creation
- break PID retrieval into sep. function
* add tmpfile cleanup func
* add file cleanup to main cleanup func
* refactor file logic to only return name string
* add temp buffer naming to gcs
* add temp buffer naming to s3
* add temp buffer naming to filesystem
* add temp buffer naming to git
* consolidate cleanup functions
- have single function handle both files and dirs
- remove interface(not needed with a single func implementation)
- change calls to `New(...)` to reflect config implementation
- simplify automation in main.go
- update disk-buffer-reader dependency
* integrate changes from pr #2133
* merge main
* checkout from main to revert conflict issues
* re-add buffer logic to git
* interface no longer needed
* move string format to global const
---------
Co-authored-by: Ahrav Dutta <ahrav.dutta@trufflesec.com>
ChunkReporter is more flexible and will allow code reuse for unit
chunking. ChanReporter was added as a way to maintain the original
channel functionality, so this PR should not alter existing behavior.
The previous implementation used int64 for both, which can be mixed up
easily. Using distinct types adds a layer of type safety checked by the
compiler.
* Add CancellableWrite helper function
* Create SourceUnitEnumerator interface and EnumerationResult struct
* Implement SourceUnitEnumerator for the filesystem Source
* Omit explicit zero values
* Implement CommonSourceUnitUnmarshaller
* Add SourceUnitUnmarshaller to all sources using
All sources, with the exception of git, will use the CommonSourceUnit as
they only contain a single type of unit to scan.
* Fix method comments to adhere to Go's style guide
* Rename directories to paths
* Generate protos
* Add file scanning support to filesystem source
* Add directories back to filesystem proto
* Generate protos
* Combine paths and directories from in source
* Add filesystem filter
* Address comments
* Use a config struct when scanning and engine source.
* fix tests.
* Move test_helpers to the sources pkg.
* Handle ScanGit error in tests.
* adderss comments.
* Use functional options.
* Remove temp var.
* Add better var names for the setup functions for each config.
* Remove unused var.
* fix error logs.
* fix error logs.
* single line.
* remove blank lines.