* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* Handle non-archive data within the DefaultHandler
* make structs and methods private
* Remove non-archive data handling within sources
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Handle non-archive data within the DefaultHandler
* rebase
* Remove non-archive data handling within sources
* Adjust check for rpm/deb archive type
* add additional deb mime type
* add gzip
* move diskbuffered rereader setup into handler pkg
* remove DiskBuffereReader creation logic within sources
* update comment
* move rewind closer
* reduce log verbosity
* add metrics for file handling
* add metrics for errors
* make defaultBufferSize a const
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* Allow git cat-file blob to complete before trying to handle the file
* wrap compReader with DiskbufferReader
* Allow git cat-file blob to complete before trying to handle the file
* updates
* use buffer writer
* update
* refactor
* update context pkg
* revert stuff
* update test
* fix test
* remove
* use correct reader
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* Update write method in contentWriter interface
* Add bufferReadSeekCloser
* update name
* update comment
* fix lint
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Handle non-archive data within the DefaultHandler
* rebase
* Remove non-archive data handling within sources
* Handle non-archive data within the DefaultHandler
* add gzip
* move diskbuffered rereader setup into handler pkg
* remove DiskBuffereReader creation logic within sources
* update comment
* move rewind closer
* reduce log verbosity
* make defaultBufferSize a const
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* wrap compReader with DiskbufferReader
* Allow git cat-file blob to complete before trying to handle the file
* updates
* use buffer writer
* update
* refactor
* update context pkg
* revert stuff
* update test
* remove
* rebase
* go mod tidy
* lint check
* update metric to ms
* update metric
* update comments
* dont use ptr
* update
* fix
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Adjust check for rpm/deb archive type
* add additional deb mime type
* update comment
* go mod tidy
* update go mod
* Add a buffered file reader
* update comments
* use Buffered File Readder
* return buffer
* update
* fix
* return
* go mod tidy
* merge
* use a shared pool
* use sync.Once
* reorganzie
* remove unused code
* fix double init
* fix stuff
* nil check
* reduce allocations
* updates
* update metrics
* updates
* reset buffer instead of putting it back
* skip binaries
* skip
* concurrently process diffs
* close chan
* concurrently enumerate orgs
* increase workers
* ignore pbix and vsdx files
* add metrics for gitparse's Diffchan
* fix metric
* update metrics
* update
* fix checks
* fix
* inc
* update
* reduce
* Create workers to handle binary files
* modify workers
* updates
* add check
* delete code
* use custom reader
* rename struct
* add nonarchive handler
* fix break
* add comments
* add tests
* refactor
* remove log
* do not scan rpm links
* simplify
* rename var
* rename
* fix benchmark
* add buffer
* buffer
* buffer
* handle panic
* merge main
* merge main
* add recover
* revert stuff
* revert
* revert to using reader
* fixes
* remove
* update
* fixes
* linter
* fix test
* fix comment
* update field name
* fix
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Adjust check for rpm/deb archive type
* add additional deb mime type
* update comment
* Remove specialized handler and archive struct and restructure handlers pkg.
* Refactor RPM archive handlers to use a library instead of shelling out
* make rpm handling context aware
* update test
* Refactor AR/deb archive handler to use an existing library instead of shelling out
* Update tests
* add max size check
* add filename and size to context kvp
* move skip file check and is binary check before opening file
* fix test
* preserve existing funcitonality of not handling non-archive files in HandleFile
* Adjust check for rpm/deb archive type
* add additional deb mime type
* update comment
* go mod tidy
* update go mod
* go mod tidy
* add comment
* update max depth check to >
* go mod tidy
* rename
* [refactor] - Refactor Archive Handling Logic - Part 4: Non-Archive Data Handling and Cleanup (#2704)
* Handle non-archive data within the DefaultHandler
* make structs and methods private
* Remove non-archive data handling within sources
* Handle non-archive data within the DefaultHandler
* rebase
* Remove non-archive data handling within sources
* add gzip
* move diskbuffered rereader setup into handler pkg
* remove DiskBuffereReader creation logic within sources
* move rewind closer
* reduce log verbosity
* make defaultBufferSize a const
* use correct reader
* address comments
* update test
* [feat] - Add Prometheus Metrics for File Handlers (#2705)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* add metrics for archive depth
* [bug] - Enhanced Archive Handling to Address Interface Constraints (#2710)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* wrap compReader with DiskbufferReader
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* replace diskbuffereader with bufferedfilereader
* updtes
* add metric back
* [bug] - Fix bug and simplify git cat-file command execution and output handling (#2719)
* add metrics for file handling
* add metrics for errors
* add metrics for file handling
* add metrics for errors
* fix tests
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* Address incompatible reader to openArchive
* remove nil check
* fix err assignment
* Allow git cat-file blob to complete before trying to handle the file
* wrap compReader with DiskbufferReader
* Allow git cat-file blob to complete before trying to handle the file
* updates
* revert stuff
* update test
* remove
* add metrics for file handling
* add metrics for errors
* fix tests
* rebase
* add metrics for errors
* add metrics for max archive depth and skipped files
* update error
* skip symlinks and dirs
* update err
* fix err assignment
* rebase
* remove
* update metric to ms
* update comments
* address comments
* reduce indentations
* inline
ChunkReporter is more flexible and will allow code reuse for unit
chunking. ChanReporter was added as a way to maintain the original
channel functionality, so this PR should not alter existing behavior.
Fixes#1769
The existing error check `errors.Is(err, archiver.ErrNoMatch) && depth >
0` only conditionally handled a specific error.
Any other error case was not short circuited and ended up causing a
nil-pointer dereference further down the method when `format.Name()` was
invoked.
* Use custom context for archive handler of specialized archives.
* fix arg.
* fix test.
* use re-reader.
* use re-reader.
* Update error and comments.
* Add better error handling.
* update.
* Add handler for .deb file formats.
* Add handler for .rpm file formats.
* update.
* move logic to general archive handler.
* update const.
* Add compile time guard.
* Remove redundant parens.
* Add checks to make sure we have the tools installed to extract arhives.
* Limit size of temp file for arhive reading.
* handle nested archives.
* add comment.
* use consistent name for tempEnv -> env
* fix handler fxn signature.
* Fix error where some files do not get properly scanned due to order of
extraction / decompression steps. Doing decompression first ensures
that a compressed archive (e.g., gzipped zip file), is handled
correctly.