Commit graph

26 commits

Author SHA1 Message Date
Richard Gomez
5216142960
refactor(cache): use generics (#2930) 2024-06-06 13:08:00 -04:00
ahrav
ead9dd5748
[refactor] - Create separate handler for non-archive data (#2825)
* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* Handle non-archive data within the DefaultHandler

* make structs and methods private

* Remove non-archive data handling within sources

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* Adjust check for rpm/deb archive type

* add additional deb mime type

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* update comment

* move rewind closer

* reduce log verbosity

* add metrics for file handling

* add metrics for errors

* make defaultBufferSize a const

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* Allow git cat-file blob to complete before trying to handle the file

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* use buffer writer

* update

* refactor

* update context pkg

* revert stuff

* update test

* fix test

* remove

* use correct reader

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* Update write method in contentWriter interface

* Add bufferReadSeekCloser

* update name

* update comment

* fix lint

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* Handle non-archive data within the DefaultHandler

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* update comment

* move rewind closer

* reduce log verbosity

* make defaultBufferSize a const

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* use buffer writer

* update

* refactor

* update context pkg

* revert stuff

* update test

* remove

* rebase

* go mod tidy

* lint check

* update metric to ms

* update metric

* update comments

* dont use ptr

* update

* fix

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* go mod tidy

* update go mod

* Add a buffered file reader

* update comments

* use Buffered File Readder

* return buffer

* update

* fix

* return

* go mod tidy

* merge

* use a shared pool

* use sync.Once

* reorganzie

* remove unused code

* fix double init

* fix stuff

* nil check

* reduce allocations

* updates

* update metrics

* updates

* reset buffer instead of putting it back

* skip binaries

* skip

* concurrently process diffs

* close chan

* concurrently enumerate orgs

* increase workers

* ignore pbix and vsdx files

* add metrics for gitparse's Diffchan

* fix metric

* update metrics

* update

* fix checks

* fix

* inc

* update

* reduce

* Create workers to handle binary files

* modify workers

* updates

* add check

* delete code

* use custom reader

* rename struct

* add nonarchive handler

* fix break

* add comments

* add tests

* refactor

* remove log

* do not scan rpm links

* simplify

* rename var

* rename

* fix benchmark

* add buffer

* buffer

* buffer

* handle panic

* merge main

* merge main

* add recover

* revert stuff

* revert

* revert to using reader

* fixes

* remove

* update

* fixes

* linter

* fix test

* fix comment

* update field name

* fix
2024-05-15 13:40:16 -07:00
ahrav
570cec7565
[refactor] - Refactor Archive Handling Logic (#2703)
* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* go mod tidy

* update go mod

* go mod tidy

* add comment

* update max depth check to >

* go mod tidy

* rename

* [refactor] - Refactor Archive Handling Logic - Part 4: Non-Archive Data Handling and Cleanup (#2704)

* Handle non-archive data within the DefaultHandler

* make structs and methods private

* Remove non-archive data handling within sources

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* move rewind closer

* reduce log verbosity

* make defaultBufferSize a const

* use correct reader

* address comments

* update test

* [feat] - Add Prometheus Metrics for File Handlers (#2705)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* add metrics for archive depth

* [bug] - Enhanced Archive Handling to Address Interface Constraints (#2710)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* wrap compReader with DiskbufferReader

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* replace diskbuffereader with bufferedfilereader

* updtes

* add metric back

* [bug] -  Fix bug and simplify git cat-file command execution and output handling (#2719)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* Allow git cat-file blob to complete before trying to handle the file

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* revert stuff

* update test

* remove

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* inline
2024-05-10 11:36:06 -07:00
ahrav
677238c96c
Extend memory cache (#2275)
* Extend memory cache to allow for configuring custom expiration and purge interval

* use any for value type

* fix test

* fix test

* address comments

* address

* make new construct more clear

* reduce duplication

* fix test
2024-01-11 08:20:37 -08:00
Mike Vanbuskirk
53f060a08e
Add disk buffer tempfile cleanup (#2130)
* add tempfile creation

- break PID retrieval into sep. function

* add tmpfile cleanup func

* add file cleanup to main cleanup func

* refactor file logic to only return name string

* add temp buffer naming to gcs

* add temp buffer naming to s3

* add temp buffer naming to filesystem

* add temp buffer naming to git

* consolidate cleanup functions

- have single function handle both files and dirs
- remove interface(not needed with a single func implementation)
- change calls to `New(...)` to reflect config implementation
- simplify automation in main.go
- update disk-buffer-reader dependency

* integrate changes from pr #2133

* merge main

* checkout from main to revert conflict issues

* re-add buffer logic to git

* interface no longer needed

* move string format to global const

---------

Co-authored-by: Ahrav Dutta <ahrav.dutta@trufflesec.com>
2023-12-11 18:31:50 -05:00
Miccah
52600a897a
[chore] Replace chunks channel with ChunkReporter in git based sources (#2082)
ChunkReporter is more flexible and will allow code reuse for unit
chunking. ChanReporter was added as a way to maintain the original
channel functionality, so this PR should not alter existing behavior.
2023-11-01 09:22:44 -07:00
Bill Rich
c5efa870ff
Use latest dbr (#1955) 2023-10-24 07:52:49 -07:00
Miccah
dbcb888063
Update Source interface to use SourceID and JobID types (#1774)
The previous implementation used int64 for both, which can be mixed up
easily. Using distinct types adds a layer of type safety checked by the
compiler.
2023-09-14 11:28:24 -07:00
Miccah
72b6a9ec6b
Add a SourceType constant to all source packages (#1768) 2023-09-12 17:23:25 -07:00
ahrav
2a9f34962d
Add optional param to Chunks (#1747)
* Add interface for targeted chunking.

* use optional args.

* update Chunks method signature.

* update tests.

* fix test.

* update QueryCriteria type.
2023-09-07 09:03:37 -07:00
ahrav
2b1b1b5ad0
Add jobID to chunk. (#1721) 2023-08-29 12:02:30 -07:00
Miccah
f3152b6885
Implement SourceUnitUnmarshaller for all sources (#1416)
* Implement CommonSourceUnitUnmarshaller

* Add SourceUnitUnmarshaller to all sources using

All sources, with the exception of git, will use the CommonSourceUnit as
they only contain a single type of unit to scan.

* Fix method comments to adhere to Go's style guide
2023-06-23 11:15:51 -05:00
ahrav
6db770fbe5
use md5 hash for checking if key exists. (#1257) 2023-05-15 10:04:14 -07:00
ahrav
948828ba8c
[chore] - move objectManager interface (#1332)
* Relocate the objectManager interface to the consumer package as per Go
best practices.

* address comment.
2023-05-15 09:30:26 -07:00
Brendan Shaklovitz
10902f802a
Add max object size flag for s3 bucket scanning (#1294)
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
2023-04-26 15:39:43 -07:00
ahrav
461f1a631e
[chore] - use hex encode vs base64 (#1256)
* use hex encode vs base64.

* fix tests.
2023-04-13 19:16:06 -07:00
ahrav
2fbf86a6ab
Use md5 hash for resuming key (#1203)
* Add in-memory caching lib, used by the GCS source.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* Fix static check.

* Add test for NewWithData.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* update comment.

* update comments.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* Include md5 hash to the object struct.

* remove unused dep.

* address comments.

* Add exists method.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* rebase.

* fix test.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* rebase.

* rebase.

* split encode resume by comma.

* update comment.

add comment for shouldCache.

remove redundant return.

* use md5 instead of name.

* update tests.

* Include md5 hash to the object struct.

* use md5 instead of name.

* update tests.

* Use a persistable cache.

* fix merge.

* fix merge.

* Include md5 hash to the object struct.

* use md5 instead of name.

* update tests.

* use md5 instead of name.

* update progress tests.

* use name for log message.

* remove slice operation.
2023-04-13 18:26:45 -07:00
ahrav
c451f9daf8
Use persistable cache for GCS progress tracking (#1204)
* Add in-memory caching lib, used by the GCS source.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* Fix static check.

* Add test for NewWithData.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* update comment.

* update comments.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* remove unused dep.

* address comments.

* Add exists method.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* rebase.

* fix test.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* rebase.

* rebase.

* split encode resume by comma.

* Use a persistable cache.

* fix merge.

* fix merge.

* Add progress as part of the cache given it will be the persistence layer.

* Add test for making sure the cache doesn't persist when the increment value is not met.

* fix tests.
2023-04-10 07:55:00 -07:00
ahrav
2cf6f831d4
Use OAuth2 http client with GCS (#1220)
* Use OAuth2 http client with GCS.

* rename variable.
2023-03-29 19:40:27 -07:00
ahrav
ac19de75bf
Delete progress tracking from GCS source (#1190)
* Add in-memory caching lib, used by the GCS source.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* Fix static check.

* Add test for NewWithData.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* update comment.

* update comments.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* remove unused dep.

* address comments.

* Add exists method.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* rebase.

* fix test.

* Use cache for tracking progress for the GCS source.

* fix merge issue.

* fix merge issue.

* fix test.

* rebase.

* rebase.

* split encode resume by comma.

* update comment.

add comment for shouldCache.

remove redundant return.

* delete old code.

* delete more code.

* update comment.
2023-03-27 10:39:16 -07:00
ahrav
03a534d59f
Use correct date format for Date posted. (#1211) 2023-03-27 10:27:28 -07:00
ahrav
ffbd9c1ead
[chore] - log enumeration duration (#1187)
* log enumeration duration.

* use defer to print enumeration duration stat.

* remove temp var.
2023-03-21 09:14:58 -07:00
ahrav
c617bd7a4e
Add resuming capability to GCS source (#1161)
* Add resuming capability to GCS source.

* Handle no auth scans.

* complete resume logic

* Use custom function type.

* remove functions.

* linter.

* fix test.

* fix test.

* Handle concurrent map writes.

* use string as CLI flag for include/exclude.

* handle emtpy buckets.

* Handle enumeration on initial job run.

* Rename stats to attributes.

* remove redundant return.

* If test fails due to 400, that is fine, it's expected.

* Add unauth GCS source type.

* comments.

* update proto.

* Use short flag.

* address comments.
2023-03-16 17:53:42 -07:00
ahrav
6193509098
add support for json service account and service account file. (#1185) 2023-03-16 13:04:36 -07:00
Dustin Decker
585bd82d47
update integration test excludes (#1169) 2023-03-10 14:41:29 -08:00
ahrav
cbf299aa77
Add gcs scanning integration (#1153)
* Setup for GCS scanning.

* Update GCS engine w/ projectID req.

* Add concurrency field to gcsManager.

* add errgroup to gcsManager.

* Update gcs manager.

* Use defautl ADC.

* use ADC.'

* Add TOOD.

* add log to iterator completion.

* use a BinaryReader instead of concrete object for channel type.

* initial test for Chunks.

* Add tests for chunking objects.

* Add concurrency.

* update metadata to include content type and acls.

* Add object reading code.

* Add integration test.

* Add entrypoint.

* Add removed wg.Wait().

* remove dead code.

* remove build.

* Remove period from file extension.

* remove used.

* Add comment.

* Setup for GCS scanning.

* Update GCS engine w/ projectID req.

* Add concurrency field to gcsManager.

* add errgroup to gcsManager.

* Update gcs manager.

* Use defautl ADC.

* use ADC.'

* Add TOOD.

* add log to iterator completion.

* use a BinaryReader instead of concrete object for channel type.

* initial test for Chunks.

* Add tests for chunking objects.

* Add concurrency.

* update metadata to include content type and acls.

* Add object reading code.

* Add integration test.

* Add entrypoint.

* Add removed wg.Wait().

* remove dead code.

* remove build.

* remove used.

* Add file type for objects.

* Add check for file type and size.

* Add default file size.

* Add additinoal auth options and remaining CLI flags.

* Handle errors in go routines.

* Handle resuming for buckets.

* Remove redundant words in comment.

* remove ok check on bool check.

* remove extra blank line.

* Add return if handler handles chunk.

* Add comment.

* remove extra blank line.

* cleanup comment.

* Add comment.

* move up fxn.

* go mod tidy.

* Add exclusion to perf testing buckets.

* Handle blocking the channel.

* remove unused const.

* fix tests.

* fix tests.

* Handle gcs manger options better.

* update fxn name.

* Remove arg name.

* ignore buckets in gcsManager test.

* fix test.

* propulate gsManagerOpts.

* inline err check.

* Add readme.

* update readme spelling.

* fix test.
2023-03-07 17:32:04 -08:00