Commit graph

46 commits

Author SHA1 Message Date
ahrav
ead9dd5748
[refactor] - Create separate handler for non-archive data (#2825)
* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* Handle non-archive data within the DefaultHandler

* make structs and methods private

* Remove non-archive data handling within sources

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* Adjust check for rpm/deb archive type

* add additional deb mime type

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* update comment

* move rewind closer

* reduce log verbosity

* add metrics for file handling

* add metrics for errors

* make defaultBufferSize a const

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* Allow git cat-file blob to complete before trying to handle the file

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* use buffer writer

* update

* refactor

* update context pkg

* revert stuff

* update test

* fix test

* remove

* use correct reader

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* Update write method in contentWriter interface

* Add bufferReadSeekCloser

* update name

* update comment

* fix lint

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* Handle non-archive data within the DefaultHandler

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* update comment

* move rewind closer

* reduce log verbosity

* make defaultBufferSize a const

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* use buffer writer

* update

* refactor

* update context pkg

* revert stuff

* update test

* remove

* rebase

* go mod tidy

* lint check

* update metric to ms

* update metric

* update comments

* dont use ptr

* update

* fix

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* go mod tidy

* update go mod

* Add a buffered file reader

* update comments

* use Buffered File Readder

* return buffer

* update

* fix

* return

* go mod tidy

* merge

* use a shared pool

* use sync.Once

* reorganzie

* remove unused code

* fix double init

* fix stuff

* nil check

* reduce allocations

* updates

* update metrics

* updates

* reset buffer instead of putting it back

* skip binaries

* skip

* concurrently process diffs

* close chan

* concurrently enumerate orgs

* increase workers

* ignore pbix and vsdx files

* add metrics for gitparse's Diffchan

* fix metric

* update metrics

* update

* fix checks

* fix

* inc

* update

* reduce

* Create workers to handle binary files

* modify workers

* updates

* add check

* delete code

* use custom reader

* rename struct

* add nonarchive handler

* fix break

* add comments

* add tests

* refactor

* remove log

* do not scan rpm links

* simplify

* rename var

* rename

* fix benchmark

* add buffer

* buffer

* buffer

* handle panic

* merge main

* merge main

* add recover

* revert stuff

* revert

* revert to using reader

* fixes

* remove

* update

* fixes

* linter

* fix test

* fix comment

* update field name

* fix
2024-05-15 13:40:16 -07:00
cuiyourong
ead4e8fa2d
chore: fix some typos in comments (#2851)
Signed-off-by: cuiyourong <cuiyourong@gmail.com>
2024-05-15 07:36:21 -07:00
ahrav
570cec7565
[refactor] - Refactor Archive Handling Logic (#2703)
* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* go mod tidy

* update go mod

* go mod tidy

* add comment

* update max depth check to >

* go mod tidy

* rename

* [refactor] - Refactor Archive Handling Logic - Part 4: Non-Archive Data Handling and Cleanup (#2704)

* Handle non-archive data within the DefaultHandler

* make structs and methods private

* Remove non-archive data handling within sources

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* move rewind closer

* reduce log verbosity

* make defaultBufferSize a const

* use correct reader

* address comments

* update test

* [feat] - Add Prometheus Metrics for File Handlers (#2705)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* add metrics for archive depth

* [bug] - Enhanced Archive Handling to Address Interface Constraints (#2710)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* wrap compReader with DiskbufferReader

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* replace diskbuffereader with bufferedfilereader

* updtes

* add metric back

* [bug] -  Fix bug and simplify git cat-file command execution and output handling (#2719)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* Allow git cat-file blob to complete before trying to handle the file

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* revert stuff

* update test

* remove

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* inline
2024-05-10 11:36:06 -07:00
Cody Rose
28ed81f0a2
Add naive S3 ignorelist (#2536)
This PR adds the ability to exclude buckets from S3 scans. The capability is pretty rudimentary right now, and does not support globbing. If both lists are specified the source to fail to initialize.
2024-03-05 08:01:20 -05:00
Mike Vanbuskirk
53f060a08e
Add disk buffer tempfile cleanup (#2130)
* add tempfile creation

- break PID retrieval into sep. function

* add tmpfile cleanup func

* add file cleanup to main cleanup func

* refactor file logic to only return name string

* add temp buffer naming to gcs

* add temp buffer naming to s3

* add temp buffer naming to filesystem

* add temp buffer naming to git

* consolidate cleanup functions

- have single function handle both files and dirs
- remove interface(not needed with a single func implementation)
- change calls to `New(...)` to reflect config implementation
- simplify automation in main.go
- update disk-buffer-reader dependency

* integrate changes from pr #2133

* merge main

* checkout from main to revert conflict issues

* re-add buffer logic to git

* interface no longer needed

* move string format to global const

---------

Co-authored-by: Ahrav Dutta <ahrav.dutta@trufflesec.com>
2023-12-11 18:31:50 -05:00
ahrav
52ffab1034
[chore] - fix import name clashes (#2143)
* fix import name clashes

* fix missing var
2023-12-01 06:53:15 -08:00
Miccah
52600a897a
[chore] Replace chunks channel with ChunkReporter in git based sources (#2082)
ChunkReporter is more flexible and will allow code reuse for unit
chunking. ChanReporter was added as a way to maintain the original
channel functionality, so this PR should not alter existing behavior.
2023-11-01 09:22:44 -07:00
Bill Rich
c5efa870ff
Use latest dbr (#1955) 2023-10-24 07:52:49 -07:00
Cody Rose
e9efed85c2
Use S3 credentials waterfall (#1823)
This PR updates the S3 source to use explicitly configured credentials if they're available and follow the normal AWS credentials waterfall if they're not. This is irrespective of whether role assumption is configured. This changes the previous behavior, which was to use waterfall credentials only if role assumption was configured and explicitly configured credentials only when it was not.
2023-09-27 16:57:47 -04:00
Miccah
dbcb888063
Update Source interface to use SourceID and JobID types (#1774)
The previous implementation used int64 for both, which can be mixed up
easily. Using distinct types adds a layer of type safety checked by the
compiler.
2023-09-14 11:28:24 -07:00
Miccah
72b6a9ec6b
Add a SourceType constant to all source packages (#1768) 2023-09-12 17:23:25 -07:00
Mike Vanbuskirk
de540652cb
verbosity updates to s3 source (#1750) 2023-09-11 14:53:43 -05:00
ahrav
2a9f34962d
Add optional param to Chunks (#1747)
* Add interface for targeted chunking.

* use optional args.

* update Chunks method signature.

* update tests.

* fix test.

* update QueryCriteria type.
2023-09-07 09:03:37 -07:00
Cody Rose
afe708519b
Validate S3 source (#1715)
This PR adds S3 source validation. This is accomplished by factoring out common "bucket visiting" logic to be used by both scanning and validation.
2023-09-05 10:18:58 -04:00
Cody Rose
a2c0abbfd6
Unify S3 client creation logic (#1657)
This PR unifies some code paths within the S3 source. This is being done to better support a future implementation of S3 source validation; less code that runs means less code to validate. The logical change is to move the handling of "role-less" operation down the call tree, which allows for a single code path for more of the S3 code.

This PR also fixes a bug that would occur in the (rare) case that the source couldn't create a regional S3 client. Before, an error would be logged, but it would be followed by a panic. Now the bucket in question is skipped.
2023-08-30 17:49:37 -04:00
ahrav
2b1b1b5ad0
Add jobID to chunk. (#1721) 2023-08-29 12:02:30 -07:00
Cody Rose
33eed42e17
Test S3 role assumption (#1655)
This PR adds a test of the S3 role assumption functionality. It currently only tests role assumption within a single account.
2023-08-25 11:30:08 -04:00
Cody Rose
059ea23a72
update s3 test bucket (#1649)
We're switching our S3 source test account over to a different one, which means we have to change the bucket name.
2023-08-22 12:43:38 -04:00
Cody Rose
dbb2c2e319
wait before finishing s3 test (#1647)
The S3 source test verifies that chunking has completed, but it didn't actually wait for completion first, leading to non-deterministic test failures.
2023-08-21 12:36:36 -04:00
Mike Vanbuskirk
64dd49f9ce
add role assumption for s3 source (#1477)
* add role assumption for s3 source

* refactor role assumption to repeatable string

user can pass array of roles to assume

* refactor s3 chunks to handle passed roleARNs

* add role-session name

use timestamp to make dynamic

* add docstring for rolearn strings()

* make sure role ars are passed into source

* refactor role assumption functionality

break s3 bucket scanning into sep. function

* add log check on assume role

* fix role iteration

- Make sure s3 struct is populated with roles
- add separate new client instantiation for role-based access
- iterates through each role

* add comment

* protobuf revert for merge

* re-run make proto

* lint cleanup

* cleanup TODOs

* drop redundant switch case in assumerole client

* use less verbose 'ctx' designator

* breakout functionality from Chunks

- separate functions for:
- enumerating buckets to scan
- scanning objects within the buckets

* remake protobuf defs

* allow scan to continue on single bucket err

* add readme docs

* minor fixups
2023-08-17 20:30:20 -04:00
ahrav
b8bb94f2b1
[bug] - copy chunk before sending on chunksChan (#1633)
* Redclare chunk before sending on chunksChan.

* add integration test.

* update test.
2023-08-16 16:36:38 -07:00
ahrav
13999227b9
Use common chunk reader (#1596)
* Add common chunker.

* add comment.

* use better config name.

* Add common chunk reader to s3.

* Add common chunk reader to git, gcs, circleci.

* revert gcs.

* revert gcs.

* fix chunker.

* revert gcs.

* update cancellablewrite.

* revert impl.

* update to remove totalsize.

* Fix my goof.

* Use unified struct in chunkreader.

* return err instead of logging and returning.

* rename error to err.

* only send single ChunkResult even if there is an error and chunkBytes.

* fix logic.
2023-08-07 12:55:28 -07:00
ahrav
78d06658ca
Dont return in loop. (#1589) 2023-08-01 10:29:01 -07:00
Brendan Shaklovitz
da5301ea1e
Exit with non-zero exit code on chunk source error (#1286)
* Exit with non-zero exit code on chunk source error

* Exit with a non-zero exit code whenever we hit an error getting
  chunks. Previously the error would be logged but trufflehog would exit
  with a 0 (success) status code.

* fix gcs test

---------

Co-authored-by: Dustin Decker <dustin@trufflesec.com>
Co-authored-by: ahrav <ahravdutta02@gmail.com>
2023-06-26 11:39:57 -05:00
Miccah
f3152b6885
Implement SourceUnitUnmarshaller for all sources (#1416)
* Implement CommonSourceUnitUnmarshaller

* Add SourceUnitUnmarshaller to all sources using

All sources, with the exception of git, will use the CommonSourceUnit as
they only contain a single type of unit to scan.

* Fix method comments to adhere to Go's style guide
2023-06-23 11:15:51 -05:00
Brendan Shaklovitz
10902f802a
Add max object size flag for s3 bucket scanning (#1294)
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
2023-04-26 15:39:43 -07:00
iamjpotts
b3d917f9c7
Resolve #1167 by adding support for the AWS_SESSION_TOKEN (#1170)
* Resolve #1167 by adding support for the AWS_SESSION_TOKEN environment variable and adding a --session-token cli arg

* fix error message

---------

Co-authored-by: Dustin Decker <dustin@trufflesec.com>
2023-04-03 14:56:43 -07:00
Miccah
d317ddb51a
[chore] Remove logrus from circleci, filesystem, gitlab, and s3 sources (#1089)
* [chore] Remove logrus from circleci, filesystem, gitlab, and s3 sources

* Address comments
2023-02-10 11:02:55 -06:00
ahrav
8be89a593b
Handle errors in a thread safe manner (#1052)
* Handle errors in a thread safe manner.

* fix test.

* fix linter.

* address comments.
2023-02-02 11:05:33 -08:00
ahrav
009756dce6
add proto that was missing. (#986) 2022-12-23 13:27:07 -08:00
Bill Rich
36ca2601e0
Add s3 object count to trace logs (#975)
* Add s3 object count to trace logs

* fix debug level
2022-12-13 16:46:09 -08:00
ahrav
26befdd1ec
[bug] - Handle error when scanning s3 bucket. (#969)
* Handle error when scanning s# bucket.

* move wait outside loop.

* Add logging.

* revert changes.

* remove.

* revert.
2022-12-12 10:10:06 -08:00
Bill Rich
f1ec9e74eb
Close files to clean up tmp files (#940) 2022-11-22 13:13:34 -08:00
Dustin Decker
28dd25beeb
S3 scanner improvements (#938) 2022-11-21 19:15:26 -08:00
Bill Rich
ab71b93f7d
Add context to handler (#877)
* Add context to handler

* Return rather than break out of select
2022-10-28 08:57:55 -07:00
Bill Rich
958266ea84
Run chunker in pipeline (#859)
* Run chunker in pipeline

* Move ChunkSize and PeekSize to source package.

* Use new Chunk and Peek size location
2022-10-24 13:57:27 -07:00
ahrav
92f40c2031
[THOG-709] - Recover from detector panics (#810) 2022-09-22 07:01:10 -07:00
ahrav
7ba583ca40
[THOG-681] - Handle errors sources (#783)
* Handle errors w/ github source.

* Fix loop var captured by func literal.

* Fix loop var captured by func literal.

* Set completed progress if the scan completes with no errors.

* Set progress to 100% if the scope and iteration are both 0.

* Fix commentary.

* Fix test.

* Return after the defer to os.RemoveAll.

* Fix unauth scan.

* Inline range loop.

* update tests for partial scan completion with errors. Ensure correct progress is set.

* Update progress for all sources.

* Update github test.

* Address comments.
2022-09-07 19:40:37 -07:00
Dustin Decker
fa9479100e
Add common sentry recover library and add into goroutines (#738)
* Add common sentry recover library and add into goroutines

* fix nits
2022-08-29 11:45:37 -07:00
Bill Rich
0ddd49a1b8
Use file handler and common chunker (#707) 2022-08-23 16:35:52 -07:00
ahrav
dcc102a81c
[Thog-371] Utilize config struct for engine scans (#700)
* Use a config struct when scanning and engine source.

* fix tests.

* Move test_helpers to the sources pkg.

* Handle ScanGit error in tests.

* adderss comments.

* Use functional options.

* Remove temp var.

* Add better var names for the setup functions for each config.

* Remove unused var.

* fix error logs.

* fix error logs.

* single line.

* remove blank lines.
2022-08-10 10:11:13 -07:00
Dustin Decker
2178f1f42e reword and fix error logging 2022-06-13 16:14:22 -07:00
ahrav
d2605354fe
[THOG-332 ]Remove TokenSource interface from the init method of Source. (#539)
* Remove TokenSource interface from the init method of Source.

* Remove proto message.

* Remove proto message.

* Fix tests.

* Fix filesystem test.
2022-05-13 14:35:06 -07:00
ahrav
b0d79180f6
[THOG-314] Add new parameter to the Init method for the source interface. (#529)
* Add new parameter to the Init method for the source interface.

* Add Oauth Token service.

* remove .test file.

* remove .test file.

* Fix param spelling.

* fix tests with new param in init

* Add missing gock lib.
2022-05-10 11:11:43 -07:00
steeeve
a770f643df Add placeholder for encoded resume info in SetProgressComplete 2022-03-24 12:43:36 -04:00
Bill Rich
6486c18565
Add s3 support to CLI (#76)
* Add s3 support to CLI

* Clean up comments

Co-authored-by: Dustin Decker <dustin@trufflesec.com>
2022-03-14 17:07:07 -07:00