The source manager initialization function was defined as `sourceID`
followed by `jobID`, while the source initialization function is the
reverse. This is confusing and easy to mix up since the parameters are
the same type.
This commit adds a test to make sure the source manager initializes in
the correct order, but it doesn't prevent the library user to make the
same mistake. We may want to consider using different types.
* add exportable validate function for github
* update validator
* use the context
* gate to prevent panic
* wrap error with context
* wrap error with context for basic auth and unauth
* add role assumption for s3 source
* refactor role assumption to repeatable string
user can pass array of roles to assume
* refactor s3 chunks to handle passed roleARNs
* add role-session name
use timestamp to make dynamic
* add docstring for rolearn strings()
* make sure role ars are passed into source
* refactor role assumption functionality
break s3 bucket scanning into sep. function
* add log check on assume role
* fix role iteration
- Make sure s3 struct is populated with roles
- add separate new client instantiation for role-based access
- iterates through each role
* add comment
* protobuf revert for merge
* re-run make proto
* lint cleanup
* cleanup TODOs
* drop redundant switch case in assumerole client
* use less verbose 'ctx' designator
* breakout functionality from Chunks
- separate functions for:
- enumerating buckets to scan
- scanning objects within the buckets
* remake protobuf defs
* allow scan to continue on single bucket err
* add readme docs
* minor fixups
With the introduction of the SourceManager, the chunks channel became
private and read-only. This provides a method to write chunks into the
channel as we transition away from needing to do that.
* added functionality to scan docker images with digests instead of tags
* cleaned import statement
* added unit test for baseAndTag parsing + remote digest scan
* Add common chunker.
* add comment.
* use better config name.
* Add common chunk reader to s3.
* Add common chunk reader to git, gcs, circleci.
* revert gcs.
* revert gcs.
* fix chunker.
* revert gcs.
* update cancellablewrite.
* revert impl.
* update to remove totalsize.
* Fix my goof.
* Use unified struct in chunkreader.
* return err instead of logging and returning.
* rename error to err.
* only send single ChunkResult even if there is an error and chunkBytes.
* fix logic.
* Add SourceManager to Engine struct
* Update Engine methods to use the SourceManager
* Fix GCS test
The original was testing that `Init()` errors weren't surfaced in
`Finish()`, but the `SourceManager` changed that behavior.
* JobProgress race fixes
* Add contextual values
* Remove unused code
* Add debug logs
* Rename WithConcurrency to WithConcurrentSources
* Always forward chunks to the output chunks channel
* feat: initial support for bare repositories
* feat: use concatenation instead of formatting and os.Getenv instead of os.Environ
Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
* fix: go-git update with pre-receive hooks fix
Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
* fix: remove info about pre-receive hook from README.md for now
Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
* fix: don't scan staged while using --bare option, fixes to make it work with the latest master
Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
* fix: small refactor according to #1518
Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
---------
Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
* Refactor git source to allow ScanOptions and use source in engine
Refactor the Chunks method of the git Source to call out to two helper
methods: scanRepos and scanDirs which scans s.conn.Repositories and
s.conn.Directories respectively. The only notable change in behavior is
that a credential is no longer necessary if there are no
s.conn.Repositories to scan.
* Preserve ScanGit functionality of not cleaning up temporary files
* Support fatal errors in job reports
* WIP: JobReporter and JobInspector
* WIP: JobReportHook and JobReportRef
* Add ChunkError type and asyncRun helper method
* Rename JobReport to JobProgress
* Return a closed channel from Done when the JobProgress is nil
* Comment catchFirstFatal function
#1454 modified one of the Github enumeration code paths in a way that broke an integration test by causing one client's transport to be used for the construction of a different client, causing authentication failures. This saves the original transport for use, fixing the test.
* Miscellaneous SourceManager updates
* Own the chunks channel instead of accepting it as an input
* Add Chunks and Wait methods
* Fix bug in Enroll so it actually returns the handle
* Add context.Context parameter to the SourceInitFunc type
* Add SourceManager tests for Run and Wait methods
* Rename man variables to mgr
* Implement SourceManager basics
* Rename identifiers and add a default headlessAPI implementation
* Rewrite to use SourceInitFunc
* Update variable name to accurately reflect its value
* issue comment scanning
* save progress
* test
* test for pr comment and issue comment
* add pagination support
* linter stuff
* make linter happy
* remove debug log
* readd logging
* github issue resolved
* var const block and handle rate limit
* remove magic number
* make gitURLParse a public function to use more generally
* fix test bug
* make comment scanning OPT-IN
* Add CancellableWrite helper function
* Create SourceUnitEnumerator interface and EnumerationResult struct
* Implement SourceUnitEnumerator for the filesystem Source
* Omit explicit zero values
* Exit with non-zero exit code on chunk source error
* Exit with a non-zero exit code whenever we hit an error getting
chunks. Previously the error would be logged but trufflehog would exit
with a 0 (success) status code.
* fix gcs test
---------
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
Co-authored-by: ahrav <ahravdutta02@gmail.com>
* Implement CommonSourceUnitUnmarshaller
* Add SourceUnitUnmarshaller to all sources using
All sources, with the exception of git, will use the CommonSourceUnit as
they only contain a single type of unit to scan.
* Fix method comments to adhere to Go's style guide
* Add Validator interface and example
* Close sockets and improve error messages
* Remove duplicate error
* Use var declaration so err slice can be nil
* Fix worktree scan by setting EnableDotGitCommonDir
* Change `PlainOpenOptions` to set `EnableDotGitCommonDir` to true.
In every current usage of this function, it is on an already-cloned
repository, so it should always be valid to have this set. By doing
so, it should fix some issues with worktrees.
* Remove unused go.mod replace directives
* Remove replace directives for libraries that are not in use.
* Add in-memory caching lib, used by the GCS source.
* Use cache for tracking progress for the GCS source.
* fix merge issue.
* fix merge issue.
* fix test.
* Fix static check.
* Add test for NewWithData.
* Use cache for tracking progress for the GCS source.
* fix merge issue.
* fix merge issue.
* fix test.
* update comment.
* update comments.
* Use cache for tracking progress for the GCS source.
* fix merge issue.
* fix merge issue.
* fix test.
* remove unused dep.
* address comments.
* Add exists method.
* Use cache for tracking progress for the GCS source.
* fix merge issue.
* fix merge issue.
* fix test.
* rebase.
* fix test.
* Use cache for tracking progress for the GCS source.
* fix merge issue.
* fix merge issue.
* fix test.
* rebase.
* rebase.
* split encode resume by comma.
* Use a persistable cache.
* fix merge.
* fix merge.
* Add progress as part of the cache given it will be the persistence layer.
* Add test for making sure the cache doesn't persist when the increment value is not met.
* fix tests.
* Resolve#1167 by adding support for the AWS_SESSION_TOKEN environment variable and adding a --session-token cli arg
* fix error message
---------
Co-authored-by: Dustin Decker <dustin@trufflesec.com>
This will concatenate all errors together into a single string. When
possible, it would be better to log the actual errors slice to take
advantage of structured logging.
* Rename directories to paths
* Generate protos
* Add file scanning support to filesystem source
* Add directories back to filesystem proto
* Generate protos
* Combine paths and directories from in source
* Add filesystem filter
* Address comments
* Adding missing flags to Readme
* Use retryableHttpClient by default for GitHub
* Adding repoUrl for scanning time log
* Use WithField instead of WithFields
* Updating README with lasted --help output
* Allow using a glob for include list.
* Update command flag.
* Make comment more clear.
* update comment.
* Allow scanning repo and org at the same time.
Also, don't fetch a github user or their token when both are known. This
currently only affects the Github Token auth type. Github App
installations will continually fetch tokens every time we clone a repo.
In the future we should check the `ExpiresAt` field of the Github App
token and determine if we need to fetch a new one at that point.
* Add SSH config option for the git source
The auth message is empty since we use the git binary underneath to
handle the SSH authentication.
* Import digitaloceanv2
I'm not sure I fully understand why this issue exists. But I think the
short version is this: When we attempted to paginate users, we would set
a variable's Page value. But that variable appears to not actually be a
pointer, despite being added as one. It probably has to do with how
struct embedding works. Either way, if we make the overall options
variable the whole thing, and update its embedded struct with our page
variable, everything works out.
* Handle errors w/ github source.
* Fix loop var captured by func literal.
* Fix loop var captured by func literal.
* Set completed progress if the scan completes with no errors.
* Set progress to 100% if the scope and iteration are both 0.
* Fix commentary.
* Fix test.
* Return after the defer to os.RemoveAll.
* Fix unauth scan.
* Inline range loop.
* update tests for partial scan completion with errors. Ensure correct progress is set.
* Update progress for all sources.
* Update github test.
* Address comments.
* Scan binary files for git sources
* Create data chunks in for loop
* Linter feedback and newline commit result
* Use disk buffered reader and chunker function
* Use a config struct when scanning and engine source.
* fix tests.
* Move test_helpers to the sources pkg.
* Handle ScanGit error in tests.
* adderss comments.
* Use functional options.
* Remove temp var.
* Add better var names for the setup functions for each config.
* Remove unused var.
* fix error logs.
* fix error logs.
* single line.
* remove blank lines.
The fragment trace was a bit too verbose even at the trace level. We may
want to trace the file being chunked or something like that, but not the
entire diff.
* Shallow clone if --since-commit is provided
* Set the user before constructing args
* Fix vbout detector
* Address PR comments
* Use a better name for timestamp
* Use net.URL.String method for the remote path
* Add unit tests and refactor some logic
* Move integration tests to a separate file behind a build flag
* Fix bugs in normalizeRepos
* Address lint errors
* Sort slices before comparing because order doesn't matter
* Reorganize GitHub source
This breaks up the Chunks method into smaller sub-method calls to help
organize and better understand the logic flow. No logic has been
modified (except one obvious bug), just shuffling code around.
* Check errors and revert bug fix