ChunkReporter is more flexible and will allow code reuse for unit
chunking. ChanReporter was added as a way to maintain the original
channel functionality, so this PR should not alter existing behavior.
* [chore] Fix SourceManager flaky test
Sorting by EndTime is not deterministic, however sorting by StartTime
should be. StartTime is set in a goroutine that's limited by
WithConcurrentUnits, so it should happen in order that the units are
received.
* Sort by unit ID
* Add TravisCI source
* update test to use sourcestest
* Remove jobPage loop
ListByBuild does not support pagination, so this was infinitely
repeating. https://developer.travis-ci.com/resource/jobs#find
* Continue chunking on error
* review updates
* update readme
---------
Co-authored-by: Miccah Castorina <m.castorina93@gmail.com>
* adds func to get scannerPIDs
* add cleanup and call to get pids
* move pid handling to git module
* remove PID logic from main
* refactor testing code to handle different exec name
* cleanup linting errors
* add better logging, fix dir if clause
* some PR fixups
* mod fixup
* add interfaces for helper funcs
* refactor cleanup into main, getPID into git
* lint and test fixups, remove fail on n<2 pids
* simplify pid sorting
* use filepath.Join
* use Args[0] for exec name, fix logger
* formatting fixup
* move functionality into cleantemp pkg
* go mod fixup
* remove redundant testing comment
* fix go.sum issues
* add 15m ticker loop for cleanup
* enclose ticker in function for goroutine defer
fix cleantemp interface
* make time more readable
* add check for non-local Trufflehog PIDs
* allow deletion even if no non-local pids found
* bundle intial cleanup into runCleanup func
* add explicit regex check for tempdir format
* Add UnitHook and NoopHook implementations
The UnitHook tracks metrics per unit of a job, and emits them on a
channel once finished. It should work even if the Source does not
support source units.
* Refactor channel to use an LRU cache instead
An LRU cache has a more favorable failure mode than the channel. With
the channel, if the consumer stopped consuming metrics, scanning would
block. With the LRU cache, metrics will be dropped when space runs out
and a log message emitted.
* Fix bug in chunker that surfaces with a flaky passed in io.Reader
The chunker was previously expecting the passed in io.Reader to always
successfully read a full buffer of data, however it's valid for a Reader
to return less data than requested. When this happens, the chunker would
peek the same data that it then reads in the next iteration of the loop,
causing the same data to be scanned twice.
Co-authored-by: ahrav <ahravdutta02@gmail.com>
* Fix EOF error check
* Use io.ReadFull in Chunker
---------
Co-authored-by: ahrav <ahravdutta02@gmail.com>
This PR updates the S3 source to use explicitly configured credentials if they're available and follow the normal AWS credentials waterfall if they're not. This is irrespective of whether role assumption is configured. This changes the previous behavior, which was to use waterfall credentials only if role assumption was configured and explicitly configured credentials only when it was not.
* added PR and Issue body scanning; adjusted CLI args to fit
* removed print statement from debugging
* removed exclude-commits; adjusted CLI flags
* minor changes to match main branch
* fixing logic
* updating README for --issues and --prs
* Add ability to dynamically scale concurrently running sources
Refactor SourceManager to use a counting semaphore to allow for
dymanically changing limits. This complicated `Wait() error` which needs
to return the first error encountered. We previously got that for free
using `errgroup.Group`, however now we need to handle that ourselves.
`Wait()` needs to return an error for use in the engine to set the
correct exit code.
* Group third party imports together
The previous implementation used int64 for both, which can be mixed up
easily. Using distinct types adds a layer of type safety checked by the
compiler.
This PR implements validation of Gitlab source configuration.
I was hoping to be able to unify more of the implementation of Validate and Chunks, but there was more divergence than I expected. Specifically, Chunks handles a fair number of Gitlab errors that aren't configuration errors (e.g. "Gitlab returned a repo with an unparseable URL"). Accommodating these in the Validate code path felt wrong, and I wasn't able to create a common code path that could accommodate both Validate and Chunks without looking awful.
* Refactor SourceManager to remove Enrollment
Initializing the Source will be the responsibility of the caller. The
SourceManager exposes a GetIDs method for getting a source and job ID.
* Update tests
* Update engine usage
* Update apiClient interface to have one GetIDs method
* Update SourceManager usage in engine
This PR unifies some code paths within the S3 source. This is being done to better support a future implementation of S3 source validation; less code that runs means less code to validate. The logical change is to move the handling of "role-less" operation down the call tree, which allows for a single code path for more of the S3 code.
This PR also fixes a bug that would occur in the (rare) case that the source couldn't create a regional S3 client. Before, an error would be logged, but it would be followed by a panic. Now the bucket in question is skipped.