Commit graph

325 commits

Author SHA1 Message Date
Miccah
9f6a47da3f
[chore] Remove omitempty tags on JobProgressMetrics and UnitMetrics (#2204) 2023-12-12 10:02:56 -08:00
Mike Vanbuskirk
53f060a08e
Add disk buffer tempfile cleanup (#2130)
* add tempfile creation

- break PID retrieval into sep. function

* add tmpfile cleanup func

* add file cleanup to main cleanup func

* refactor file logic to only return name string

* add temp buffer naming to gcs

* add temp buffer naming to s3

* add temp buffer naming to filesystem

* add temp buffer naming to git

* consolidate cleanup functions

- have single function handle both files and dirs
- remove interface(not needed with a single func implementation)
- change calls to `New(...)` to reflect config implementation
- simplify automation in main.go
- update disk-buffer-reader dependency

* integrate changes from pr #2133

* merge main

* checkout from main to revert conflict issues

* re-add buffer logic to git

* interface no longer needed

* move string format to global const

---------

Co-authored-by: Ahrav Dutta <ahrav.dutta@trufflesec.com>
2023-12-11 18:31:50 -05:00
Richard Gomez
d1a2d9e832
chore: propagate log context to handlers (#2191) 2023-12-10 10:30:11 -08:00
ahrav
2728e514d2
move logic to main Chunks method (#2194) 2023-12-08 14:51:24 -08:00
ahrav
2a7813929b
add metrics for gitlab (#2190) 2023-12-08 09:50:09 -08:00
ahrav
4b31b39d6b
[chore] - Refactor common code into a separate function (#2179)
* Refactor common code into a separate function

* rename vars

* make sure to set the scanOptions fields

* address comments
2023-12-08 08:44:35 -08:00
ahrav
b75991850a
[chore] - Compile regex once (#2176)
* move regex compilation out of the fxn

* missed a spot

* merge main
2023-12-07 07:26:27 -08:00
ahrav
0595a3baac
allow targets for the source manager (#2182)
* allow targets to the source manager

* use targets
2023-12-06 16:38:35 -08:00
ahrav
e6bc7f4451
remove unnecessary Git cmd check (#2175) 2023-12-06 13:38:34 -08:00
ahrav
cb81f7d11a
[feat] - Remove go-git dependency (#2174)
* remove use of go-git for binary files

* fix it

* use limit reader

* fix comment

* fix test

* address comments

* address comments

* address comments
2023-12-06 13:38:01 -08:00
ahrav
13da76d357
skip files we can't scan (#2170) 2023-12-04 13:37:11 -08:00
ahrav
996a11dcc0
[chore] - remove deprecated types (#2168)
* remove deprecated types

* missed one
2023-12-04 13:23:58 -08:00
ahrav
37d9e5eedf
[chore] - Increase pagination limit (#2154)
* increae pagination limit

* rename
2023-12-04 10:14:46 -08:00
ahrav
279f915799
[chore] - fix error comparisons (#2142)
* fix error comparisons

* fix imports
2023-12-01 08:32:41 -08:00
ahrav
52ffab1034
[chore] - fix import name clashes (#2143)
* fix import name clashes

* fix missing var
2023-12-01 06:53:15 -08:00
Miccah
e498c80b3d
Fix nil pointer dereference when checking if a unit IsFinished (#2135) 2023-11-29 14:19:31 -08:00
Miccah
7ecd43ab1e
[chore] Minor cleanup of source_manager.go (#2134) 2023-11-29 11:08:25 -08:00
Miccah
78219a27b3
Call Finish in SourceManager after the semaphore is released (#2121) 2023-11-24 13:22:08 -08:00
Richard Gomez
024aa056b9
chore(github): add a newline between titles and bodies (#2124) 2023-11-23 16:14:28 -08:00
Richard Gomez
1f502fd42c
feat(github): scan issue & pr titles (#1899) 2023-11-22 19:15:27 -08:00
Dustin Decker
75e869faff
Fix forks and repos counter, add metric for orgs enumerated (#2118) 2023-11-21 08:52:33 -08:00
Miccah
39a603d2dc
[chore] Add JSON tags to job metrics (#2114) 2023-11-16 17:08:33 -08:00
ahrav
d334b3075e
move all Git setup into Init method (#2105)
* add proto fields for git

* add uri to proto

* move all git setup into Init method

* fix logic for when to use repoPath
2023-11-16 13:59:53 -08:00
Miccah
9d6bc8c504
Refactor git source to support scanning units (#2083) 2023-11-01 09:52:58 -07:00
Miccah
52600a897a
[chore] Replace chunks channel with ChunkReporter in git based sources (#2082)
ChunkReporter is more flexible and will allow code reuse for unit
chunking. ChanReporter was added as a way to maintain the original
channel functionality, so this PR should not alter existing behavior.
2023-11-01 09:22:44 -07:00
ahrav
95e0090bc2
[chore] - correctly handle input shorter than 512 bytes (#2077)
* correctly handle input shorter than 512 bytes

* add tests

* reorder tests

* add another test case

* update test

* address comment
2023-10-31 16:42:42 -07:00
Miccah
57203a56cd
[chore] Fix SourceManager flaky test (#2059)
* [chore] Fix SourceManager flaky test

Sorting by EndTime is not deterministic, however sorting by StartTime
should be. StartTime is set in a goroutine that's limited by
WithConcurrentUnits, so it should happen in order that the units are
received.

* Sort by unit ID
2023-10-30 19:16:55 -07:00
Dustin Decker
05fae156e1
Add TravisCI source (#1877)
* Add TravisCI source

* update test to use sourcestest

* Remove jobPage loop

ListByBuild does not support pagination, so this was infinitely
repeating. https://developer.travis-ci.com/resource/jobs#find

* Continue chunking on error

* review updates

* update readme

---------

Co-authored-by: Miccah Castorina <m.castorina93@gmail.com>
2023-10-30 07:28:25 -07:00
Mike Vanbuskirk
4636dc08f6
Add temp directory management (#1878)
* adds func to get scannerPIDs

* add cleanup and call to get pids

* move pid handling to git module

* remove PID logic from main

* refactor testing code to handle different exec name

* cleanup linting errors

* add better logging, fix dir if clause

* some PR fixups

* mod fixup

* add interfaces for helper funcs

* refactor cleanup into main, getPID into git

* lint and test fixups, remove fail on n<2 pids

* simplify pid sorting

* use filepath.Join

* use Args[0] for exec name, fix logger

* formatting fixup

* move functionality into cleantemp pkg

* go mod fixup

* remove redundant testing comment

* fix go.sum issues

* add 15m ticker loop for cleanup

* enclose ticker in function for goroutine defer

fix cleantemp interface

* make time more readable

* add check for non-local Trufflehog PIDs

* allow deletion even if no non-local pids found

* bundle intial cleanup into runCleanup func

* add explicit regex check for tempdir format
2023-10-26 12:28:56 -04:00
Bill Rich
c5efa870ff
Use latest dbr (#1955) 2023-10-24 07:52:49 -07:00
Miccah
0b16142d4f
Add UnitHook and NoopHook implementations (#1930)
* Add UnitHook and NoopHook implementations

The UnitHook tracks metrics per unit of a job, and emits them on a
channel once finished. It should work even if the Source does not
support source units.

* Refactor channel to use an LRU cache instead

An LRU cache has a more favorable failure mode than the channel. With
the channel, if the consumer stopped consuming metrics, scanning would
block. With the LRU cache, metrics will be dropped when space runs out
and a log message emitted.
2023-10-23 14:27:01 -07:00
Miccah
b8724e87e6
Use the configured include repositories in the GitHub filter (#1926) 2023-10-20 19:03:28 -07:00
Richard Gomez
3acc65b2fb
chore(github): reduce comment log verbosity (#1922) 2023-10-20 16:16:38 -07:00
Cody Rose
7ac7fa8728
Move Github comments check to fix a test #1927 2023-10-19 19:23:55 -04:00
Richard Gomez
4b821e9732
Handle secondary GitHub ratelimits (#1912)
* fix(github): reduce visibility-related api calls

* fix(github): handle secondary ratelimits
2023-10-19 14:54:45 -04:00
Miccah
758344711a
Export ChunkError fields and add ErrorsFor convenience method (#1920) 2023-10-19 08:46:49 -07:00
Richard Gomez
6ea3a7da4a
fix(github): normalize repo cache (#1897) 2023-10-17 15:07:47 -07:00
Miccah
03dc7cb68d
[chore] Add SourceUnitEnumChunker filesystem tests (#1873)
* [chore] Add SourceUnitEnumChunker filesystem tests

* Ensure reported units are exactly what is expected
2023-10-16 10:42:18 -07:00
Miccah
f09bce3f75
[chore] Fix flaky TestJobProgressElapsedTime (#1872) 2023-10-06 17:05:05 -07:00
ahrav
3d2490ca80
use Repositories field from conn. (#1860) 2023-10-04 13:56:02 -07:00
Miccah
0d451aa806
Fix bug in chunker that surfaces with a flaky passed in io.Reader (#1838)
* Fix bug in chunker that surfaces with a flaky passed in io.Reader

The chunker was previously expecting the passed in io.Reader to always
successfully read a full buffer of data, however it's valid for a Reader
to return less data than requested. When this happens, the chunker would
peek the same data that it then reads in the next iteration of the loop,
causing the same data to be scanned twice.

Co-authored-by: ahrav <ahravdutta02@gmail.com>

* Fix EOF error check

* Use io.ReadFull in Chunker

---------

Co-authored-by: ahrav <ahravdutta02@gmail.com>
2023-10-02 09:38:23 -07:00
ahrav
c4bc8fc7fa
[bug] - correctly check err (#1824)
* correctly check err.

* address comments.

* update.

* add comment.

* update comment.
2023-09-27 15:52:07 -07:00
Cody Rose
e9efed85c2
Use S3 credentials waterfall (#1823)
This PR updates the S3 source to use explicitly configured credentials if they're available and follow the normal AWS credentials waterfall if they're not. This is irrespective of whether role assumption is configured. This changes the previous behavior, which was to use waterfall credentials only if role assumption was configured and explicitly configured credentials only when it was not.
2023-09-27 16:57:47 -04:00
joeleonjr
699547b7d3
consolidated pr and issue descr/comment flags (#1827) 2023-09-27 15:54:02 -04:00
ahrav
bf47fd69bb
Github partial scan (#1804)
* Add ability for targetted partial scans of Github.

* update comment.

* add more tests.

* add additiional test.

* address comments.
2023-09-26 12:38:33 -07:00
joeleonjr
1e42dae734
added PR and Issue body scanning (#1816)
* added PR and Issue body scanning; adjusted CLI args to fit

* removed print statement from debugging

* removed exclude-commits; adjusted CLI flags

* minor changes to match main branch

* fixing logic

* updating README for --issues and --prs
2023-09-26 12:25:48 -04:00
āh̳̕mͭͭͨͩ̐e̘ͬ́͋ͬ̊̓͂d
62b2195502
Adding new function SetProgressOngoing to be used when the source does not yet know how many items it is scanning and does not want to display a percentage complete. (#1802)
Co-Authored-By: @mcastorina
2023-09-21 13:26:10 -04:00
Miccah
efa404942a
Add ability to dynamically scale concurrently running sources (#1790)
* Add ability to dynamically scale concurrently running sources

Refactor SourceManager to use a counting semaphore to allow for
dymanically changing limits. This complicated `Wait() error` which needs
to return the first error encountered. We previously got that for free
using `errgroup.Group`, however now we need to handle that ourselves.
`Wait()` needs to return an error for use in the engine to set the
correct exit code.

* Group third party imports together
2023-09-20 16:49:56 -07:00
ahrav
22876f8381
replace interface{} with any. (#1771) 2023-09-15 04:35:15 -07:00
Miccah
dbcb888063
Update Source interface to use SourceID and JobID types (#1774)
The previous implementation used int64 for both, which can be mixed up
easily. Using distinct types adds a layer of type safety checked by the
compiler.
2023-09-14 11:28:24 -07:00