Commit graph

436 commits

Author SHA1 Message Date
Dustin Decker
8999eab89d
Add central feature flags (#3264)
* Add central feature flags

* use atomic

* tidy
2024-09-03 15:54:41 -07:00
Cody Rose
dbc1464c63
Download files when reverifying (#3252)
The previous implementation of targeted file scanning pulled patches out of commit data, which didn't work for binary files (because GitHub doesn't return patches for them). This PR changes the system to always just download the requested file and scan it, which means we get binary file support.
2024-08-29 16:10:11 -04:00
Cody Rose
3b0b2909ca
Strip leading +/- from github target diffs (#3244)
The GitHub source generates chunks for targeted scans differently than it does for "normal" scans. One difference was the presence of leading + and - characters, which can interfere with detection in some cases.
2024-08-23 15:21:58 -04:00
Miccah
3db9ed7c74
[chore] Fix lint errors (#3218)
* [chore] Fix lint errors under analyzer package

* Fix lint error in source manager test

* Use Sprint instead of Sprintf where appropriate
2024-08-14 13:49:24 -07:00
Cody Rose
f26b502c2e
Auth GitHub in Init (#3131)
The GitHub source currently applies its authentication configuration as the first step of enumeration. This is incompatible with both targeted scans and scan job reports, and also means that authentication logic has to be duplicated into the validation flow. This PR moves it into Init so that it's available to targeted scans and, eventually, unit-specific scans. This also allows us to remove the copy of the old logic that was in Validate.

As part of the work I've also cleaned up the integration test suite. (Several of them were apparently disabled back when they ran on every push, but now that we're not doing that, we can re-enable them.)
2024-08-05 15:13:29 -04:00
ahrav
ddb7211ded
[chore] - set custom transport for the Docker client (#3156)
* set custom transport for docker

* fix lint
2024-08-02 08:51:59 -07:00
joeleonjr
f927076483
quick patch for cfor enumeration (#3155)
Co-authored-by: Joe Leon <joe.leon@trufflesec.com>
2024-08-02 11:12:43 -04:00
Dustin Decker
05e4635824
Add progress bar to CFOR (#3151)
* Add progress bar to CFOR

* unused vars

* explicitly ignore progress errors

* removed print statements

* use stderr

---------

Co-authored-by: joeleonjr <20135619+joeleonjr@users.noreply.github.com>
Co-authored-by: Joe Leon <joe.leon@trufflesec.com>
2024-08-02 07:43:59 -07:00
ahrav
fba1a8b410
[perf] - Leverage pgzip for Parallel decompression (#3149) 2024-08-02 04:11:10 -07:00
joeleonjr
7d606e2480
CFOR Commit Scanner (#3145)
* alpha feature for scanning hidden commits on github

* improvements re: git operations

* lint updates

* updating with exec block due to no gh token

* reworked logic into new source

* fixed collisions threshold flag input

* fixed IOutil issues

* removed additions from GH config

---------

Co-authored-by: Joe Leon <joe.leon@trufflesec.com>
2024-08-01 23:04:20 -04:00
ahrav
048ec26c92
move concurrency (#3135) 2024-07-31 18:58:18 -07:00
Cody Rose
3ab975edb3
Update GitHub integration tests (#3124)
#1816 and #2995 both updated the GitHub source without updating its integration tests. This PR updates those tests, bringing them back into success.
2024-07-31 09:28:10 -04:00
ahrav
55fe05d0b4
fix dep versions (#3106) 2024-07-26 17:44:23 -07:00
shangchenglumetro
c4aab3fb51
chore: fix some comments (#3098) 2024-07-25 10:37:13 -07:00
ahrav
f865482025
[feat] - Streamlined File Handling with BufferedReaderSeeker (#3041)
* Streaming file handling.

* cleanup

* update tests

* lint

* defer close on input io.ReadCloser's

* fix seek bug

* fix hanging

* clarify errors

* update

* address comments

* revert

* update

* address

* add check to prevent seek without buffering

* revet

* revert

* update comment to make buffer usage more clear
2024-07-17 13:52:18 -07:00
Cody Rose
296379d5a0
Log more GitLab stuff (#3040)
Our GitLab happy path logging could use some love.

go.sum also needed a little love, for some reason.
2024-07-09 10:53:40 -04:00
Cody Rose
1a73442088
Order GitLab repos by ID (#3047) 2024-07-09 10:39:47 -04:00
ahrav
7d349ac7f3
remove dead code (#3044) 2024-07-07 08:59:40 -07:00
joeleonjr
01a1499600
New Source: HuggingFace (#3000)
* initial spike on hf

* added in user and org enum

* adding huggingface source

* updated with lint suggestions

* updated readme

* addressing resources that require org approval to access

* removing unneeded code

* updating with new error msg for 403

* deleted unused code + added resource check in main
2024-06-27 13:22:06 -04:00
Richard Gomez
3c20b000e1
fix(git): set GIT_DIR based on ScanOptions.Bare (#3004) 2024-06-24 07:37:45 -07:00
Cody Rose
de19a39f2c
Return targeted scan errors (#2995)
Targeted scans should return their errors so that consumers can process them. By creating a type that combines an error with a targeted secret ID, we can return these errors without having to modify the Source interface.
2024-06-21 13:50:56 -04:00
Zachary Rice
d5b9157d2b
clone more refs (#2988) 2024-06-20 09:40:03 -05:00
Richard Gomez
4addd81e29
test: fix compile errors (#2964) 2024-06-13 08:22:25 -07:00
Richard Gomez
ca67a8aa83
refactor(filesystem): change symlink err handling (#2941) 2024-06-10 13:05:42 -07:00
Richard Gomez
5216142960
refactor(cache): use generics (#2930) 2024-06-06 13:08:00 -04:00
Richard Gomez
40fa304a3a
feat(git): improve scan logging (#2923) 2024-06-06 05:12:59 -04:00
Richard Gomez
4d2c8c6e11
refactor(github): improve wiki err handling (#2917) 2024-06-05 08:06:01 -04:00
Dustin Decker
ef410873f2
Add Jenkins scanning (#2892)
* add jenkins

* whoops

* adding unauthenticated jenkins scanning

* update docs

---------

Co-authored-by: Joe Leon <joe.leon@trufflesec.com>
2024-06-04 07:13:14 -04:00
Miccah
c86b423c61
[chore] Always log git repositories being scanned (#2909) 2024-06-03 18:02:34 -07:00
Richard Gomez
9053d8f4de
refactor(github): enumerateWithToken flow & tests (#2880) 2024-05-31 15:53:44 -05:00
James Telfer
0024b6ce77
feat: support docker image history scanning (#2882)
* feat: support docker image history scanning

* refactor: collapse error handling into return

Style suggestion from review feedback.

* fix: associate layers with history entries

Where possible, add the associated layer to the history entry record. This may help tracing any issues discovered.

This also changes the entry reference format to `image-metadata:history:%d:created-by` which _may_ be more self-explanatory.
2024-05-28 14:07:43 -07:00
Richard Gomez
5102e3ae11
test(github): fix some errors (#2774) 2024-05-24 13:03:41 -07:00
Richard Gomez
e53f5bd5c5
Improve handling of Gist URLs (#2653)
* feat(github): handle ghes gists

* fix(github): handle all gist URLs

* refactor(github): helper func to check gist urls
2024-05-24 08:36:30 -07:00
Charlie Gunyon
311494e86e
Elastic adapter (#2727)
* Add stub source and elastic API funcs

* Spawn workers and ship chunks

* Now successfully detects a credential

- Added tests
- Added some documentation comments
- Threaded the passed context through to all the API requests

* Linting fixes

* Add integration tests and resolve some bugs they uncovered

* Logstash -> Elasticsearch

* Add support for --index-pattern

* Add support for --query-json

* Use structs instead of string building to construct a search body

* Support --since-timestamp

* Implement additional authentication methods

* Fix some small bugs

* Refactoring to support --best-effort-scan

* Finish implementation of --best-effort-scan

* Implement scan catch-up

* Finish connecting support for nodes CLI arg

* Add some integration tests around the catchup mechanism

* go mod tidy

* Fix some linting issues

* Remove some debugging Prints

* Move off of _doc

* Remove informational Printf and add informational logging

* Remove debugging logging

* Copy the index from the outer loop as well

* Don't burn up the ES API with rapid requests if there's no work to do in subsequent scans

* No need to export UnitOfWork.AddSearch

* Use a better name for the range query variable when building the timestamp range clause in searches

* Replace some unlocking defers with explicit unlocks to make the synchronized part of the code clearer

* found -> ok

* Remove superfluous buildElasticClient method

---------

Co-authored-by: Charlie Gunyon <charlie@spectral.energy>
2024-05-24 09:38:20 -05:00
Richard Gomez
1441289d41
fix(github): scan user repos (#2814) 2024-05-23 09:40:40 -05:00
Cody Rose
f7214cfee3
Log reasons for GitLab repo exclusion (#2875)
We have some evidence that some GitLab repos are getting incorrectly ignored, but it's not clear why this is happening, so this PR adds some more logging to the relevant code.
2024-05-23 09:08:36 -04:00
ahrav
896e6e7c66
upgrade github dep (#2858) 2024-05-16 14:35:08 -07:00
Zachary Rice
e0351c215a
add tolower to all keywords, and remove return on error for global vars (#2852) 2024-05-16 14:03:03 -05:00
ahrav
ead9dd5748
[refactor] - Create separate handler for non-archive data (#2825)
* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* Handle non-archive data within the DefaultHandler

* make structs and methods private

* Remove non-archive data handling within sources

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* Adjust check for rpm/deb archive type

* add additional deb mime type

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* update comment

* move rewind closer

* reduce log verbosity

* add metrics for file handling

* add metrics for errors

* make defaultBufferSize a const

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* Allow git cat-file blob to complete before trying to handle the file

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* use buffer writer

* update

* refactor

* update context pkg

* revert stuff

* update test

* fix test

* remove

* use correct reader

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* Update write method in contentWriter interface

* Add bufferReadSeekCloser

* update name

* update comment

* fix lint

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* Handle non-archive data within the DefaultHandler

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* update comment

* move rewind closer

* reduce log verbosity

* make defaultBufferSize a const

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* use buffer writer

* update

* refactor

* update context pkg

* revert stuff

* update test

* remove

* rebase

* go mod tidy

* lint check

* update metric to ms

* update metric

* update comments

* dont use ptr

* update

* fix

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* go mod tidy

* update go mod

* Add a buffered file reader

* update comments

* use Buffered File Readder

* return buffer

* update

* fix

* return

* go mod tidy

* merge

* use a shared pool

* use sync.Once

* reorganzie

* remove unused code

* fix double init

* fix stuff

* nil check

* reduce allocations

* updates

* update metrics

* updates

* reset buffer instead of putting it back

* skip binaries

* skip

* concurrently process diffs

* close chan

* concurrently enumerate orgs

* increase workers

* ignore pbix and vsdx files

* add metrics for gitparse's Diffchan

* fix metric

* update metrics

* update

* fix checks

* fix

* inc

* update

* reduce

* Create workers to handle binary files

* modify workers

* updates

* add check

* delete code

* use custom reader

* rename struct

* add nonarchive handler

* fix break

* add comments

* add tests

* refactor

* remove log

* do not scan rpm links

* simplify

* rename var

* rename

* fix benchmark

* add buffer

* buffer

* buffer

* handle panic

* merge main

* merge main

* add recover

* revert stuff

* revert

* revert to using reader

* fixes

* remove

* update

* fixes

* linter

* fix test

* fix comment

* update field name

* fix
2024-05-15 13:40:16 -07:00
cuiyourong
ead4e8fa2d
chore: fix some typos in comments (#2851)
Signed-off-by: cuiyourong <cuiyourong@gmail.com>
2024-05-15 07:36:21 -07:00
ahrav
6df147de58
[feat] - Support bearer auth for docker scans (#2848)
* Support bearer auth for docker scans

* updates

* use no auth by default if no other auth method is provided
2024-05-14 11:30:11 -07:00
ahrav
570cec7565
[refactor] - Refactor Archive Handling Logic (#2703)
* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* go mod tidy

* update go mod

* go mod tidy

* add comment

* update max depth check to >

* go mod tidy

* rename

* [refactor] - Refactor Archive Handling Logic - Part 4: Non-Archive Data Handling and Cleanup (#2704)

* Handle non-archive data within the DefaultHandler

* make structs and methods private

* Remove non-archive data handling within sources

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* move rewind closer

* reduce log verbosity

* make defaultBufferSize a const

* use correct reader

* address comments

* update test

* [feat] - Add Prometheus Metrics for File Handlers (#2705)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* add metrics for archive depth

* [bug] - Enhanced Archive Handling to Address Interface Constraints (#2710)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* wrap compReader with DiskbufferReader

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* replace diskbuffereader with bufferedfilereader

* updtes

* add metric back

* [bug] -  Fix bug and simplify git cat-file command execution and output handling (#2719)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* Allow git cat-file blob to complete before trying to handle the file

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* revert stuff

* update test

* remove

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* inline
2024-05-10 11:36:06 -07:00
Cody Rose
a317897d66
increase test chan size (#2797)
This test has a race condition. This change makes it less likely to cause a test failure, and is a stopgap measure to de-flake the test while we investigate the underlying issue.
2024-05-07 11:11:11 -04:00
ahrav
3c659a2144
set default buffer size to 64 (#2778) 2024-05-03 08:42:18 -07:00
Zachary Rice
4ea3a1376b
fix for infinite recursion in Postman var sub (#2780)
* fix for infinite recursion

* oneliner
2024-05-02 13:03:03 -05:00
Richard Gomez
13bd783d2d
test(git): change length of chunks (#2767)
This fixes one missed test in #2754 (comment).

The number of chunks doubled because each commit now has metadata + data.
2024-04-30 08:34:12 -04:00
Miccah
6cf3a25a04
[chore] Add some happy path logs to GitLab (#2765) 2024-04-29 16:42:35 -07:00
ahrav
591871977c
Correclty set metrics for enumerated orgs (#2757) 2024-04-29 14:26:46 -07:00
Richard Gomez
11e5febeee
feat(git): scan commit metadata (#2754)
This is a follow-up to #2713 that fixes the strange test error.

As suspected, the failure was caused by additional diffs not being included in the test's expected data.
2024-04-29 16:58:45 -04:00
mountcount
1d92655d97
pkg: fix function names in comment (#2761)
Signed-off-by: mountcount <cuimoman@outlook.com>
2024-04-29 11:21:26 -05:00