Commit graph

407 commits

Author SHA1 Message Date
Richard Gomez
9053d8f4de
refactor(github): enumerateWithToken flow & tests (#2880) 2024-05-31 15:53:44 -05:00
James Telfer
0024b6ce77
feat: support docker image history scanning (#2882)
* feat: support docker image history scanning

* refactor: collapse error handling into return

Style suggestion from review feedback.

* fix: associate layers with history entries

Where possible, add the associated layer to the history entry record. This may help tracing any issues discovered.

This also changes the entry reference format to `image-metadata:history:%d:created-by` which _may_ be more self-explanatory.
2024-05-28 14:07:43 -07:00
Richard Gomez
5102e3ae11
test(github): fix some errors (#2774) 2024-05-24 13:03:41 -07:00
Richard Gomez
e53f5bd5c5
Improve handling of Gist URLs (#2653)
* feat(github): handle ghes gists

* fix(github): handle all gist URLs

* refactor(github): helper func to check gist urls
2024-05-24 08:36:30 -07:00
Charlie Gunyon
311494e86e
Elastic adapter (#2727)
* Add stub source and elastic API funcs

* Spawn workers and ship chunks

* Now successfully detects a credential

- Added tests
- Added some documentation comments
- Threaded the passed context through to all the API requests

* Linting fixes

* Add integration tests and resolve some bugs they uncovered

* Logstash -> Elasticsearch

* Add support for --index-pattern

* Add support for --query-json

* Use structs instead of string building to construct a search body

* Support --since-timestamp

* Implement additional authentication methods

* Fix some small bugs

* Refactoring to support --best-effort-scan

* Finish implementation of --best-effort-scan

* Implement scan catch-up

* Finish connecting support for nodes CLI arg

* Add some integration tests around the catchup mechanism

* go mod tidy

* Fix some linting issues

* Remove some debugging Prints

* Move off of _doc

* Remove informational Printf and add informational logging

* Remove debugging logging

* Copy the index from the outer loop as well

* Don't burn up the ES API with rapid requests if there's no work to do in subsequent scans

* No need to export UnitOfWork.AddSearch

* Use a better name for the range query variable when building the timestamp range clause in searches

* Replace some unlocking defers with explicit unlocks to make the synchronized part of the code clearer

* found -> ok

* Remove superfluous buildElasticClient method

---------

Co-authored-by: Charlie Gunyon <charlie@spectral.energy>
2024-05-24 09:38:20 -05:00
Richard Gomez
1441289d41
fix(github): scan user repos (#2814) 2024-05-23 09:40:40 -05:00
Cody Rose
f7214cfee3
Log reasons for GitLab repo exclusion (#2875)
We have some evidence that some GitLab repos are getting incorrectly ignored, but it's not clear why this is happening, so this PR adds some more logging to the relevant code.
2024-05-23 09:08:36 -04:00
ahrav
896e6e7c66
upgrade github dep (#2858) 2024-05-16 14:35:08 -07:00
Zachary Rice
e0351c215a
add tolower to all keywords, and remove return on error for global vars (#2852) 2024-05-16 14:03:03 -05:00
ahrav
ead9dd5748
[refactor] - Create separate handler for non-archive data (#2825)
* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* Handle non-archive data within the DefaultHandler

* make structs and methods private

* Remove non-archive data handling within sources

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* Adjust check for rpm/deb archive type

* add additional deb mime type

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* update comment

* move rewind closer

* reduce log verbosity

* add metrics for file handling

* add metrics for errors

* make defaultBufferSize a const

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* Allow git cat-file blob to complete before trying to handle the file

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* use buffer writer

* update

* refactor

* update context pkg

* revert stuff

* update test

* fix test

* remove

* use correct reader

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* Update write method in contentWriter interface

* Add bufferReadSeekCloser

* update name

* update comment

* fix lint

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* Handle non-archive data within the DefaultHandler

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* update comment

* move rewind closer

* reduce log verbosity

* make defaultBufferSize a const

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* use buffer writer

* update

* refactor

* update context pkg

* revert stuff

* update test

* remove

* rebase

* go mod tidy

* lint check

* update metric to ms

* update metric

* update comments

* dont use ptr

* update

* fix

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* go mod tidy

* update go mod

* Add a buffered file reader

* update comments

* use Buffered File Readder

* return buffer

* update

* fix

* return

* go mod tidy

* merge

* use a shared pool

* use sync.Once

* reorganzie

* remove unused code

* fix double init

* fix stuff

* nil check

* reduce allocations

* updates

* update metrics

* updates

* reset buffer instead of putting it back

* skip binaries

* skip

* concurrently process diffs

* close chan

* concurrently enumerate orgs

* increase workers

* ignore pbix and vsdx files

* add metrics for gitparse's Diffchan

* fix metric

* update metrics

* update

* fix checks

* fix

* inc

* update

* reduce

* Create workers to handle binary files

* modify workers

* updates

* add check

* delete code

* use custom reader

* rename struct

* add nonarchive handler

* fix break

* add comments

* add tests

* refactor

* remove log

* do not scan rpm links

* simplify

* rename var

* rename

* fix benchmark

* add buffer

* buffer

* buffer

* handle panic

* merge main

* merge main

* add recover

* revert stuff

* revert

* revert to using reader

* fixes

* remove

* update

* fixes

* linter

* fix test

* fix comment

* update field name

* fix
2024-05-15 13:40:16 -07:00
cuiyourong
ead4e8fa2d
chore: fix some typos in comments (#2851)
Signed-off-by: cuiyourong <cuiyourong@gmail.com>
2024-05-15 07:36:21 -07:00
ahrav
6df147de58
[feat] - Support bearer auth for docker scans (#2848)
* Support bearer auth for docker scans

* updates

* use no auth by default if no other auth method is provided
2024-05-14 11:30:11 -07:00
ahrav
570cec7565
[refactor] - Refactor Archive Handling Logic (#2703)
* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* go mod tidy

* update go mod

* go mod tidy

* add comment

* update max depth check to >

* go mod tidy

* rename

* [refactor] - Refactor Archive Handling Logic - Part 4: Non-Archive Data Handling and Cleanup (#2704)

* Handle non-archive data within the DefaultHandler

* make structs and methods private

* Remove non-archive data handling within sources

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* move rewind closer

* reduce log verbosity

* make defaultBufferSize a const

* use correct reader

* address comments

* update test

* [feat] - Add Prometheus Metrics for File Handlers (#2705)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* add metrics for archive depth

* [bug] - Enhanced Archive Handling to Address Interface Constraints (#2710)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* wrap compReader with DiskbufferReader

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* replace diskbuffereader with bufferedfilereader

* updtes

* add metric back

* [bug] -  Fix bug and simplify git cat-file command execution and output handling (#2719)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* Allow git cat-file blob to complete before trying to handle the file

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* revert stuff

* update test

* remove

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* inline
2024-05-10 11:36:06 -07:00
Cody Rose
a317897d66
increase test chan size (#2797)
This test has a race condition. This change makes it less likely to cause a test failure, and is a stopgap measure to de-flake the test while we investigate the underlying issue.
2024-05-07 11:11:11 -04:00
ahrav
3c659a2144
set default buffer size to 64 (#2778) 2024-05-03 08:42:18 -07:00
Zachary Rice
4ea3a1376b
fix for infinite recursion in Postman var sub (#2780)
* fix for infinite recursion

* oneliner
2024-05-02 13:03:03 -05:00
Richard Gomez
13bd783d2d
test(git): change length of chunks (#2767)
This fixes one missed test in #2754 (comment).

The number of chunks doubled because each commit now has metadata + data.
2024-04-30 08:34:12 -04:00
Miccah
6cf3a25a04
[chore] Add some happy path logs to GitLab (#2765) 2024-04-29 16:42:35 -07:00
ahrav
591871977c
Correclty set metrics for enumerated orgs (#2757) 2024-04-29 14:26:46 -07:00
Richard Gomez
11e5febeee
feat(git): scan commit metadata (#2754)
This is a follow-up to #2713 that fixes the strange test error.

As suspected, the failure was caused by additional diffs not being included in the test's expected data.
2024-04-29 16:58:45 -04:00
mountcount
1d92655d97
pkg: fix function names in comment (#2761)
Signed-off-by: mountcount <cuimoman@outlook.com>
2024-04-29 11:21:26 -05:00
Cody Rose
11452e8a57
Revert "feat(git): scan commit metadata (#2713)" (#2747)
This reverts commit 81a9c813a1.
2024-04-25 10:56:48 -04:00
Richard Gomez
81a9c813a1
feat(git): scan commit metadata (#2713)
This fixes #2683. It scans the commit author, committer (which is typically GitHub <noreply@github.com> for GitHub, but can be different), and message.

It also scans Git notes.
2024-04-25 10:13:09 -04:00
Cody Rose
b745cfd495
Enrich Gitlab enumeration logging (#2678)
This PR modifies the GitLab source:

* emits a new "groups enumerated" metric
* logs more information about group enumeration
* emits the repo enumeration metric inside getAllProjectRepos, which means it will work when units are flipped on
* emits the repo enumeration metric more granularly
2024-04-08 10:47:05 -04:00
ahrav
a8132839f8
[chore] - update go-github dep manually (#2664)
* update go-github dep

* remove commented out line
2024-04-03 19:19:14 -07:00
Richard Gomez
3b58a15a84
Fix GitHub enumeration & rate-limiting logic (#2625)
This is a follow-up to #2379.

It fixes the following issues:

GitHub API calls missing rate-limit handling
The fix for Refactor GitHub source #2379 (comment) inadvertently resulting in duplicate API calls
2024-03-29 10:29:46 -04:00
Dustin Decker
612ff1a0f1
Use Lstat to identify non-regular files in filesystem source (#2628)
* Use Lstat to identify non-regular files in filesystem source

* fix test
2024-03-26 15:22:42 -07:00
Richard Gomez
95dc8d6e16
Fix additional GitHub test errors #2614 2024-03-26 09:34:12 -04:00
Richard Gomez
9d4cf87c02
fix(github): resolve panic & test failures (#2608) 2024-03-22 09:49:01 -07:00
Richard Gomez
80e8a67c2d
Refactor GitHub source (#2379)
* refactor(github): cleanup logic

* fix(github): lookup wikis per-repo

* refactor(github): change scanErrs.String output

---------

Co-authored-by: Bill Rich <bill.rich@gmail.com>
2024-03-21 14:07:39 -07:00
Miccah
3a7266e540
[chore] Fix potential resource leak in postman source (#2606)
This moves workspace unpacking to a helper function to leverage a defer,
which ensures the file is always closed.
2024-03-21 10:21:13 -05:00
Zachary Rice
1216fa23c9
strings contain keyword check, add collection name to keywords (#2602) 2024-03-21 09:35:38 -05:00
Zachary Rice
b11ce72338
Postman Source (#2579)
postman source

Co-authored-by: Miccah <m.castorina93@gmail.com>

---------

Co-authored-by: Joe Leon <joe.leon@trufflesec.com>
Co-authored-by: Miccah Castorina <m.castorina93@gmail.com>
2024-03-20 11:36:20 -05:00
Cody Rose
b7f08db1ef
Redact secret in git command output (#2539)
When we fail to clone a git repository we log the command output to help with diagnosis. However, this output can include credentials in certain cases (such as certain errors associated with redirects). We don't want to log credentials when this happens.
2024-03-06 11:51:35 -05:00
Cody Rose
28ed81f0a2
Add naive S3 ignorelist (#2536)
This PR adds the ability to exclude buckets from S3 scans. The capability is pretty rudimentary right now, and does not support globbing. If both lists are specified the source to fail to initialize.
2024-03-05 08:01:20 -05:00
ahrav
3da0c5e125
[feat] - Make the client configurable (#2528)
* Make the client configurable

* add comment

* add backoff option
2024-03-01 13:29:25 -08:00
trufflesteeeve
12ff21f245
Improve Gitlab default URL handling (#2491)
Co-authored-by: Miccah <m.castorina93@gmail.com>
2024-02-28 14:15:11 -05:00
ahrav
9ef5151200
Gitlab scan targets (#2470)
* add method to scan targets

* Add logic to handle targetted scan

* address comments

* remove pagination opts

* add kvp with scan type
2024-02-23 07:40:52 -08:00
Miccah
c60443891b
Add Display method to SourceUnit and Kind member to the CommonSourceUnit (#2450)
* Add Display method to SourceUnit and Kind member to the CommonSourceUnit

* Make SourceUnitID return the ID and a kind

These two values together uniquely represent a unit.
2024-02-20 11:24:13 -08:00
ahrav
5290023c2d
use read full (#2474) 2024-02-20 07:21:16 -08:00
Miccah
216a29d7cf
[chore] Add some doc comments to source manager (#2434) 2024-02-13 07:54:48 -08:00
ahrav
e8006f1bee
2396 since commit stopped working (#2402)
* Ensure we handle commits with no diffs correctly.

* cleanup

* add nil check

* address comments

* move comment

* revert

* add comment
2024-02-13 07:21:22 -08:00
Miccah
74f1553e06
[fix] Add unit information to error returned by ChunkUnit (#2410) 2024-02-12 08:24:31 -08:00
Miccah
8f01326468
[chore] Rename file to legacy_reporters.go (#2406) 2024-02-09 18:17:46 -08:00
Miccah
aace92b64d
Implement SourceUnitEnumChunker for GitLab (#2367)
* Implement SourceUnitEnumChunker for GitLab

* Add GitLab engine integration test

* Use a SliceReporter instead of checking for nil reporters

* Use more generic VisitorReporter

* Merge logic from getReposFromGitlab into getAllProjectRepos

* Update integration test to have a lower bound

Unfortunately, the GitLab integration test does not appear to be
deterministic. Sometimes 36390 chunks are found, sometimes 36312, or
even lower.
2024-02-09 11:06:31 -08:00
Miccah
dd4d4a8a96
Refactor UnitHook to block the scan if finished metrics aren't handled (#2309)
* Refactor UnitHook to block the scan if finished metrics aren't handled

* Log once when back-pressure is detected

* Add hook channel size metric

* Use plural "metrics" for consistency

* Replace LRU cache with map
2024-02-08 14:50:58 -08:00
Richard Gomez
b3ff12d1e9
Fix handling of GitHub ratelimit information (#2041)
This is a follow-up to #1912, which used the headers from the response to determine rate-limiting information, instead of using the values from RateLimitError.Rate. Although that logic seemed solid, I discovered that it did not work in some circumstances. This lead to the "unexpected" path more often than intended, and periodic instances where requests would be made before the ratelimit was refreshed.
2024-02-07 09:11:12 -05:00
ahrav
7b492a690a
[feat] - use diff chan (#2387)
* use diff chan

* address comments

* add comment

* address comments

* use old ordering

* add correct author line

* Add required *Commit arg to newDiff

* address comments
2024-02-06 10:06:10 -08:00
Miccah
01c9ac7b59
Fix binary file hanging bug in git sources (#2388)
Waiting for the sub-command will block until all of `stdout` has been
read. In some cases, we return early due to failed chunking without
reading all of the data, and thus, get stuck waiting for the command to
finish. Closing the pipe will ensure `Wait` does not block on that I/O.
2024-02-05 15:28:49 -08:00
ahrav
135cc3eb69
[fixup] - correctly use the buffered file writer (#2373)
* correctly use the buffered file writer

* use value from source

* reorder fields

* use only the DetectorKey as a map field

* address comments and use factory function

* fix optional params

* remove commented out code
2024-02-05 10:43:55 -08:00