Commit graph

395 commits

Author SHA1 Message Date
ahrav
570cec7565
[refactor] - Refactor Archive Handling Logic (#2703)
* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* go mod tidy

* update go mod

* go mod tidy

* add comment

* update max depth check to >

* go mod tidy

* rename

* [refactor] - Refactor Archive Handling Logic - Part 4: Non-Archive Data Handling and Cleanup (#2704)

* Handle non-archive data within the DefaultHandler

* make structs and methods private

* Remove non-archive data handling within sources

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* move rewind closer

* reduce log verbosity

* make defaultBufferSize a const

* use correct reader

* address comments

* update test

* [feat] - Add Prometheus Metrics for File Handlers (#2705)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* add metrics for archive depth

* [bug] - Enhanced Archive Handling to Address Interface Constraints (#2710)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* wrap compReader with DiskbufferReader

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* replace diskbuffereader with bufferedfilereader

* updtes

* add metric back

* [bug] -  Fix bug and simplify git cat-file command execution and output handling (#2719)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* Allow git cat-file blob to complete before trying to handle the file

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* revert stuff

* update test

* remove

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* inline
2024-05-10 11:36:06 -07:00
Cody Rose
a317897d66
increase test chan size (#2797)
This test has a race condition. This change makes it less likely to cause a test failure, and is a stopgap measure to de-flake the test while we investigate the underlying issue.
2024-05-07 11:11:11 -04:00
ahrav
3c659a2144
set default buffer size to 64 (#2778) 2024-05-03 08:42:18 -07:00
Zachary Rice
4ea3a1376b
fix for infinite recursion in Postman var sub (#2780)
* fix for infinite recursion

* oneliner
2024-05-02 13:03:03 -05:00
Richard Gomez
13bd783d2d
test(git): change length of chunks (#2767)
This fixes one missed test in #2754 (comment).

The number of chunks doubled because each commit now has metadata + data.
2024-04-30 08:34:12 -04:00
Miccah
6cf3a25a04
[chore] Add some happy path logs to GitLab (#2765) 2024-04-29 16:42:35 -07:00
ahrav
591871977c
Correclty set metrics for enumerated orgs (#2757) 2024-04-29 14:26:46 -07:00
Richard Gomez
11e5febeee
feat(git): scan commit metadata (#2754)
This is a follow-up to #2713 that fixes the strange test error.

As suspected, the failure was caused by additional diffs not being included in the test's expected data.
2024-04-29 16:58:45 -04:00
mountcount
1d92655d97
pkg: fix function names in comment (#2761)
Signed-off-by: mountcount <cuimoman@outlook.com>
2024-04-29 11:21:26 -05:00
Cody Rose
11452e8a57
Revert "feat(git): scan commit metadata (#2713)" (#2747)
This reverts commit 81a9c813a1.
2024-04-25 10:56:48 -04:00
Richard Gomez
81a9c813a1
feat(git): scan commit metadata (#2713)
This fixes #2683. It scans the commit author, committer (which is typically GitHub <noreply@github.com> for GitHub, but can be different), and message.

It also scans Git notes.
2024-04-25 10:13:09 -04:00
Cody Rose
b745cfd495
Enrich Gitlab enumeration logging (#2678)
This PR modifies the GitLab source:

* emits a new "groups enumerated" metric
* logs more information about group enumeration
* emits the repo enumeration metric inside getAllProjectRepos, which means it will work when units are flipped on
* emits the repo enumeration metric more granularly
2024-04-08 10:47:05 -04:00
ahrav
a8132839f8
[chore] - update go-github dep manually (#2664)
* update go-github dep

* remove commented out line
2024-04-03 19:19:14 -07:00
Richard Gomez
3b58a15a84
Fix GitHub enumeration & rate-limiting logic (#2625)
This is a follow-up to #2379.

It fixes the following issues:

GitHub API calls missing rate-limit handling
The fix for Refactor GitHub source #2379 (comment) inadvertently resulting in duplicate API calls
2024-03-29 10:29:46 -04:00
Dustin Decker
612ff1a0f1
Use Lstat to identify non-regular files in filesystem source (#2628)
* Use Lstat to identify non-regular files in filesystem source

* fix test
2024-03-26 15:22:42 -07:00
Richard Gomez
95dc8d6e16
Fix additional GitHub test errors #2614 2024-03-26 09:34:12 -04:00
Richard Gomez
9d4cf87c02
fix(github): resolve panic & test failures (#2608) 2024-03-22 09:49:01 -07:00
Richard Gomez
80e8a67c2d
Refactor GitHub source (#2379)
* refactor(github): cleanup logic

* fix(github): lookup wikis per-repo

* refactor(github): change scanErrs.String output

---------

Co-authored-by: Bill Rich <bill.rich@gmail.com>
2024-03-21 14:07:39 -07:00
Miccah
3a7266e540
[chore] Fix potential resource leak in postman source (#2606)
This moves workspace unpacking to a helper function to leverage a defer,
which ensures the file is always closed.
2024-03-21 10:21:13 -05:00
Zachary Rice
1216fa23c9
strings contain keyword check, add collection name to keywords (#2602) 2024-03-21 09:35:38 -05:00
Zachary Rice
b11ce72338
Postman Source (#2579)
postman source

Co-authored-by: Miccah <m.castorina93@gmail.com>

---------

Co-authored-by: Joe Leon <joe.leon@trufflesec.com>
Co-authored-by: Miccah Castorina <m.castorina93@gmail.com>
2024-03-20 11:36:20 -05:00
Cody Rose
b7f08db1ef
Redact secret in git command output (#2539)
When we fail to clone a git repository we log the command output to help with diagnosis. However, this output can include credentials in certain cases (such as certain errors associated with redirects). We don't want to log credentials when this happens.
2024-03-06 11:51:35 -05:00
Cody Rose
28ed81f0a2
Add naive S3 ignorelist (#2536)
This PR adds the ability to exclude buckets from S3 scans. The capability is pretty rudimentary right now, and does not support globbing. If both lists are specified the source to fail to initialize.
2024-03-05 08:01:20 -05:00
ahrav
3da0c5e125
[feat] - Make the client configurable (#2528)
* Make the client configurable

* add comment

* add backoff option
2024-03-01 13:29:25 -08:00
trufflesteeeve
12ff21f245
Improve Gitlab default URL handling (#2491)
Co-authored-by: Miccah <m.castorina93@gmail.com>
2024-02-28 14:15:11 -05:00
ahrav
9ef5151200
Gitlab scan targets (#2470)
* add method to scan targets

* Add logic to handle targetted scan

* address comments

* remove pagination opts

* add kvp with scan type
2024-02-23 07:40:52 -08:00
Miccah
c60443891b
Add Display method to SourceUnit and Kind member to the CommonSourceUnit (#2450)
* Add Display method to SourceUnit and Kind member to the CommonSourceUnit

* Make SourceUnitID return the ID and a kind

These two values together uniquely represent a unit.
2024-02-20 11:24:13 -08:00
ahrav
5290023c2d
use read full (#2474) 2024-02-20 07:21:16 -08:00
Miccah
216a29d7cf
[chore] Add some doc comments to source manager (#2434) 2024-02-13 07:54:48 -08:00
ahrav
e8006f1bee
2396 since commit stopped working (#2402)
* Ensure we handle commits with no diffs correctly.

* cleanup

* add nil check

* address comments

* move comment

* revert

* add comment
2024-02-13 07:21:22 -08:00
Miccah
74f1553e06
[fix] Add unit information to error returned by ChunkUnit (#2410) 2024-02-12 08:24:31 -08:00
Miccah
8f01326468
[chore] Rename file to legacy_reporters.go (#2406) 2024-02-09 18:17:46 -08:00
Miccah
aace92b64d
Implement SourceUnitEnumChunker for GitLab (#2367)
* Implement SourceUnitEnumChunker for GitLab

* Add GitLab engine integration test

* Use a SliceReporter instead of checking for nil reporters

* Use more generic VisitorReporter

* Merge logic from getReposFromGitlab into getAllProjectRepos

* Update integration test to have a lower bound

Unfortunately, the GitLab integration test does not appear to be
deterministic. Sometimes 36390 chunks are found, sometimes 36312, or
even lower.
2024-02-09 11:06:31 -08:00
Miccah
dd4d4a8a96
Refactor UnitHook to block the scan if finished metrics aren't handled (#2309)
* Refactor UnitHook to block the scan if finished metrics aren't handled

* Log once when back-pressure is detected

* Add hook channel size metric

* Use plural "metrics" for consistency

* Replace LRU cache with map
2024-02-08 14:50:58 -08:00
Richard Gomez
b3ff12d1e9
Fix handling of GitHub ratelimit information (#2041)
This is a follow-up to #1912, which used the headers from the response to determine rate-limiting information, instead of using the values from RateLimitError.Rate. Although that logic seemed solid, I discovered that it did not work in some circumstances. This lead to the "unexpected" path more often than intended, and periodic instances where requests would be made before the ratelimit was refreshed.
2024-02-07 09:11:12 -05:00
ahrav
7b492a690a
[feat] - use diff chan (#2387)
* use diff chan

* address comments

* add comment

* address comments

* use old ordering

* add correct author line

* Add required *Commit arg to newDiff

* address comments
2024-02-06 10:06:10 -08:00
Miccah
01c9ac7b59
Fix binary file hanging bug in git sources (#2388)
Waiting for the sub-command will block until all of `stdout` has been
read. In some cases, we return early due to failed chunking without
reading all of the data, and thus, get stuck waiting for the command to
finish. Closing the pipe will ensure `Wait` does not block on that I/O.
2024-02-05 15:28:49 -08:00
ahrav
135cc3eb69
[fixup] - correctly use the buffered file writer (#2373)
* correctly use the buffered file writer

* use value from source

* reorder fields

* use only the DetectorKey as a map field

* address comments and use factory function

* fix optional params

* remove commented out code
2024-02-05 10:43:55 -08:00
ahrav
a22874f9f0
[feat] - concurently scan the filesystem source (#2364)
* concurently scan the filesystem source

Co-authored-by: Miccah Castorina <m.castorina93@gmail.com>

* fix test

* update test

* remove return

* use error not info

* address comment

---------

Co-authored-by: Miccah Castorina <m.castorina93@gmail.com>
2024-02-03 10:49:14 -08:00
Miccah
27b30e65ed
[chore] Cleanup GitLab source errors (#2345)
* [chore] Cleanup GitLab source errors

* Ungroup compile time interface checks and revert error message
2024-02-02 20:00:34 -08:00
Richard Gomez
8e90c4e669
Scan GitHub wikis #2233 2024-01-31 10:52:24 -05:00
ahrav
9867ce8eb8
Allow for configuring the buffered file writer (#2319)
* Write large diffs to tmp files

* address comments

* Move bufferedfilewriter to own pkg

* update test

* swallow write err

* use buffer pool

* use size vs len

* use interface

* fix test

* update comments

* fix test

* Allow for configuring the buffered file writer

* remove unused

* add missing method

* remove

* remove unused

* move parser and commit struct closer to where they are used

* linter change

* fix snifftest

* address comments

* add more kvp pairs to error

* fix test

* update

* add back missing metadata fields

* address comments

* remove bufferedfile writer

* fix

* address comments

* use unint8

* update interface

* adjust interface

* fix tests

* make linter happy

* fix finalize

* address comments

* update test

* address comments

* lint

* remove guard

* fix test

* fix

* add TODO

* fix tests
2024-01-30 12:51:58 -08:00
ahrav
7c59ff95d5
[feat] - tmp file diffs (#2306)
* Write large diffs to tmp files

* address comments

* Move bufferedfilewriter to own pkg

* update test

* swallow write err

* use buffer pool

* use size vs len

* use interface

* fix test

* update comments

* fix test

* remove unused

* remove

* remove unused

* move parser and commit struct closer to where they are used

* linter change

* add more kvp pairs to error

* fix test

* update

* address comments

* remove bufferedfile writer

* address comments

* adjust interface

* fix finalize

* address comments

* lint

* remove guard

* fix

* add TODO
2024-01-30 12:30:51 -08:00
Miccah
6824eb41ea
Fix filesystem enumeration ignore paths bug (#2355) 2024-01-30 12:21:37 -08:00
Richard Gomez
38eb5d08e7
Improve GitHub scan logging (#2220)
* feat(github): improve scan logging

* Move metric

---------

Co-authored-by: Dustin Decker <dustin@trufflesec.com>
2024-01-25 22:11:01 -08:00
ahrav
f209b04d5d
add priority semaphore (#2336) 2024-01-24 16:43:56 -08:00
Miccah
4c698fc1e8
Walk directories in filesystem source enumeration (#2313)
* Walk directories in filesystem source enumeration

* Ignore all directories instead of just the root

* Fix bug with multiple directories

* Skip filesystem TestEnumerate

* Update filesystem enumeration test to create files and folders
2024-01-23 14:57:38 -08:00
Cody Rose
80f2696ae0
Update Gitlab repo count in tests #2333 2024-01-23 15:04:11 -05:00
Miccah
2d96b89554
Add prometheus metrics to measure hook execution time (#2312)
* Add prometheus metrics to measure hook execution time

* Move metrics to separate file and reduce buckets
2024-01-22 11:47:45 -08:00
ahrav
d3d551d24e
[chore] - Update Chunk struct comment (#2317)
* update comment to include information on the importance of struct ordering

* more cute tricks

* remove cute tricks
2024-01-20 13:31:27 -08:00