Commit graph

366 commits

Author SHA1 Message Date
ahrav
e8006f1bee
2396 since commit stopped working (#2402)
* Ensure we handle commits with no diffs correctly.

* cleanup

* add nil check

* address comments

* move comment

* revert

* add comment
2024-02-13 07:21:22 -08:00
Miccah
74f1553e06
[fix] Add unit information to error returned by ChunkUnit (#2410) 2024-02-12 08:24:31 -08:00
Miccah
8f01326468
[chore] Rename file to legacy_reporters.go (#2406) 2024-02-09 18:17:46 -08:00
Miccah
aace92b64d
Implement SourceUnitEnumChunker for GitLab (#2367)
* Implement SourceUnitEnumChunker for GitLab

* Add GitLab engine integration test

* Use a SliceReporter instead of checking for nil reporters

* Use more generic VisitorReporter

* Merge logic from getReposFromGitlab into getAllProjectRepos

* Update integration test to have a lower bound

Unfortunately, the GitLab integration test does not appear to be
deterministic. Sometimes 36390 chunks are found, sometimes 36312, or
even lower.
2024-02-09 11:06:31 -08:00
Miccah
dd4d4a8a96
Refactor UnitHook to block the scan if finished metrics aren't handled (#2309)
* Refactor UnitHook to block the scan if finished metrics aren't handled

* Log once when back-pressure is detected

* Add hook channel size metric

* Use plural "metrics" for consistency

* Replace LRU cache with map
2024-02-08 14:50:58 -08:00
Richard Gomez
b3ff12d1e9
Fix handling of GitHub ratelimit information (#2041)
This is a follow-up to #1912, which used the headers from the response to determine rate-limiting information, instead of using the values from RateLimitError.Rate. Although that logic seemed solid, I discovered that it did not work in some circumstances. This lead to the "unexpected" path more often than intended, and periodic instances where requests would be made before the ratelimit was refreshed.
2024-02-07 09:11:12 -05:00
ahrav
7b492a690a
[feat] - use diff chan (#2387)
* use diff chan

* address comments

* add comment

* address comments

* use old ordering

* add correct author line

* Add required *Commit arg to newDiff

* address comments
2024-02-06 10:06:10 -08:00
Miccah
01c9ac7b59
Fix binary file hanging bug in git sources (#2388)
Waiting for the sub-command will block until all of `stdout` has been
read. In some cases, we return early due to failed chunking without
reading all of the data, and thus, get stuck waiting for the command to
finish. Closing the pipe will ensure `Wait` does not block on that I/O.
2024-02-05 15:28:49 -08:00
ahrav
135cc3eb69
[fixup] - correctly use the buffered file writer (#2373)
* correctly use the buffered file writer

* use value from source

* reorder fields

* use only the DetectorKey as a map field

* address comments and use factory function

* fix optional params

* remove commented out code
2024-02-05 10:43:55 -08:00
ahrav
a22874f9f0
[feat] - concurently scan the filesystem source (#2364)
* concurently scan the filesystem source

Co-authored-by: Miccah Castorina <m.castorina93@gmail.com>

* fix test

* update test

* remove return

* use error not info

* address comment

---------

Co-authored-by: Miccah Castorina <m.castorina93@gmail.com>
2024-02-03 10:49:14 -08:00
Miccah
27b30e65ed
[chore] Cleanup GitLab source errors (#2345)
* [chore] Cleanup GitLab source errors

* Ungroup compile time interface checks and revert error message
2024-02-02 20:00:34 -08:00
Richard Gomez
8e90c4e669
Scan GitHub wikis #2233 2024-01-31 10:52:24 -05:00
ahrav
9867ce8eb8
Allow for configuring the buffered file writer (#2319)
* Write large diffs to tmp files

* address comments

* Move bufferedfilewriter to own pkg

* update test

* swallow write err

* use buffer pool

* use size vs len

* use interface

* fix test

* update comments

* fix test

* Allow for configuring the buffered file writer

* remove unused

* add missing method

* remove

* remove unused

* move parser and commit struct closer to where they are used

* linter change

* fix snifftest

* address comments

* add more kvp pairs to error

* fix test

* update

* add back missing metadata fields

* address comments

* remove bufferedfile writer

* fix

* address comments

* use unint8

* update interface

* adjust interface

* fix tests

* make linter happy

* fix finalize

* address comments

* update test

* address comments

* lint

* remove guard

* fix test

* fix

* add TODO

* fix tests
2024-01-30 12:51:58 -08:00
ahrav
7c59ff95d5
[feat] - tmp file diffs (#2306)
* Write large diffs to tmp files

* address comments

* Move bufferedfilewriter to own pkg

* update test

* swallow write err

* use buffer pool

* use size vs len

* use interface

* fix test

* update comments

* fix test

* remove unused

* remove

* remove unused

* move parser and commit struct closer to where they are used

* linter change

* add more kvp pairs to error

* fix test

* update

* address comments

* remove bufferedfile writer

* address comments

* adjust interface

* fix finalize

* address comments

* lint

* remove guard

* fix

* add TODO
2024-01-30 12:30:51 -08:00
Miccah
6824eb41ea
Fix filesystem enumeration ignore paths bug (#2355) 2024-01-30 12:21:37 -08:00
Richard Gomez
38eb5d08e7
Improve GitHub scan logging (#2220)
* feat(github): improve scan logging

* Move metric

---------

Co-authored-by: Dustin Decker <dustin@trufflesec.com>
2024-01-25 22:11:01 -08:00
ahrav
f209b04d5d
add priority semaphore (#2336) 2024-01-24 16:43:56 -08:00
Miccah
4c698fc1e8
Walk directories in filesystem source enumeration (#2313)
* Walk directories in filesystem source enumeration

* Ignore all directories instead of just the root

* Fix bug with multiple directories

* Skip filesystem TestEnumerate

* Update filesystem enumeration test to create files and folders
2024-01-23 14:57:38 -08:00
Cody Rose
80f2696ae0
Update Gitlab repo count in tests #2333 2024-01-23 15:04:11 -05:00
Miccah
2d96b89554
Add prometheus metrics to measure hook execution time (#2312)
* Add prometheus metrics to measure hook execution time

* Move metrics to separate file and reduce buckets
2024-01-22 11:47:45 -08:00
ahrav
d3d551d24e
[chore] - Update Chunk struct comment (#2317)
* update comment to include information on the importance of struct ordering

* more cute tricks

* remove cute tricks
2024-01-20 13:31:27 -08:00
ahrav
8380e1713e
save 8 bytes per chunk (#2310) 2024-01-18 13:20:06 -08:00
Miccah
c5af979aee
Assume unauthenticated github scans have public visibility (#2308) 2024-01-16 14:57:06 -08:00
ahrav
a1dc660f41
[fixup ] - Allow ssh cloning with AWS Code Commit (#2307) 2024-01-16 11:55:17 -08:00
ahrav
651beff492
[feat] - Allow for the use of include/exclude path files for filesystem scans (#2297)
* Allow for the use of include/exclude path files for filesystem scans

* remove oopsie
2024-01-11 15:41:50 -08:00
ahrav
9408425cc6
[chore] - small updates (#2288)
* small updates

* fix logic

* simplify fxn

* remove errors

* use strings.EqualFold
2024-01-11 14:27:10 -08:00
ahrav
677238c96c
Extend memory cache (#2275)
* Extend memory cache to allow for configuring custom expiration and purge interval

* use any for value type

* fix test

* fix test

* address comments

* address

* make new construct more clear

* reduce duplication

* fix test
2024-01-11 08:20:37 -08:00
ahrav
fb927e011b
update test (#2283) 2024-01-10 09:56:21 -08:00
David
24a09bc37d
1833 Fix syslog udp (#1835)
* # 1183 - Update syslog UDP listener deadline

* #1833 - Update syslog UDP listener deadline v2

* #1833 - Update syslog UDP listener deadline v3
2024-01-08 09:59:48 -08:00
Richard Gomez
241e153dfb
fix(gitparse): handle fromFileLine edge case (#2206) 2024-01-04 14:53:08 -08:00
Bill Rich
78d8dd3abf
Add handlerOpts back (#2258) 2023-12-22 12:11:59 -08:00
Bill Rich
ceff786db4
Skip all binaries (#2256)
* Skip all binaries

* Remove noop

* Drop handlerOpts
2023-12-22 12:01:07 -08:00
Dustin Decker
7d93adc1d0
Add skip archive support (#2257) 2023-12-22 11:55:23 -08:00
ahrav
39f0310f1f
[fixup] - Refactor to Pass Reader for Binary Diffs and Archived Data; Optimize /tmp Directory Cleanup (#2253) 2023-12-22 07:41:54 -08:00
Cody Rose
9c8674777c
Dedupe some source log keys (#2250)
The source manager attaches some context keys, but in certain circumstances, they're already present, resulting in duplicate keys. This PR changes the attachment to be conditional. It also adds some new log messages to track source startup progress.
2023-12-21 10:11:52 -08:00
ahrav
07ae9ec870
Fix goroutine leak (#2251) 2023-12-20 21:09:05 -08:00
ahrav
64c7365364
add secretID to chunk (#2242) 2023-12-18 15:27:49 -08:00
ahrav
5c6ce693c1
[feat] - Make skipping binaries configurable (#2226)
* Make skipping binaries configurable

* remove ioutil

* fix

* address comments

* address comments

* use multi-reader

* remove print

* use const

* fix test

* fix my stupidness
2023-12-15 11:46:27 -08:00
Miccah
78b5a95342
[chore] Prevent panic when ChunkError has a nil Unit (#2227) 2023-12-15 11:11:28 -08:00
Richard Gomez
b3040b1227
fix(github): remove unused 'members' var (#2202) 2023-12-14 11:53:24 -08:00
Miccah
f6bbc59bf6
Check for SourceUnit support dynamically in the SourceManager (#2205)
* Check for SourceUnit support dynamically in the SourceManager

* Only call the function if we can use source units
2023-12-14 11:48:15 -08:00
Miccah
9f6a47da3f
[chore] Remove omitempty tags on JobProgressMetrics and UnitMetrics (#2204) 2023-12-12 10:02:56 -08:00
Mike Vanbuskirk
53f060a08e
Add disk buffer tempfile cleanup (#2130)
* add tempfile creation

- break PID retrieval into sep. function

* add tmpfile cleanup func

* add file cleanup to main cleanup func

* refactor file logic to only return name string

* add temp buffer naming to gcs

* add temp buffer naming to s3

* add temp buffer naming to filesystem

* add temp buffer naming to git

* consolidate cleanup functions

- have single function handle both files and dirs
- remove interface(not needed with a single func implementation)
- change calls to `New(...)` to reflect config implementation
- simplify automation in main.go
- update disk-buffer-reader dependency

* integrate changes from pr #2133

* merge main

* checkout from main to revert conflict issues

* re-add buffer logic to git

* interface no longer needed

* move string format to global const

---------

Co-authored-by: Ahrav Dutta <ahrav.dutta@trufflesec.com>
2023-12-11 18:31:50 -05:00
Richard Gomez
d1a2d9e832
chore: propagate log context to handlers (#2191) 2023-12-10 10:30:11 -08:00
ahrav
2728e514d2
move logic to main Chunks method (#2194) 2023-12-08 14:51:24 -08:00
ahrav
2a7813929b
add metrics for gitlab (#2190) 2023-12-08 09:50:09 -08:00
ahrav
4b31b39d6b
[chore] - Refactor common code into a separate function (#2179)
* Refactor common code into a separate function

* rename vars

* make sure to set the scanOptions fields

* address comments
2023-12-08 08:44:35 -08:00
ahrav
b75991850a
[chore] - Compile regex once (#2176)
* move regex compilation out of the fxn

* missed a spot

* merge main
2023-12-07 07:26:27 -08:00
ahrav
0595a3baac
allow targets for the source manager (#2182)
* allow targets to the source manager

* use targets
2023-12-06 16:38:35 -08:00
ahrav
e6bc7f4451
remove unnecessary Git cmd check (#2175) 2023-12-06 13:38:34 -08:00