Commit graph

436 commits

Author SHA1 Message Date
Cody Rose
11452e8a57
Revert "feat(git): scan commit metadata (#2713)" (#2747)
This reverts commit 81a9c813a1.
2024-04-25 10:56:48 -04:00
Richard Gomez
81a9c813a1
feat(git): scan commit metadata (#2713)
This fixes #2683. It scans the commit author, committer (which is typically GitHub <noreply@github.com> for GitHub, but can be different), and message.

It also scans Git notes.
2024-04-25 10:13:09 -04:00
Cody Rose
b745cfd495
Enrich Gitlab enumeration logging (#2678)
This PR modifies the GitLab source:

* emits a new "groups enumerated" metric
* logs more information about group enumeration
* emits the repo enumeration metric inside getAllProjectRepos, which means it will work when units are flipped on
* emits the repo enumeration metric more granularly
2024-04-08 10:47:05 -04:00
ahrav
a8132839f8
[chore] - update go-github dep manually (#2664)
* update go-github dep

* remove commented out line
2024-04-03 19:19:14 -07:00
Richard Gomez
3b58a15a84
Fix GitHub enumeration & rate-limiting logic (#2625)
This is a follow-up to #2379.

It fixes the following issues:

GitHub API calls missing rate-limit handling
The fix for Refactor GitHub source #2379 (comment) inadvertently resulting in duplicate API calls
2024-03-29 10:29:46 -04:00
Dustin Decker
612ff1a0f1
Use Lstat to identify non-regular files in filesystem source (#2628)
* Use Lstat to identify non-regular files in filesystem source

* fix test
2024-03-26 15:22:42 -07:00
Richard Gomez
95dc8d6e16
Fix additional GitHub test errors #2614 2024-03-26 09:34:12 -04:00
Richard Gomez
9d4cf87c02
fix(github): resolve panic & test failures (#2608) 2024-03-22 09:49:01 -07:00
Richard Gomez
80e8a67c2d
Refactor GitHub source (#2379)
* refactor(github): cleanup logic

* fix(github): lookup wikis per-repo

* refactor(github): change scanErrs.String output

---------

Co-authored-by: Bill Rich <bill.rich@gmail.com>
2024-03-21 14:07:39 -07:00
Miccah
3a7266e540
[chore] Fix potential resource leak in postman source (#2606)
This moves workspace unpacking to a helper function to leverage a defer,
which ensures the file is always closed.
2024-03-21 10:21:13 -05:00
Zachary Rice
1216fa23c9
strings contain keyword check, add collection name to keywords (#2602) 2024-03-21 09:35:38 -05:00
Zachary Rice
b11ce72338
Postman Source (#2579)
postman source

Co-authored-by: Miccah <m.castorina93@gmail.com>

---------

Co-authored-by: Joe Leon <joe.leon@trufflesec.com>
Co-authored-by: Miccah Castorina <m.castorina93@gmail.com>
2024-03-20 11:36:20 -05:00
Cody Rose
b7f08db1ef
Redact secret in git command output (#2539)
When we fail to clone a git repository we log the command output to help with diagnosis. However, this output can include credentials in certain cases (such as certain errors associated with redirects). We don't want to log credentials when this happens.
2024-03-06 11:51:35 -05:00
Cody Rose
28ed81f0a2
Add naive S3 ignorelist (#2536)
This PR adds the ability to exclude buckets from S3 scans. The capability is pretty rudimentary right now, and does not support globbing. If both lists are specified the source to fail to initialize.
2024-03-05 08:01:20 -05:00
ahrav
3da0c5e125
[feat] - Make the client configurable (#2528)
* Make the client configurable

* add comment

* add backoff option
2024-03-01 13:29:25 -08:00
trufflesteeeve
12ff21f245
Improve Gitlab default URL handling (#2491)
Co-authored-by: Miccah <m.castorina93@gmail.com>
2024-02-28 14:15:11 -05:00
ahrav
9ef5151200
Gitlab scan targets (#2470)
* add method to scan targets

* Add logic to handle targetted scan

* address comments

* remove pagination opts

* add kvp with scan type
2024-02-23 07:40:52 -08:00
Miccah
c60443891b
Add Display method to SourceUnit and Kind member to the CommonSourceUnit (#2450)
* Add Display method to SourceUnit and Kind member to the CommonSourceUnit

* Make SourceUnitID return the ID and a kind

These two values together uniquely represent a unit.
2024-02-20 11:24:13 -08:00
ahrav
5290023c2d
use read full (#2474) 2024-02-20 07:21:16 -08:00
Miccah
216a29d7cf
[chore] Add some doc comments to source manager (#2434) 2024-02-13 07:54:48 -08:00
ahrav
e8006f1bee
2396 since commit stopped working (#2402)
* Ensure we handle commits with no diffs correctly.

* cleanup

* add nil check

* address comments

* move comment

* revert

* add comment
2024-02-13 07:21:22 -08:00
Miccah
74f1553e06
[fix] Add unit information to error returned by ChunkUnit (#2410) 2024-02-12 08:24:31 -08:00
Miccah
8f01326468
[chore] Rename file to legacy_reporters.go (#2406) 2024-02-09 18:17:46 -08:00
Miccah
aace92b64d
Implement SourceUnitEnumChunker for GitLab (#2367)
* Implement SourceUnitEnumChunker for GitLab

* Add GitLab engine integration test

* Use a SliceReporter instead of checking for nil reporters

* Use more generic VisitorReporter

* Merge logic from getReposFromGitlab into getAllProjectRepos

* Update integration test to have a lower bound

Unfortunately, the GitLab integration test does not appear to be
deterministic. Sometimes 36390 chunks are found, sometimes 36312, or
even lower.
2024-02-09 11:06:31 -08:00
Miccah
dd4d4a8a96
Refactor UnitHook to block the scan if finished metrics aren't handled (#2309)
* Refactor UnitHook to block the scan if finished metrics aren't handled

* Log once when back-pressure is detected

* Add hook channel size metric

* Use plural "metrics" for consistency

* Replace LRU cache with map
2024-02-08 14:50:58 -08:00
Richard Gomez
b3ff12d1e9
Fix handling of GitHub ratelimit information (#2041)
This is a follow-up to #1912, which used the headers from the response to determine rate-limiting information, instead of using the values from RateLimitError.Rate. Although that logic seemed solid, I discovered that it did not work in some circumstances. This lead to the "unexpected" path more often than intended, and periodic instances where requests would be made before the ratelimit was refreshed.
2024-02-07 09:11:12 -05:00
ahrav
7b492a690a
[feat] - use diff chan (#2387)
* use diff chan

* address comments

* add comment

* address comments

* use old ordering

* add correct author line

* Add required *Commit arg to newDiff

* address comments
2024-02-06 10:06:10 -08:00
Miccah
01c9ac7b59
Fix binary file hanging bug in git sources (#2388)
Waiting for the sub-command will block until all of `stdout` has been
read. In some cases, we return early due to failed chunking without
reading all of the data, and thus, get stuck waiting for the command to
finish. Closing the pipe will ensure `Wait` does not block on that I/O.
2024-02-05 15:28:49 -08:00
ahrav
135cc3eb69
[fixup] - correctly use the buffered file writer (#2373)
* correctly use the buffered file writer

* use value from source

* reorder fields

* use only the DetectorKey as a map field

* address comments and use factory function

* fix optional params

* remove commented out code
2024-02-05 10:43:55 -08:00
ahrav
a22874f9f0
[feat] - concurently scan the filesystem source (#2364)
* concurently scan the filesystem source

Co-authored-by: Miccah Castorina <m.castorina93@gmail.com>

* fix test

* update test

* remove return

* use error not info

* address comment

---------

Co-authored-by: Miccah Castorina <m.castorina93@gmail.com>
2024-02-03 10:49:14 -08:00
Miccah
27b30e65ed
[chore] Cleanup GitLab source errors (#2345)
* [chore] Cleanup GitLab source errors

* Ungroup compile time interface checks and revert error message
2024-02-02 20:00:34 -08:00
Richard Gomez
8e90c4e669
Scan GitHub wikis #2233 2024-01-31 10:52:24 -05:00
ahrav
9867ce8eb8
Allow for configuring the buffered file writer (#2319)
* Write large diffs to tmp files

* address comments

* Move bufferedfilewriter to own pkg

* update test

* swallow write err

* use buffer pool

* use size vs len

* use interface

* fix test

* update comments

* fix test

* Allow for configuring the buffered file writer

* remove unused

* add missing method

* remove

* remove unused

* move parser and commit struct closer to where they are used

* linter change

* fix snifftest

* address comments

* add more kvp pairs to error

* fix test

* update

* add back missing metadata fields

* address comments

* remove bufferedfile writer

* fix

* address comments

* use unint8

* update interface

* adjust interface

* fix tests

* make linter happy

* fix finalize

* address comments

* update test

* address comments

* lint

* remove guard

* fix test

* fix

* add TODO

* fix tests
2024-01-30 12:51:58 -08:00
ahrav
7c59ff95d5
[feat] - tmp file diffs (#2306)
* Write large diffs to tmp files

* address comments

* Move bufferedfilewriter to own pkg

* update test

* swallow write err

* use buffer pool

* use size vs len

* use interface

* fix test

* update comments

* fix test

* remove unused

* remove

* remove unused

* move parser and commit struct closer to where they are used

* linter change

* add more kvp pairs to error

* fix test

* update

* address comments

* remove bufferedfile writer

* address comments

* adjust interface

* fix finalize

* address comments

* lint

* remove guard

* fix

* add TODO
2024-01-30 12:30:51 -08:00
Miccah
6824eb41ea
Fix filesystem enumeration ignore paths bug (#2355) 2024-01-30 12:21:37 -08:00
Richard Gomez
38eb5d08e7
Improve GitHub scan logging (#2220)
* feat(github): improve scan logging

* Move metric

---------

Co-authored-by: Dustin Decker <dustin@trufflesec.com>
2024-01-25 22:11:01 -08:00
ahrav
f209b04d5d
add priority semaphore (#2336) 2024-01-24 16:43:56 -08:00
Miccah
4c698fc1e8
Walk directories in filesystem source enumeration (#2313)
* Walk directories in filesystem source enumeration

* Ignore all directories instead of just the root

* Fix bug with multiple directories

* Skip filesystem TestEnumerate

* Update filesystem enumeration test to create files and folders
2024-01-23 14:57:38 -08:00
Cody Rose
80f2696ae0
Update Gitlab repo count in tests #2333 2024-01-23 15:04:11 -05:00
Miccah
2d96b89554
Add prometheus metrics to measure hook execution time (#2312)
* Add prometheus metrics to measure hook execution time

* Move metrics to separate file and reduce buckets
2024-01-22 11:47:45 -08:00
ahrav
d3d551d24e
[chore] - Update Chunk struct comment (#2317)
* update comment to include information on the importance of struct ordering

* more cute tricks

* remove cute tricks
2024-01-20 13:31:27 -08:00
ahrav
8380e1713e
save 8 bytes per chunk (#2310) 2024-01-18 13:20:06 -08:00
Miccah
c5af979aee
Assume unauthenticated github scans have public visibility (#2308) 2024-01-16 14:57:06 -08:00
ahrav
a1dc660f41
[fixup ] - Allow ssh cloning with AWS Code Commit (#2307) 2024-01-16 11:55:17 -08:00
ahrav
651beff492
[feat] - Allow for the use of include/exclude path files for filesystem scans (#2297)
* Allow for the use of include/exclude path files for filesystem scans

* remove oopsie
2024-01-11 15:41:50 -08:00
ahrav
9408425cc6
[chore] - small updates (#2288)
* small updates

* fix logic

* simplify fxn

* remove errors

* use strings.EqualFold
2024-01-11 14:27:10 -08:00
ahrav
677238c96c
Extend memory cache (#2275)
* Extend memory cache to allow for configuring custom expiration and purge interval

* use any for value type

* fix test

* fix test

* address comments

* address

* make new construct more clear

* reduce duplication

* fix test
2024-01-11 08:20:37 -08:00
ahrav
fb927e011b
update test (#2283) 2024-01-10 09:56:21 -08:00
David
24a09bc37d
1833 Fix syslog udp (#1835)
* # 1183 - Update syslog UDP listener deadline

* #1833 - Update syslog UDP listener deadline v2

* #1833 - Update syslog UDP listener deadline v3
2024-01-08 09:59:48 -08:00
Richard Gomez
241e153dfb
fix(gitparse): handle fromFileLine edge case (#2206) 2024-01-04 14:53:08 -08:00
Bill Rich
78d8dd3abf
Add handlerOpts back (#2258) 2023-12-22 12:11:59 -08:00
Bill Rich
ceff786db4
Skip all binaries (#2256)
* Skip all binaries

* Remove noop

* Drop handlerOpts
2023-12-22 12:01:07 -08:00
Dustin Decker
7d93adc1d0
Add skip archive support (#2257) 2023-12-22 11:55:23 -08:00
ahrav
39f0310f1f
[fixup] - Refactor to Pass Reader for Binary Diffs and Archived Data; Optimize /tmp Directory Cleanup (#2253) 2023-12-22 07:41:54 -08:00
Cody Rose
9c8674777c
Dedupe some source log keys (#2250)
The source manager attaches some context keys, but in certain circumstances, they're already present, resulting in duplicate keys. This PR changes the attachment to be conditional. It also adds some new log messages to track source startup progress.
2023-12-21 10:11:52 -08:00
ahrav
07ae9ec870
Fix goroutine leak (#2251) 2023-12-20 21:09:05 -08:00
ahrav
64c7365364
add secretID to chunk (#2242) 2023-12-18 15:27:49 -08:00
ahrav
5c6ce693c1
[feat] - Make skipping binaries configurable (#2226)
* Make skipping binaries configurable

* remove ioutil

* fix

* address comments

* address comments

* use multi-reader

* remove print

* use const

* fix test

* fix my stupidness
2023-12-15 11:46:27 -08:00
Miccah
78b5a95342
[chore] Prevent panic when ChunkError has a nil Unit (#2227) 2023-12-15 11:11:28 -08:00
Richard Gomez
b3040b1227
fix(github): remove unused 'members' var (#2202) 2023-12-14 11:53:24 -08:00
Miccah
f6bbc59bf6
Check for SourceUnit support dynamically in the SourceManager (#2205)
* Check for SourceUnit support dynamically in the SourceManager

* Only call the function if we can use source units
2023-12-14 11:48:15 -08:00
Miccah
9f6a47da3f
[chore] Remove omitempty tags on JobProgressMetrics and UnitMetrics (#2204) 2023-12-12 10:02:56 -08:00
Mike Vanbuskirk
53f060a08e
Add disk buffer tempfile cleanup (#2130)
* add tempfile creation

- break PID retrieval into sep. function

* add tmpfile cleanup func

* add file cleanup to main cleanup func

* refactor file logic to only return name string

* add temp buffer naming to gcs

* add temp buffer naming to s3

* add temp buffer naming to filesystem

* add temp buffer naming to git

* consolidate cleanup functions

- have single function handle both files and dirs
- remove interface(not needed with a single func implementation)
- change calls to `New(...)` to reflect config implementation
- simplify automation in main.go
- update disk-buffer-reader dependency

* integrate changes from pr #2133

* merge main

* checkout from main to revert conflict issues

* re-add buffer logic to git

* interface no longer needed

* move string format to global const

---------

Co-authored-by: Ahrav Dutta <ahrav.dutta@trufflesec.com>
2023-12-11 18:31:50 -05:00
Richard Gomez
d1a2d9e832
chore: propagate log context to handlers (#2191) 2023-12-10 10:30:11 -08:00
ahrav
2728e514d2
move logic to main Chunks method (#2194) 2023-12-08 14:51:24 -08:00
ahrav
2a7813929b
add metrics for gitlab (#2190) 2023-12-08 09:50:09 -08:00
ahrav
4b31b39d6b
[chore] - Refactor common code into a separate function (#2179)
* Refactor common code into a separate function

* rename vars

* make sure to set the scanOptions fields

* address comments
2023-12-08 08:44:35 -08:00
ahrav
b75991850a
[chore] - Compile regex once (#2176)
* move regex compilation out of the fxn

* missed a spot

* merge main
2023-12-07 07:26:27 -08:00
ahrav
0595a3baac
allow targets for the source manager (#2182)
* allow targets to the source manager

* use targets
2023-12-06 16:38:35 -08:00
ahrav
e6bc7f4451
remove unnecessary Git cmd check (#2175) 2023-12-06 13:38:34 -08:00
ahrav
cb81f7d11a
[feat] - Remove go-git dependency (#2174)
* remove use of go-git for binary files

* fix it

* use limit reader

* fix comment

* fix test

* address comments

* address comments

* address comments
2023-12-06 13:38:01 -08:00
ahrav
13da76d357
skip files we can't scan (#2170) 2023-12-04 13:37:11 -08:00
ahrav
996a11dcc0
[chore] - remove deprecated types (#2168)
* remove deprecated types

* missed one
2023-12-04 13:23:58 -08:00
ahrav
37d9e5eedf
[chore] - Increase pagination limit (#2154)
* increae pagination limit

* rename
2023-12-04 10:14:46 -08:00
ahrav
279f915799
[chore] - fix error comparisons (#2142)
* fix error comparisons

* fix imports
2023-12-01 08:32:41 -08:00
ahrav
52ffab1034
[chore] - fix import name clashes (#2143)
* fix import name clashes

* fix missing var
2023-12-01 06:53:15 -08:00
Miccah
e498c80b3d
Fix nil pointer dereference when checking if a unit IsFinished (#2135) 2023-11-29 14:19:31 -08:00
Miccah
7ecd43ab1e
[chore] Minor cleanup of source_manager.go (#2134) 2023-11-29 11:08:25 -08:00
Miccah
78219a27b3
Call Finish in SourceManager after the semaphore is released (#2121) 2023-11-24 13:22:08 -08:00
Richard Gomez
024aa056b9
chore(github): add a newline between titles and bodies (#2124) 2023-11-23 16:14:28 -08:00
Richard Gomez
1f502fd42c
feat(github): scan issue & pr titles (#1899) 2023-11-22 19:15:27 -08:00
Dustin Decker
75e869faff
Fix forks and repos counter, add metric for orgs enumerated (#2118) 2023-11-21 08:52:33 -08:00
Miccah
39a603d2dc
[chore] Add JSON tags to job metrics (#2114) 2023-11-16 17:08:33 -08:00
ahrav
d334b3075e
move all Git setup into Init method (#2105)
* add proto fields for git

* add uri to proto

* move all git setup into Init method

* fix logic for when to use repoPath
2023-11-16 13:59:53 -08:00
Miccah
9d6bc8c504
Refactor git source to support scanning units (#2083) 2023-11-01 09:52:58 -07:00
Miccah
52600a897a
[chore] Replace chunks channel with ChunkReporter in git based sources (#2082)
ChunkReporter is more flexible and will allow code reuse for unit
chunking. ChanReporter was added as a way to maintain the original
channel functionality, so this PR should not alter existing behavior.
2023-11-01 09:22:44 -07:00
ahrav
95e0090bc2
[chore] - correctly handle input shorter than 512 bytes (#2077)
* correctly handle input shorter than 512 bytes

* add tests

* reorder tests

* add another test case

* update test

* address comment
2023-10-31 16:42:42 -07:00
Miccah
57203a56cd
[chore] Fix SourceManager flaky test (#2059)
* [chore] Fix SourceManager flaky test

Sorting by EndTime is not deterministic, however sorting by StartTime
should be. StartTime is set in a goroutine that's limited by
WithConcurrentUnits, so it should happen in order that the units are
received.

* Sort by unit ID
2023-10-30 19:16:55 -07:00
Dustin Decker
05fae156e1
Add TravisCI source (#1877)
* Add TravisCI source

* update test to use sourcestest

* Remove jobPage loop

ListByBuild does not support pagination, so this was infinitely
repeating. https://developer.travis-ci.com/resource/jobs#find

* Continue chunking on error

* review updates

* update readme

---------

Co-authored-by: Miccah Castorina <m.castorina93@gmail.com>
2023-10-30 07:28:25 -07:00
Mike Vanbuskirk
4636dc08f6
Add temp directory management (#1878)
* adds func to get scannerPIDs

* add cleanup and call to get pids

* move pid handling to git module

* remove PID logic from main

* refactor testing code to handle different exec name

* cleanup linting errors

* add better logging, fix dir if clause

* some PR fixups

* mod fixup

* add interfaces for helper funcs

* refactor cleanup into main, getPID into git

* lint and test fixups, remove fail on n<2 pids

* simplify pid sorting

* use filepath.Join

* use Args[0] for exec name, fix logger

* formatting fixup

* move functionality into cleantemp pkg

* go mod fixup

* remove redundant testing comment

* fix go.sum issues

* add 15m ticker loop for cleanup

* enclose ticker in function for goroutine defer

fix cleantemp interface

* make time more readable

* add check for non-local Trufflehog PIDs

* allow deletion even if no non-local pids found

* bundle intial cleanup into runCleanup func

* add explicit regex check for tempdir format
2023-10-26 12:28:56 -04:00
Bill Rich
c5efa870ff
Use latest dbr (#1955) 2023-10-24 07:52:49 -07:00
Miccah
0b16142d4f
Add UnitHook and NoopHook implementations (#1930)
* Add UnitHook and NoopHook implementations

The UnitHook tracks metrics per unit of a job, and emits them on a
channel once finished. It should work even if the Source does not
support source units.

* Refactor channel to use an LRU cache instead

An LRU cache has a more favorable failure mode than the channel. With
the channel, if the consumer stopped consuming metrics, scanning would
block. With the LRU cache, metrics will be dropped when space runs out
and a log message emitted.
2023-10-23 14:27:01 -07:00
Miccah
b8724e87e6
Use the configured include repositories in the GitHub filter (#1926) 2023-10-20 19:03:28 -07:00
Richard Gomez
3acc65b2fb
chore(github): reduce comment log verbosity (#1922) 2023-10-20 16:16:38 -07:00
Cody Rose
7ac7fa8728
Move Github comments check to fix a test #1927 2023-10-19 19:23:55 -04:00
Richard Gomez
4b821e9732
Handle secondary GitHub ratelimits (#1912)
* fix(github): reduce visibility-related api calls

* fix(github): handle secondary ratelimits
2023-10-19 14:54:45 -04:00
Miccah
758344711a
Export ChunkError fields and add ErrorsFor convenience method (#1920) 2023-10-19 08:46:49 -07:00
Richard Gomez
6ea3a7da4a
fix(github): normalize repo cache (#1897) 2023-10-17 15:07:47 -07:00
Miccah
03dc7cb68d
[chore] Add SourceUnitEnumChunker filesystem tests (#1873)
* [chore] Add SourceUnitEnumChunker filesystem tests

* Ensure reported units are exactly what is expected
2023-10-16 10:42:18 -07:00
Miccah
f09bce3f75
[chore] Fix flaky TestJobProgressElapsedTime (#1872) 2023-10-06 17:05:05 -07:00