Commit graph

416 commits

Author SHA1 Message Date
Miccah
7ba880f47a
Add AvailableCapacity method to SourceManager (#1665) 2023-08-29 12:36:44 -07:00
ahrav
2b1b1b5ad0
Add jobID to chunk. (#1721) 2023-08-29 12:02:30 -07:00
ahrav
c51e8f8af5
buffer channel. (#1718) 2023-08-28 18:08:31 -07:00
ahrav
0932ea224b
[chore] - Prevent nil deref panic (#1709) 2023-08-26 20:39:50 -07:00
Miccah
5eb776cd61
Support cancelling a run from a JobProgressRef (#1663) 2023-08-25 10:43:33 -07:00
Cody Rose
33eed42e17
Test S3 role assumption (#1655)
This PR adds a test of the S3 role assumption functionality. It currently only tests role assumption within a single account.
2023-08-25 11:30:08 -04:00
Miccah
61977412df
Add SourceName to JobProgressRef (#1664) 2023-08-25 07:48:25 -07:00
ahrav
4f4a79f62b
Support azure git links (#1662)
* Support azure git links.

* update comment.

* update test names.
2023-08-24 14:36:52 -07:00
Miccah
f2bfcc7ac6
Capture source-reported progress in JobProgress snapshot (#1661) 2023-08-24 11:28:50 -07:00
Miccah
a4401939a8
Add ElapsedTime method to JobProgressMetrics (#1660) 2023-08-24 11:28:33 -07:00
ahrav
a2a7a2087e
[chore] - update comments and logs. (#1654)
* update comments and logs.

* Update github.go
2023-08-23 13:18:07 -07:00
ahrav
9ae72308be
Include the job ID in a chunk (#1652)
* Include the job ID in a source's chunk.

* address comments.

* address comments.
2023-08-22 14:00:27 -07:00
Zubair Khan
fd00d2b30b
add rate limit and consumption metrics for GitHub (#1651)
* add rate limit and consumption metrics

* incrment after each repo scanned

* update repo scanned label name
2023-08-22 15:01:59 -04:00
Cody Rose
059ea23a72
update s3 test bucket (#1649)
We're switching our S3 source test account over to a different one, which means we have to change the bucket name.
2023-08-22 12:43:38 -04:00
Miccah
5cfbde783f
Fix reversed ordering of arguments (#1648)
The source manager initialization function was defined as `sourceID`
followed by `jobID`, while the source initialization function is the
reverse. This is confusing and easy to mix up since the parameters are
the same type.

This commit adds a test to make sure the source manager initializes in
the correct order, but it doesn't prevent the library user to make the
same mistake. We may want to consider using different types.
2023-08-22 07:55:56 -07:00
Zubair Khan
9a13c74a35
add thog CLI support for GitHub config validate (#1626)
* add exportable validate function for github

* update validator

* use the context

* gate to prevent panic

* wrap error with context

* wrap error with context for basic auth and unauth
2023-08-22 10:22:39 -04:00
Cody Rose
dbb2c2e319
wait before finishing s3 test (#1647)
The S3 source test verifies that chunking has completed, but it didn't actually wait for completion first, leading to non-deterministic test failures.
2023-08-21 12:36:36 -04:00
ahrav
d51e3b6d83
Only scan gist comments or repo comments. (#1646) 2023-08-20 11:38:28 -07:00
Mike Vanbuskirk
64dd49f9ce
add role assumption for s3 source (#1477)
* add role assumption for s3 source

* refactor role assumption to repeatable string

user can pass array of roles to assume

* refactor s3 chunks to handle passed roleARNs

* add role-session name

use timestamp to make dynamic

* add docstring for rolearn strings()

* make sure role ars are passed into source

* refactor role assumption functionality

break s3 bucket scanning into sep. function

* add log check on assume role

* fix role iteration

- Make sure s3 struct is populated with roles
- add separate new client instantiation for role-based access
- iterates through each role

* add comment

* protobuf revert for merge

* re-run make proto

* lint cleanup

* cleanup TODOs

* drop redundant switch case in assumerole client

* use less verbose 'ctx' designator

* breakout functionality from Chunks

- separate functions for:
- enumerating buckets to scan
- scanning objects within the buckets

* remake protobuf defs

* allow scan to continue on single bucket err

* add readme docs

* minor fixups
2023-08-17 20:30:20 -04:00
ahrav
0ae8cf5d35
[bug] - handle IOOR panic (#1639)
* handle IOOR panic.

* use a better fxn name.

* increae timeout for test to compete.

* simplify code and add test.

* do it for miccah.
2023-08-17 15:47:11 -07:00
ahrav
b8bb94f2b1
[bug] - copy chunk before sending on chunksChan (#1633)
* Redclare chunk before sending on chunksChan.

* add integration test.

* update test.
2023-08-16 16:36:38 -07:00
Miccah
fae54c7ffa
Add ScanChunk to allow injecting Chunks into the SourceManager's channel (#1634)
With the introduction of the SourceManager, the chunks channel became
private and read-only. This provides a method to write chunks into the
channel as we transition away from needing to do that.
2023-08-16 16:09:23 -07:00
Zubair Khan
db89e345d7
correct logging output for github comments and add oss flags (#1632)
* correct logging output

* add flags

* respect oss cli flags for github comment scanning

* improve copy
2023-08-16 18:23:59 -04:00
joeleonjr
fa9469cfc7
Docker scanning by digest (#1615)
* added functionality to scan docker images with digests instead of tags

* cleaned import statement

* added unit test for baseAndTag parsing + remote digest scan
2023-08-11 16:53:12 -05:00
ahrav
e894540632
Use the common chunker for scanning the filesystem source (#1619)
* Use the common chunker for scanning the filesystem source.

* remove unused conts.

* add test.
2023-08-11 13:40:10 -07:00
Bill Rich
2d2595a2e3
Move commits_scanned to ScanRepo (#1610) 2023-08-07 14:28:57 -07:00
ahrav
13999227b9
Use common chunk reader (#1596)
* Add common chunker.

* add comment.

* use better config name.

* Add common chunk reader to s3.

* Add common chunk reader to git, gcs, circleci.

* revert gcs.

* revert gcs.

* fix chunker.

* revert gcs.

* update cancellablewrite.

* revert impl.

* update to remove totalsize.

* Fix my goof.

* Use unified struct in chunkreader.

* return err instead of logging and returning.

* rename error to err.

* only send single ChunkResult even if there is an error and chunkBytes.

* fix logic.
2023-08-07 12:55:28 -07:00
Miccah
1cd600f70f
Use SourceManager in engine (#1586)
* Add SourceManager to Engine struct

* Update Engine methods to use the SourceManager

* Fix GCS test

The original was testing that `Init()` errors weren't surfaced in
`Finish()`, but the `SourceManager` changed that behavior.

* JobProgress race fixes

* Add contextual values

* Remove unused code

* Add debug logs

* Rename WithConcurrency to WithConcurrentSources

* Always forward chunks to the output chunks channel
2023-08-03 13:36:30 -05:00
Miccah
e322c4b29d
Fix nil pointer dereference to git ScanOptions (#1603) 2023-08-03 12:07:24 -05:00
Savely Krasovsky
d062834997
initial support for bare repositories (#1499)
* feat: initial support for bare repositories

* feat: use concatenation instead of formatting and os.Getenv instead of os.Environ

Signed-off-by: Savely Krasovsky <savely@krasovs.ky>

* fix: go-git update with pre-receive hooks fix

Signed-off-by: Savely Krasovsky <savely@krasovs.ky>

* fix: remove info about pre-receive hook from README.md for now

Signed-off-by: Savely Krasovsky <savely@krasovs.ky>

* fix: don't scan staged while using --bare option, fixes to make it work with the latest master

Signed-off-by: Savely Krasovsky <savely@krasovs.ky>

* fix: small refactor according to #1518

Signed-off-by: Savely Krasovsky <savely@krasovs.ky>

---------

Signed-off-by: Savely Krasovsky <savely@krasovs.ky>
2023-08-03 11:23:41 -05:00
ahrav
5a5e8a607e
Common chunk reader (#1594)
* Add common chunker.

* add comment.

* use better config name.

* Add common chunk reader to s3.

* Add common chunk reader to git, gcs, circleci.

* fix chunker.

* revert gcs.

* update cancellablewrite.

* revert impl.

* update to remove totalsize.
2023-08-03 06:27:33 -07:00
Bill Rich
c995e93dcc
Add commits scanned to log (#1600)
* Add commits scanned to log

* Use atomic
2023-08-02 14:10:54 -07:00
ahrav
78d06658ca
Dont return in loop. (#1589) 2023-08-01 10:29:01 -07:00
Miccah
69021f59c5
Refactor git source to allow ScanOptions and use source in engine (#1518)
* Refactor git source to allow ScanOptions and use source in engine

Refactor the Chunks method of the git Source to call out to two helper
methods: scanRepos and scanDirs which scans s.conn.Repositories and
s.conn.Directories respectively. The only notable change in behavior is
that a credential is no longer necessary if there are no
s.conn.Repositories to scan.

* Preserve ScanGit functionality of not cleaning up temporary files
2023-08-01 09:52:02 -05:00
ahrav
5043fc8756
[bug] - Fix unlocking an unlocked mutex (#1583)
* use correct mutext.

* remove unused fxn.
2023-07-31 14:06:41 -07:00
ahrav
eb00d0d4e1
[bug] - fix data races (#1577)
* fix data race.

* Add test and fix additional data race.

* address comments.
2023-07-31 11:12:38 -07:00
Miccah
a07b6664f8
Support fatal errors in job reports (#1562)
* Support fatal errors in job reports

* WIP: JobReporter and JobInspector

* WIP: JobReportHook and JobReportRef

* Add ChunkError type and asyncRun helper method

* Rename JobReport to JobProgress

* Return a closed channel from Done when the JobProgress is nil

* Comment catchFirstFatal function
2023-07-31 11:28:30 -05:00
Cody Rose
ad57de50cd
Do not nest transports for Github installation client (#1564)
#1454 modified one of the Github enumeration code paths in a way that broke an integration test by causing one client's transport to be used for the construction of a different client, causing authentication failures. This saves the original transport for use, fixing the test.
2023-07-31 11:31:16 -04:00
Richard Gomez
e0faac8d1c
Fix runtime error when scanning Gist comments (#1552)
* fix(github): fix runtime error from gist comments

* fix(github): add flag to scan Gist comments
2023-07-31 08:57:42 -05:00
Miccah
e391e89f3e
Initial implementation of JobReport with SourceManager usage (#1557)
* Initial implementation of JobReport with SourceManager usage

* Limit concurrent units

* Only save the last JobReport per handle
2023-07-27 10:49:56 -05:00
Richard Gomez
46823f77c9
feat(github): clarify comment log statement (#1553) 2023-07-26 09:40:30 -05:00
Miccah
10f0963bc9
Add SourceManager tests for Run and Wait methods (#1530)
* Miscellaneous SourceManager updates

* Own the chunks channel instead of accepting it as an input
* Add Chunks and Wait methods
* Fix bug in Enroll so it actually returns the handle
* Add context.Context parameter to the SourceInitFunc type

* Add SourceManager tests for Run and Wait methods

* Rename man variables to mgr
2023-07-26 00:48:28 -05:00
Richard Gomez
2290954b02
fix(github): use apiEndpoint for basic or no auth (#1454) 2023-07-25 20:03:08 -07:00
Bill Rich
f39303495a
Add commitsScanned metrics (#1533)
* Add commitsScanned metrics

* Just keep commit count
2023-07-25 11:31:01 -07:00
ahrav
b5b01d3eba
[chore] - optimize chunker (#1535)
* Use chunkbytes that includes the size of peek.

* linter.

* continue.

* add TotalChunkSize const.
2023-07-24 19:30:29 -07:00
ahrav
9e0a2e9ddd
[chore] - Remove password info from log (#1528)
* Remove password info from log.

* update.

* one more.
2023-07-22 20:25:45 -07:00
Miccah
91c5472876
Implement SourceManager basics (#1515)
* Implement SourceManager basics

* Rename identifiers and add a default headlessAPI implementation

* Rewrite to use SourceInitFunc

* Update variable name to accurately reflect its value
2023-07-21 15:20:25 -05:00
Miccah
4e774d1f01
Define SourceUnit chunking interface (#1484)
* Define SourceUnit chunking interface

* Refactor to use a ChunkReporter interface

* Rename shadowed err to scanErr
2023-07-13 14:11:43 -05:00
Miccah
4b7f94dea1
Rewrite SourceUnitEnumerator to use UnitReporter instead of a channel (#1485) 2023-07-13 13:48:33 -05:00
Richard Gomez
1594fddf05
feat(git): include line in github & gitlab links (#1466) 2023-07-11 20:02:27 -07:00