Commit graph

41 commits

Author SHA1 Message Date
ahrav
42b3a9d999
[perf] - Optimize MIME Type Detection to Reduce Allocations (#3048)
* Streaming file handling.

* cleanup

* update tests

* lint

* defer close on input io.ReadCloser's

* remove redundant mime type detection

* Reduce allocations

* fix test

* update comment

* fix seek bug

* address comment

* undo
2024-07-17 14:04:29 -07:00
ahrav
f865482025
[feat] - Streamlined File Handling with BufferedReaderSeeker (#3041)
* Streaming file handling.

* cleanup

* update tests

* lint

* defer close on input io.ReadCloser's

* fix seek bug

* fix hanging

* clarify errors

* update

* address comments

* revert

* update

* address

* add check to prevent seek without buffering

* revet

* revert

* update comment to make buffer usage more clear
2024-07-17 13:52:18 -07:00
Richard Gomez
235b27964b
fix(handlers): workaround for max archive depth (#2965) 2024-06-14 08:18:05 -07:00
ahrav
2db06f0576
[bug] - Handle empty reader case in newFileReader (#2854)
* Correclty handle empty files

* fix

* fix test
2024-05-15 18:25:36 -07:00
ahrav
ead9dd5748
[refactor] - Create separate handler for non-archive data (#2825)
* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* Handle non-archive data within the DefaultHandler

* make structs and methods private

* Remove non-archive data handling within sources

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* Adjust check for rpm/deb archive type

* add additional deb mime type

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* update comment

* move rewind closer

* reduce log verbosity

* add metrics for file handling

* add metrics for errors

* make defaultBufferSize a const

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* Allow git cat-file blob to complete before trying to handle the file

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* use buffer writer

* update

* refactor

* update context pkg

* revert stuff

* update test

* fix test

* remove

* use correct reader

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* Update write method in contentWriter interface

* Add bufferReadSeekCloser

* update name

* update comment

* fix lint

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* Handle non-archive data within the DefaultHandler

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* update comment

* move rewind closer

* reduce log verbosity

* make defaultBufferSize a const

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* use buffer writer

* update

* refactor

* update context pkg

* revert stuff

* update test

* remove

* rebase

* go mod tidy

* lint check

* update metric to ms

* update metric

* update comments

* dont use ptr

* update

* fix

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* go mod tidy

* update go mod

* Add a buffered file reader

* update comments

* use Buffered File Readder

* return buffer

* update

* fix

* return

* go mod tidy

* merge

* use a shared pool

* use sync.Once

* reorganzie

* remove unused code

* fix double init

* fix stuff

* nil check

* reduce allocations

* updates

* update metrics

* updates

* reset buffer instead of putting it back

* skip binaries

* skip

* concurrently process diffs

* close chan

* concurrently enumerate orgs

* increase workers

* ignore pbix and vsdx files

* add metrics for gitparse's Diffchan

* fix metric

* update metrics

* update

* fix checks

* fix

* inc

* update

* reduce

* Create workers to handle binary files

* modify workers

* updates

* add check

* delete code

* use custom reader

* rename struct

* add nonarchive handler

* fix break

* add comments

* add tests

* refactor

* remove log

* do not scan rpm links

* simplify

* rename var

* rename

* fix benchmark

* add buffer

* buffer

* buffer

* handle panic

* merge main

* merge main

* add recover

* revert stuff

* revert

* revert to using reader

* fixes

* remove

* update

* fixes

* linter

* fix test

* fix comment

* update field name

* fix
2024-05-15 13:40:16 -07:00
cuiyourong
ead4e8fa2d
chore: fix some typos in comments (#2851)
Signed-off-by: cuiyourong <cuiyourong@gmail.com>
2024-05-15 07:36:21 -07:00
ahrav
570cec7565
[refactor] - Refactor Archive Handling Logic (#2703)
* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* Remove specialized handler and archive struct and restructure handlers pkg.

* Refactor RPM archive handlers to use a library instead of shelling out

* make rpm handling context aware

* update test

* Refactor AR/deb archive handler to use an existing library instead of shelling out

* Update tests

* add max size check

* add filename and size to context kvp

* move skip file check and is binary check before opening file

* fix test

* preserve existing funcitonality of not handling non-archive files in HandleFile

* Adjust check for rpm/deb archive type

* add additional deb mime type

* update comment

* go mod tidy

* update go mod

* go mod tidy

* add comment

* update max depth check to >

* go mod tidy

* rename

* [refactor] - Refactor Archive Handling Logic - Part 4: Non-Archive Data Handling and Cleanup (#2704)

* Handle non-archive data within the DefaultHandler

* make structs and methods private

* Remove non-archive data handling within sources

* Handle non-archive data within the DefaultHandler

* rebase

* Remove non-archive data handling within sources

* add gzip

* move diskbuffered rereader setup into handler pkg

* remove DiskBuffereReader creation logic within sources

* move rewind closer

* reduce log verbosity

* make defaultBufferSize a const

* use correct reader

* address comments

* update test

* [feat] - Add Prometheus Metrics for File Handlers (#2705)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* add metrics for archive depth

* [bug] - Enhanced Archive Handling to Address Interface Constraints (#2710)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* wrap compReader with DiskbufferReader

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* replace diskbuffereader with bufferedfilereader

* updtes

* add metric back

* [bug] -  Fix bug and simplify git cat-file command execution and output handling (#2719)

* add metrics for file handling

* add metrics for errors

* add metrics for file handling

* add metrics for errors

* fix tests

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* Address incompatible reader to openArchive

* remove nil check

* fix err assignment

* Allow git cat-file blob to complete before trying to handle the file

* wrap compReader with DiskbufferReader

* Allow git cat-file blob to complete before trying to handle the file

* updates

* revert stuff

* update test

* remove

* add metrics for file handling

* add metrics for errors

* fix tests

* rebase

* add metrics for errors

* add metrics for max archive depth and skipped files

* update error

* skip symlinks and dirs

* update err

* fix err assignment

* rebase

* remove

* update metric to ms

* update comments

* address comments

* reduce indentations

* inline
2024-05-10 11:36:06 -07:00
Cody Rose
b03cc30263
Individuate archive tests #2293 2024-01-12 09:39:48 -05:00
Dustin Decker
7d93adc1d0
Add skip archive support (#2257) 2023-12-22 11:55:23 -08:00
ahrav
39f0310f1f
[fixup] - Refactor to Pass Reader for Binary Diffs and Archived Data; Optimize /tmp Directory Cleanup (#2253) 2023-12-22 07:41:54 -08:00
ahrav
07ae9ec870
Fix goroutine leak (#2251) 2023-12-20 21:09:05 -08:00
ahrav
5848f5b8d6
[bug] - Bug archive handler memory leak (#2247) 2023-12-20 06:16:58 -08:00
ahrav
5c6ce693c1
[feat] - Make skipping binaries configurable (#2226)
* Make skipping binaries configurable

* remove ioutil

* fix

* address comments

* address comments

* use multi-reader

* remove print

* use const

* fix test

* fix my stupidness
2023-12-15 11:46:27 -08:00
ahrav
d8cb65833c
Avoid reading decompressed data into memory (#2196) 2023-12-14 11:00:11 -08:00
ahrav
61c7d52a43
[bug] - close file after reading (#2203)
* close file after reading

* inline return
2023-12-11 15:04:30 -08:00
Richard Gomez
d1a2d9e832
chore: propagate log context to handlers (#2191) 2023-12-10 10:30:11 -08:00
ahrav
331336dc0a
[fixup] - skip files in the archive handler (#2195) 2023-12-08 20:23:32 -08:00
ahrav
990274b596
Skip trying to determine MIME type for directories (#2178) 2023-12-06 12:00:18 -08:00
Miccah
52600a897a
[chore] Replace chunks channel with ChunkReporter in git based sources (#2082)
ChunkReporter is more flexible and will allow code reuse for unit
chunking. ChanReporter was added as a way to maintain the original
channel functionality, so this PR should not alter existing behavior.
2023-11-01 09:22:44 -07:00
ahrav
95e0090bc2
[chore] - correctly handle input shorter than 512 bytes (#2077)
* correctly handle input shorter than 512 bytes

* add tests

* reorder tests

* add another test case

* update test

* address comment
2023-10-31 16:42:42 -07:00
ahrav
a9b056de0a
Centralize logic for checking archive extraction tools (#2063)
* Centralize logic for checking archive extraction tools

* simplify
2023-10-30 20:14:51 -07:00
Bill Rich
c5efa870ff
Use latest dbr (#1955) 2023-10-24 07:52:49 -07:00
ahrav
d2676618c0
[bug] - correclty handle nested archived directories (#1778) 2023-09-15 04:37:15 -07:00
martinohmann
31d17c4f93
fix: add missing error check in archive handler (#1770)
Fixes #1769

The existing error check `errors.Is(err, archiver.ErrNoMatch) && depth >
0` only conditionally handled a specific error.

Any other error case was not short circuited and ended up causing a
nil-pointer dereference further down the method when `format.Name()` was
invoked.
2023-09-13 07:07:40 -07:00
ahrav
f6512ac4ca
Use common chunker for archive handler (#1717)
* optimize the ReadToMax.

* add comment.

* remove dumb comment.

* update comment.

* fix test.

* lint.

* Expired invite link fix (#1713)

* Use comon chunker for archive handler.

---------

Co-authored-by: Zachary Rice <zachary.rice@trufflesec.com>
2023-09-06 09:26:33 -07:00
ahrav
4dc5eb7912
Optimize read to max (#1714)
* optimize the ReadToMax.

* add comment.

* remove dumb comment.

* update comment.

* fix test.

* lint.

* address comments.

* use limit reader.

* update equality check.

* update test.'

* use custom limit reader.

* address comments.

* revert fun.
2023-08-29 17:31:40 -07:00
ahrav
55b9d48e0d
updat test file. (#1637) 2023-08-17 10:16:25 -07:00
ahrav
f3c2d5e6c7
[bug] - Correctly reset reader before handling archive chunk data (#1636)
* Correctly reset reader before handling archive chunk.

* stop the re-reader.
2023-08-17 10:04:43 -07:00
ahrav
e0db575d4a
[chore] - Use custom context for archive handler of specialized archives (#1629)
* Use custom context for archive handler of specialized archives.

* fix arg.

* fix test.

* use re-reader.

* use re-reader.

* Update error and comments.

* Add better error handling.

* update.
2023-08-16 13:52:55 -07:00
ahrav
6ad5659334
Integration of SpecializedHandler for Enhanced Archive Processing (#1625)
* Add handler for .deb file formats.

* Add handler for .rpm file formats.

* update.

* move logic to general archive handler.

* update const.

* Add compile time guard.

* Remove redundant parens.

* Add checks to make sure we have the tools installed to extract arhives.

* Limit size of temp file for arhive reading.

* handle nested archives.

* add comment.

* use consistent name for tempEnv -> env

* fix handler fxn signature.
2023-08-15 16:08:55 -07:00
ahrav
1da7720912
Replace context.TODO. (#1349) 2023-05-19 11:09:51 -07:00
Brendan Shaklovitz
e3213fbdeb
Do extraction after decompression (#1320)
* Fix error where some files do not get properly scanned due to order of
  extraction / decompression steps. Doing decompression first ensures
  that a compressed archive (e.g., gzipped zip file), is handled
  correctly.
2023-05-09 07:56:08 -07:00
Miccah
161e499142
[chore] Remove logrus from trufflehog (#1095)
* [chore] Remove logrus from trufflehog

* Minor fixes

* Fix logFatal call

* Fix logrus call
2023-02-14 17:00:07 -06:00
Bill Rich
7dd2b74f1f
Make archive handler configurable (#1077)
* Make archive handler configurable.

* Use common.IsDone()
2023-02-07 15:25:14 -08:00
Miccah
b3d3f531a4
Return an error from ReadToMax when it panics (#925) 2022-11-16 14:24:05 -06:00
ahrav
28983036a0
only write if the filechunk has len > 0. (#903) 2022-11-05 18:19:41 -07:00
Miccah
ab54ec4072
Check for closed channel in HandleFile (#895)
* Check for closed channel in HandleFile

* Refactor to be more readable

* Fix handler search
2022-11-02 16:35:19 -05:00
Bill Rich
ab71b93f7d
Add context to handler (#877)
* Add context to handler

* Return rather than break out of select
2022-10-28 08:57:55 -07:00
Bill Rich
a30b52f9b0
Use recover to catch panic in dep for old rars (#801) 2022-09-15 18:51:00 -07:00
Bill Rich
248cff8201
Use disk-buffer-reader that implements Seeker and ReaderAt (#787)
* Use disk-buffer-reader that implements Seeker and ReaderAt

* Include test
2022-09-09 09:05:28 -07:00
Bill Rich
7273dc9058
Archive decoder (#683)
* Archive decoder

* Fix reader handling

* Seek error handling

* Add tests

* Fix extra empty chunk

* Sync chunk size
2022-08-02 20:36:21 -07:00