Commit graph

17 commits

Author SHA1 Message Date
Richard Gomez
b7411d2922
Clarify "no decoder found for chunk" log message (#3001)
* chore(engine): clarify trace log message

* chore(engine): fix merge conflicts
2024-09-10 13:58:40 -05:00
Cody Rose
9718ec6a51
Capture decoding time metric (#3209)
We're trying to track down some slowness.
2024-08-09 15:19:16 -04:00
ahrav
3cb7aedf4a
[bug] - Add ASCII validation check for base64 decoding (#2671)
* Correclt handle invalid base64 with ascii check

* remove parallel
2024-04-04 16:59:13 -07:00
Richard Gomez
cbc0f0f48e
Create basic escaped unicode decoder (#2456)
* feat(decoders): basic escaped unicode

* wip: handle unicode notation
Experimenting with this.. might remove
2024-03-02 11:27:44 -08:00
ahrav
68f28a0e34
Filter unique detectors by keywords in chunk (#1711)
* pre filter detectors that include the keywords in the chunk.

* Optimize the engine to prevent iterating overing all detectors.

* use sync.Map for concurrent access.

* lint.

* use correct verify.

* allow versioned detectors.

* Break apart Start.

* cleanup.

* Update benchmark.

* add comment.

* remove Engine prefix.

* update comments.

* use regular map.

* delete the pool.

* remove old code.

* refactor ahocorasickcore into own file.

* update comments

* move structs to ahocorasickcore

* update comments

* fix

* address comments

* exported some methods and constructor since it will need to be be used by the enterprise pipeline as well

* remove extra log
2023-10-23 08:02:01 -07:00
Zachary Rice
b48ac24c46
Dedupe results (#1479)
* init 4 dedupin

* use raw rather than rawv2

* rm comment

* comments

* nits

* clean up and use rawv2 too

* add decoder order test
2023-07-11 15:48:00 -05:00
Miccah
fb76eaf17b
Use heuristic to choose the most likely UTF-16 decoded string (#1381)
* Use heuristic to choose the most likely UTF-16 decoded string

* Assume ASCII and include valid BE and LE bytes

* Remove unused code

* Assume ASCII and return nil when not utf16

---------

Co-authored-by: bill-rich <bill.rich@gmail.com>
2023-06-13 17:00:40 -07:00
Brendan Shaklovitz
195f9f0798
Add Base64URLSafe decoder (#1292)
* Add Base64URLSafe decoder

* Add decoder that can decode base64 strings with '_' and '-' instead of
  of '+' and '/'.

* Combine url-safe b64 decoder into b64 decoder
2023-05-18 08:30:47 -07:00
ahrav
34f5db64ae
Small optimizations for the base64 decoder (#1278)
* Small optimizations.

* remove unnecessary timer reset.

* remove blank lines.

* remove test file.

* Move b64 character mapping creation to init.
2023-04-24 11:27:07 -07:00
ahrav
800ac30ea0
optimize base64 decoder. (#1277) 2023-04-20 20:36:46 -07:00
ahrav
abdff53d5d
optimize utf-8 decoder (#1275)
* optimize utf-8 decoder.

* remove string conversion.
2023-04-20 16:52:34 -07:00
ahrav
4116a24b1c
Add utf16 decoder (#1274)
* Add utf16 decoder.

* Add test for utf-8.

* Remove else if.

* optimize to use a single loop.
2023-04-20 15:07:49 -07:00
Bill Rich
d3b24fa592
Replace plain decoder with utf8 (#922) 2022-11-15 09:36:01 -08:00
ahrav
3b1cd65447
Add base64 detectors. (#414) 2022-04-15 12:09:01 -07:00
ahrav
cedb3393d1
[THOG-128] Code cleanup/ OSS onboarding (#117)
* Small amount of code clean up.

* Rename sem to concurrency for better readability and to remove an extra comment.

* fix stashing issue.

Co-authored-by: Ahrav Dutta <ahrav.dutta@trufflesec.com>
2022-04-01 16:47:27 -07:00
Dustin Decker
77418fb3f8 module v3 2022-02-15 18:54:47 -08:00
Dustin Decker
4218c39d99
Initial CLI w/ partially implemented Git source and demo detector (#1) 2022-01-13 12:02:24 -08:00