Commit graph

38 commits

Author SHA1 Message Date
kegsay
6de29c1cd2
bugfix: E2EE device keys could sometimes not be sent to remote servers (#2466)
* Fix flakey sytest 'Local device key changes get to remote servers'

* Debug logs

* Remove internal/test and use /test only

Remove a lot of ancient code too.

* Use FederationRoomserverAPI in more places

* Use more interfaces in federationapi; begin adding regression test

* Linting

* Add regression test

* Unbreak tests

* ALL THE LOGS

* Fix a race condition which could cause events to not be sent to servers

If a new room event which rewrites state arrives, we remove all joined hosts
then re-calculate them. This wasn't done in a transaction so for a brief period
we would have no joined hosts. During this interim, key change events which arrive
would not be sent to destination servers. This would sporadically fail on sytest.

* Unbreak new tests

* Linting
2022-05-17 13:23:35 +01:00
kegsay
9957752a9d
Define component interfaces based on consumers (2/2) (#2425)
* convert remaining interfaces

* Tidy up the userapi interfaces
2022-05-05 19:30:38 +01:00
S7evinK
49dc49b232
Remove eduserver (#2306)
* Move receipt sending to own JetStream producer

* Move SendToDevice to producer

* Remove most parts of the EDU server

* Fix SendToDevice & copyrights

* Move structs, cleanup EDU Server traces

* Use HeadersOnly subscription

* Missing file

* Fix linter issues

* Move consumers to own files

* Rename durable consumer; Consumer cleanup

* Docs/config cleanup
2022-03-29 14:14:35 +02:00
Neil Alexander
5106cc807c
Ensure only one transaction is used for RS input per room (#2178)
* Ensure the input API only uses a single transaction

* Remove more of the dead query API call

* Tidy up

* Fix tests hopefully

* Don't do unnecessary work for rooms that don't exist

* Improve error, fix another case where transaction wasn't used properly

* Add a unit test for checking single transaction on RS input API

* Fix logic oops when deciding whether to use a transaction in storeEvent
2022-02-11 17:40:14 +00:00
Neil Alexander
a763cbb0e1
Roomserver/federation input refactor (#2104)
* Put federation client functions into their own file

* Look for missing auth events in RS input

* Remove retrieveMissingAuthEvents from federation API

* Logging

* Sorta transplanted the code over

* Use event origin failing all else

* Don't get stuck on mutexes:

* Add verifier

* Don't mark state events with zero snapshot NID as not existing

* Check missing state if not an outlier before storing the event

* Reject instead of soft-fail, don't copy roominfo so much

* Use synchronous contexts, limit time to fetch missing events

* Clean up some commented out bits

* Simplify `/send` endpoint significantly

* Submit async

* Report errors on sending to RS input

* Set max payload in NATS to 16MB

* Tweak metrics

* Add `workerForRoom` for tidiness

* Try skipping unmarshalling errors for RespMissingEvents

* Track missing prev events separately to avoid calculating state when not possible

* Tweak logic around checking missing state

* Care about state when checking missing prev events

* Don't check missing state for create events

* Try that again

* Handle create events better

* Send create room events as new

* Use given event kind when sending auth/state events

* Revert "Use given event kind when sending auth/state events"

This reverts commit 089d64d271.

* Only search for missing prev events or state for new events

* Tweaks

* We only have missing prev if we don't supply state

* Room version tweaks

* Allow async inputs again

* Apply backpressure to consumers/synchronous requests to hopefully stop things being overwhelmed

* Set timeouts on roomserver input tasks (need to decide what timeout makes sense)

* Use work queue policy, deliver all on restart

* Reduce chance of duplicates being sent by NATS

* Limit the number of servers we attempt to reduce backpressure

* Some review comment fixes

* Tidy up a couple things

* Don't limit servers, randomise order using map

* Some context refactoring

* Update gmsl

* Don't resend create events

* Set stateIDs length correctly or else the roomserver thinks there are missing events when there aren't

* Exclude our own servername

* Try backing off servers

* Make excluding self behaviour optional

* Exclude self from g_m_e

* Update sytest-whitelist

* Update consumers for the roomserver output stream

* Remember to send outliers for state returned from /gme

* Make full HTTP tests less upsetti

* Remove 'If a device list update goes missing, the server resyncs on the next one' from the sytest blacklist

* Remove debugging test

* Fix blacklist again, remove unnecessary duplicate context

* Clearer contexts, don't use background in case there's something happening there

* Don't queue up events more than once in memory

* Correctly identify create events when checking for state

* Fill in gaps again in /gme code

* Remove `AuthEventIDs` from `InputRoomEvent`

* Remove stray field

Co-authored-by: Kegan Dougal <kegan@matrix.org>
2022-01-27 14:29:14 +00:00
Neil Alexander
ff21675c5b
Cross-signing fixes, notifications via sync, federation (#1974)
* Initial work on signing key update EDUs

* Fix build

* Produce/consume EDUs

* Producer logging

* Only produce key change notifications for local users

* Better naming

* Try to notify sync

* Enable feature

* Use key change topic

* Don't bother verifying signatures, validate key lengths if we can, notifier fixes

* Copyright notices

* Remove tests from whitelist until matrix-org/sytest#1117

* Some review comment fixes

* Update to matrix-org/gomatrixserverlib@f9416ac

* Remove unneeded parameter
2021-08-17 13:44:30 +01:00
kegsay
e3df612953
Add tracing to user API (#1948)
Use the trace version in tests so we can just implement the required API functions.
2021-08-03 11:23:25 +01:00
Neil Alexander
c8408a6387
Add more optimised code path for checking if we're in a room (#1909)
* Add more optimised code path for checking if we're in a room

* Fix database queries

* Fix federation API test

* Fix logging

* Review comments

* Make separate API call for room membership
2021-07-09 16:36:45 +01:00
Neil Alexander
3afb161352
Reduce memory usage in federation /send endpoint (#1890)
* More aggressive event caching

* Deduplicate /state results

* Deduplicate more

* Ensure we use the correct list of events when excluding repeated state

* Fixes

* Ensure we track all events we already knew about properly
2021-06-30 10:01:56 +01:00
Kegsay
f8d3a762c4
Add a per-room mutex to federationapi when processing transactions (#1810)
* Add a per-room mutex to federationapi when processing transactions

This has numerous benefits:
 - Prevents us doing lots of state resolutions in busy rooms. Previously, room forks would always result
   in a state resolution being performed immediately, without checking if we were already doing this in
   a different transaction. Now they will queue up, resulting in fewer calls to `/state_ids`, `/g_m_e`, etc.
 - Prevents memory usage from growing too large as a result and potentially OOMing.

And costs:
 - High traffic rooms will be slightly slower due to head-of-line blocking from other servers,
   though this has always been an issue as roomserver has a per-room mutex already.

* Fix unit tests

* Correct mutex lock ordering
2021-03-30 10:01:32 +01:00
Kegsay
b507312d4c
MSC2836 threading: part 2 (#1596)
* Update GMSL

* Add MSC2836EventRelationships to fedsender

* Call MSC2836EventRelationships in reqCtx

* auth remote servers

* Extract room ID and servers from previous events; refactor a bit

* initial cut of federated threading

* Use the right client/fed struct in the response

* Add QueryAuthChain for use with MSC2836

* Add auth chain to federated response

* Fix pointers

* under CI: more logging and enable mscs, nil fix

* Handle direction: up

* Actually send message events to the roomserver..

* Add children and children_hash to unsigned, with tests

* Add logic for exploring threads and tracking children; missing storage functions

* Implement storage functions for children

* Add fetchUnknownEvent

* Do federated hits for include_children if we have unexplored children

* Use /ev_rel rather than /event as the former includes child metadata

* Remove cross-room threading impl

* Enable MSC2836 in the p2p demo

* Namespace mscs db

* Enable msc2836 for ygg

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-12-04 14:11:01 +00:00
Neil Alexander
be7d8595be
Peeking updates (#1607)
* Add unpeek

* Don't allow peeks into encrypted rooms

* Fix send tests

* Update consumers
2020-12-03 11:11:46 +00:00
Neil Alexander
20a01bceb2
Pass pointers to events — reloaded (#1583)
* Pass events as pointers

* Fix lint errors

* Update gomatrixserverlib

* Update gomatrixserverlib

* Update to matrix-org/gomatrixserverlib#240
2020-11-16 15:44:53 +00:00
S7evinK
bcb89ada5e
Implement read receipts (#1528)
* fix conversion from int to string yields a string of one rune, not a string of digits

* Add receipts table to syncapi

* Use StreamingToken as the since value

* Add required method to testEDUProducer

* Make receipt json creation "easier" to read

* Add receipts api to the eduserver

* Add receipts endpoint

* Add eduserver kafka consumer

* Add missing kafka config

* Add passing tests to whitelist

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>

* Fix copy & paste error

* Fix column count error

* Make outbound federation receipts pass

* Make "Inbound federation rejects receipts from wrong remote" pass

* Don't use errors package

* - Add TODO for batching requests
- Rename variable

* Return a better error message

* - Use OutputReceiptEvent instead of InputReceiptEvent as result
- Don't use the errors package for errors
- Defer CloseAndLogIfError to close rows
- Fix Copyright

* Better creation/usage of JoinResponse

* Query all joined rooms instead of just one

* Update gomatrixserverlib

* Add sqlite3 migration

* Add postgres migration

* Ensure required sequence exists before running migrations

* Clarification on comment

* - Fix a bug when creating client receipts
- Use concrete types instead of interface{}

* Remove dead code
Use key for timestamp

* Fix postgres query...

* Remove single purpose struct

* Use key/value directly

* Only apply receipts on initial sync or if edu positions differ,
otherwise we'll be sending the same receipts over and over again.

* Actually update the id, so it is correctly send in syncs

* Set receipt on request to /read_markers

* Fix issue with receipts getting overwritten

* Use fmt.Errorf instead of pkg/errors

* Revert "Add postgres migration"

This reverts commit 722fe5a046.

* Revert "Add sqlite3 migration"

This reverts commit d113b03f64.

* Fix selectRoomReceipts query

* Make golangci-lint happy

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-11-09 18:46:11 +00:00
S7evinK
eccd0d2c1b
Implement forgetting about rooms (#1572)
* Add basic storage methods

* Add internal api handler

* Add check for forgotten room

* Add /rooms/{roomID}/forget endpoint

* Add missing rsAPI method

* Remove unused parameters

* Add passing tests

Signed-off-by: Till Faelligen <tfaelligen@gmail.com>

* Add missing file

* Add postgres migration

* Add sqlite migration

* Use Forgetter to forget room

* Remove empty line

* Update HTTP status codes

It looks like the spec calls for these to be 400, rather than 403: https://matrix.org/docs/spec/client_server/r0.6.1#post-matrix-client-r0-rooms-roomid-forget

Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
2020-11-05 10:19:23 +00:00
Neil Alexander
9d6b77c58a
Try to retrieve missing auth events from multiple servers (#1516)
* Recursively fetch auth events if needed

* Fix processEvent call

* Ask more servers in lookupEvent

* Don't panic!

* Panic at the Disco

* Find servers more aggressively

* Add getServers

* Fix number of servers to 5, don't bail making RespState if auth events missing

* Fix panic

* Ignore missing state events too

* Report number of servers correctly

* Don't reuse request context for /send_join

* Update federation API tests

* Don't recurse processEvents

* Implement getEvents differently
2020-10-13 11:53:20 +01:00
Neil Alexander
8001627cfc
Get missing event tweaks (#1514)
* Adjust backfill to send backward extremity with state before other backfilled events, include prev_events with no state amongst missing events

* Not finished refactor

* Fix test

* Remove isInboundTxn

* Remove debug logging
2020-10-12 15:56:15 +01:00
Neil Alexander
738b829a23
Fetch missing auth events, implement QueryMissingAuthPrevEvents, try other servers in room for /event and /get_missing_events (#1450)
* Try to ask other servers in the room for missing events if the origin won't provide them

* Logging

* More logging

* Implement QueryMissingAuthPrevEvents

* Try to get missing auth events badly

* Use processEvent

* Logging

* Update QueryMissingAuthPrevEvents

* Try to find missing auth events

* Patchy fix for test

* Logging tweaks

* Send auth events as outliers

* Update check in QueryMissingAuthPrevEvents

* Error responses

* More return codes

* Don't return error on reject/soft-fail since it was ultimately handled

* More tweaks

* More error tweaks
2020-09-29 13:40:29 +01:00
Neil Alexander
3013ade84f
Reject make_join for empty rooms (#1439)
* Sanity-check room version on RS event input

* Update gomatrixserverlib

* Reject make_join when no room members are left

* Revert some changes from wrong branch

* Distinguish between room not existing and room being abandoned on this server

* nolint
2020-09-24 16:18:13 +01:00
Kegsay
18231f25b4
Implement rejected events (#1426)
* WIP Event rejection

* Still send back errors for rejected events

Instead, discard them at the federationapi /send layer rather than
re-implementing checks at the clientapi/PerformJoin layer.

* Implement rejected events

Critically, rejected events CAN cause state resolution to happen
as it can merge forks in the DAG. This is fine, _provided_ we
do not add the rejected event when performing state resolution,
which is what this PR does. It also fixes the error handling
when NotAllowed happens, as we were checking too early and needlessly
handling NotAllowed in more than one place.

* Update test to match reality

* Modify InputRoomEvents to no longer return an error

Errors do not serialise across HTTP boundaries in polylith mode,
so instead set fields on the InputRoomEventsResponse. Add `Err()`
function to make the API shape basically the same.

* Remove redundant returns; linting

* Update blacklist
2020-09-16 13:00:52 +01:00
Matthew Hodgson
39507bacc3
Peeking via MSC2753 (#1370)
Initial implementation of MSC2753, as tested by https://github.com/matrix-org/sytest/pull/944.
Doesn't yet handle unpeeks, peeked EDUs, or history viz changing during a peek - these will follow.
https://github.com/matrix-org/dendrite/pull/1370 has full details.
2020-09-10 14:39:18 +01:00
Neil Alexander
895ead8048
Use background context when processing event with missing state (#1403)
* Use background context when processing event with missing state

* Five minute timeout

* Remove context from txnreq, thread through instead

* Fix unit tests
2020-09-07 12:32:40 +01:00
Kegsay
2570418f42
Remove ServerACLs from the current state server (#1390)
* Remove ServerACLs from the current state server

Functionality moved to roomserver

* Nothing to see here, move along
2020-09-04 10:40:58 +01:00
Kegsay
b20386123e
Move currentstateserver API to roomserver (#1387)
* Move currentstateserver API to roomserver

Stub out DB functions for now, nothing uses the roomserver version yet.

* Allow it to startup

* Implement some current-state-server storage interface functions

* Add missing package
2020-09-03 17:20:54 +01:00
Neil Alexander
6cb1a65809
Synchronous invites (#1273)
* Refactor invites to be synchronous

* Fix synchronous invites

* Fix client API return type for send invite error

* Linter

* Restore PerformError on rsAPI.PerformInvite

* Update sytest-whitelist

* Don't override PerformError with normal errors

* Fix error passing

* Un-whitelist a couple of tests

* Update sytest-whitelist

* Try to handle multiple invite rejections better

* nolint

* Update gomatrixserverlib

* Fix /v1/invite test

* Remove replace from go.mod
2020-08-17 11:40:49 +01:00
Neil Alexander
bcdf9577a3
Support for server ACLs (#1261)
* First pass at server ACLs (not efficient)

* Use transaction origin, update whitelist

* Fix federation API test

It's sufficient for us to return nothing in response to current state, so that the server ACL check returns no ACLs.

* More efficient server ACLs - hopefully

* Fix queries

* Fix queries

* Avoid panics by nil pointers

* Bug fixes

* Fix state event type

* Fix mutex

* Update logging

* Ignore port when matching servername

* Use read mutex

* Fix bugs

* Fix sync API test

* Comments

* Add tests, tweaks to behaviour

* Fix test output
2020-08-11 18:19:11 +01:00
Kegsay
4c1e6597c0
Replace publicroomsapi with a combination of clientapi/roomserver/currentstateserver (#1174)
* Use content_value instead of membership

* Fix build

* Replace publicroomsapi with a combination of clientapi/roomserver/currentstateserver

- All public rooms paths are now handled by clientapi
- Requests to (un)publish rooms are sent to the roomserver via `PerformPublish`
  which are stored in a new `published_table.go`
- Requests for public rooms are handled in clientapi by:
    * Fetch all room IDs which are published using `QueryPublishedRooms` on the roomserver.
    * Apply pagination parameters to the slice.
    * Do a `QueryBulkStateContent` request to the currentstateserver to pull out
      required state event *content* (not entire events).
    * Aggregate and return the chunk.

Mostly but not fully implemented (DB queries on currentstateserver are missing)

* Fix pq query

* Make postgres work

* Make sqlite work

* Fix tests

* Unbreak pagination tests

* Linting
2020-07-02 15:41:18 +01:00
Kegsay
002fe05a20
Add PerformInvite and refactor how errors get handled (#1158)
* Add PerformInvite and refactor how errors get handled

- Rename `JoinError` to `PerformError`
- Remove `error` from the API function signature entirely. This forces
  errors to be bundled into `PerformError` which makes it easier for callers
  to detect and handle errors. On network errors, HTTP clients will make a
  `PerformError`.

* Unbreak everything; thanks Go!

* Send back JSONResponse according to the PerformError

* Update federation invite code too
2020-06-24 15:06:14 +01:00
Kegsay
914f6cadce
Add /send restrictions and return correct error codes (#1156)
* Add /send restrictions and return correct error codes

- Max 50 PDUs / 100 EDUs
- Fail the transaction when PDUs contain bad JSON

* Update whitelist

* Unbreak test

* Linting
2020-06-23 13:15:15 +01:00
Kegsay
7c36fb78a7
Fix rooms v3 url paths for good - with tests (#1130)
* Fix rooms v3 url paths for good - with tests

- Add a test rig around `federationapi` to test routing.
- Use `JSONVerifier` over `KeyRing` so we can stub things out more easily.
- Add `test.NopJSONVerifier` which verifies nothing.
- Add `base.BaseMux` which is the original `mux.Router` used to spawn public/internal routers.
- Listen on `base.BaseMux` and not the default serve mux as it cleans paths which we don't want.
- Factor out `ListenAndServe` to `test.ListenAndServe` and add flag for listening on TLS.

* Fix comments

* Linting
2020-06-15 16:57:59 +01:00
Kegsay
0dc4ceaa2d
Minor perf/debugging improvements (#1121)
* Minor perf/debugging improvements

- publicroomsapi: Don't call QueryEventsByID with no event IDs
- appservice: Consume only if there are 1 or more ASes
- roomserver: don't keep a copy of the request "for debugging" - we trace now

* fedsender: return early if we have no destinations

* Unbreak tests
2020-06-12 15:11:33 +01:00
Kegsay
ec7718e7f8
Roomserver API changes (#1118)
* s/QueryBackfill/PerformBackfill/g

* OutputEvent now includes AddStateEvents which contain the full event of extra state events

* Only include adds not the current event

* Get adding state right
2020-06-11 19:50:40 +01:00
Kegsay
25cd2dd1c9
Remove unused internal APIs (#1117) 2020-06-11 15:07:16 +01:00
Kegsay
b7187a9a35
Remove clientapi producers which aren't actually producers (#1111)
* Remove clientapi producers which aren't actually producers

They are actually just convenience wrappers around the internal APIs
for roomserver/eduserver. Move their logic to their respective `api`
packages and call them directly.

* Remove TODO

* unbreak ygg
2020-06-10 12:17:54 +01:00
Neil Alexander
a5d822004d
Send-to-device support (#1072)
* Groundwork for send-to-device messaging

* Update sample config

* Add unstable routing for now

* Send to device consumer in sync API

* Start the send-to-device consumer

* fix indentation in dendrite-config.yaml

* Create send-to-device database tables, other tweaks

* Add some logic for send-to-device messages, add them into sync stream

* Handle incoming send-to-device messages, count them with EDU stream pos

* Undo changes to test

* pq.Array

* Fix sync

* Logging

* Fix a couple of transaction things, fix client API

* Add send-to-device test, hopefully fix bugs

* Comments

* Refactor a bit

* Fix schema

* Fix queries

* Debug logging

* Fix storing and retrieving of send-to-device messages

* Try to avoid database locks

* Update sync position

* Use latest sync position

* Jiggle about sync a bit

* Fix tests

* Break out the retrieval from the update/delete behaviour

* Comments

* nolint on getResponseWithPDUsForCompleteSync

* Try to line up sync tokens again

* Implement wildcard

* Add all send-to-device tests to whitelist, what could possibly go wrong?

* Only care about wildcard when targeted locally

* Deduplicate transactions

* Handle tokens properly, return immediately if waiting send-to-device messages

* Fix sync

* Update sytest-whitelist

* Fix copyright notice (need to do more of this)

* Comments, copyrights

* Return errors from Do, fix dendritejs

* Review comments

* Comments

* Constructor for TransactionWriter

* defletions

* Update gomatrixserverlib, sytest-blacklist
2020-06-01 17:50:19 +01:00
Kegsay
ce5dfbebf9
Implement /get_missing_events (#1022)
* WIP get_missing_events work

* More WIP get_missing_events work

* First working /get_missing_events implementation

Flakey currently due to racing between /sync and /send

* Final tweaks

* Remove log lines

* Linting

* go mod tidy

* Clamp min depth to 0

* sort events by depth because sytest makes me sad

Specifically I think it's
4172585c25/lib/SyTest/Federation/Client.pm (L265)
to blame here.
2020-05-12 16:24:28 +01:00
Kegsay
3b98535dc5
only send new events to RS; add tests for /state_ids and /event (#1011)
* only send new events to RS; add tests for /state_ids and /event

* Review comments: send in auth event order

* Ignore order of state events for this test as RespState.Events is non-deterministic
2020-05-06 18:03:25 +01:00
Kegsay
1294852270
Add tests around federationapi's txnReq (#1010)
* Add necessary stubs for testing txnReq

* Add basic tests
2020-05-06 14:27:02 +01:00