As presented in the issue, we might end up in situation when parallel-processing and accepting two neutral+ trade offers will result in unwanted inventory state, because while they're both neutral+ and therefore OK to accept standalone, the combination of them both causes active badge progress degradation.
Considering the requirements we have, e.g. still processing trades in parallel, being performant, low on resources and with limited Steam servers overhead, the solution that I came up with in regards to this issue is quite simple:
- After we determine the trade to be neutral+, but before we tell the parse trade routine to accept it, we check if shared with other parallel processes set of handled sets contains any sets that we're currently processing.
- If no, we update that set to include everything we're dealing with, and tell the caller to accept this trade.
- If yes, we tell the caller to retry this trade after (other) accepted trades are confirmed and handled as usual.
This solves some issues and creates some optimistic assumptions:
- First of all, it solves the original issue, since if trade A and B both touch set S, then only one of them will be accepted. It's not deterministic which one (the one that gets to the check first), and not important anyway.
- We do not "lock" the sets before we determine that trade is neutral+, because otherwise unrelated users could spam us with non-neutral+ trades in order to lock the bot in infinite retry. This way they can't, as if the trade is determined to not be neutral+ then it never checks for concurrent processing.
- We are optimistic about resources usage. This routine could be made much more complicated to be more synchronous in order to avoid unnecessary calls to inventory and matching, however, that'd slow down the whole process only because the next call MAYBE will be determined as unneeded. Due to that, ASF is optimistic that trades will (usually) be unrelated, and can be processed in parallel, and if the conflict happens then simply we end up in a situation where we did some extra work for no reason, which is better than waiting with the work till all previous trades are processed.
- As soon as the conditions are met, the conflicting trades are retried to check if the conditions allow to accept them. If yes, they'll be accepted almost immediately after previous ones, if not, they'll be rejected as non-neutral+ anymore.
This way the additional code does not hurt the performance, parallel processing or anything else in usually expected optimistic scenarios, while adding some additional overhead in pessimistic ones, which is justified considering we don't want to degrade the badge progress.
After changes regarding to callbacks handling, we accidentally broke the reconnection logic. In particular, forced connection implicitly did disconnect with disconnect callback, but disconnect callback killed our callbacks handling loop for future connection since it was instructed to not reconnect... Pretty convulated logic.
Let's attempt to fix and simplify it. There is no forced connection concept anymore, but rather a new reconnect function which either, triggers reconnection through usual disconnection logic, or connects in edge case if we attempted to reconnect with already disconnnected client.
This way the status transition is more predictable, as we Connect() only in 3 cases:
- Initial start, including !start command, when we actually spawn the callbacks handling loop
- Upon disconnection, if we're configured to reconnect
- Reconnection, in case we're already disconnected and can't use above
And we use reconnect when:
- Failure in heartbeats to detect disconnections sooner
- Failure in refreshing access tokens, since if we lose our refresh token then the only way to get a new one is to reconnect
And finally disconnect is triggered when:
- Stopping the bot, especially !stop
- Bulletproofing against trying to connect when !KeepRunning and likewise
- Usual Steam maintenance and other network issues (which usually trigger reconnection)
The codebase is too huge to analyze every possible edge case, but with this logic I can no longer reproduce the previous issue