Hi all,

Back in June we wrote about our plans to tighten up data privacy in Matrix after some areas for improvement were brought to our attention. To quickly recap: the primary concern was that the default config for Riot specifies identity servers and integration managers run by New Vector (the company which the original Matrix team set up to build Riot and fund Matrix dev) - and so folks using a standalone homeserver may end up using external services without realising it. There were some other legitimate issues raised too (e.g. contact information should be obfuscated when checking if your contacts are on Matrix; Riot defaulted to using Google for STUN (firewall detection) if no TURN server had been set up on their server; Synapse defaults to using matrix.org as a key notary server).

We’ve been working away at this fairly solidly over the last few months. Some of the simpler items shipped quickly (e.g. Riot/Web had a stupid bug where it kept incorrectly loading the integration manager; Riot/Android wasn’t clear enough about when contact discovery was happening; Riot/Web wasn’t clear enough about the fact device names are publicly visible; etc) - but other bits have turned out to be incredibly time-consuming to get right.

However, we’re in the process today of releasing Synapse 1.4.0 and Riot/Web 1.4.0 (it’s coincidence the version numbers have lined up!) which resolve the majority of the remaining issues. The main changes are as follows:

  1. Riot no longer automatically uses identity servers by default. Identity servers are only useful when inviting users by email address, or when discovering whether your contacts are on Matrix. Therefore, we now wait until the user tries to perform one of these operations before explaining that they need an identity server to do so, and we prompt them to select one if they want to proceed. This makes it abundantly clear that the user is connecting to an independent service, and why.

  2. Integration Managers and identity servers now have the ability to force users to accept terms of use before using them. This means they can explicitly spell out the data privacy & usage policy of the server as required by GDPR, and it should now be impossible for a user to use these services without realising it. This was particularly fun in the case of identity servers, which previously had no concept of users and so couldn’t track whether users had agreed to their terms & conditions or not… and because homeservers sometimes talk to the identity server on behalf of users rather than the user talking direct, the privacy policy flow gets even hairier. But it’s solved now, and a nice side-effect of this is that users can now explicitly select their Integration Manager in Riot, in case they want to use Dimension or similar rather than the default provided by Modular.

  3. Synapse no longer uses identity servers for verifying registrations or verifying password reset. Originally, Synapse made use of the fact that the Identity Service contains email/msisdn verification logic to handle identity verification in general on behalf of the homeserver. However, in retrospect this was a mistake: why should the entity running your identity server have the right to verify password resets or registration details on your homeserver? So, we have moved this logic into Synapse. This means Synapse 1.4.0 requires new configuration for email/msisdn verification to work - please see the upgrade notes for full details.

  4. Sydent now supports discovering contacts based on hashed identifiers. MSC2134 specifies entirely new IS APIs for discovering contacts using a hash of their identifier rather than directly exposing the raw identifiers being searched for. This is implemented in Riot/iOS and Riot/Android and should be in the next major release; Riot/Web 1.4.0 has it already.

  5. Synapse now warns in its logs if you are using matrix.org as a default trusted key server, in case you wish to use a different server to help discover other servers’ keys.

  6. Synapse now garbage collects redacted messages after N days (7 days by default). (It doesn’t yet garbage collect attachments referenced from redacted messages; we’re still working on that).

  7. Synapse now deletes account access data (IP addresses and User Agent) after N days (28 days by default) of a device being deleted.

  8. Riot warns before falling back to using STUN (and defaults to turn.matrix.org rather than stun.google.com) for firewall discovery (STUN) when placing VoIP calls, and makes it clear that this is an emergency fallback for misconfigured servers which are missing TURN support. (We originally deleted the fallback entirely, but this broke things for too many people, so we’ve kept it but warn instead).

All of this is implemented in Riot/Web 1.4.0 and Synapse 1.4.0. Riot/Web 1.4.0 shipped today (Fri Sept 27th) and we have a release candidate for Synapse 1.4 (1.4.0rc1) today which fully ship on Monday.

For full details please go check out the Riot 1.4.0 and Synapse 1.4.0 blog posts.

Riot/Mobile is following fast behind - most of the above has been implemented and everything should land in the next release. RiotX/Android doesn’t really have any changes to make given it hadn’t yet implemented Identity Service or Integration Manager APIs.

This has involved a surprisingly large amount of spec work; no fewer than 9 new Matrix Spec Changes (MSC) have been required as part of the project. In particular, this results in a massive update to the Identity Service API, which will be released very shortly with the new MSCs. You can see the upcoming changes on the unstable branch and compare with the previous 0.2.1 stable release, as well as checking the detailed MSCs as follows:

This said, there is still some work remaining for us to do here. The main things which haven’t made it into this release are:

  • Preferring to get server keys from the source server rather than the notary server by default (https://github.com/matrix-org/synapse/pull/6110). This almost made it in, but we need to test it more first - until then, your specified notary server will see roughly what servers your servers are trying to talk to. In future this will be mitigated properly by MSC1228 (removing mxids from events).
  • Configurable data retention periods for rooms. We are tantalisingly close with this - https://github.com/matrix-org/synapse/pull/5815 is an implementation that the French Govt deployment is using; we need to port it into mainline Synapse.
  • Authenticating access to the media repository - for now, we still rely on media IDs being almost impossible to guess to protect the data rather than authenticating the user.
  • Deleting items from the media repository - we still need to hook up deletion APIs.
  • Garbage collecting forgotten rooms. If everyone leaves & forgets a room, we should delete it from the DB.
  • Communicating erasure requests over federation

We’ll continue to work on these as part of our ongoing maintenance backlog.

Separately to the data privacy concerns, we’ve had a separate wave of feedback regarding how we handle GDPR Data Subject Access Requests (DSARs). Particularly: whether DSAR responses should contain solely the info your have directly keyed by the requesting Matrix ID - or if we should provide all the data “visible” to that ID (i.e. the history of the conversations they’ve been part of). We went and got professional legal advice on this one, and the conclusion is that we should keep our responses to DSARs as tightly scoped as possible. We updated Matrix.org’s privacy policy and DSAR tools to reflect the new legal input.

Finally, it’s really worth calling out the amount of effort that went into this project. Huge huge thanks to everyone involved (given it’s cut across pretty much every project & subteam we have working on the core of Matrix) who have soldiered through the backlog. We’ve been tracking progress using our feature-dashboard tool which summarises Github issues based on labels & issue lifecycle, and for better or worse it’s ended up being the biggest project board we’ve ever had. You can see the live data here (warning, it takes tens of seconds to spider Github to gather the data) - or, for posterity and ease of reference, I’ve included the current issue list below. The issues which are completed have “done” after them; the ones still in progress say “in progress”, and ones which haven’t started yet have nothing. We split the project into 3 phases - phases 1 and 2 represent the items needed to fully solve the privacy concerns, phase 3 is right now a mix of "nice to have" polish and some more speculative items. At this point we’ve effectively finished phase 1 on Synapse & Riot/Web, and Riot/Mobile is following close behind. We're continuing to work on phase 2, and we’ll work through phase 3 (where appropriate) as part of our general maintenance backlog.

I hope this gives suitable visibility on how we’re considering privacy; after all, Matrix is useless as an open communication protocol if the openness comes at the expense of user privacy. We’ll give another update once the remaining straggling issues are closed out; and meanwhile, now the bulk of the privacy work is out of the way on Riot/Web, we can finally get back to implementing the UI E2E cross-signing verification and improving first time user experience.

Thanks for your patience and understanding while we’ve sorted this stuff out; and thanks once again for flying Matrix :)

In the absence of comments on the current blog, please feel free to discuss over at HN, or alternatively come ask stuff in our AMA over at /r/privacy (starting ~5pm GMT+1 (UK) on Friday Sept 27th).

The Privacy Project Dashboard Of Doom