Privacy

8 posts tagged with "Privacy" (See all Category)

Atom Feed

Avoiding unwelcome visitors on private Matrix servers

09.11.2019 00:00 — Privacy Matthew Hodgson

Hi all,

Over the course of today we've been made aware of folks port-scanning the general internet to discover private Matrix servers, looking for publicly visible room directories, and then trying to join rooms listed in them.

If you are running a Matrix server that is intended to be private, you must correctly configure your server to not expose its public room list to the general public - and also ensure that any sensitive rooms are invite-only (especially if the server is federated with the public Matrix network).

In Synapse, this means ensuring that the following options are set correctly in your homeserver.yaml:

# If set to 'false', requires authentication to access the server's public rooms
# directory through the client API. Defaults to 'true'.
#
#allow_public_rooms_without_auth: false

# If set to 'false', forbids any other homeserver to fetch the server's public
# rooms directory via federation. Defaults to 'true'.
#
#allow_public_rooms_over_federation: false

For private servers, you will almost certainly want to explicitly set these to false, meaning that the server's "public" room directory is hidden from the general internet and wider Matrix network.

You can test whether your room directory is visible to arbitrary Matrix clients on the general internet by viewing a URL like https://sandbox.modular.im/_matrix/client/r0/publicRooms (but for your server). If it gives a "Missing access token" error, you are okay.

You can test whether your room directory is visible to arbitrary Matrix servers on the general internet by loading Riot (or similar) on another server, and entering the target server's domain name into the room directory's server selection box. If you can't see any rooms, then are okay.

Relatedly, please ensure that any sensitive rooms are set to be "invite only" and room history is not world visible - particularly if your server is federated, or if it has public registration enabled. This stops random members of the public peeking into them (let alone joining them).

Relying on security-by-obscurity is a very bad idea: all it takes is for someone to scan the whole internet for Matrix servers, and then trying to join (say) #finance on each discovered domain (either by signing up on that server or by trying to join over federation) to cause problems.

Finally, if you don't want the general public reading your room directory, please also remember to turn off public registration on your homeserver. Otherwise even with the changes above, if randoms can sign up on your server to view & join rooms then all bets are off.

We'll be rethinking the security model of room directories in future (e.g. whether to default them to being only visible to registered users on the local server, or whether to replace per-server directories with per-community directories with finer grained access control, etc) - but until this is sorted, please heed this advice.

If you have concerns about randoms having managed to discover or join rooms which should have been private, please contact [email protected].

Privacy improvements in Synapse 1.4 and Riot 1.4

27.09.2019 00:00 — Privacy Matthew Hodgson

Hi all,

Back in June we wrote about our plans to tighten up data privacy in Matrix after some areas for improvement were brought to our attention. To quickly recap: the primary concern was that the default config for Riot specifies identity servers and integration managers run by New Vector (the company which the original Matrix team set up to build Riot and fund Matrix dev) - and so folks using a standalone homeserver may end up using external services without realising it. There were some other legitimate issues raised too (e.g. contact information should be obfuscated when checking if your contacts are on Matrix; Riot defaulted to using Google for STUN (firewall detection) if no TURN server had been set up on their server; Synapse defaults to using matrix.org as a key notary server).

We’ve been working away at this fairly solidly over the last few months. Some of the simpler items shipped quickly (e.g. Riot/Web had a stupid bug where it kept incorrectly loading the integration manager; Riot/Android wasn’t clear enough about when contact discovery was happening; Riot/Web wasn’t clear enough about the fact device names are publicly visible; etc) - but other bits have turned out to be incredibly time-consuming to get right.

However, we’re in the process today of releasing Synapse 1.4.0 and Riot/Web 1.4.0 (it’s coincidence the version numbers have lined up!) which resolve the majority of the remaining issues. The main changes are as follows:

  1. Riot no longer automatically uses identity servers by default. Identity servers are only useful when inviting users by email address, or when discovering whether your contacts are on Matrix. Therefore, we now wait until the user tries to perform one of these operations before explaining that they need an identity server to do so, and we prompt them to select one if they want to proceed. This makes it abundantly clear that the user is connecting to an independent service, and why.

  2. Integration Managers and identity servers now have the ability to force users to accept terms of use before using them. This means they can explicitly spell out the data privacy & usage policy of the server as required by GDPR, and it should now be impossible for a user to use these services without realising it. This was particularly fun in the case of identity servers, which previously had no concept of users and so couldn’t track whether users had agreed to their terms & conditions or not… and because homeservers sometimes talk to the identity server on behalf of users rather than the user talking direct, the privacy policy flow gets even hairier. But it’s solved now, and a nice side-effect of this is that users can now explicitly select their Integration Manager in Riot, in case they want to use Dimension or similar rather than the default provided by Modular.

  3. Synapse no longer uses identity servers for verifying registrations or verifying password reset. Originally, Synapse made use of the fact that the Identity Service contains email/msisdn verification logic to handle identity verification in general on behalf of the homeserver. However, in retrospect this was a mistake: why should the entity running your identity server have the right to verify password resets or registration details on your homeserver? So, we have moved this logic into Synapse. This means Synapse 1.4.0 requires new configuration for email/msisdn verification to work - please see the upgrade notes for full details.

  4. Sydent now supports discovering contacts based on hashed identifiers. MSC2134 specifies entirely new IS APIs for discovering contacts using a hash of their identifier rather than directly exposing the raw identifiers being searched for. This is implemented in Riot/iOS and Riot/Android and should be in the next major release; Riot/Web 1.4.0 has it already.

  5. Synapse now warns in its logs if you are using matrix.org as a default trusted key server, in case you wish to use a different server to help discover other servers’ keys.

  6. Synapse now garbage collects redacted messages after N days (7 days by default). (It doesn’t yet garbage collect attachments referenced from redacted messages; we’re still working on that).

  7. Synapse now deletes account access data (IP addresses and User Agent) after N days (28 days by default) of a device being deleted.

  8. Riot warns before falling back to using STUN (and defaults to turn.matrix.org rather than stun.google.com) for firewall discovery (STUN) when placing VoIP calls, and makes it clear that this is an emergency fallback for misconfigured servers which are missing TURN support. (We originally deleted the fallback entirely, but this broke things for too many people, so we’ve kept it but warn instead).

All of this is implemented in Riot/Web 1.4.0 and Synapse 1.4.0. Riot/Web 1.4.0 shipped today (Fri Sept 27th) and we have a release candidate for Synapse 1.4 (1.4.0rc1) today which fully ship on Monday.

For full details please go check out the Riot 1.4.0 and Synapse 1.4.0 blog posts.

Riot/Mobile is following fast behind - most of the above has been implemented and everything should land in the next release. RiotX/Android doesn’t really have any changes to make given it hadn’t yet implemented Identity Service or Integration Manager APIs.

This has involved a surprisingly large amount of spec work; no fewer than 9 new Matrix Spec Changes (MSC) have been required as part of the project. In particular, this results in a massive update to the Identity Service API, which will be released very shortly with the new MSCs. You can see the upcoming changes on the unstable branch and compare with the previous 0.2.1 stable release, as well as checking the detailed MSCs as follows:

This said, there is still some work remaining for us to do here. The main things which haven’t made it into this release are:

  • Preferring to get server keys from the source server rather than the notary server by default (https://github.com/matrix-org/synapse/pull/6110). This almost made it in, but we need to test it more first - until then, your specified notary server will see roughly what servers your servers are trying to talk to. In future this will be mitigated properly by MSC1228 (removing mxids from events).
  • Configurable data retention periods for rooms. We are tantalisingly close with this - https://github.com/matrix-org/synapse/pull/5815 is an implementation that the French Govt deployment is using; we need to port it into mainline Synapse.
  • Authenticating access to the media repository - for now, we still rely on media IDs being almost impossible to guess to protect the data rather than authenticating the user.
  • Deleting items from the media repository - we still need to hook up deletion APIs.
  • Garbage collecting forgotten rooms. If everyone leaves & forgets a room, we should delete it from the DB.
  • Communicating erasure requests over federation

We’ll continue to work on these as part of our ongoing maintenance backlog.

Separately to the data privacy concerns, we’ve had a separate wave of feedback regarding how we handle GDPR Data Subject Access Requests (DSARs). Particularly: whether DSAR responses should contain solely the info your have directly keyed by the requesting Matrix ID - or if we should provide all the data “visible” to that ID (i.e. the history of the conversations they’ve been part of). We went and got professional legal advice on this one, and the conclusion is that we should keep our responses to DSARs as tightly scoped as possible. We updated Matrix.org’s privacy policy and DSAR tools to reflect the new legal input.

Finally, it’s really worth calling out the amount of effort that went into this project. Huge huge thanks to everyone involved (given it’s cut across pretty much every project & subteam we have working on the core of Matrix) who have soldiered through the backlog. We’ve been tracking progress using our feature-dashboard tool which summarises Github issues based on labels & issue lifecycle, and for better or worse it’s ended up being the biggest project board we’ve ever had. You can see the live data here (warning, it takes tens of seconds to spider Github to gather the data) - or, for posterity and ease of reference, I’ve included the current issue list below. The issues which are completed have “done” after them; the ones still in progress say “in progress”, and ones which haven’t started yet have nothing. We split the project into 3 phases - phases 1 and 2 represent the items needed to fully solve the privacy concerns, phase 3 is right now a mix of "nice to have" polish and some more speculative items. At this point we’ve effectively finished phase 1 on Synapse & Riot/Web, and Riot/Mobile is following close behind. We're continuing to work on phase 2, and we’ll work through phase 3 (where appropriate) as part of our general maintenance backlog.

I hope this gives suitable visibility on how we’re considering privacy; after all, Matrix is useless as an open communication protocol if the openness comes at the expense of user privacy. We’ll give another update once the remaining straggling issues are closed out; and meanwhile, now the bulk of the privacy work is out of the way on Riot/Web, we can finally get back to implementing the UI E2E cross-signing verification and improving first time user experience.

Thanks for your patience and understanding while we’ve sorted this stuff out; and thanks once again for flying Matrix :)

In the absence of comments on the current blog, please feel free to discuss over at HN, or alternatively come ask stuff in our AMA over at /r/privacy (starting ~5pm GMT+1 (UK) on Friday Sept 27th).

The Privacy Project Dashboard Of Doom

Data Portability Tooling Bug

24.07.2019 00:00 — Privacy Thomas Lant

It was drawn to our attention this afternoon that there is a bug in our GDPR data portability tooling that resulted in the data dump including some events that should not have been included.

This tooling has recently been updated (here is the new code), and the bug only affects reports generated with the updated tool. So far we have generated one report using the updated tooling.

The bug affects events which:

  • were sent in rooms in which, at the point at which the message was sent, the message visibility was set to 'shared' or 'world readable', and
  • were pulled in over federation from another server after the data subject left the room

As a reminder, 'shared' message visibility means anyone in the room can view the message, from the point in time at which visibility was set to 'shared' and 'world readable' means anyone can read the messages without joining the room, from the point in time at which visibility was set to 'world readable'.

Events are pulled onto a homeserver over federation when a user on that homeserver tries to access events which, for whatever reason, their homeserver does not already have a local copy. This most often happens when their homeserver is offline for any period of time, but can also happen when a user is the first user from their homeserver to join a room with active participants on other homeservers.

We're still analysing the data but so far it looks like the bug resulted in only a small number of events that were not publicly-accessible being shared (there were also publicly-accessible events mistakenly included). At this stage we have identified 19 events from 4 users across 2 rooms (the dump contained ~3.5 million events). This is not to diminish the severity of the bug - just to reassure that the scale of its impact appears to be extremely limited.

It is also worth noting that any encrypted events erroneously included in the dump will not have been decryptable (since the data subject would not have had access to the keys).

Update (2019-08-06)

In our original analysis we stated that 19 events were shared erroneously. On closer analysis we missed 5 other timeline events - the correct figure is 24 timeline events originating from 4 users over 2 rooms. However, this figure focused on timeline data and does not take into account all state events (such as user joins, parts, topic changes etc). When considering these too, a further 56 state events were erroneously shared, referencing 64 users across these 2 rooms (mainly detailing when users had joined/left the room after the requesting user themselves had left). These membership events contained avatar & display name details which may not have been public (but in practice, the vast majority appear to be public data).

Aside from the events referenced above, the full dump contained ~20,000 events that also ought not to have been included; however these events were already publicly accessible due to being part of publicly accessible rooms (eg Matrix HQ) and so we do not consider them a breach of data.

What caused the bug?

Events that are pulled in over federation are assigned a negative 'stream ordering' ID. This is designed to avoid their being sent down the sync (where they would likely be out of sequence). In normal operation (accessing your homeserver via a Matrix client) these events would be appropriately filtered, but a bug in the data dump tooling caused them to be included.

The bug was introduced as a result of two factors:

  • The event filtering code assumes that the user is currently in the room - this was not intuitive, and was not called out in the documentation
  • When we fetched the events from the database, we tried to limit to events sent before the user left the room. On reflection, we used the wrong ordering mechanism (stream ordering instead of topological ordering), resulting in the inclusion of events that were fetched from a remote server after the data subject had left

We are working to fix the bug, and we'll update here when it is resolved. As a reminder, please do report security bugs responsibly as per the Security Disclosure Policy so we can validate the issue and mitigate abuse.

As is standard practice for any data breach, we have notified the ICO.

Privacy Changes to New Vector Identity Servers

19.07.2019 16:35 — Privacy Thomas Lant

As a step towards implementing Terms of Service for Sydent Identity Servers (MSC2140), we're rolling out a couple of changes to the two Identity Servers run by New Vector (running at vector.im and matrix.org):

  1. We have erased all of the data where there is any chance that the data subject didn't understand how, why or with whom their data was being shared.
  2. We've made a change to Sydent so that it no longer persists new associations relating to users on homeservers not run by New Vector.

The impact of these changes is that users on homeservers not run by New Vector will no longer be discoverable by their email or telephone number via the Identity Servers running at vector.im and matrix.org. As we roll out the rest of the changes for Terms of Service for Identity Servers, this functionality will again be made available for users who make an informed choice to opt in.

Registration with Email and Password Reset

In the short term, the New Vector Identity Servers will continue to support registration with email (signing up with an email address as well as a matrix username) and password reset. However, as we continue to improve Identity Server data hygiene practices, we will phase out their use in registration with email and password reset entirely. We have already made the change to Synapse to support password reset without relying on an Identity Server (though this can optionally be re-enabled).

Once Synapse can support registration with email without relying on an Identity Server we will announce a schedule for disabling registration with email and password reset in our Identity Servers entirely. After this point, homeserver administrators will have to make sure their homeservers are configured to send email to keep registration with email and password reset working. More details on this to follow - please watch this space.

Tightening up privacy in Matrix

30.06.2019 00:00 — General Matthew Hodgson

Hi all,

A few weeks ago there was some discussion around the privacy of typical Matrix configurations, particularly how Riot's default config uses vector.im as an Identity Server (for discovering users on Matrix by their email address or phone number) and scalar.vector.im as an Integration Manager (i.e. the mechanism for adding hosted bots/bridges/widgets into rooms). This means that Riot, even if using a custom homeserver and running from a custom Riot deployment, will try to talk to *.vector.im (run by New Vector; the company formed by the core Matrix team in 2017) for some operations unless an alternative IS or IM has been specified in the config.

We haven't done as good a job at explaining this as we could have, and this blog post is a progress update on how we're fixing that and improving other privacy considerations in general.

Firstly, the reason Riot is configured like this is for the user's convenience: in general, we believe most users just want to discover other people on Matrix as easily as possible, and a logically-centralised server for looking up user matrix IDs by email/phone number (called third party IDs, or 3PIDs) is the only comprehensive way of doing so. Decentralising this data while protecting the privacy of the 3PIDs and their matrix IDs is a Hard Problem which we're unaware of anyone having solved yet. Alternatively, you could run a local identity server, but it will end up having to delegate to a centralised identity server anyway for IDs it has no other way to know about. Similarly, providing a default integration server that just works out of the box (rather than mandating the user configures their own) is a matter of trying to keep Riot's UX simple, especially when onboarding users, and especially given Riot's reputation for complexity at the best of times.

That said, the discussion highlighted some areas for improvement. Specifically:

  1. When doing work on making Matrix GDPR compliant back in May 2018, we set up a single privacy policy for the services we run, and got users to agree to it by locking them out of the matrix.org homeserver until they did. However, we missed that users not on the matrix.org homeserver might still be using our Identity Service (IS) & Integration Manager (IM) without accepting the privacy policy. Over the last few weeks we've been working on addressing this - it turns out that it's a pain to fix, given the Identity Service doesn't have the concept of users, so tracking which users have agreed to the policy policy or not means some fairly major changes. The current proposal is to change the Identity Service to use a form of OpenID to authenticate users against their homeserver in order to check they've accepted the IS's terms of use - see MSC2140 for the gory details.

Meanwhile, Riot is being updated to prompt the user to accept the IS & IM terms of use (if different to the HS's), and thus make it crystal clear to the user that they are using an IS & IM and that they have the option not to if desired - see https://github.com/vector-im/riot-web/issues/10167 and associated issues. This includes also explicitly prompting the user as to whether they want 3PIDs they provide at registration to be discoverable, as per https://github.com/vector-im/riot-web/issues/10091.

  1. Riot on iOS & Android gives the option of scanning your local addressbook to discover which of your contacts are on Matrix. The wording explaining this wasn't clear enough on Android - which we promptly fixed. Separately, the contact details sent to the server are currently not obfuscated. This is partially because we hadn't got to it, and partially because obfuscating them doesn't actually help much with privacy, given an attacker can just scan through possible obfuscated phone numbers and email addresses to deobfuscate them. However, we've been working through obfuscating the contact details anyway by hashing as per MSC2134, which has all the details. We're also adding an explicit lookup warning in Riot/Web, as per https://github.com/vector-im/riot-web/issues/10093.

  2. There was a bug where Riot/Web was querying the Integration Manager every time you opened a room, even if that room had no integrations (actually, it did it 3 times in a row). This got fixed and released in Riot/Web 1.2.2 back on June 19th.

  3. Matrix needs to authenticate whether events were actually sent by the server that claimed to send them. We do this by having servers sign their events when they create them, and publishing the public half of their signing keys for anyone to query. However, this poses problems if you receive an event which is signed by a server which isn't currently online. To solve this, we have the concept of trusted_key_servers (aka notary servers), which your server can query to see if they know about the missing server's keys. By default, matrix.org is configured as Synapse's trusted notary, but you can of course change this. If you choose an unreliable server as the notary (e.g. by not setting one at all) then there's a risk that you won't be able to look up signing keys, and a splitbrain will result where your server can't receive certain events, but other servers in the room can. This can then result in your server being unable to participate in the room entirely, if it's missing key events in the room's lifetime.

    Our plan here is to get rid of notaries entirely by changing how event signing works as per MSC1228, but this is going to take a while. Meanwhile we're going to check Synapse's code to ensure it doesn't talk to the notary server unnecessarily. (E.g. it should be caching the signing keys locally, and it should only use the notary server if the remote server is down.)

  4. When doing VoIP in Matrix, clients need to use a TURN server to discover their network conditions and perform firewall traversal. The TURN server should be specified by your homeserver (and each homeserver deployment should ideally include a TURN server). However, for users who have not configured a TURN server, Riot (on all 3 platforms) defaulted to use Google's public STUN service (stun.l.google.com). STUN is a subset of TURN which provides firewall discovery, but not traffic relaying. This slightly increased the chances of calls working for users without a proper TURN server, but not by much - and rather than fall back to Google, we've decided to simply remove it from Riot (e.g. https://github.com/matrix-org/matrix-ios-sdk/commit/24832a2b14fb72ae6f051d5aba40262d11eef65d). This means that VoIP might get less reliable for users who were relying on this fallback, but you really should be running your own TURN server anyway if you want VoIP to work reliably on your homeserver.

  5. We should make it clearer in Riot that device names are world-readable, and not just for the user's own personal reference. This is https://github.com/vector-im/riot-web/issues/10216

As you can see, much of the work on improving these issues is still in full swing, although some has already shipped. As should also be obvious, these issues are categorically not malicious: Matrix (and Riot) literally exists to give users full control and autonomy over their communication, and privacy is a key part of that. These are avoidable issues which can and will be solved. It's worth noting that we have to prioritise privacy issues alongside all the other development in Matrix however: there's no point in having excellent privacy if there are other bugs stopping the platform from being usable.

We'll do another blog post to confirm once most of the fixes here have landed - meanwhile, hopefully this post provides some useful visibility on how we're going about improving things.

Matrix.org homeserver privacy policy and terms of use being enforced today

29.05.2018 00:00 — Privacy Tom Lant

Hi all,

As mentioned in our last blog post on GDPR, to make sure that everyone has read and understood the important details about how their personal data is processed by the matrix.org homeserver, users who haven't yet agreed to the privacy notice and terms and conditions will be blocked from sending new messages until they have.

Users will continue to be able to receive messages, so they won't miss out on any messages sent to them before they've agreed to the terms.

The System Alerts room has already sent every user their unique link to review and agree, and if anyone missed that message, the latest Riot.im web and mobile will display a helpful error message guiding users who are yet to agree through the agreement process.

If you have any questions or difficulties, please let us know at [email protected].

Thanks!

Tom

GDPR on matrix.org

25.05.2018 00:00 — Privacy Tom Lant

If you've connected to the matrix.org homeserver today, you'll have noticed some activity in support of GDPR compliance. The most obvious of these is an invite from System Alerts (aka @server:matrix.org):

We've rolled out the System Alerts feature to communicate important platform information to all of a homeserver's users. Today, we're using it to communicate the arrival of our new (and much-improved) Privacy Notice and Terms and Conditions to users on matrix.org.

The System Alerts service takes the form of an (unrejectable) invite to a room. We took this approach to support maximum compatibility with the myriad Matrix clients (since all Matrix clients can support conversations in a room ?).

When we first rolled out System Alerts, we didn't allow users leave the System Alerts room. Sorry! We got a bit overexcited - we've fixed that now (though please do provide your agreement before you leave).

What do I need to do?

At some point today the System Alerts service will provide you with unique link, directing you to review the new terms and provide your agreement.

For us to process your personal data lawfully, it's really important that we know you understand and agree to our Privacy Notice and Terms and Conditions. For that reason, we will shortly be blocking any users who haven't indicated their acceptance, so please act quickly when you receive your link.

Once the block is enabled, users who haven't accepted the terms will see an error when they try and send a message, join a room, or send an invite. This message will also include the unique link to review and accept the terms, so users who haven't seen the message from System Alerts will know what to do.

Don't worry if you're reading this some time after May 25 - accepting the terms at any time will unblock message sending on your account, and you won't have missed any messages sent to you.

If you have any thoughts or suggestions on the legal documentation, you can provide comment via github.

GDPR Compliance in Matrix

08.05.2018 00:00 — Privacy Matthew Hodgson

Hi all,

As the May 25th deadline looms, we've had lots and lots of questions about how GDPR (the EU's new General Data Protection Regulation legislation) applies to Matrix and to folks running Matrix servers - and so we've written this blog post to try to spell out what we're doing as part of maintaining the Matrix.org server (and bridges and hosted integrations etc), in case it helps folks running their own servers.

The main controversial point is how to handle Article 17 of the GDPR: 'Right to Erasure' (aka Right to be Forgotten). The question is particularly interesting for Matrix, because as a relatively new protocol with somewhat distinctive semantics it's not always clear how the rules apply - and there's no case law to seek inspiration from.

The key question boils down to whether Matrix should be considered more like email (where people would be horrified if senders could erase their messages from your mail spool), or should it be considered more like Facebook (where people would be horrified if their posts were visible anywhere after they avail themselves of their right to erasure).

Solving this requires making a judgement call, which we've approached from two directions: firstly, considering what the spirit of the GDPR is actually trying to achieve (in terms of empowering users to control their data and have the right to be forgotten if they regret saying something in a public setting) - and secondly, considering the concrete legal obligations which exist.  

The conclusion we've ended up with is to (obviously) prioritise that Matrix can support all the core concrete legal obligations that GDPR imposes on it - whilst also having a detailed plan for the full 'spirit of the GDPR' where the legal obligations are ambiguous.  The idea is to get as much of the longer term plan into place as soon as possible, but ensure that the core stuff is in place for May 25th.

Please note that we are still talking to GDPR lawyers, and we'd also very much appreciate feedback from the wider Matrix community - i.e. this plan is very much subject to change.  We're sharing it now to ensure everyone sees where our understanding stands today.

The current todo list breaks down into the following categories. Most of these issues have matching github IDs, which we'll track in a progress dashboard.

Right to Erasure

We're opting to follow the email model, where the act of sending an event (i.e. message) into a room shares a copy of that message to everyone who is currently in that room.  This means that in the privacy policy (see Consent below) users will have to consent to agreeing that a copy of their messages will be transferred to whoever they are addressing.  This is also the model followed by IM systems such as WhatsApp, Twitter DMs or (almost) Facebook Messenger.

This means that if a user invokes their right to erasure, we will need to ensure that their events will only ever be visible to users who already have a copy - and must never be served to new users or the general public. Meanwhile, data which is no longer accessible by any user must of course be deleted entirely.

In the email analogy: this is like saying that you cannot erase emails that you have sent other people; you cannot try to rewrite history as witnessed by others... but you can erase your emails from a public mail archive or search engine and stop them from being visible to anyone else.

It is important to note that GDPR Erasure is completely separate from the existing Matrix functionality of "redactions" which let users remove events from the room. A "redaction" today represents a request for the human-facing details of an event (message, join/leave, avatar change etc) to be removed.  Technically, there is no way to enforce a redaction over federation, but there is a "gentlemen's agreement" that this request will be honoured.

The alternative to the 'email-analogue' approach would have been to facilitate users' automatically applying the existing redact function to all of the events they have ever submitted to a public room. The problem here is that defining a 'public room' is subtle, especially to uninformed users: for instance, if a message was sent in a private room (and so didn't get erased), what happens if that room is later made public? Conversely, if right-to-erasure removed messages from all rooms, it will end up destroying the history integrity of 1:1 conversations, which pretty much everyone agrees is abhorrent. Hence our conclusion to protect erased users from being visible to the general public (or anyone who comes snooping around after the fact) - but preserving their history from the perspective of the people they were talking to at the time.

In practice, our core to-do list for Right to Erasure is:

  • As a first cut, provide Article 17 right-to-erasure at a per-account granularity. The simplest UX for this will be an option when calling the account deactivation API to request erasure as well as deactivation. There will be a 30 day grace period, and (ideally) a 2FA confirmation (if available) to avoid the feature being abused.
  • Homeservers must delete events that nobody has access to any more (i.e. if all the users in a room have GDPR-erased themselves). If users have deactivated their accounts without GDPR-erasure, then the data will persist in case they reactivate in future.
  • Homeservers must delete media that nobody has access to any more. This is hard, as media is referenced by mxc:// URLs which may be shared across multiple events (e.g. stickers or forwarded events, including E2E encrypted events), and moreover mxc:// URLs aren't currently authorized.  As a first cut, we track which user uploaded the mxc:// content, and if they erase themselves then the content will also be erased.
  • Homeservers must not serve up unredacted events over federation to users who were not in the room at the time. This poses some interesting problems in terms of the privacy implications of sharing MXIDs of erased users over federation - see "GDPR erasure of MXIDs" below.
  • Matrix must specify a way of informing both servers and clients (especially bots and bridges) of GDPR erasures (as distinct from redactions), so that they can apply the appropriate erasure semantics.

GDPR erasure of Matrix IDs

One interesting edge case that comes out of GDPR erasure is that we need a way to stop GDPR-erased events from leaking out over federation - when in practice they are cryptographically signed into the event Directed Acyclic Graph (DAG) of a given room. Today, we can remove the message contents (and preserve the integrity of the room's DAG) via redaction - but this still leaves personally identifying information in the form of the Matrix IDs (MXIDs) of the sender of these events.

In practice, this could be quite serious: imagine that you join a public chatroom for some sensitive subject (e.g. #hiv:example.com) and then later on decide that you want to erase yourself from the room. It would be very undesirable if any new homeserver joining that room received a copy of the DAG showing that your MXID had sent thousands of events into the room - especially if your MXID was clearly identifying (i.e. your real name).

Mitigating this is a hard problem, as MXIDs are baked into the DAG for a room in many places - not least to identify which servers are participating in a room. The problem is made even worse by the fact that in Matrix, server hostnames themselves are often personally identifying (for one-person homeservers sitting on a personal domain).

We've spent quite a lot time reasoning through how to fix this situation, and a full technical spec proposal for removing MXIDs from events can be found at https://docs.google.com/document/d/1ni4LnC_vafX4h4K4sYNpmccS7QeHEFpAcYcbLS-J21Q. The high level proposal is to switch to giving each user a different ID in the form of a cryptographic public key for every room it participates in, and maintaining a mapping of today's MXIDs to these per-user-per-room keys.  In the event of a GDPR erasure, these mappings can be discarded, pseudonymising the user and avoiding correlation across different rooms. We'd also switch to using cryptographic public keys as the identifiers for Rooms, Events and Users (for cross-room APIs like presence).

This is obviously a significant protocol change, and we're not going to do it lightly - we're still waiting for legal confirmation on whether we need it for May 25th (it may be covered as an intrinsic technical limitation of the system).  However, the good news is that it paves the way towards many other desirable features: the ability to migrate accounts between homeservers; the ability to solve the problem of how to handle domain names being reused (or hijacked); the ability to decouple homeservers from DNS so that they can run clientside (for p2p matrix); etc.  The chances are high that this proposal will land in the relatively near future (especially if mandated by GDPR), so input is very appreciated at this point!

GDPR describes six lawful bases for processing personal data. For those running Matrix servers, it seems the best route to compliance is the most explicit and active one: consent.

Consent requires that our users are fully informed as to exactly how their data will be used, where it will be stored, and (in our case) the specific caveats associated with a decentralised, federated communication system. They are then asked to provide their explicit approval before using (or continuing to use) the service.

In order to gather consent in a way that doesn't break all of the assorted Matrix clients connecting to matrix.org today, we have identified both an immediate- and a long-term approach.

The (immediate-term) todo list for gathering consent is:

  • Modify Synapse to serve up a simple 'consent tool' static webapp to display the privacy notice/terms and conditions and gather consent to this API.
    • Add a 'consent API' to the CS API which lets a server track whether a given user has consented to the server's privacy policy or not.
  • Send emails and push notifications to advise users of the upcoming change (and link through to the consent tool)
  • Develop a bot that automatically connects to all users (new and existing), posting a link to the consent tool. This bot can also be used in the future as a general 'server notice channel' for letting server admins inform users of privacy policy changes; planned downtime; security notices etc.
  • Modify synapse to reject message send requests for all users who have not yet provided consent
    • return a useful error message which contains a link to the consent tool
  • Making our anonymised user analytics for Riot.im 'opt in' rather than 'opt out' - this isn't a requirement of GDPR (since our analytics are fully anonymised) but reflects our commitment to user data sovereignty

Long-term:

  • Add a User Interactive Auth flow for the /register API to gather consent at register
  • As an alternative to the bot:
    • Fix user authentication in general to distinguish between 'need to reauthorize without destroying user data' and 'destroy user data and login again', so we can use the re-authorize API to gather consent via /login without destroying user data on the client.
    • port the /login API to use User Interactive Auth and also use it to gather consent for existing users when logging in

Deactivation

Account deactivation (the ability to terminate your account on your homeserver) intersects with GDPR in a number of places.

Todo list for account deactivation:

  • Remove deactivated users from all rooms - this finally solves the problem where deactivated users leave zombie users around on bridged networks.
  • Remove deactivated users from the homeserver's user directory
  • Remove all 3PID bindings associated with a deactivated user from the identity servers
  • Improve the account deactivation UX to make sure users understand the full consequences of account deactivation

Portability

GDPR states that users have a right to extract their data in a structured, commonly used and machine-readable format.

In the medium term we would like to develop this as a core feature of Matrix (i.e. an API for exporting your logs and other data, or for that matter account portability between Matrix servers), but in the immediate term we'll be meeting our obligations by providing a manual service.

The immediate todo list for data portability is:

  • Expose a simple interface for people to request their data
  • Implement the necessary tooling to provide full message logs (as a csv) upon request. As a first cut this would be the result of manually running something like select * from events where user=?.

Other

GDPR mandates rules for all the personal data stored by a business, so there are some broader areas to bear in mind which aren't really Matrix specific, including:

  • Making a clear statement as to how data is processed if you apply for a job
  • Ensuring you are seeking appropriate consent for cookies
  • Making sure all the appropriate documentation, processes and training materials are in place to meet GDPR obligations.

Conclusion

So, there you have it. We'll be tracking progress in github issues and an associated dashboard over the coming weeks; for now https://github.com/matrix-org/synapse/issues/1941 (for Right to Erasure) or https://github.com/vector-im/riot-meta/issues/149 (GDPR in general) is as good as place as any to gather feedback. Alternatively, feel free to comment on the original text of this blog post: https://docs.google.com/document/d/1JTEI6RENnOlnCwcU2hwpg3P6LmTWuNS9S-ZYDdjqgzA.

It's worth noting that we feel that GDPR is an excellent piece of legislation from the perspective of forcing us to think more seriously about our privacy - it has forced us to re-prioritise all sorts of long-term deficiencies in Matrix (e.g. dependence on DNS; improving User Interactive authentication; improving logout semantics etc). There's obviously a lot of work to be done here, but hopefully it should all be worth it!