Matthew Hodgson

150 posts tagged with "Matthew Hodgson" (See all Author)

The Matrix Holiday Special 2020

25.12.2020 00:00 — General Matthew Hodgson

Hi all,

Over the years it’s become a tradition to write an end-of-year wrap-up on Christmas Eve, reviewing all the things the core Matrix team has been up over the year, and looking forwards to the next (e.g. here’s last year’s edition). These days there’s so much going on in Matrix it’s impossible to cover it all (and besides, we now have This Week In Matrix and better blogging in general to cover events as they happen). So here’s a quick overview of the highlights:

Looking back at our plans for 2020 in last year’s wrap-up, amazingly it seems we pretty much achieved what we set out to do. Going through the bulletpoints in order:

  • We turned on End-to-end Encryption by default.
  • We have a dedicated team making major improvements to First-Time User Experience in Element (as of the last few months; hopefully you’ve been noticing the improvements!)
  • RiotX became Element Android and shipped.
  • Communities have been completely reinvented as Spaces (MSC1772) and while in alpha currently, they should ship in Jan.
  • Synapse scalability is fixed: we now shard horizontally by event - and Synapse is now pretty much entirely async/await!
  • Dendrite Beta shipped, as did the initial P2P Matrix experiments, which have subsequently continued to evolve significantly (although we haven’t implemented MSC1228 or MSC2787 portable accounts yet). Check out the Dendrite end-of-year update for more.
  • MLS experiments are in full swing - we got the first MLS messages passing over Matrix a few days ago, and Decentralised MLS work is back on the menu after an initial sprint in May.
  • There’s been a valiant mission to improve Bridge UX in the form of MSC2346 and its implementations in Element Web, although this has ended up failing to get to the top of the todo list (sorry Half-Shot! :/)
  • Spec progress has improved somewhat, and we are very excited to have welcomed Will Bamberg (formerly MDN) to support the spec from a professional tech writer perspective, with the all-new engine landing any day now! We’re still experimenting with ways to ensure the spec gets enough time allocated to keep up with the backlog, however - particularly community contributions.
  • ...and in terms of Abuse/Reputation - we properly kicked off our anti-abuse work and launched a first PoC implementation in the depths of Cerulean last week.

Perhaps more interesting is the stuff we didn’t predict (or at least didn’t want to pre-announce ;) for 2020:

  • Riot, Modular and New Vector got unified at last behind a single name: Element; hopefully the shock has worn off by now :)
  • Mozilla joined Matrix in force, turning off Moznet IRC in favour of going full Matrix.
  • We welcomed Gitter into the heart of the Matrix ecosystem (with Element acquiring Gitter from Gitlab in order to ensure Gitter’s Matrix integration acts as a reference for integrating future chat silos into Matrix) - with native Matrix support in Gitter going live shortly afterwards.
  • Automattic launched itself into the Matrix ecosystem with an investment in Element, and since then we’ve been working on getting Matrix better integrated and available to them (although all of Element’s Matrix-for-governments activity has ended up delaying this a bit). If you want to work for Automattic on integrating Matrix, they’re hiring!
  • We previewed Cerulean as a super-exciting proof-of-concept client, demonstrating how social media could work on Matrix, with native threading, profiles-as-rooms, decentralised reputation, and (shortly) peeking-over-federation.
  • We completely rewrote matrix.to and relaunched it as a much more capable and friendly permalink redirection service; a precursor to finally getting matrix:// URLs everywhere!
  • We certainly didn’t predict that the “how to install Synapse” video tutorial published at the beginning of the COVID-19 pandemic would end up with 25.5K views (and counting…)

Then, there’s whole new waves of exciting stuff going on. The most obvious has to be the amount of Government uptake we’ve seen with Matrix this year, following on from France embracing Matrix across the public sector last year. Firstly the German armed forces announced their transition to Matrix, and then the German states of Schleswig-Holstein and Hamburg announced a mammoth 500K user Matrix deployment for education and public administration. Meanwhile, North Rhine Westphalia (the biggest state in Germany) launched their own Matrix-powered messager for education; loads of different universities have rolled out Matrix for collaboration - and we hear Famedly is making good progress with Matrix-powered healthcare messaging solutions. Finally, outside of Germany, we’re seeing the first official deployments in the UK government and US federal government - we’ll share details where possible (but sometimes big deployments of encrypted communication systems want to remain discreet). It’s incredibly exciting to see Matrix spreading across the public sector and education, and we’re hoping this will follow a similar pattern to how the Internet, email or indeed the Web first developed: a mix of high profile public sector deployments, complemented by a passionate grass-roots technical community, eventually spreading to span the rest of society :).

Another exciting thing which emerged this year is the amazing academic work that Karlsruhe Institute of Technology’s Decentralized Systems and Network Services Research Group has been conducting on Matrix. This really came on the radar back in June when their Matrix Decomposition: Analysis of an Access Control Approach on Transaction-based DAGs without Finality paper was published - a truly fascinating analysis of how state resolution works in Matrix, and how we manage to preserve access control within rooms without using blockchain-style ‘sealed blocks’ (and has helped fix a few nasty bugs!). I’m not sure any of us realised that Matrix’s state resolution counts as a new field of research, but it’s been great to follow along with their independent work. Most recently, and even more excitingly, they’re circulating a preview of their Analysis of the Matrix Event Graph Replicated Data Type paper - a deep analysis of the properties of Matrix DAGs themselves. We highly recommend reading the papers (what better way to spend the holiday break!). To give a taste, the final paragraph of the paper concludes:

MEG summary

2020 has also seen the arrival and maturation of a whole new generation of Matrix clients - Hydrogen is really impressive as an experimental next-generation Web (and Mobile Web) client; an account with 3000 rooms that uses 1.4GB of RAM on Element Web uses 14MB of RAM on Hydrogen and launches instantly, complete with excellent E2EE implementation. It even works on MSIE! The whole app, including dependencies, is about 70KB of code (200KB including Olm). Meanwhile, matrix-rust-sdk is coming along well, providing a general purpose native library for writing excellent native Matrix clients. Fractal merged initial matrix-rust-sdk a few weeks ago, and we’ll be experimenting with switching to it in Element iOS and Element Android (for its e2ee) in the coming year. It’s not inconceivable to think of a world where matrix-rust-sdk ends up being the no-brainer official SDK for native apps, and Hydrogen’s SDK becomes the no-brainer official SDK for JS apps.

Meanwhile, in the community, there’s been so much activity it’s untrue. But on the subject of maturing apps, it’s been incredibly exciting to see NeoChat emerge as an official KDE Matrix client (built on libQuotient and Kirigami, forked from Spectral), FluffyChat going from strength to strength; Nheko continuing to mature impressively; Mirage appearing out of nowhere as a fully featured desktop client; Fractal merging matrix-rust-sdk etc. On the serverside, Conduit was the big community story of the year - with an incredibly fast Rust + Sled server appearing out of the blue, with viable federation coming up on the horizon. The best bet for an overview of all things community is to checkout the TWIM backlogs however - there’s simply way too much stuff to mention it all here.

Obviously, no 2020 wrap-up post would be complete without acknowledging the COVID-19 pandemic - which increased focus on Matrix and other remote collaboration technology more than anyone could have predicted (especially given all the privacy missteps from Zoom, Teams and others). One of the highlights of the year was seeing the HOPE (Hackers On Planet Earth) conference shift their entire proceedings over to Matrix - turning the conference into a 10 day television station of premium hacking content, with Matrix successfully providing the social glue to preserve a sense of community despite going virtual. Similarly, we’re incredibly excited that FOSDEM 2021 is highly likely to run primarily via Matrix (with bridges to IRC and XMPP, of course) - our work is going to be cut out for us in January to ensure the amazing atmosphere of FOSDEM is preserved online for the >8,500 participants and ~800 talks. And if any other event organisers are reading this - please do reach out if you’re interested in going online via Matrix: we want Matrix to be the best possible ecosystem for online communities, including virtual events, and we’ll be happy to help :)

Talking of FOSDEM, a really fun bit of work which landed in Element this year was to (finally!) polish Widgets: the ability to embed arbitrary webapps into Matrix chatrooms. This includes being able to embed widgets in the RightPanel on Element Web, the LeftPanel too, add as many as you like to a room, resize them(!), and generally build much more sophisticated dashboards of additional content. Modal and fullscreen widgets are coming too, as are ways to simplify and unify access control. It turns out that these have arrived in the nick of time for events like FOSDEM, where we’re expecting to very heavily use widgets to embed video streams, video conferences, schedules, and generally automate the workflow of the conference via adding in web UIs as widgets wherever necessary. The work for this has been driven by the various German education deployments, where the same tricks are invaluable for automating online learning experiences. We originally wrote Widgets back in 2017 as a proof-of-concept to try to illustrate how chatrooms could be used to host proper custom UIs, and it's fantastic to see that dream finally come of age.

Finally, it’s been really exciting to see major progress in recent months on what’s essentially a whole new evolution of Matrix. Two years ago, a quiet patch during the Christmas holidays gave birth to a whole bunch of wild science fiction Matrix Spec Changes: MSC1772: Spaces (groups as rooms), MSC1769: Profiles as rooms, MSC1767: Extensible events, MSC1776: Peeking over /sync, MSC1777: Peeking over federation, etc. This was in part trying to ensure that we had something to look forward to when we emerged from the tunnel of launching Matrix 1.0, and in part trying to draw a coherent high-level sketch of what the next big wave of Matrix features could look like. Inevitably the MSCs got stuck in limbo for ages while we exited beta, launched Matrix 1.0, turned on E2EE by default etc - but in the latter half of this year they’ve hit the top of the todo list and it’s been incredibly exciting to see entirely new features landing once again. Implementation for Spaces is in in full swing and looking great; Profiles-as-rooms are effectively being trialled in Cerulean; Peeking over /sync has landed in Dendrite and peeking over federation is in PR (and unlocks all sorts of desirable features for using rooms more generically than we have today, including Spaces). Only Extensible events remains in limbo for now (we have enough to handle getting the others landed!)

Of these, Spaces has turned out to be exciting in wholly unexpected ways. While prototyping the UX for how best to navigate hierarchies of spaces, we had a genuine epiphany: the ability for anyone to define and share arbitrary hierarchies of rooms makes Matrix effectively a global decentralised hierarchical file system (where the ‘files’ are streams of realtime data, but can obviously store static content too). The decentralised access controls that KIT DSN wrote about could literally be file-system style access controls; enforcing access on a global decentralised hierarchy. We obviously have shared hierarchical filesystems today thanks to Dropbox and Google Drive, but these of course are centralised and effectively only store files - whereas Spaces could potentially scale to the whole web. In fact, you could even think of Spaces as flipping Matrix entirely on its head: the most defining building block going forwards could be the Spaces themselves rather than the rooms and events - just as directories are intrinsic to how you navigate a conventional filesystem. How has Matrix got this far without the concept of folders/directories?!

Right now these thoughts are just overexcited science fiction, but the potential really is mindblowing. It could give us a global read/write web for organising any arbitrary realtime data - with the social controls via ACLs to delegate and crowdsource curation of hierarchies however folks choose. The Matrix.org Foundation could seed a ‘root’ hierarchy, go curate all the rooms we know about into some Linnean-style taxonomy, delegate curation of the various subspaces to moderators from the community, and hey presto we’ve reinvented USENET… but with modern semantics, and without the rigid governance models. Hell, we could just mount (i.e. bridge) USENET straight into it. And any other hierarchical namespace of conversations you can think of - Google Groups, Stackoverflow, Discourse, IMAP trees…

Of course, the initial Spaces implementation is going to be focused of on letting communities publish their existing rooms, and users organise their own rooms, rather than managing an infinite ever-expanding global space hierarchy - but given we’ve been designing Spaces to support government (and inter-government) scales of Spaces, it’s not inconceivable to think we could use it to navigate gigantic public shared Spaces in the longer term.

Anyway, enough Space scifi - what’s coming up in 2021?

2021

Our current hit list is:

  • Spaces - see above :)
  • Social Login - we’re going to be making Single Sign On (SSO) a proper first-class citizen in Matrix (and Synapse and Element) in the coming weeks, and enabling it on the matrix.org homeserver, so users can do single-click logins via Github/Gitlab/Google and other SSO providers. Obviously this means your Matrix identity will be beholden to your identity provider (IdP), but this may well be preferable for many users who just want a single-click way to enter Matrix and don’t care about being tied to a given IdP.
  • VoIP - we have a lot of work in flight at the moment to make 1:1 VoIP super robust. Some of it has already landed in Element, but the rest will land in the coming weeks - and then we’re hoping to revisit Matrix-native group voice/video.
  • Voice messaging - we’re hoping to finally add voice messaging to Element (and Matrix)
  • Location sharing - ...and this too.
  • **P2P **- Lots of P2P work on the horizon, now Dendrite is increasingly stable. First of all we need to iterate more on Pinecone, our pre-alpha next-generation P2P overlay network - and then sort out account portability, and privacy-preserving store-and-forward. We’re hoping to see the live P2P Matrix network turn on this year, however, and ideally see homeservers (probably Dendrite) multihoming by default on both today’s Matrix as well as the P2P network, acting as gateways between the two.
  • Threads - Cerulean is excellent proof for how threading could work in Matrix; we just need to get it implemented in Element!
  • Peeking - Peeking is going to become so much more important for participating in non-chat rooms, such as Spaces, Profiles, Reputation feeds, etc. We’ll finish it in Dendrite, and then implement it in Synapse too.
  • **Decentralised Reputation **- Cerulean has the first implementation of decentralised reputation for experimentation purposes, and we’ll be working solidly on it over the coming year to empower users to counter abuse by applying their own subjective reputation feeds to their content.
  • **Incremental Signup **- Once upon a time, Element (Riot) had the ability to gradually sign-up without the user even really realising they’d signed up. We want to bring it back - perhaps this will be the year?
  • DMLS - with the first MLS messages flowing over Matrix, we want to at least provide MLS as an option alongside Megolm for encryption. It should be radically more performant in larger rooms (logarithmic rather than linear complexity), but lacks deniability (the assurance that you cannot prove a user said something in retrospect, in order to blackmail them or similar), and is still unproven technology. We’ll aim to prove it in 2021.
  • E2EE improvements - We improved E2EE immeasurably in 2020; turning it on by default, adding cross-signing, QR code verification etc. But usability and reliability can still be improved. We’ll be looking at further simplifying the UX, and potentially combining together your login password and recovery/security passphrase so you only have one password to remember going forwards.
  • Hydrogen - We’ll keep polishing Hydrogen, bringing it towards feature parity with Element, ensure its SDK is available for other clients, and start seeing how we can use it in Element itself. For instance, the Spaces-aware RoomList in Element may well end up stealing alien technology from Hydrogen.
  • matrix-rust-sdk - Similarly, we’ll keep polishing matrix-rust-sdk; stealing inspiration from Hydrogen’s state model, and start migrating bits of the native mobile Element apps to use it.
  • The Spec - get Will’s new spec website live, and get improving all the surrounding material too.

I’m sure I’m missing lots here, but these are the ones which pop immediately to mind. You can also check Element's public roadmap, which covers all the core Matrix work donated by Element (as well as everything else Element is getting up to).

As always, huge huge thanks goes to the whole Matrix community for flying Matrix and keeping the dream alive and growing faster than ever. It’s been a rough year, and we hope that you’ve survived it intact (and you have our sincere sympathies if you haven’t). Let’s hope that 2021 will be a massive improvement, and that the whole Matrix ecosystem will continue to prosper in the new year.

-- Matthew, Amandine, and the whole Matrix team.

Introducing Cerulean

18.12.2020 00:00 — General Matthew Hodgson

Hi all,

We have a bit of an unexpected early Christmas present for you today…

Alongside all the normal business-as-usual Matrix stuff, we’ve found some time to do a mad science experiment over the last few weeks - to test the question: “Is it possible to build a serious Twitter-style decentralised microblogging app using Matrix?”

It turns out the answer is a firm “yes” - and as a result we’d like to present a very early sneak preview of Cerulean: a highly experimental new microblogging app for Matrix, complete with first-class support for arbitrarily nested threading, with both Twitter-style (“vertical”) and HN/Reddit-style (“horizontal”) layout… and mobile web support!

Cerulean screenie

Cerulean is unusual in many ways:

  • It’s (currently) a very minimal javascript app - only 2,500 lines of code.
  • It has zero dependencies (other than React).
    • This is to show just how simple a fairly sophisticated Matrix client can be...
    • ...and so the code can be easily understood by folks unfamiliar with Matrix...
    • ...and so we can iterate fast while figuring out threading...
    • ...and because none of the SDKs support threading yet :D
  • It relies on MSC2836: Threading - our highly experimental Matrix Spec Change to extend relationships (as used by reaction & edit aggregations) to support free-form arbitrary depth threading.
  • As such, it only works on Dendrite, as that’s where we’ve been experimenting with implementing MSC2836. (We’re now running an official public Dendrite server instance at dendrite.matrix.org though, which makes it easy to test - and our test Cerulean instance https://cerulean.matrix.org points at it by default).

This is **very much a proof of concept. **We’re releasing it today as a sneak preview so that intrepid Matrix experimenters can play with it, and to open up the project for contributions! (PRs welcome - it should be dead easy to hack on!). Also, we give no guarantees about data durability: both Cerulean and dendrite.matrix.org are highly experimental; do not trust them yet with important data; we reserve the right to delete it all while we iterate on the design.

What can it do?

So for the first cut, we’ve implemented the minimal features to make this something you can just about use and play with for real :)

  • Home view (showing recent posts from folks you follow)
  • Timeline view (showing the recent posts or replies from a given user)
  • Thread view (showing a post and its replies as a thread)
  • Live updating (It’s Matrix, after all! We’ve disabled it for guests though.)
  • Posting plain text and images
  • Fully decentralised thanks to Matrix (assuming you’re on Dendrite)
  • Twitter-style “Vertical” threading (replies form a column; you indent when someone forks the conversation)
  • HN/Reddit/Email-style “Horizontal” threading (each reply is indented; forks have the same indentation)
  • Basic Registration & Login
  • Guest support (slightly faked with non-guest users, as Dendrite’s guest support isn’t finished yet)
  • Super-experimental proof-of-concept support for decentralised reputation filtering(!)

Obviously, there’s a huge amount of stuff needed for parity with a proper Twitter-style system:

  • Configurable follows. Currently the act of viewing someone’s timeline automatically follows them. This is because Dendrite doesn’t peek over federation yet (but it’s close), so you have to join a room to view its contents - and the act of viewing someone’s timeline room is how you follow them in Cerulean.
  • Likes (i.e. plain old Matrix reactions, although we might need to finally sort out federating them as aggregations rather than individually, if people use them like they use them on Twitter!)
  • Retweets (dead easy)
  • Pagination / infinite scrolling (just need to hook it up)
  • Protect your posts (dead easy; you just switch your timeline room to invite-only!)
  • Show (some) replies to messages in the Home view
  • Show parent and sibling context as well as child context in the Thread view
  • Mentions (we need to decide how to notify folks when they’re mentioned - perhaps Matrix’s push notifications should be extended to let you subscribe to keywords for public rooms you’re not actually in?)
  • Notifications (although this is just because Dendrite doesn’t do notifs yet)
  • Search (again, just needs to be implemented in Dendrite - although how do you search beyond the data in your current homeserver? Folks are used to global search)
  • Hashtags (it’s just search, basically)
  • Symlinks (see below)
  • Figure out how to handle lost unthreaded messages (see below)
  • Offline support? (if we were using a proper Matrix SDK, we’d hopefully get this for free, but currently Cerulean doesn’t store any state locally at all).

How does it work?

Every message you send using Cerulean goes into two Matrix rooms, dubbed the "timeline" room and the "thread" room. The "timeline" room (with an alias of #@matthew:dendrite.matrix.org or whatever your matrix id is) is a room with all of your posts and no one else's. The "thread" room is a normal Matrix room which represents the message thread itself. Creating a new "Post" will create a new "thread" room. Replying to a post will join the existing "thread" room and send a message into that room. MSC2836 is used to handle threading of messages in the "thread” room - the replies refer to their parent via an m.relationship field in the event.

These semantics play nicely with existing Matrix clients, who will see one room per thread and a flattened chronological view of the thread itself (unless the client natively supports MSC2836, but none do yet apart from Cerulean). However, as Cerulean only navigates threaded messages with an m.reference relationship (eg it only ever uses the new /event_relationships API rather than /messages to pull in history), normal messages sent by Matrix into a thread or timeline room will not yet show up in Cerulean.

In this initial version, Cerulean literally posts the message twice into both rooms - but we’re also experimenting with the idea of adding “symlinks” to Matrix, letting the canonical version of the event be in the timeline room, and then the instance of the event in the thread room be a ‘symlink’ to the one in the timeline. This means that the threading metadata could be structured in the thread room, and let the user do things like turn their timeline private (or vice versa) without impacting the threading metadata. We could also add an API to both post to timeline and symlink into a thread in one fell swoop, rather than manually sending two events. It’d look something like this:

Cerulean diagram

We also experimented with cross-room threading (letting Bob’s timeline messages directly respond to Alice’s timeline messages and vice versa), but it posed some nasty problems - for instance, to find out what cross-room replies a message has, you’d need to store forward references somehow which the replier would need permission to create. Also, if you didn’t have access to view the remote room, the thread would break. So we’ve punted cross-room threading to a later MSC for now.

Needless to say, once we’re happy with how threading works at the protocol level, we’ll be looking at getting it into the UX of Element and mainstream Matrix chat clients too!

What’s with the decentralised reputation button?

Cerulean is very much a test jig for new ideas (e.g. threading, timeline rooms, peeking), and we’re taking the opportunity to also use it as an experiment for our first forays into publishing and subscribing to reputation greylists; giving users the option to filter out content by default they might not want to see… but doing so on their own terms by subscribing to whatever reputation feed they prefer, while clearly visualising the filtering being applied. In other words, this is the first concrete experimental implementation of the work proposed in the second half of Combating Abuse in Matrix without Backdoors. This is super early days, and we haven’t even published a proto-MSC for the event format being used, but if you’re particularly interested in this domain it’s easy enough to figure out - just head over to #nsfw:dendrite.matrix.org (warning: not actually NSFW, yet) and look in /devtools to see what’s going on.

So, there you have it - further evidence that Matrix is not just for Chat, and a hopefully intriguing taste of the shape of things to come! Please check out the demo at https://cerulean.matrix.org or try playing with your own from https://github.com/matrix-org/cerulean, and then head over to #cerulean:matrix.org and let us know what you think! :)

Gitter now speaks Matrix!

07.12.2020 00:00 — General Matthew Hodgson

Hi all,

It’s been just over 2 months since we revealed that Gitter was going to join Matrix - and we are incredibly proud to announce that Gitter has officially turned on true native Matrix connectivity: all public Gitter rooms are now available natively via Matrix, and all Gitter users now natively exist on Matrix. So, if you wanted to join the official Node.js language support room at https://gitter.im/nodejs/node from Matrix, just head over to #nodejs_node:gitter.im and *boom*, you’re in!

This means Gitter is now running a Matrix homeserver at gitter.im which exposes all the active public rooms - so if you go to the the room directory in Element (for instance) and select gitter.im as a homeserver, you can jump straight in:

Gitter room directory

Once you’re in, you can chat back and forth transparently between users on the Gitter side and the Matrix side, and you no longer have the ugly “Matrixbot” user faking the messages back and forth - these are ‘real’ users talking directly to one another, and every public msg in every public room is now automatically exposed into Matrix.

Gitter and Matrix going native!

So, suddenly all the developer communities previously living only in Gitter (Scala, Node, Webpack, Angular, Rails and thousands of others) are now available to anyone anywhere on Matrix - alongside communities bridged from Freenode and Slack; the native Matrix communities for Mozilla, KDE, GNOME communities etc. We’re hopeful that glueing everything together via Matrix will usher in a new age of open and defragmented dev collaboration, a bit like we used to have on IRC, back in the day.

This is also great news for mobile Gitter users - as the original mobile Gitter clients have been in a holding pattern for over a year, and native Matrix support for Gitter means they are now officially deprecated in favour of Element (or indeed any other mobile Matrix client).

What features are ready?

Now, this is the first cut of native Matrix support in Gitter: much of the time since Gitter joined Element has been spent migrating stuff over from Gitlab to Element, and it’s only really been a month of work so far in hooking up Matrix. As a result: all the important features work, but there’s also stuff that’s yet to land:

Features ready today:

  • Ability to join rooms from Matrix via #org_repo:gitter.im
  • Bridging Edits, Replies (mapped to Threads on Gitter), Deletes, File transfer
  • Bridging Markdown & Emoji

What remains:

  • Ability to send/receive Direct Messages
  • Ability to plumb existing Matrix rooms into Gitter natively
  • Ability to view past Gitter history from Matrix. This is planned thanks to https://github.com/matrix-org/matrix-doc/pull/2716
  • Synchronising the full Gitter membership list to Matrix. Currently the membership syncs incrementally as people speak
  • Turning off the old Gitter bridge
  • Bridging emotes (/me support) (almost landed!)
  • Bridging read receipts
  • Synchronising room avatars
  • Bridge LaTeX

Stuff we’re not planning to support:

  • Ability to join arbitrary rooms on Matrix from Gitter. This could consume huge resources on Gitter, and we’re not in a rush to mirror all of Matrix into Gitter. This will get addressed when Gitter merges with Element into a pure Matrix client.
  • Bridging Reactions. Gitter doesn’t have these natively today, and rather than adding them to Gitter, we’d rather work on merging Gitter & Element together.

For more details, we strongly recommend checking out the native Matrix epic on Gitlab for the unvarnished truth straight from the coal-face!

How do you make an existing chat system talk Matrix?

In terms of the work which has gone into this - Gitter has been an excellent case study of how you can easily plug an existing large established chat system into Matrix.

At high level, the core work needed was as simple as:

  • Add ‘virtual users’, so remote Matrix users can be modelled and represented in Gitter correctly: https://gitlab.com/gitterHQ/webapp/-/merge_requests/2027/diffs.
    • This can be accomplished by simply adding a virtualUser property to your chat message/post/tweet schema which holds the mxid, displayName, and avatar as an alternative to your author field. Then display the virtualUser whenever available over the author.
  • Add an application service to Gitter to bridge traffic in & out of Matrix: https://gitlab.com/gitterHQ/webapp/-/merge_requests/2041/diffs
    • This "application service" comes pre-packaged for you in many cases, so for example you can simply drop in a library like matrix-appservice-bridge in a Node.js application, and all of the Matrix talking complexity is handled for you.
  • Polish it!

In practice, Eric (lead Gitter dev) laid out the waypoints of the full journey:

  1. First big step was to add the concept of virtual users to Gitter. We could also have created a new Gitter user for every new matrix ID that appears, but tagging them as virtual users is a bit cleaner.
  2. Figuring out how to balance the Matrix traffic coming into/out of Gitter.
    1. Spreading the inbound load comes for free via our existing load-balancer setup (ELB) where we already have 8 webapp servers running the various services of gitter.im. We just run the Matrix bridge on those servers alongside each web and api process, and then the load-balancer’s matrix.gitter.im spreads out to the servers.
    2. Events from Matrix then hit the load balancer and reach one of the servers (no duplication when processing events).
    3. If something on Gitter happens, the action occurs on one server and we just propagate it over to Matrix (no duplication or locking needed).
  3. We have realtime websockets and Faye subscriptions already in the app which are backed by Mongoose database hooks whenever something changes. We just tapped into the same thing to be able to bridge across new information to Matrix as we receive it on Gitter.
  4. Hooking up the official Matrix bridging matrix-appservice-bridge library to use Gitter’s existing MongoDB for storage instead of nedb.
  5. Figuring out how to namespace the mxids of the gitter users:
    1. It’s nice to have the mxid as human readable as possible instead of just the numerical userId in your service.
    2. But if people can change their username in your service, you can’t change your mxid on Matrix. In the future, we’ll have portable accounts in Matrix to support this (MSC2787) but sadly these are still vapourware at this point.
    3. If you naively just switch the user’s mxid when they rename their username, then you could end up leaking conversation history between mxids(!)
    4. So we went with @username-userid:gitter.im for the Matrix ID to make it a bit more human readable but also unique so any renames can happen without affecting anything.
  6. For room aliases, we decided to change our community/room URI syntax to underscores for the room aliases, #community_room:gitter.im
  7. Figuring out how to bridge features correctly;
    1. Emoji - mapping between :shortcode: and unicode emojis
    2. Mapping between Gitter threaded conversations <-> Matrix replies
    3. Mapping between Matrix mentions and Gitter mentions
  8. Keeping users and room data in sync
    1. We haven’t gotten there yet, but the data comes through the same Mongoose hook and we can update the bridged data as they change on the Gitter end.

Meanwhile, the Matrix side of gitter.im is hosted by Element Matrix Services and is a plain old Synapse, talking through to Gitter via the Application Service API. An alternative architecture would be to have got Gitter directly federating with Matrix by embedding a “homeserver library” into it (e.g. embedding Dendrite). However, given Dendrite is still beta and assumes it is storing its data itself (rather than persisting in an existing backend such as Gitter’s mongodb), we went for the simpler option to start with.

It’s been really interesting to see how this has played out week by week in the Gitter updates in This Week in Matrix: you can literally track the progress and see how the integration came to life between Oct 9, Oct 23, Nov 6, Nov 27 and finally Dec 4.

Huge thanks go to Eric Eastwood, the lead dev of Gitter and mastermind behind the project - and also to Half-Shot and Christian who’ve been providing all the support and review from the Matrix bridging team.

What’s next?

First and foremost we’re going to be working through the “What remains” section of the list above: killing off the old bridge, sorting out plumbed rooms, hooking up DMs, importing old Gitter history into Matrix, etc. This should then give us an exceptionally low impedance link between Gitter & Matrix.

Then, as per our original announcement, the plan is:

In the medium/long term, it’s simply not going to be efficient for the combined Element/Gitter team to split our efforts maintaining two high-profile Matrix clients. Our plan is instead to merge Gitter’s features into Element (or next generations of Element) itself and then - if and only if Element has achieved parity with Gitter - we expect to upgrade the deployment on gitter.im to a Gitter-customised version of Element. The inevitable side-effect is that we’ll be adding new features to Element rather than Gitter going forwards.

Now, that means implementing some features in Matrix/Element to match...

  • Instant live room peeking (less than a second to load the webapp into a live-view of a massive room with 20K users!!)
  • Seamless onboarding thanks to using GitLab & GitHub for accounts
  • Curated hierarchical room directory
  • Magical creation of rooms on demand for every GitLab and GitHub project ever
  • GitLab/GitHub activity as a first-class citizen in a room’s side-panel
  • Excellent search-engine-friendly static content and archives
  • KaTeX support for Maths communities
  • Threads!

...and this work is in full swing:

The only bits which aren’t already progressing yet are tighter GL/GH integration, and better search engine optimised static archives.

So, the plan is to get cracking on the rest of the feature parity, then merge Gitter & Element together - and meanwhile continue getting the rest of the world into Matrix :)

We live in exciting times: open standards-based interoperable communication is on the rise again, and we hope Gitter’s new life in Matrix is the beginning of a new age of cross-project developer collaboration, at last escaping the fragmentation we’ve suffered over the last few years.

Finally, please do give feedback via Gitter or Matrix (or mail!) on the integration and where you’d like to see it go next!

Thanks for flying Matrix and Gitter,

-- The Matrix Team

Dendrite 0.3.0 released

16.11.2020 17:44 — Releases Matthew Hodgson

Hi all,

Heads up that we just cut another beta release of Dendrite - now at 0.3.0!

This is a really fun release given almost all the changes are contributed from the wider community - so huge thanks to S7evinK, MayeulC and felix!

The main new feature is full Read Receipt support thanks to S7evinK, which makes an enormous perceptual improvement when using Dendrite - so especial thanks are due there :)

So, if you're interested in helping us test, please spin up a copy from https://github.com/matrix-org/dendrite and let us know how it goes - and if you're already running one, now is an excellent time to upgrade!

Full changelog (including 0.2.1, which we forgot to blog about) follows:

Dendrite 0.3.0 (2020-11-16)

Features

  • Read receipts (both inbound and outbound) are now supported (contributed by S7evinK)
  • Forgetting rooms is now supported (contributed by S7evinK)
  • The -version command line flag has been added (contributed by S7evinK)

Fixes

  • User accounts that contain the = character can now be registered
  • Backfilling should now work properly on rooms with world-readable history visibility (contributed by MayeulC)
  • The gjson dependency has been updated for correct JSON integer ranges
  • Some more client event fields have been marked as omit-when-empty (contributed by S7evinK)
  • The build.sh script has been updated to work properly on all POSIX platforms (contributed by felix)

Dendrite 0.2.1 (2020-10-22)

Fixes

  • Forward extremities are now calculated using only references from other extremities, rather than including outliers, which should fix cases where state can become corrupted (#1556)
  • Old state events will no longer be processed by the sync API as new, which should fix some cases where clients incorrectly believe they have joined or left rooms (#1548)
  • More SQLite database locking issues have been resolved in the latest events updater (#1554)
  • Internal HTTP API calls are now made using H2C (HTTP/2) in polylith mode, mitigating some potential head-of-line blocking issues (#1541)
  • Roomserver output events no longer incorrectly flag state rewrites (#1557)
  • Notification levels are now parsed correctly in power level events (gomatrixserverlib#228, contributed by Pestdoktor)
  • Invalid UTF-8 is now correctly rejected when making federation requests (gomatrixserverlib#229, contributed by Pestdoktor)

How we fixed Synapse's scalability!

03.11.2020 00:00 — Releases Matthew Hodgson

Hi all,

We had a major break-through in Synapse 1.22 which we want to talk about in more detail: Synapse now scales horizontally across multiple python processes.

Horizontal scaling means that you can support more users and traffic by adding in more python processes (spread over more machines, if necessary) without there being a single bottleneck which all the traffic is passing through - as opposed to vertical scaling where you make things go faster overall by making the bottleneck go faster.

After many years of having to vertically scale Synapse (by trying to make the main process go faster) we’re now finally at the point where you can configure Synapse so that messages no longer flow through the main process - eliminating the bottleneck entirely. What’s more, the Matrix.org homeserver has now been successfully running in this config and enjoying the massive scalability improvements for the last 2 weeks! Huge kudos goes to Erik and the wider Synapse team for pulling this off.

Some readers might wonder how this ties in with Dendrite entering beta, given one of Dendrite’s design goals is full horizontal scalability. The answer is that we’re very much using Dendrite for experimentation and next-gen stuff at the moment (currently focused more on scaling downwards for P2P rather than scaling upwards for megaservers) - while Synapse is the stable and long-term supported option.

So, that’s the context - now over to Erik with more than you could possibly ever want to know about how we actually did it...

Background

Synapse started life off back in 2014 as a single monolithic python process, and for quite a while we made it scale by adding more and more in-memory caches to speed things up by avoiding hitting the database (at the expense of RAM). It looked like this:

Eventually the caches stopped helping and we needed more than one thread of execution in order to spread CPU across multiple cores. Python’s Global Interpreter Lock (GIL) means that Python can mainly only use one CPU core at a time, so starting more threads doesn’t help with scalability - you have to run multiple processes.

Now, the vast majority of the work that Synapse does is related to “streams”. These are append only sequences of rows, such as the events stream, typing stream, receipts stream, etc. When a new event arrives (for example) we write it to the events stream, and then notify anything waiting that there has been an update. The /sync endpoint, for instance, will wait for updates to streams and send them down to long-polling Matrix clients.

Streams support being added to concurrently, so have a concept of the “persisted up-to position”. This is the point where all rows before that point have finished persisting. Readers only read up to the current “persisted up-to position”, so that they don’t skip updates that haven’t finished persisting at that point. (E.g. if two events A and B get assigned positions 5 and 6, but B finishes persisting first, then the persisted up to position will remain at 4 until A finishes persisting and then it jumps to 6).

To split any meaningful amount of work into separate processes, we need to add a mechanism where processes can be told that updates to streams have happened (otherwise they’d have to repeatedly poll the DB, which would be deeply inefficient). The architecture ended up being one where we had the “main” process that streams updates via a custom replication protocol (initially long-polling HTTP; later custom TCP) to any number of “worker” processes. This meant that we could move sync stream handling (and other read apis) off the main process and onto workers, but also that all database writes had to go through the single main process (as it was a star topology, the main process could talk to all workers but workers could only talk to the main process and not each other).

2020-11-03-synapse2.png

As an aside: cache invalidations also had to be streamed down the replication connections, which has the side effect that we could only cache things that would only be invalidated on the main process.

We continued to move more and more read APIs out onto separate workers. We also added workers in front of the main process that would e.g. handle the creation of the new events, authenticating, etc, and then call out to the main process with the event for it to persist the event.

Moving writes off the main process

Eventually we ran out of stuff to move out of the main process that didn’t involve writing to the DB. To write stuff from other processes we needed a way for the workers to stream updates to each other. The easiest and most obvious way was to just use Redis and its pub/sub support.

2020-11-03-synapse3.png

This almost allowed us to move writing of a particular stream to a different worker, except writing to streams generally also meant invalidating caches which in itself requires writing to a stream. We needed a way of writing to the cache invalidation stream from multiple workers at once.

Sharding the cache invalidation thankfully turned out to be easy, as workers would simply call the cache invalidation function whenever they get an invalidation notice over replication. In particular, the ordering of invalidations from different workers doesn’t matter and so there isn’t a need to calculate a single “persisted up-to position”. Sharding then just becomes a case of adding the name of the worker that is writing the update to the replication stream, and then workers reading from it can basically treat the cache stream the same as if they were multiple streams, one per worker.

This then unlocks the ability to move writing of streams off the main process and onto different workers - and so we added the “event persister” worker for offloading the main event stream off the main process:

2020-11-03-synapse4.png

Sharding the events stream

Eventually the worker responsible for doing nothing but persisting events started maxing out CPU. This meant that we had to look at sharding the events stream, i.e. writing to it from multiple workers.

This is more complicated than sharding the cache invalidation stream as the ordering of the events does matter; we send them down sync streams, in order, with a token that indicates where the sync stream is up to in the events stream. This means that workers need to be able to calculate a “persisted up-to position” when getting updates from different workers.

The easiest way of doing that is to simply set the persisted up-to position as the minimum position received over federation from all active writers. This works, except events would only be processed after all other writers have subsequently written events (to advance the persisted position past the point at which the event was written), which can add a lot of latency depending on how often events are written.

A refinement is to note that if you have a persisted up-to position of 10, then receive updates at sequential positions 11, 12, 13 and 14, you know that everything between 10 and 14 has finished persisting (as you received updates about them), and so can set the persisted up-to position to 14. Annoyingly, it’s not required that positions are sequential without gaps (due to various technical considerations), and so in the worst case this still has the same problems as the naïve solution.

To avoid these problems we change the persisted up-to position to be a vector clock of positions; tracking a vector of positions - one per writer. This still allows answering the query of “get all events after token X” (as events are written with the position and the name of the writer). The persisted up-to position is then calculated by just tracking the last position seen to arrive over replication from each writer.

This allows writing events from multiple workers, while ensuring that other workers can correctly keep track of a “persisted up-to position”. Then it's just a matter of inspecting the code to ensure that it does not assume that it is the only writer to the stream. In the case of writing to the events stream, we note that the function persisting events assumes it's the only writer for a given room, so when sharding we have to ensure that there are no concurrent writes to the same room. This is most easily done by sharding based on room ID, and ensuring that the mapping of room ID to worker does not change (without coordination).

The only thing left is to then encode the vector clock position into the sync tokens. We want to ensure that these tokens are not too long, as they get included as query string parameters (e.g. the since= parameter of /sync). By assigning persistent unique integer IDs to workers the vector clock can be persisted as a sequence of pairs of integers, which is relatively few bytes so long as we don’t have too many workers writing to the events stream. We can further reduce the size of the tokens by calculating an integer “persisted up-to position” as we did before, encoding that and only including positions for workers that are larger than the integer persisted upto position. (The idea here is that most of the time only a small number of workers will be ahead of the calculated persisted up-to position, and so we only need to encode those).

And this is what we have today:

2020-11-03-synapse5.png

The major limitation of the current situation is that you can’t dynamically add/remove workers which persist events, as the sharding by room ID is calculated at startup, and so changing it requires restarting the whole system. This could be replaced by any system that allowed coordination over which persister is allowed to write to a room at any given point. However this is likely tricky to get right in practice, but would allow dynamic auto scaling of deployments, or automatically recovering from a worker that gets wedged/dies.

Finally, it’s worth noting that sharding event persisters isn’t the only performance work that’s been going on - switching everything over to python 3 and async twisted has helped, along with lots of smaller optimisations on the hot paths, and further rebalancing workers (e.g. moving background jobs off the master process to dedicated workers). We’ve also benefited a lot from the maintainability of rolling out mypy typing throughout the codebase. And next up, we’ll be going back to speeding up the codebase as a whole - starting with algorithmic state resolution improvements! 🎉

Performance

So, how does it stack up?

Here’s the send time heatmap on Matrix.org showing the step change on Oct 16th when we rolled out the second event persister (full disclosure: this also coincides with moving background processes off the main Synapse process to a background worker). As you can see, we go from messages being spread over a huge range of durations (up to several seconds) to the sweet spot being 50ms or less - a spectacular improvement!

2020-11-03-synapse-heatmap.png

Meanwhile, here’s the actual CPU utilisation as we split the traffic from a single event persister (yellow) to two persisters (one yellow, one blue), showing the sharding beautifully horizontally balancing CPU between the two active/active worker processes:

2020-11-03-synapse-cpu.png

We’ve yet to loadtest to see just how fast we can go now (before we start hitting bottlenecks on the postgres cluster), but it sure feels good to have all our CPU headroom back on Matrix.org again, ready for the next wave of users to arrive.

Conclusion

So there you have it: folks running massive homeservers (50K+ concurrent users) like Matrix.org (and cough various high profile public sector deployments) are no longer held hostage by the bottleneck of the main synapse process and should feel free to experiment with setting up event persister workers to handle high traffic loads. Otherwise, if you can spread your users over smaller servers, that’s also a good bet (assuming they don’t have massively overlapping room membership, like we see on Matrix.org.)

The current worker documentation is up-to-date, although does assume you are already very familiar with how to administer Synapse. It’s also very much subject to change, as we keep adding new workers and improving the architecture. However, now is a pretty good time to get involved if you’re interested in large-scale Matrix deployments.

-- The Synapse Team

Dendrite 0.2.0 released

20.10.2020 19:35 — Releases Matthew Hodgson

Hi all,

It's been over a week since our next-generation homeserver Dendrite entered beta, and it's been a wild rollercoaster ride as the team has been frantically zapping all the initial teething issues that came up - mostly around room federation getting 'stuck' due to needing to fix bugs in how room state is managed. Huge huge thanks to everyone who has spun up a Dendrite to experiment and report bugs!

We're now in an impressively better place, and it's feeling way more stable now (but please don't trust it with your data yet). So we've skipped 0.1.x and jumped straight to 0.2.0.

Now would be a great time for more intrepid explorers to try spinning up a server from https://github.com/matrix-org/dendrite and see how it feels - the more feedback the better. And if you got scared off by weird bugs in 0.1.0, now's the right time to try it again!

Full changelog follows:

Dendrite 0.2.0 (2020-10-20)

Important

  • This release makes breaking changes for polylith deployments, since they now use the multi-personality binary rather than separate binary files
    • Users of polylith deployments should revise their setups to use the new binary - see the Features section below
  • This release also makes breaking changes for Docker deployments, as are now publishing images to Docker Hub in separate repositories for monolith and polylith
    • New repositories are as follows: matrixdotorg/dendrite-monolith and matrixdotorg/dendrite-polylith
    • The new latest tag will be updated with the latest release, and new versioned tags, e.g. v0.2.0, will preserve specific release versions
    • Sample Compose configs have been updated - if you are running a Docker deployment, please review the changes
    • Images for the client API proxy and federation API proxy are no longer provided as they are unsupported - please use nginx (or another reverse proxy) instead

Features

  • Dendrite polylith deployments now use a special multi-personality binary, rather than separate binaries
    • This is cleaner, builds faster and simplifies deployment
    • The first command line argument states the component to run, e.g. ./dendrite-polylith-multi roomserver
  • Database migrations are now run at startup
  • Invalid UTF-8 in requests is now rejected (contributed by Pestdoktor)
  • Fully read markers are now implemented in the client API (contributed by Lesterpig)
  • Missing auth events are now retrieved from other servers in the room, rather than just the event origin
  • m.room.create events are now validated properly when processing a /send_join response
  • The roomserver now implements KindOld for handling historic events without them becoming forward extremity candidates, i.e. for backfilled or missing events

Fixes

  • State resolution v2 performance has been improved dramatically when dealing with large state sets
  • The roomserver no longer processes outlier events if they are already known
  • A SQLite locking issue in the previous events updater has been fixed
  • The client API /state endpoint now correctly returns state after the leave event, if the user has left the room
  • The client API /createRoom endpoint now sends cumulative state to the roomserver for the initial room events
  • The federation API /send endpoint now correctly requests the entire room state from the roomserver when needed
  • Some internal HTTP API paths have been fixed in the user API (contributed by S7evinK)
  • A race condition in the rate limiting code resulting in concurrent map writes has been fixed
  • Each component now correctly starts a consumer/producer connection in monolith mode (when using Kafka)
  • State resolution is no longer run for single trusted state snapshots that have been verified before
  • A crash when rolling back the transaction in the latest events updater has been fixed
  • Typing events are now ignored when the sender domain does not match the origin server
  • Duplicate redaction entries no longer result in database errors
  • Recursion has been removed from the code path for retrieving missing events
  • QueryMissingAuthPrevEvents now returns events that have no associated state as if they are missing
  • Signing key fetchers no longer ignore keys for the local domain, if retrieving a key that is not known in the local config
  • Federation timeouts have been adjusted so we don't give up on remote requests so quickly
  • create-account no longer relies on the device database (contributed by ThatNerdyPikachu)

Known issues

  • Old events can incorrectly appear in /sync as if they are new when retrieving missing events from federated servers, causing them to appear at the bottom of the timeline in clients
  • Memory can explode when catching up after a federation outage.

Combating abuse in Matrix - without backdoors.

19.10.2020 00:00 — General Matthew Hodgson

UPDATE: Nov 9th 2020

Not only are UK/US/AU/NZ/CA/IN/JP considering mandating backdoors, but it turns out that the Council of the European Union is working on it too, having created an advanced Draft Council Resolution on Encryption as of Nov 6th, which could be approved by the Council as early as Nov 25th if it passes approval. This doesn't directly translate into EU legislation, but would set the direction for subsequent EU policy.

Even though the Draft Council Resolution does not explicitly call for backdoors, the language used...

Competent authorities must be able to access data in a lawful and targeted manner

...makes it quite clear that they are seeking the ability to break encryption on demand: i.e. a backdoor.

Please help us spread the word that backdoors are fundamentally flawed - read on for the rationale, and an alternative approach to combatting online abuse.


Hi all,

Last Sunday (Oct 11th 2020), the UK Government published an international statement on end-to-end encryption and public safety, co-signed by representatives from the US, Australia, New Zealand, Canada, India and Japan. The statement is well written and well worth a read in full, but the central point is this:

We call on technology companies to [...] enable law enforcement access to content in a readable and usable format where an authorisation is lawfully issued, is necessary and proportionate, and is subject to strong safeguards and oversight.

In other words, this is an explicit request from seven of the biggest governments in the world to mandate a backdoor in end-to-end encrypted (E2EE) communication services: a backdoor to which the authorities have a secret key, letting them view communication on demand. This is big news, and is of direct relevance to Matrix as an end-to-end encrypted communication protocol whose core team is currently centred in the UK.

Now, we sympathise with the authorities’ predicament here: we utterly abhor child abuse, terrorism, fascism and similar - and we did not build Matrix to enable it. However, trying to mitigate abuse with backdoors is, unfortunately, fundamentally flawed.

  • Backdoors necessarily introduce a fatal weak point into encryption for everyone, which then becomes the ultimate high value target for attackers. Anyone who can determine the secret needed to break the encryption will gain full access, and you can be absolutely sure the backdoor key will leak - whether that’s via intrusion, social engineering, brute-force attacks, or accident. And even if you unilaterally trust your current government to be responsible with the keys to the backdoor, is it wise to unilaterally trust their successors? Computer security is only ever a matter of degree, and the only safe way to keep a secret like this safe is for it not to exist in the first place.

  • End-to-end encryption is nowadays a completely ubiquitous technology; an attempt to legislate against it is like trying to turn back the tide or declare a branch of mathematics illegal. Even if Matrix did compromise its encryption, users could easily use any number of other approaches to additionally secure their conversations - from PGP, to OTR, to using one-time pads, to sharing content in password-protected ZIP files. Or they could just switch to a E2EE chat system operating from a jurisdiction without backdoors.

  • Governments protect their own data using end-to-end encryption, precisely because they do not want other governments being able to snoop on them. So not only is it hypocritical for governments to argue for backdoors,** it immediately puts their own governmental data at risk of being compromised**. Moreover, creating infrastructure for backdoors sets an incredibly bad precedent to the rest of the world - where less salubrious governments will inevitably use the same technology to the massive detriment of their citizens’ human rights.

  • Finally, in Matrix’s specific case: Matrix is an encrypted decentralised open network powered by open source software, where anyone can run a server. Even if the Matrix core team were obligated to add a backdoor, this would be visible to the wider world - and there would be no way to make the wider network adopt it. It would just damage the credibility of the core team, push encryption development to other countries, and the wider network would move on irrespectively.

In short, we need to keep E2EE as it is so that it benefits the 99.9% of people who are good actors. If we enforce backdoors and undermine it, then the bad 0.1% percent simply will switch to non-backdoored systems while the 99.9% are left vulnerable.

We’re not alone in thinking this either: the GDPR (the world-leading regulation towards data protection and privacy) explicitly calls out robust encryption as a necessary information security measure. In fact, the risk of US governmental backdoors explicitly caused the European Court of Justice to invalidate the Privacy Shield for EU->US data. The position of the seven governments here (alongside recent communications by the EU commissioner on the ‘problem’ of encryption) is a significant step back on the protection of the fundamental right of privacy.

So, how do we solve this predicament for Matrix?

Thankfully: there is another way.

This statement from the seven governments aims to protect the general public from bad actors, but it clearly undermines the good ones. What we really need is something that empowers users and administrators to identify and protect themselves from bad actors, without undermining privacy.

What if we had a standard way to let users themselves build up and share their own views of whether other users, messages, rooms, servers etc. are obnoxious or not? What if you could visualise and choose which filters to apply to your view of Matrix?

Just like the Web, Email or the Internet as a whole, there is literally no way to unilaterally censor or block content in Matrix. But what we can do is provide first-class infrastructure to let users (and room/community moderators and server admins) make up their own mind about who to trust, and what content to allow. This would also provide a means for authorities to publish reputation data about illegal content, providing a privacy-respecting mechanism that admins/mods/users can use to keep illegal content away from their servers/clients.

The model we currently have in mind is:

  • Anyone can gather reputation data about Matrix rooms / users / servers / communities / content, and publish it to as wide or narrow an audience as they like - providing their subjective score on whether something in Matrix is positive or negative in a given context.
  • This reputation data is published in a privacy preserving fashion - i.e. you can look up reputation data if you know the ID being queried, but the data is stored pseudonymised (e.g. indexed by a hashed ID).
  • Anyone can subscribe to reputation feeds and blend them together in order to inform how they filter their content. The feeds might be their own data, or from their friends, or from trusted sources (e.g. a fact-checking company). Their blended feed can be republished as their own.
  • To prevent users getting trapped in a factional filter bubble of their own devising, we’ll provide UI to visualise and warn about the extent of their filtering - and make it easy and fun to shift their viewpoint as needed.
  • Admins running servers in particular jurisdictions then have the option to enforce whatever rules they need on their servers (e.g. they might want to subscribe to reputation feeds from a trusted source such as the IWF, identifying child sexual abuse content, and use it to block it from their server).
  • This isn’t just about combating abuse - but the same system can also be used to empower users to filter out spam, propaganda, unwanted NSFW content, etc on their own terms.

This forms a relative reputation system. As uncomfortable as it may be, one man’s terrorist is another man’s freedom fighter, and different jurisdictions have different laws - and it’s not up to the Matrix.org Foundation to play God and adjudicate. Each user/moderator/admin should be free to make up their own mind and decide which reputation feeds to align themselves with. That is not to say that this system would help users locate extreme content - the privacy-preserving nature of the reputation data means that it’s only useful to filter out material which would otherwise already be visible to you - not to locate new content.

In terms of how this interacts with end-to-end-encryption and mitigating abuse: the reality is that the vast majority of abuse in public networks like Matrix, the Web or Email is visible from the public unencrypted domain. Abusive communities generally want to attract/recruit/groom users - and that means providing a public front door, which would be flagged by a reputation system such as the one proposed above. Meanwhile, communities which are entirely private and entirely encrypted typically still have touch-points with the rest of the world - and even then, the chances are extremely high that they will avoid any hypothetical backdoored servers. In short, investigating such communities requires traditional infiltration and surveillance by the authorities rather than an ineffective backdoor.

Now, this approach may sound completely sci-fi and implausibly overambitious (our speciality!) - but we’ve actually started successfully building this already, having been refining the idea over the last few years. MSC2313 is a first cut at the idea of publishing and subscribing to reputation data - starting off with simple binary ban rules. It’s been implemented and in production for over a year now, and is used to maintain shared banlists used by both matrix.org and mozilla.org communities. The next step is to expand this to support a blendable continuum of reputation data (rather than just binary banlists), make it privacy preserving, and get working on the client UX for configuring and visualising them.

Finally: we are continuing to hire a dedicated Reputation Team to work full time on building this (kindly funded by Element). This is a major investment in the future of Matrix, and frankly is spending money that we don’t really have - but it’s critical to the long-term success of the project, and perhaps the health of the Internet as a whole. There’s nothing about a good relative reputation system which is particularly specific to Matrix, after all, and many other folks (decentralised and otherwise) are clearly in desperate need of one too. We are actively looking for funding to support this work, so if you’re feeling rich and philanthropic (or a government wanting to support a more enlightened approach) we would love to hear from you at [email protected]!

Here’s to a world where users have excellent tools to protect themselves online - and a world where their safety is not compromised by encryption backdoors.

-- The Matrix.org Core Team

*Comments at HN, lobste.rs, and r/linux, LWN

Dendrite is entering Beta!

08.10.2020 00:00 — Releases Matthew Hodgson

Hi all,

We’re very excited to announce that Dendrite, the next-generation Matrix homeserver from the core Matrix team, is at last exiting alpha development and entering beta testing!

The path we’ve taken to get here has been quite a curious one, and it’s worth recapping to give context on why it’s taken reality a little while to catch up with the dream. :)

The Dendrite project has its roots in 2016 as Dendron: an attempt to write a next-generation homeserver in Golang rather than Python, in order to benefit from Go’s stronger typing, ease of profiling (no twisted stack-shredding via deferredInlineCallbacks), multithreading and faster GC performance. The idea for Dendron was to do a strangler pattern rewrite of Synapse - where we’d insert Dendron in front of Synapse as a load balancer, and incrementally replace Synapse’s API endpoints with ones implemented by Dendron.

However, as the project started to progress, it became clear that this was going to end up with many of Synapse’s architectural choices being baked into the project - particularly the DB schema and data flow architecture, such that the new endpoints could interoperate with the existing Python ones. We got as far as putting Dendron live on matrix.org and moving some of the login/registration APIs over to it… but then work fizzled out due to Synapse demanding more urgent attention as traffic grew on Matrix.org, combined with concerns about whether Dendron was the right approach in general.

So, towards the end of 2016 (after the rush to launch Vector Riot Element that summer), we went back to the drawing board to devise Dendrite—“Dendron done right!”—as opposed to Dendron, which in retrospect was Dendrite done wrong. ;) The new vision was:

  • Build a massively horizontally scalable architecture, such that large Matrix deployments like matrix.org and big government deployments could run smoothly without the constant scalability headaches we were seeing at the time with Synapse
  • Do so by splitting the server into well-defined microservice components, each of which could independently horizontally scale, each with its own DB (if desired)
  • Connect the components together with a set of append-only logs via Kafka or similar, easily letting components shard and maintain their databases from the logs, allowing rolling upgrades, possibly schema upgrades, and all sorts of other niceties. The logs effectively become a primary source of truth rather than putting all the onus on a massive monolithic ever-growing database

Rather than Dendron’s top-down approach, instead Dendrite started bottom-up with the very hardest bit: gomatrixserverlib, a standalone Go library implementing the state resolution algorithms and performing federation requests (such that it might also someday be used as a general purpose way to add Matrix federation support to an existing Go codebase).

Then we started building out the various components to implement the various services, starting with the roomserver (the service which models the history and state of one or more rooms in the server), then the syncserver (the service which implements the /sync API to let clients receive messages), etc. We even implemented a simplified in-memory version of Kafka named naffka—useful for glueing together the microservice components when running them all within a single binary.

Things were looking pretty positive by the summer of 2017: we had the server sending/receiving messages, federating with Synapse, and looking tantalisingly close:

We just sent the first ever synapse->dendrite federated traffic, including full dendrite media API (thumbnailing, fed, etc)!!! :D :D :D pic.twitter.com/sBcM2jMAr6

— Matrix (@matrixdotorg) June 8, 2017

However, we then hit three fairly major obstacles:

  • Matrix lost its funding
  • In the ensuing uncertainty, the two lead developers (Mjark & Kegan) went to work elsewhere
  • Meanwhile, Matrix uptake was starting to explode and Synapse was failing to scale to handle the traffic on matrix.org (and elsewhere)

At first, having formed what would become New Vector (now Element) to keep the rest of the core team hired, we pushed to see if we could get Dendrite finished fast enough to replace Synapse, with Erik & richvdh jumping over from Synapse to pick up the remaining work. However, it became clear that we urgently needed a quicker solution to address all the overloaded Synapses out there, and so they swung back to focus on improving Synapse (taking inspiration from some of the design of Dendrite - e.g. offloading endpoints onto worker processes connected via replication streams, and using OpenTracing to debug traffic as it flows over the various services).

At this point, Dendrite maintenance was in effect valiantly taken over by the community, with Brendan and later Anoa keeping the ball going in 2017, joined by APWhitehat in GSoC 2018 and cnly in GSoC 2019. The fact that Dendrite is now here today is thanks in no small part to their work to keep the project alive in its “wilderness years” between Sept 2017 and Dec 2019.

Meanwhile, it became clear that we were overdue getting Matrix itself out of beta - and the last thing we wanted to do was to split and dilute the implementation work of Matrix 1.0 over both Synapse and Dendrite - so we consciously made the decision to focus all our effort on Synapse for solving the remaining bugs and challenges.

Then, in July 2019, Matrix and Synapse exited beta, and we finally started to see light at the end of the tunnel. In October we started dusting off Dendrite again - looking to use it as a relatively simple and flexible codebase for experimenting with Peer-to-Peer Matrix, not least because being Go it can compile to WebAssembly and run clientside, and because even though Dendrite was originally built with massive deployments in mind, it turns out the elastic scaling means it can also scale down pretty small too—as a part of the iOS P2P demo, we’ve even ran full Dendrite homeservers on iPhones embedded into Element iOS! :)

In Dec 2019, we finally got to the point where Element could fund full-time dedicated development on Dendrite once again, with Neil Alexander joining the project and focusing fulltime on getting Dendrite out of alpha and getting it working for P2P and embedded usage (adding libp2p as a federation transport, and adding SQLite support) - and in Jan 2020 we got Dendrite successfully running clientside in a WASM service worker (just in time for FOSDEM!). Then, in Feb 2020, Kegan returned to the project to work fulltime on Dendrite - and the race began in earnest to get Dendrite ready for beta!

Here’s a pretty picture courtesy of GitHub to visualise the progress:

020-10-08-dendrite-contributors.png

Throughout 2020 there’s been a huge amount of stabilisation work and polish:

  • Refactoring much of Dendrite’s foundation to make the codebase more maintainable
  • Created all-new user server, key server, signing key server microservices
  • Moving some work from existing microservices (ultimately superseding the former currentstateserver, publicroomsapi and typingserver microservices altogether)
  • Developing new testing infrastructure:
    • Complement - our brand new Golang Matrix integration test harness
    • Are We Synapse Yet - an aggregator which parses sytest/complement output to compare how close Dendrite is to passing
  • All the Matrix 1.0 work - particularly state res v2 & room version support
  • Making it work with more P2P transports for all the exciting P2P experiments
  • Supporting backfill and fetching missing events
  • Fixing up SQLite support to make it work as a first class citizen (with shared storage code where we can!)
  • Supporting both sending and rejecting invites (even over federation)
  • E2E encryption support (one-time keys, device lists, send-to-device support)
  • Improved federation sender logic (resend retries, backoffs, blacklisting, metrics, resetting backoffs when receiving transactions)
  • Handling both inbound and outbound redactions
  • User interactive authentication (and implemented on various ‘sudo’ endpoints e.g. deleting devices and changing passwords)
  • Respecting server ACLs
  • Rejecting / soft-failing events properly
  • Support for database schema upgrades

... which brings us at last to the present day (Oct 2020), as we declare Dendrite sufficiently stable that we consider it ready for beta testing!

In practice, this means **Dendrite is now ready for experimentation by adventurous Matrix sysadmins. It is NOT ready for production usage yet, but we need folks to test it and help us iron out the remaining bugs! **Please do not trust it with sensitive data yet, and we don’t recommend trying to run it at scale yet as we haven’t done any serious optimisation work yet.

That said, we do provide the following guarantees:

  • We’re providing versioned releases from here on in, beginning with 0.1.0
  • We don’t expect any major breaking changes to the config or architecture before 1.0
  • Ready for early adopters to try running Dendrite without experiencing ~daily breaking churn
  • The database schema is now stable and will upgrade itself going forwards - your database should now be here to stay! (assuming we don’t hit any nasty data loss bugs during beta)

In terms of comparison with Synapse, the main things you should get excited about are:

  • Dendrite aims to provide an efficient, reliable and scalable alternative to Synapse:
    • Efficient: A small memory footprint with better baseline performance than an out-of-the-box Synapse
    • Reliable: Implements the Matrix specification as written, using the same test suite as Synapse as well as a brand new Go test suite
    • Scalable: can run on multiple machines and eventually scale to massive homeserver deployments
  • This means significantly less memory usage than Synapse (depends on joined rooms, often between 50MB - 400MB resident memory) - although we haven’t tuned this at all yet!
  • All-new database model, where every microservice instance has its own database tables, letting them scale arbitrarily wide
  • The ability to efficiently use all your available CPU cores without needing to split into separate processes, thanks to Go and our extensive use of goroutines. No more Python global interpreter lock! :)
  • Future experimental MSCs are likely to land in Dendrite before Synapse (e.g MSC2753 Peeking via /sync and MSC2444 Peeking over Federation are already being prototyped (#1370 and #1391) in Dendrite rather than Synapse!)

The provisos you should know about however are:

  • We’re not feature complete yet: sytest reports 56% CS API coverage and 77% Federation coverage. NB: these are always going to be underestimates of how much Dendrite actually performs due to how the tests are spread out, in actuality it’s likely more 70% CS, 95% Fed.
  • No read receipts, membership lazy-loading, presence, push notifications, search, event context, key backups, cross-signing. See changelog for full limitations.
  • Not battle-tested in the wild by many people (there are probably only ~10 dendrites on the open network today!) - so there’s likely to be a broad spectrum of bugs at first.
  • Clients that require more exotic features, like lazy loading, may not behave properly yet
  • Please use Postgres rather than SQLite wherever possible—it’s faster and has fewer issues regarding concurrency (some requests on SQLite Dendrites may 500 with ‘database is locked’ - though we’ve worked hard to eliminate most of these)
  • Dendrite can run in either “monolith” or “polylith” mode. In monolith, all the microservices are linked into a single binary - and we recommend running in this configuration wherever possible for now. Monolith mode is extremely capable as it is and has fewer moving parts for things to go wrong and will be the right choice for the majority of beta deployments!
  • Whilst Dendrite is nearly 100% federation compatible, there may still be situations where it will split-brain and disagree with the current room state that Synapse has calculated. We expect these issues to resolve as we get more user feedback.

Architecture-wise, this is what Dendrite looks like under the hood today:

2020-10-08-dendrite-arch.svg

To get up and running, please install Go and head on over to the Get Started guide at https://github.com/matrix-org/dendrite#get-started to join the fun :)

In terms of where we’re going next:

  • Read receipts. It’s a major missing feature and impacts UX significantly.
  • 100% Federation coverage (according to sytest). It’s crucial that Dendrite instances play nicely with other servers. This will be the best metric we have for asserting that we are just as capable as Synapse at the fed level.
  • Optimisation—Dendrite has not been optimised yet for speed or resource utilisation!
    • We plan to add benchmarks which will stress test different microservices in the presence of many different scaling factors (number of users, number of rooms, size of room, number of devices per user, number of sync requests, etc). This will hopefully allow us to identify early on bottlenecks and slow algorithms
    • Good old fashioned pprof with known slow scenarios to see what’s consuming CPU/memory and fixing issues ad-hoc (which we’ve already done a bit of pre-beta). This may involve adding additional in-memory caches, with a healthy respect for the complexities it may introduce (which Synapse has been bitten by)
  • We plan to add first class feature flag support for experimental MSCs—experimentation is one thing which makes Dendrite notably different from Synapse, and supporting it more thoroughly going forwards will be important. This may mean adding additional hooks; potentially a dedicated microservice to cleanly separate experiments, we don’t know yet
  • P2P work will continue with vigour now we have a working, featureful, and relatively stable HS to embed and play with

Longer term, it’s pretty hard to say right now when we expect to exit beta (it took Synapse 5 years to exit beta, after all ;) - but obviously we’ll need Dendrite to have parity with Synapse and have no known serious bugs.

Finally: you’re probably wondering what this means for Synapse. Synapse is here to stay - with tens of thousands of deployments around the world serving tens of millions of users. The majority of the core team is still focused on improving and optimising Synapse, and we’ll be keeping improving it for the foreseeable.

However, we’ll certainly be experimenting with new stuff on Dendrite first - whether that’s P2P, portable accounts, new-style communities, peeking etc. We expect Synapse to be the stable long-term-supported solution, while Dendrite (particularly while in beta) will be the more unstable and experimental platform. In the longer term we’ll provide ways of migrating from Synapse to Dendrite however (probably via portable accounts), and perhaps in future new deployments may choose to use Dendrite - a bit like you might choose to use nginx rather than Apache for a new web server these days. But this will be a long transition—meanwhile we expect to see more and more next-generation homeservers like Conduit, Mascarene or Construct coming of age too.

So, there you have it. If you’re an intrepid sysadmin please spin up a Dendrite and start filing bugs! :)

— Matthew, Neil Alexander, Kegan and the whole Matrix team.

Here’s the official changelog:

Client-Server API Features

Account registration and management

  • Registration: By password only.
  • Login: By password only. No fallback.
  • Logout: Yes.
  • Change password: Yes.
  • Link email/msisdn to account: No.
  • Deactivate account: Yes.
  • Check if username is available: Yes.
  • Account data: Yes.
  • OpenID: No.

Rooms

  • Room creation: Yes, including presets.
  • Joining rooms: Yes, including by alias or ?server_name=.
  • Event sending: Yes, including transaction IDs.
  • Aliases: Yes.
  • Published room directory: Yes.
  • Kicking users: Yes.
  • Banning users: Yes.
  • Inviting users: Yes, but not third-party invites.
  • Forgetting rooms: No.
  • Room versions: All (v1 - v6)
  • Tagging: Yes.

User management

  • User directory: Basic support.
  • Ignoring users: No.
  • Groups/Communities: No.

Device management

  • Creating devices: Yes.
  • Deleting devices: Yes.
  • Send-to-device messaging: Yes.

Sync

  • Filters: Timeline limit only. Rest unimplemented.
  • Deprecated /events and /initialSync: No.

Room events

  • Typing: Yes.
  • Receipts: No.
  • Read Markers: No.
  • Presence: No.
  • Content repository (attachments): Yes.
  • History visibility: No, defaults to joined.
  • Push notifications: No.
  • Event context: No.
  • Reporting content: No.

End-to-End Encryption

  • Uploading device keys: Yes.
  • Downloading device keys: Yes.
  • Claiming one-time keys: Yes.
  • Querying key changes: Yes.
  • Cross-Signing: No.

Misc

  • Server-side search: No.
  • Guest access: Partial.
  • Room previews: No, partial support for Peeking via MSC2753.
  • Third-Party networks: No.
  • Server notices: No.
  • Policy lists: No.

Federation Features

  • Querying keys (incl. notary): Yes.
  • Server ACLs: Yes.
  • Sending transactions: Yes.
  • Joining rooms: Yes.
  • Inviting to rooms: Yes, but not third-party invites.
  • Leaving rooms: Yes.
  • Content repository: Yes.
  • Backfilling / get_missing_events: Yes.
  • Retrieving state of the room (/state and /state_ids): Yes.
  • Public rooms: Yes.
  • Querying profile data: Yes.
  • Device management: Yes.
  • Send-to-Device messaging: Yes.
  • Querying/Claiming E2E Keys: Yes.
  • Typing: Yes.
  • Presence: No.
  • Receipts: No.
  • OpenID: No.

Welcoming Gitter to Matrix!

30.09.2020 16:28 — General Matthew Hodgson
Last update: 30.09.2020 14:58

Gitter ♥️ Matrix

Hi all,

We are ridiculously excited to announce that Gitter is joining the Matrix ecosystem and will become the first major existing chat platform to switch to natively speaking Matrix!

If you’re reading this from the Gitter community and have no idea what Matrix is: we’re an open source project that provides an open protocol for secure, decentralised communication - effectively the missing real-time communication layer of the open Web. The open Matrix network has more than 20M users on it and is growing fast (adding another 1.7M or so with the arrival of Gitter!)

Gitter is easily one of the best developer community chat systems out there, used by the communities of some massive projects (Node, TypeScript, Angular, Scala etc) and is a custodian of huge archives of knowledge via their chat logs. Gitter is unique in specifically focusing on developers: their tagline is literally “Where developers come to talk” (unlike Slack, which has barely any community features - or Discord, with its ban on unofficial clients, where developers are a bit of an afterthought relative to the gamers). With Gitter natively joining Matrix, we’re super excited to see the global developer community converging on the open Matrix network - and Gitter’s community rooms should see a huge new lease of life as they’re properly made natively available to the wider network as first class citizens :)

We’ve always had a bit of a crush on Gitter ever since we ended up opposite each other in the exhibition hall at TechCrunch Disrupt Europe 2014 - particularly when they demoed us not only their sexy webapp but also their official IRC server bridge at irc.gitter.im :D Over the years we’ve been gently nudging them to consider fully embracing Matrix, but perhaps understandably they’ve been busy focusing on their own stuff. However, earlier this year, our friends at GitLab (who acquired Gitter in 2017) reached out to explore the opportunity of Gitter becoming a core part of Matrix rather than a non-core project at GitLab… and we’ve jumped on that opportunity to bring Gitter fully into Matrix.

In practice, the way this is happening is that Element (the company founded by the Matrix core team to fund Matrix development) is acquiring Gitter from GitLab, with a combined Gitter and Element dev team focusing on giving Gitter a new life in Matrix! You can read about it from the Element angle over on the Element blog.

Practically speaking, we have a pretty interesting plan here, which we’d like to be very transparent about given it’s a little unusual:

At first, Gitter will keep running as it always has - needless to say, we will be doing everything we can to delight the Gitter community and keep the service in good shape.

Then we’re going to build out native Matrix connectivity - running a dedicated Matrix homeserver on gitter.im with a new bridge direct into the heart of Gitter; letting all Gitter rooms be available to Matrix directly as (say) #angular_angular:gitter.im, and bridging all the historical conversations into Matrix via MSC2716 or similar. We will of course do this entirely as open source, just as Gitter itself is open source thanks to GitLab releasing it under the MIT license in 2017. The plan is to comprehensively document our progress as the flagship worked example case study of “how do you make an existing chat system talk Matrix.”

This will of course replace the old and creaky matrix-appservice-gitter bridge we’ve been running since 2016. Gitter users will also be able to talk to other users elsewhere in the open Matrix network - e.g. DMing them, and (possibly) joining arbitrary Matrix rooms. Effectively, Gitter will have become a Matrix client.

Now we come to the interesting bit. Gitter has some really nice features which are sorely lacking in Element today:

  • Instant live room peeking (less than a second to load the webapp into a live-view of a massive room with 20K users!!)
  • Seamless onboarding thanks to using GitLab & GitHub for accounts
  • Curated hierarchical room directory
  • Magical creation of rooms on demand for every GitLab and GitHub project ever
  • GitLab/GitHub activity as a first-class citizen in a room’s side-panel
  • Excellent search-engine-friendly static content and archives
  • KaTeX support for Maths communities
  • Threads!

...and we promise to do everything in our power to preserve and honour these features at all costs and continue to give the Gitter community the experience they’ve come to know and love.

However: in the medium/long term, it’s simply not going to be efficient for the combined Element/Gitter team to split our efforts maintaining two high-profile Matrix clients. Our plan is instead to merge Gitter’s features into Element (or next generations of Element) itself and then - if and only if Element has achieved parity with Gitter based on the above list - we expect to upgrade the deployment on gitter.im to a Gitter-customised version of Element. The inevitable side-effect is that we’ll be adding new features to Element rather than Gitter going forwards.

In practice, the main outcome in the end should be Element having benefited massively from levelling up with Gitter - and Gitter benefiting massively from all the goodies which Element and Matrix brings, including:

  • E2E Encryption
  • Reactions
  • Constantly improving native iOS & Android clients (which should be a welcome alternative to Gitter’s natives ones, which are already being deprecated)
  • VoIP and conferencing
  • All the alternative clients, bots, bridges and servers in Matrix
  • The full open standard Matrix API
  • Widgets (embedding webapps into rooms!)
  • ...and of course participation in the wider decentralised Matrix network.

So, there you have it. It’s a new era for Gitter - and we look forward to reinvigorating Gitter’s communities over the coming months. We hope Gitter users will be blown away by the features arriving from Matrix… and we hope that Element users will be ecstatic with the performance and polish work that Gitter-parity will drive us towards. Imagine having guest access in Element that can launch and load a massive room in less than a second!

Finally, we would like to explicitly reassure the Gitter community again that we love and understand Gitter (it was one of the very first ever bridges we wrote for Matrix, for instance) - and we will be doing everything we can to not screw up our responsibility in looking after it. Please, please let us know if you have any concerns or if we ever fall short on this.

Any questions, come talk to us on #gitter:matrix.org - which is bridged with https://gitter.im/matrix-org/gitter. Exciting times ahead!

- Matthew, Amandine, and the whole Matrix, Element and Gitter teams.

Matthew & Amandine being dorky
Matthew and Amandine model 2014-vintage Matrix & Gitter swag in celebration :D

Bonus update - The Changelog Interview!

Sid Sijbrandij (CEO at GitLab) and Matthew had a chance to sit down with The Changelog to talk about Gitter's Big Adventure - so tune in to hear the story first hand! Warning: contains non-ironic use of the word "synergy" :D


Matrix Decomposition: an independent academic analysis of Matrix State Resolution

16.06.2020 20:15 — General Matthew Hodgson
Last update: 16.06.2020 19:09

Hi all,

Regular readers of TWIM may be familiar with the Decentralized Systems and Network Services Research Group at Karlsruhe Institute of Technology, who have been busy over the last few years analysing Matrix from an independent academic point of view. The work started in 2018 with Florian Jacob’s DSN Traveler spidering project, resulting in the Glimpse of the Matrix paper analysing Matrix’s scale and room/server distribution (at least as it was back then).

Last week, they released an entirely new paper: Matrix Decomposition: Analysis of an Access Control Approach on Transaction-based DAGs without Finality by Florian Jacob, Luca Becker, Jan Grashöfer and Hannes Hartenstein, presented at ACM SACMAT ‘20.

Now, the new paper is an absolutely fascinating deep dive analysis into State Resolution v2 - the algorithm at the heart of Matrix which defines how servers merge together their potentially conflicting copies of a given room, such that everyone ends up eventually with a consistent view… even in the face of bad actors. This means that Matrix effectively implements a decentralised access control system - ensuring that users stay banned, and only users with permission can ban, etc. You can see the slides below, and read the full paper here. The video of Florian’s talk from SACMAT should be published shortly.



To give some context from the Matrix side: designing and implementing State Resolution v2 back in 2018 was a bit of a mission. Our original v1 implementation had some bugs which meant that the result of the merge could unexpectedly favour historical state over the current state (so called ‘state resets’) - thus giving an attacker a way to maliciously revert the state of the room. In v2 we thought much more carefully about the algorithm, considering state present in one version of the room but not the other as a conflict, separating and applying access control events from regular events, and adding additional ordering of the state in the room by considering events in the context of their authorisation chain (the ‘auth DAG’). The end result is that we feel confident in v2 State Res, and we haven’t seen any problems with it in the wild since we shipped it in July 2018.

However: state resolution is not intuitive at first - for instance, when you merge two versions of a room together, you treat the state events as unordered sets… even though they are ordered in the context of the room DAG. The reason is that state res needs to work even if you don’t have a copy of the whole room DAG (otherwise you’d have to download way too much data to participate in a large room). Another example is the sequence in which orderings are then applied to the state events - and how that interacts with re-authorising those events, to stop malicious ones creeping in. In the core team, we’ve end up describing it several different ways to try to help folks understand: first Erik’s original MSC1442, then uhoreg’s literary Haskell implementation, then the terse reference version in the Spec itself, and most recently Neil Alexander’s State Resolution v2 for the Hopelessly Unmathematical.

As a result we are very excited and happy that Florian and the DSN team have now published the first ever independent in-depth analysis of the algorithm, particularly in the context of decentralised access control (i.e. enforcing bans, power levels, etc). We’re pleasantly surprised that apparently “To the best of our knowledge, Matrix is the only system that implements access control based on an eventually consistent partial order without finality and without a consensus algorithm”.

Even better, the DSN team found some remaining thinkos in Synapse’s implementation and the Matrix specification, which could have caused resolution results to diverge from other implementations, specifically:

  1. we weren’t enforcing integers in JSON to be within range [-253+1, 253-1], fixed in https://github.com/matrix-org/synapse/pull/7381 and MSC2540
  2. we forgot to include the notification field when authing power level events, fixed in https://github.com/matrix-org/synapse/issues/7501 and MSC2209 (thanks to Luca from DSN for the MSC!)
  3. we forgot to spec the limit that one should apply to the number of parents of an event in the DAG (fixed in https://github.com/matrix-org/matrix-doc/pull/2538)
  4. we missed that moderators could set server ACLs which could let them undermine room admins (fixed in https://github.com/matrix-org/synapse/pull/6834).

All of these have now been fixed in Synapse and the latest versions of the spec (room v6), and we’d like to sincerely thank Florian and Luca for rapidly and responsibly disclosing the issues to us. In other words: this research is directly improving Matrix, and it’s even more exciting that the stated future work for the DSN team is to work on a formal verification for the security of Matrix’s authorisation rules and state resolution. This stuff is tough, as anyone who’s played with TLA+ will know, and we are incredibly glad that the research community is helping out to formalise and hopefully prove that State Res v2 is as good as we think it is.

We should stress that DSN’s work is completely independent of The Matrix.org Foundation or anyone else building on the protocol; we’re just writing about it here because we think it’s incredibly cool and deserves the attention of the whole Matrix ecosystem.

Thanks again to Florian and the team - we look forward to seeing what comes next!