Lazzerex

Explore
Backend, Systems & Web Development

Home Articles The Backdoor That Almost Broke the Internet

The Backdoor That Almost Broke the Internet

H. S. N. Bình -- views

  • Linux

In early 2024, someone nearly snuck a backdoor into the authentication layer of almost every Linux server on the internet. It came within weeks of shipping to production. And it was caught almost entirely by accident.

Cover image for The Backdoor That Almost Broke the Internet

Everything runs on Linux, quietly

Most people know Linux exists but don't really think about how deeply embedded it is. Android runs on a Linux kernel, the majority of web servers run Linux, even every one of the top 500 supercomputers in the world runs Linux. Hospitals, banks, government systems, nuclear submarines, etc. It is the invisible infrastructure of the modern internet.

Article image

Because it's open source, anyone can read the code, submit changes, find bugs. The idea has always been that this openness is a security feature, as with enough people looking at the code, vulnerabilities get caught quickly. This is called the Linus's Law, "given enough eyeballs, all bugs are shallow."

The problem is that Linux as most people use it is not one project. It's a massive dependency graph of thousands of smaller projects, each doing something specific. Compression, networking, cryptography, whatever. A lot of these got started by one developer who had a problem and built a tool to fix it. Then another project depends on it, then another, and before long you have millions of machines relying on something that's maintained by a single person in their spare time, for free. There's a famous xkcd comic about this that pretty much nails it.

And surprise! That's the actual attack surface here. Yes, the risk does not lie simply in the code base itself, but it was the people maintaining it.

XZ and the person who built it

XZ Utils is a lossless data compression tool that Lasse Collin, a Finnish developer, has been working on since 2005. If you've ever wondered what format Linux packages, kernel images, and firmware updates get shipped in, a lot of it is .xz. The algorithm it uses under the hood is LZMA, developed by Igor Pavlov in the late 90s, which combines a very large sliding window dictionary lookup (so it can reference patterns from much earlier in the file) with a Markov-chain-based probability model for encoding. The result is compression that often beats zip by around 30%. When you're shipping the same files to millions of machines, that adds up.

Lasse maintained XZ alone, unpaid, for nearly twenty years. And over time, the community pressure got heavy. There are mailing list threads where strangers are openly telling him he's "choking the repo," that he's failing the project, that things will go nowhere until there's a new maintainer. He's trying to explain that this is a free hobby project and he's dealing with long-term mental health issues. The community, however, did not really engage with that.

Jia Tan

Right around when Lasse was at his most burnt out, a developer called Jia Tan appeared. Responsive, competent, genuinely helpful. He submitted good patches, fixed real bugs, handled some of the day-to-day maintenance load that Lasse had been drowning in. Over time, Lasse gave him co-maintainer status. It took about two years.

But the weird thing is when looking back at those mailing list threads where people were pressuring Lasse, the accounts doing the pressuring have almost no online footprint. Free email addresses, nothing outside XZ discussions. Almost certainly sockpuppets, fake identities manufactured to create a crisis so that Lasse would welcome outside help. The whole thing was a years-long social engineering campaign targeting one volunteer developer to get control of one specific compression library.

Why that library? Because XZ had ended up in the dependency chain of OpenSSH.

Why OpenSSH is the target

SSH is how you log into a remote Linux machine. Every time a developer SSHes into a server, every automated deployment pipeline, every monitoring system checking on remote hosts, they're all going through SSH. OpenSSH is the implementation that ships on almost every Linux system. It handles authentication using public key cryptography, the RSA-based system where your private key never leaves your machine and the server just needs your public key to verify who you are.

OpenSSH is one of the most scrutinized pieces of software in existence. Getting a backdoor into it directly is essentially impossible at this point. But OpenSSH doesn't exist in a vacuum. It links against shared libraries, and one of those libraries, through a chain of dependencies, pulls in XZ.

So Jia's plan was to compromise XZ in a way that would eventually compromise OpenSSH without ever touching OpenSSH's code.

How the backdoor actually worked

This is where it gets technically interesting.

The first step was to hide the payload. Jia didn't put any malicious code in XZ's actual source. He hid it inside binary test files, the kind of compressed blobs that compression software ships to verify that encode and decode are working correctly. Nobody audits those, they're treated as opaque data. The payload was sitting in there looking like test fixtures.

To activate it during the build, Jia added a small change to the build scripts, buried in the kind of auto-generated boilerplate that nobody reads carefully. When XZ compiled, this script extracted and injected the payload from those test blobs into the final shared library binary. So the malicious code never appears in any human-readable source file on GitHub. It only materializes in the compiled output.

Once inside the compiled liblzma, it needed to actually do something. Jia's goal was to intercept the RSA authentication step in SSH, specifically to hijack RSA_public_decrypt, the function that verifies the client's key during login.

This function doesn't belong to OpenSSH. It lives in a shared crypto library and gets loaded at runtime. When a program starts, the dynamic linker loads those libraries and fills in a data structure called the Global Offset Table (GOT), which maps function names to their actual memory addresses. When OpenSSH wants to call RSA_public_decrypt, it looks up its GOT entry to find where in memory the function lives.

Jia's plan was to overwrite that GOT entry with a pointer to his own payload. When SSH went to verify a login, it would call his code first, which would check for a secret master key, and if it matched, let the attacker in regardless of their actual credentials.

Article image

The tricky part is the timing. The GOT gets populated by the dynamic linker at startup, and shortly after, the kernel marks it read-only. Jia needed to write his fake address into the GOT after the real address was already there but before the table got locked. The window is tiny.

He used two obscure Linux mechanisms to thread that needle. The first is IFUNC resolvers, a feature that lets a library provide multiple implementations of a function and pick the right one at startup based on hardware capabilities (think of something like optimizing code path for Intel vs AMD). IFUNC resolvers run very early in process startup, before most things are initialized, which makes them useful for setup work. The second is dynamic audit hooks, a debugging feature in the Linux dynamic linker that lets you register callbacks that fire whenever a symbol gets resolved in the GOT. Normally used for profiling. Jia used an IFUNC resolver to register one of these audit hooks early in startup, and then the hook itself contained the code to overwrite the GOT entry for RSA_public_decrypt at exactly the right moment.

There's one more wrinkle. The audit hook variable isn't normally accessible from a shared library because it's hidden from the outside. So within his IFUNC code, Jia wrote a small scanner that walks through a region of memory, decoding raw bytes back into instructions to find where the hook lives. The whole thing is bespoke low-level memory manipulation and deliberately obfuscated to resist analysis.

The last layer was authentication. The backdoor wasn't a simple open door. When SSH received a login attempt, Jia's payload ran its own cryptographic handshake using Curve448 (a solid elliptic curve that you'd actually use if you were building legitimate crypto). Only if the connecting client could prove it held a specific private key would the payload grant access and wipe the session's log entries. Anyone without that key got the normal SSH flow, completely unaware anything unusual had happened. It's essentially a miniature SSH running inside SSH, except the outer one is designed to keep defenders out while the inner one is designed to let the attacker in.

The memory leak that almost hid everything

After Jia got the compromised XZ into a pre-release version of Fedora, developers running Valgrind started seeing invalid write errors. Valgrind is a memory analysis tool that watches every allocation and free at runtime, and the backdoor code was writing outside its allocated stack memory.

Jia couldn't fix the real bug because that would mean modifying the binary test blobs, which would raise obvious questions. So he came up with a cover story: he claimed the test data was generated with a non-reproducible random seed and needed to be regenerated anyway, then submitted new test blobs that happened to fix the memory error. He also made cosmetic changes to the IFUNC code nearby to make it look like there was a legitimate technical reason the Valgrind error went away.

Rich Brown, the Fedora developer handling the package, saw the bug, forwarded it to Jia, got a fix back the next day, and moved on. Nothing about the exchange seemed unusual.

How it got caught

Andres Freund is a Microsoft engineer working on Postgres, and he was not a security researcher. In March 2024, while testing the Debian unstable release to make sure Postgres would run smoothly on it, he noticed that SSH logins were taking around 400 to 500 milliseconds longer than expected. That is a small and easy-to-dismiss observation. Andres didn't dismiss it.

He'd already seen the Valgrind errors weeks earlier. He started tracing CPU usage and found that sshd was consuming significantly more processor time than it should, even on failed login attempts. He traced the slowdown back to an update in XZ. He noticed the binary test files in the XZ repo had no actual test code referencing them. He kept digging.

Eventually he posted a detailed writeup to the oss-security mailing list identifying the backdoor. Red Hat pulled the affected XZ version from Fedora. Debian, Ubuntu, and others followed. The open source community started reverse engineering the payload.

Jia Tan's accounts went dark and have not been heard from since.

Who did it

Article image

Nobody knows. The operation has the fingerprints of a nation-state. Two and a half years of work, a coordinated network of fake accounts, and a technically sophisticated attack that nearly pulled it off. Criminal organizations don't usually have that kind of patience for a payoff that's months away.

The obvious breadcrumbs point at China: the alias sounds Chinese, most commit timestamps are in UTC+8. But the operation is otherwise so meticulous that most researchers think the obvious breadcrumbs are deliberate misdirection. Nine commits fall in UTC+2, which covers Israel and parts of western Russia. Jia's team worked through Chinese New Year but not Christmas. A lot of security researchers suspect APT29, the Russian state-backed group known as Cozy Bear. There's no hard evidence either way and there probably never will be.

What this actually means

The XZ attack didn't succeed. But the near-miss reveals something uncomfortable. A huge amount of critical internet infrastructure rests on the volunteer labor of individual developers who are often overwhelmed, underfunded, and burning out. Jia didn't find a bug in XZ's code, but he was able to find a “bug” in the system around Lasse Collin.

The open source vs closed source security debate gets complicated by this. Yes, a closed-source equivalent would have no community member stumbling onto a timing anomaly and spending days chasing it. But closed-source software has its own failure modes: no community audit at all, breaches that can be silently patched, and the implicit assumption that everyone with commit access is trustworthy. At least with open source the code is there to be read.

The more uncomfortable question is what we actually owe to the maintainers keeping this stuff running. Lasse built something that became critical internet infrastructure and maintained it for free for twenty years, but the response from the community was mostly impatience. Jia exploited that directly. The attack vector wasn't a memory corruption bug or a cryptographic weakness.

After the backdoor was discovered, Lasse still helped Red Hat debug the issues anyway.


This post is based on the Veritasium video The Internet Was Weeks Away From Disaster and No One Knew, which covers the full story with interviews from the people involved and is well worth your time.

Read more at: Lazzerex’s Blog

Source:  Published Notion page