Jacek Galowicz, Thomas Prescher, Julian Stecklina · 12 min read

ZombieLoad: Cross Privilege-Boundary Data Leakage

ZombieLoad is a novel category of side-channel attacks which we refer to as **data-sampling attack**. It demonstrates that faulting load instructions can transiently expose private values of one Hyperthread sibling to the other. This new exploit is the result of a collaboration between Michael Schwarz, Daniel Gruss and Moritz Lipp from Graz University of Technology, Thomas Prescher and Julian Stecklina from Cyberus Technology, Jo Van Bulck from KU Leuven, and Daniel Moghimi from Worcester Polytechnic Institute. In this article, we summarize the implications and shed light on the different attack scenarios across CPU privilege rings, OS processes, virtual machines, and SGX enclaves, and give advice over possible ways to mitigate such attacks.

ZombieLoad: Cross Privilege-Boundary Data Leakage

ZombieLoad is a novel category of side-channel attacks which we refer to as data-sampling attack. It demonstrates that faulting load instructions can transiently expose private values of one Hyperthread sibling to the other. This new exploit is the result of a collaboration between Michael Schwarz, Daniel Gruss and Moritz Lipp from Graz University of Technology, Thomas Prescher and Julian Stecklina from Cyberus Technology, Jo Van Bulck from KU Leuven, and Daniel Moghimi from Worcester Polytechnic Institute.

In this article, we summarize the implications and shed light on the different attack scenarios across CPU privilege rings, OS processes, virtual machines, and SGX enclaves, and give advice over possible ways to mitigate such attacks.

Implications

A short summary of what this security vulnerability means:

  • By exploiting the CPU’s so-called bypass logic on return values of loads, it is possible to leak data across processes, privilege boundaries, Hyperthreads, as well as values that are loaded inside Intel SGX enclaves, and between VMs.
  • Code utilizing this exploit works on Windows, Linux, etc., as this is not a software- but a hardware issue.
  • It is possible to retrieve content that is currently being used by a Hyperthread sibling.
  • Even without Hyperthreading, it is possible to leak data out of other protection domains. During experimentation it turned out, that ZombieLoad leaks endure serializing instructions. Such leaks do however work with lower probability and are harder to obtain.
  • It is an implementation detail what kind of data is processed after a faulty read.
  • Using Spectre v1 gadgets, potentially any value in memory can be leaked.
  • Affected software:
  • So far all versions of all operating systems (Microsoft Windows, Linux, MacOS, BSDs, …)
  • All hypervisors (VMWare, Microsoft HyperV, KVM, Xen, Virtualbox, …)
  • All container solutions (Docker, LXC, OpenVZ, …)
  • Code that uses secure SGX enclaves in order to protect critical data.
  • Affected CPUs:
  • Intel Core and Xeon CPUs
  • CPUs with Meltdown/L1TF mitigations are affected by fewer variants of this attack.
  • We were unable to reproduce this behavior on non-Intel CPUs and consider it likely that this is an implementation issue affecting only Intel CPUs.
  • Sole operating system/hypervisor software patches do not suffice for complete mitigation:
  • Similar to the L1TF exploit, effective mitigations require switching off SMT (Simultaneous MultiThreading, aka Hyperthreads) or making sure that trusted and untrusted code do not share physical cores.

If you have any questions about exploits like Meltdown/Spectre/ZombieLoad and their derivatives, their impact, or the involvement of Cyberus Technology GmbH, please contact:

| Werner Haas | Tel +49 152 3429 2889 | Mail werner.haas@cyberus-technology.de

Example Attacks

We present two example attacks that are both mounted on a browser as the victim process. The browser leaking its data runs in one Hyperthread and the adversary application disclosing the values runs as sibling thread on the same physical core.

URL Recovery

In this scenario, we reconstruct URLs that are being visited by the victim browser process.

An unprivileged attacker with the ability to execute code can reconstruct
URLs being visited in Firefox.

In its basic form, the attacker has no control over the leaked data i.e., it is necessary to filter for interesting data. Hence, our adversary app searches for typical URL prefixes.

Note that, e.g., session cookies or credit card numbers follow predictable patterns in memory, hence represent a realistic target for such attacks.

Keyword Detection

In this scenario, we constantly sample data using ZombieLoad and match leaked values against a list of predefined keywords.

The adversary application prints keywords whenever the victim browser process handles data that matches the list of adversary keywords.

Note that the video shows a browser that runs inside a VM: ZombieLoad leaks across sibling Hyperthreads regardless of virtual machine boundaries.

Technical Background

In a nutshell: ZombieLoad is a transient-execution attack that observes the values of memory loads on the current physical core from a sibling thread. It exploits that the memory subsystem is shared among the logical cores of a physical core.

Simultaneous Multithreading / Hyperthreading

HyperThreading is Intel’s implementation of Simultaneous MultiThreading, both are also usually abbreviated as HT and SMT. This section explains the value of HT/SMT for the performance and power efficiency of modern CPUs, and also why it imposes security risks that are exposed by the discovery of ZombieLoad (and similarly by L1TF/Foreshadow).

SMT boosts the CPU’s instruction throughput by increasing the utilization of the independent execution units that exist within the pipeline. Already without SMT, the CPU architecture is capable of decomposing the instruction sequence into operations like loads, stores, and calculations. For different kinds of operations the CPU has different execution units (EUs). EUs that are in high demand are replicated. Operations that do not depend on each other can be processed in parallel by the corresponding EUs. The higher their utilization, the higher the overall performance.

In program sequences with too many dependencies between operations, a lot of EUs might end up idling. SMT further increases CPU utilization by running two threads concurrently on one physical core. Processing two instruction streams increases the probability of finding independent operations to assign to the available EUs that might otherwise sit idle.

Execution unit utilization without vs. with simultaneous multithreading

The diagram shows an example load of two threads that both individually do not fully utilize the CPU’s EUs. Arrows show dependencies between the operations and blocks with alphabetic suffix model operations that take more than one CPU cycle. Complex arithmetic, for example, requires more time, and memory values are always not immediately available. With SMT enabled, the CPU’s EUs can be fully utilized.

If SMT is enabled, the operating system sees two independent CPUs where only one physical core exists. Such logical cores each have their private architectural state, but they share most of their execution resources - which is one of the reasons why SMT greatly improves energy efficiency.

State Sharing between logical CPU cores in multithreading mode

The second diagram visualizes how some parts of a physical core’s resources are shared between two logical cores if it is run in multithreading (MT) mode. If one of the cores is currently not operating (e.g., after executing a halt instruction) or Hyperthreading is deactivated, all resources belong to the only running logical core (see modes ST0 and ST1).

When the operating system schedules two threads from completely different applications on the logical cores of the same physical core, data of both applications is processed at the same time in the shared execution resources. ZombieLoad exploits this circumstance.

Transient Execution of Faulting Reads

CPUs maximize execution unit utilization by speculating when it is unclear what has to be done next: Conditional jumps that depend on values that are yet to be calculated are an example, because it cannot be known for sure in advance if the jump shall be taken or not. If the speculation is wrong, the pipeline rolls back all wrongly performed operations and ensures that none of their results become visible outside the CPU so that the system stays in a correct state. Otherwise, if the speculation is right (which just needs to be “most of the time”), the CPU’s degree of utilization, and hence its performance, was successfully increased.

Instructions that are executed speculatively or out-of-order but whose results are never committed to architectural state are called transient instructions. Any fault that occurs during transient execution of an instruction is handled when the instruction retires, the last pipeline stage.

If an instruction stream depends on a value from a read operation that turns out to trigger a fault of some kind, the vulnerable CPUs speculatively use some value placeholder during transient execution. Such faults may be of architectural (e.g., exceptions) or of microarchitectural nature (e.g., updates of accessed/dirty bits in the page table). Normally, this does not present a problem, because the effects of this calculation will never leave the retire phase. By using side channels like the CPU cache subsystem (our article about Meltdown explains this in detail), this placeholder value can be extracted by an attacker.

The Attack

ZombieLoad enables four different attack scenarios. All of them have in common that they trigger a faulty read, and extract data used by transiently executed operations via a side-channel.

As already stated in the technical background section, operations that depend on the value of a faulty read operation may be executed transiently with wrong data. It is an implementation detail what kind of data is processed during that time window.

It turns out that on Intel processors this wrong data may be data from outside the current process but still loaded by the physical CPU core for whatever reason, which can be:

  • data from kernel space or other applications
  • data from outside the VM: other VM or hypervisor
  • data from inside a currently executing SGX enclave

An important detail is that the attacker has no direct control over what data is read here. Leaked data could be uninteresting because it comes from an irrelevant other process, VM, etc. If it comes from the right process, it might still be the wrong data, because the address from which the data is returned is beyond the attacker’s control. (In Meltdown or L1TF, on the other hand, the attacker chooses the address.)

Because of this restriction, the class of attacks that ZombieLoad enables is referred to as data-sampling attacks. The attacker simply samples leaking data that is currently being used by the victim process.

Attack Scenarios

The different attack scenarios are described in the following, accompanied with attacker model and example scenarios.

These attack scenarios can be enhanced in a way that gives the attacker control over the addresses from which data is leaked. In order to achieve that, they can be combined with the use of Spectre-v1 gadgets which lead the CPU into prefetching interesting data from specific addresses. We are going to mention where this can be useful, but will not go into detail as this goes beyond the scope of this article.

Cross-Process User Space Leakage

In this scenario, the attacker executes unprivileged code on one logical CPU core, while a victim application runs on the other logical (but same physical) core.

The victim application could be a browser or password manager application which contains secrets. While the victim application is dealing with interesting data, the attacker triggers faulty read operations in his own thread and samples leaked data from the victim process.

The attacker has no control over the address from which data is leaked, therefore it is necessary to know when the victim application handles the interesting data. For example, if the attacker is looking for AES keys, he can use the fact that shared libraries like OpenSSL are usually used for encryption and decryption. By flushing the sections of the shared AES encryption/decryption routines out of the cache, the attacker thread can sample the access times for such addresses - they would suddenly go down in case the victim process started executing such code again. This is the moment in which the probability of leaking (parts) of the AES keys rises and the data-sampling attack can be started.

Intel SGX Leakage

The victim code that utilizes SGX and the attacker code are assumed to be run on the same physical core, but different logical core.

SGX’s typical threat model assumes that enclaves are still secure, even if the attacker has full control over the surrounding operating system.

Under such conditions, ZombieLoad allows for leaking data of running secure enclave code with the same strategy as in the cross-process user space leakage attack scenario.

Virtual Machine Leakage

Similar to the cross-process user space leakage scenario, attacker code and victim code run on the same physical but different logical core. In this scenario, both attacker and victim may run on individual virtual machines.

The attacker might for example upload and run a prepared guest image on a cloud hosting service where the VMs of other customers are co-located in order to leak their data.

Kernel Leakage without Hyperthreading

Even with Hyperthreading disabled, ZombieLoad allows to leak data from other protection domains on the same logical core. If a faulty read leaks data during transient execution, this data may also originate from kernel space. Such an attack would be mounted after transitions between kernel space and user space, e.g., return paths from system calls.

Attacks of this class are much harder to mount because the return paths from such other protection domains to the user space/VM are less likely to leak interesting values. One reason for this are serializing instructions which do not prevent leakage in general, but reduce the amount of leaking memory.

In order to trigger leaks from interesting memory addresses, the attacker could combine the use of Spectre-style gadgets prior to mounting a ZombieLoad attack. Such gadgets may be hard to find on the return paths to user space/VM. They could, however, be deliberately installed in proprietary software in order to provide backdoors, and would be very hard to detect.

Cross-VM Covert Channel

This is not an attack scenario per se, or at least a very different one compared to the previous ones.

Using ZombieLoad as a covert channel, two VMs could communicate with each other even in scenarios where they are configured in a way that forbids direct interaction between them. For example, isolation policies could be in place such that one VM offers unrestricted Internet access (watching YouTube videos) while the other has access to the corporate network, only (reading confidential documents).

Mitigation Techniques

The safest workaround to prevent this extremely powerful attack is running trusted and untrusted applications on different physical machines.

If this is not feasible in given contexts, disabling Hyperthreading completely represents the safest mitigation. This does not, however, close the door on attacks on system call return paths that leak data from kernel space to user space.

In case disabling HT is not feasible for performance or other reasons, trusted and untrusted processes should never be scheduled on the same physical core. This, again, does not mitigate all attacker scenarios, because adversary processes could still leak data from the super ordinated kernel or hypervisor.

For more detailed information about mitigation vectors, please consult the ZombieLoad research paper.

Share: