Securing The Past Efficiently: Measuring the Cost of Virtual Machine Introspection

Virtual Machine Introspection (VMI) is a promising technology to retrofit security in legacy systems. Virtualized legacy systems run much faster on today’s hardware compared to the hardware they were designed for. This creates breathing room to use VMI to secure them. Nevertheless, performance is always a challenge and in this blog post we will dive into the topic of VMI performance.

If you need a refresher on VMI in legacy systems, checkout our blog post “Securing The Past”. It introduces the problem of keeping legacy systems alive that control railway infrastructure or medical devices for years to come, yet have already fallen out of support of conventional security software.

Passive vs. Active Virtual Machine Introspection

libvmi is the most well-known library to implement introspection tools for Xen and KVM hypervisors. It has its roots in Passive VMI. Passive VMI allows access to the memory of a virtual machine. The introspection tool can periodically access VM memory to detect threats. A low polling frequency gives low CPU overhead, but has the risk of missing threat indicators. A high polling frequency gives better accuracy, but wastes precious CPU cycles.

Active (also called reactive) VMI solves this dilemma by using the hypervisor to notify the introspection tool about interesting events. These events can be the execution of privileged instructions, interrupts, or access to monitored memory pages. It enables an introspection tool to reliably observe important actions, while at the same time being efficient.

Breakpoints are an essential part of active VMI. They are commonly known in the context of debugging applications. Usually, they are realized by inserting special instructions into the VM. When a virtual CPU (vCPU) executes such an instruction, the hypervisor pauses the vCPU and informs the introspection tool. The tool then interacts with memory and register values. Before the hypervisor can resume the vCPU, the instruction that was replaced by the breakpoint must be executed correctly.

Breakpoint Techniques in Virtual Machine Introspection

There are many ways to realize breakpoints. As a trigger, int3 instructions or other privileged instructions can be injected, but it is also possible to utilize permissions in the Extended Page Tables (EPT). To process the breakpoint-injected instruction, CPU single-stepping or instruction emulation can be used.

libvmi does not implement breakpoint setting and handling itself but provides the building blocks to do so. Introspection tools built on top of libvmi, such as DRAKVUF and SmartVMI, have to implement breakpoints themselves.

Our journey of measuring breakpoint performance started when we tried to resolve a shortcoming in SmartVMI’s breakpoint implementation. Its breakpoints are not safe to use with multiple vCPUs. We are working on contributing more advanced breakpoint implementations but noticed we are missing suitable means of measuring their performance cost. After obtaining an overview of all viable breakpoint approaches, we identified the relevant performance aspects and designed workloads to measure them.

Aspects of Breakpoint Performance

So far, we only mentioned a single aspect, which at the same time is arguably the most important: How long does it take to hit and process a breakpoint? To quantify this, we run a process in a VM that takes a timestamp, executes an instruction where a breakpoint has been placed, takes another timestamp, and calculates the time that has passed. Without a breakpoint, this happens in a matter of nanoseconds. Running KVM on an i5 7300U, we determined a median execution time of 81 μs for SmartVMI breakpoints.

Another aspect is stealthiness. Placing breakpoints into a VM’s OS kernel can be an effective way to retrieve security-relevant events. However, Windows utilizes its Kernel Patch Protection (KPP) to scan kernel memory for signs of tampering. If a breakpoint is detected, the system will crash. Luckily, hypervisors can use their higher privilege level and EPT permissions to hide breakpoints. Because the permissions are page-granular, a single breakpoint causes all reads to its memory page to slow down. We measured the median overhead for a read access to be 21 μs.

Finally, SmartVMI tries to reduce the performance overhead of process-specific breakpoints by deactivating them when their associated process is not running. It does that by getting notified of every page table switch, i.e., every write to CR3 register. While avoiding false-positive hits of breakpoints is certainly beneficial for overhead reduction, intercepting every CR3 write also takes time. Whether that is worth it depends very much on the scenario (e.g., how many processes share the memory where a breakpoint is located, how often each process hits the breakpoint) and is beyond our scope. We measured the performance reduction of notifying SmartVMI of every CR3 write during a CPU-intensive benchmark (no breakpoints involved). Using CPU-Z’s benchmark, we noticed a 22.4% drop in benchmark score when CR3 writes were monitored.

Our Research on Breakpoint Performance: Open-Access Publication

Together with our research partners at IFIS and Prof. Dr. Lukas Iffländer, we published an Open-Access journal paper in the MDPI Electronics journal. While we summarized the context and key findings in this post, there are a lot more details available in the publication, and we recommend interested readers to continue exploring it here. This work was supported by the German Federal Ministry of Education and Research (BMBF) as part of the HypErSIS project (grant ID 16KIS1745K and 16KIS1746).