Skip to content

Use-after-free under MemoryError: opcode error path leaves stale stackref that _PyFrame_ClearLocals over-decrefs #152147

Description

@devdanzin

Crash report

What happened?

(OOM-0005 in #151763)

When an allocation fails (MemoryError) partway through a bytecode instruction, the eval loop unwinds the frame through exit_unwind, clearing the operand stack via _PyFrame_ClearLocals() (Python/frame.c:101), which PyStackRef_XCLOSEs every live stack slot. If an opcode's allocation-failure error path has left a stale / over-counted _PyStackRef on the value stack — a value it already consumed/stole, or a borrowed reference it doesn't own — that close drops the object's refcount below what it should be. When the object is also referenced elsewhere, it is freed while still live → use-after-free (on a plain debug build the same over-decref is caught earlier as a _Py_NegativeRefcount abort).

This is reachable through ordinary stdlib code, with no ctypes/_testcapi-constructed objects — only _testcapi.set_nomemory to drive the allocation failure.

Reproducer

Deterministic (≥5/5) on debug + ASan builds (both free-threaded and GIL); requires a debug build exposing _testcapi.set_nomemory:

import pkgutil
import faulthandler
faulthandler.enable()
from _testcapi import set_nomemory

d = {"s": str}
d["arg"] = d["s"](0)            # a heap str ("0"), also kept alive by this dict

def sweep(thunk):
    for start in range(60):
        set_nomemory(start)     # fail every allocation from #start onward
        try:
            thunk()
        except BaseException:
            pass

def call_get_importer():
    pkgutil.get_importer(d["arg"])   # -> os.fsdecode -> os.fspath; the arg is freed mid-unwind

sweep(call_get_importer)

The nested-frame structure (the sweep(thunk) wrapper) matters — a flat module-level loop does not reproduce.

What happens

On a debug GIL + ASan build, ASan reports a clean heap-use-after-free; the freed str argument is read after it was freed during the unwind:

ERROR: AddressSanitizer: heap-use-after-free  Include/refcount.h:286 in Py_INCREF

FREED by (the over-decref — the bug):
  #4 PyStackRef_XCLOSE        Include/internal/pycore_stackref.h:726
  #5 _PyFrame_ClearLocals     Python/frame.c:101            <- closes a stale operand-stack slot
  #6 _PyFrame_ClearExceptCode Python/frame.c:126
  #7 clear_thread_frame       Python/ceval.c:1954
  #8 _PyEval_EvalFrameDefault Python/generated_cases.c.h    (exit_unwind)

READ of freed memory (later use of the same str):
  #0 Py_INCREF                Include/refcount.h:286
  #3 _Py_dict_lookup_threadsafe Objects/dictobject.c:1729   (the dict still holds the string)

PREVIOUSLY ALLOCATED (victim = str(0), "0"):
  #4 PyUnicode_New            Objects/unicodeobject.c:1326
  #7 PyObject_Str             Objects/object.c:826

On a debug free-threaded + ASan build the same freed local is instead read by PyOS_FSPath (Modules/posixmodule.c:17168) → PyType_HasFeature on Py_TYPE(path) (ob_type == 0xdd) → SIGSEGV. Which downstream use faults depends on build/timing; the defect is the single over-decref.

Analysis

_PyFrame_ClearLocals is correct only if frame->stackpointer accurately reflects the slots the frame still owns:

while (sp > locals) {
    sp--;
    PyStackRef_XCLOSE(*sp);   /* frame.c:101 */
}

The defect is upstream: an opcode that can fail under allocation pressure took an error exit after consuming/stealing a stack value (or after pushing a borrowed reference) without removing that slot from the stack pointer; the dead/over-counted _PyStackRef is then closed during the unwind. Reverse-execution (rr) of the reproducer shows the freed str is referenced by the holding dict, several call frames, the raised OSError's fields, and an args tuple, and the OOM-unwind dealloc cascade decrefs it one time too many; I was not able to isolate the single offending opcode to one source line.

This appears to be a distinct instance of the general "OOM error path leaves a stale stackref on the value stack" class. It is not the specialized _CALL_LIST_APPEND list.append double-free (gh-151818) — the reproducer here involves no list.append (rr-confirmed) — though it is the same family as the LIST_APPEND stack-pointer-sync issue (gh-151119 / PR gh-151538), which fixes one specific opcode.

Suggested fix

Audit bytecode-handler error/cleanup paths (and the generated pop_N_error: / error: stubs in Python/bytecodes.c / generated_cases.c.h) so that on any allocation-failure exit, frame->stackpointer exactly matches the set of still-owned stackrefs. The fix belongs in the opcode that leaks the stale reference, not in _PyFrame_ClearLocals (which must trust the stack pointer). As a debugging aid, the stack-effect invariants could be asserted on the error path before _PyEval_FrameClearAndPop.

Versions

Reproduces on debug builds (free-threaded and GIL, ± ASan); on release builds the negative-refcount assert is compiled out and the use-after-free is latent. Likely long-standing in the stackref eval loop.

Notes

Found via OOM-injection fuzzing (_testcapi.set_nomemory). Part of #151763. Full write-up, reproducer, and backtraces: https://gist.github.com/devdanzin/22b71f61343c81df5bea9b7fca798e87

(Investigation and draft assisted by Claude Code.)

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.16.0a0 (heads/main:1b9fe5c7226, Jun 20 2026, 23:31:55) [Clang 21.1.8 (6ubuntu1)]

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)type-crashA hard crash of the interpreter, possibly with a core dump
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions