Skip to content

gh-151763: Fix debug assertion in unicode subtype cleanup under OOM#152164

Open
zainnadeem786 wants to merge 1 commit into
python:mainfrom
zainnadeem786:fix/oom-0030-unicode-subtype-new-clean
Open

gh-151763: Fix debug assertion in unicode subtype cleanup under OOM#152164
zainnadeem786 wants to merge 1 commit into
python:mainfrom
zainnadeem786:fix/oom-0030-unicode-subtype-new-clean

Conversation

@zainnadeem786

Copy link
Copy Markdown
Contributor

Summary

This PR addresses OOM-0030 from gh-151763.

It fixes a debug-build assertion that can occur while cleaning up a partially constructed Unicode subtype after an allocation failure.

Issue

unicode_subtype_new() allocates a non-compact Unicode subtype object and initializes its internal state before allocating the backing character buffer.

If allocating that buffer fails, the constructor cleans up using:

Py_DECREF(self);

At that point the object is valid enough to own a reference, but its internal data pointer is still NULL.

During debug builds, deallocation calls unicode_is_singleton(). For length-1 strings that helper reads the first character using:

PyUnicode_READ_CHAR(unicode, 0);

For a partially initialized non-compact Unicode object this reaches the internal assertion:

data != NULL

causing the interpreter to abort instead of propagating the pending MemoryError.

Fix

Before reading the first character, unicode_is_singleton() now detects the temporary construction state:

if (!PyUnicode_IS_COMPACT(unicode) &&
    _PyUnicode_DATA_ANY(unicode) == NULL) {
    return 0;
}

A non-compact Unicode object with a NULL data pointer cannot represent a singleton, so returning 0 is correct while avoiding the invalid debug-only access.

The constructor, ownership model, and cleanup logic remain unchanged.

Validation

Built locally:

PCbuild\build.bat -p x64 -c Debug

Reproduced the original OOM path using _testcapi.set_nomemory().

Before

The reproducer aborted with:

Assertion failed: data != NULL
Include/cpython/unicodeobject.h

After

The same allocation-failure sweep completes without triggering the assertion and correctly propagates MemoryError.

Tests

Executed focused Unicode test suites:

PCbuild\amd64\python_d.exe -m test test_str test_capi.test_unicode
PCbuild\amd64\python_d.exe -m test test_codecs

All passed.

I also verified:

git diff --check

No whitespace issues were reported.

Notes

No regression test is included.

The reproducer relies on _testcapi.set_nomemory() and allocation-failure indices that are build-sensitive, making it difficult to provide a stable CI regression test for this specific OOM path.

Addresses OOM-0030 from gh-151763.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant