Skip to content

gh-151722: Do not track the dict in the GC in _PyDict_FromKeys()#152067

Open
vstinner wants to merge 8 commits into
python:mainfrom
vstinner:dict_fromkeys
Open

gh-151722: Do not track the dict in the GC in _PyDict_FromKeys()#152067
vstinner wants to merge 8 commits into
python:mainfrom
vstinner:dict_fromkeys

Conversation

@vstinner

@vstinner vstinner commented Jun 24, 2026

Copy link
Copy Markdown
Member

dict_merge() no longer requires the dictionary to be tracked by the GC.

dict_merge() no longer requires the dictionary to be tracked by the
GC.

Co-authored-by: Donghee Na <donghee.na@python.org>
@vstinner

vstinner commented Jun 24, 2026

Copy link
Copy Markdown
Member Author

This fix is similar to PR gh-152021 but it doesn't call _PyObject_GC_UNTRACK(d). It only cares about the dictionary being tracked by the GC if it's a frozendict. If cls is a frozendict, call frozendict_new_untracked() to get directly a frozendict not tracked by the GC. Otherwise, if cls constructor creates a frozendict tracked by the GC, create a copy which is not tracked by the GC.

We don't have to track again the dictionary by the GC on error. For a dict, it's already tracked. For a frozendict, it's ok that it's not tracked by the GC on error.

@vstinner

Copy link
Copy Markdown
Member Author

cc @corona10 @methane

@corona10

Copy link
Copy Markdown
Member

@vstinner
Could you check with following tests passed with TSAN build?

TSAN_OPTIONS=halt_on_error=0 ./python tsan_frozendict.py

import gc, threading

N = 4000
stop = threading.Event()
box = []

def reader():
    while not stop.is_set():
        if box:
            o = box[0]
            try:
                len(o); repr(o); hash(o)
            except Exception:
                pass

def leak():
    if not box:
        for o in gc.get_objects():
            if type(o) is frozendict and "k0" in o and len(o) < N:
                box.append(o)
                break

class LeakyMap:                   
    def keys(self): return [f"k{i}" for i in range(N)]
    def __getitem__(self, key):
        if key == "k5":
            leak()
        return key

def gen_keys():
    for i in range(N):
        if i == 5:
            leak()
        yield f"k{i}"

def builder():
    try:
        for _ in range(30):
            box.clear(); frozendict(LeakyMap())
            box.clear(); frozendict.fromkeys(gen_keys())
    finally:
        stop.set()

ts = [threading.Thread(target=reader) for _ in range(2)]
b = threading.Thread(target=builder)
for t in ts: t.start()
b.start(); b.join()
for t in ts: t.join(timeout=2)
print("observed half-built:", bool(box), "(False = fixed)")
print("done")

@vstinner

Copy link
Copy Markdown
Member Author

@corona10: Could you check with following tests passed with TSAN build?

Sure. The test says that issue is fixed:

$ TSAN_OPTIONS=halt_on_error=0 ./python tsan_frozendict.py
observed half-built: False (False = fixed)
done

I expect the issue to be fixed since the frozendict is no longer tracked by the GC, at least the one which is modified in-place by .fromkeys().

I built Python with ./configure --with-thread-sanitizer.

@vstinner

Copy link
Copy Markdown
Member Author

I pushed a .fromkeys() fix for dict subclass. The following code now displays <class 'frozendict'> instead of <class '__main__.DictSubclass'>. IMO it's a bad idea to create a DictSubclass instance in this case. It's less surprising to get a frozendict if DictSubclass.__new__() returns a frozendict.

created = frozendict(x=1)

class DictSubclass(dict):
    def __new__(self):
        return created

print(type(DictSubclass.fromkeys("abc")))

Comment thread Lib/test/test_dict.py Outdated
fd = FrozenDictSubclass.fromkeys("abc")
self.assertEqual(fd, frozendict(x=1, a=None, b=None, c=None))
self.assertEqual(type(fd), FrozenDictSubclass)
self.assertEqual(type(fd), frozendict)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not behavior change?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change on purpose. IMO the old behavior is wrong. If FrozenDictSubclass constructor returns a frozendict, .fromkeys() must return a frozendict instead of creating a FrozenDictSubclass.

@methane

methane commented Jun 25, 2026

Copy link
Copy Markdown
Member

We shouldn't change the behavior of dict and frozendict without a reason, but in this case, the relationship between .fromkeys() and __new__() should be different from dict because of the immutable property of frozendict.

How about passing a frozendict to the __new__() method of a subclass of frozendict like this?

class MyFrozen(frozendict):
    def __new__(cls, *args, **kwargs):
        return super().__new__(cls, *args, **kwargs)

MyFrozen.fromkeys(["a", "b", "c"])
# => MyFrozen(frozendict({"a": None, "b": None, "c": None}))

frozendict is a new type, so we don't need to worry about breaking existing code.

@vstinner

Copy link
Copy Markdown
Member Author

How about passing a frozendict to the new() method of a subclass of frozendict like this?

Ok. I modified frozendict.fromkeys() to call the type constructor again with a frozendict if the type is a frozendict subclass, or if the constructor returned a frozendict.

@methane @corona10: Does it look better to you?

Sorry, fromkeys() is quite complex because frozendict is immutable :-) At least, the behavior is now well tested by test_dict.

Example:

class FrozenDictSubclass(frozendict):
    def __new__(cls, *args, **kwargs):
        print(f"Call constructor with {args} and {kwargs}")
        return super().__new__(cls, *args, **kwargs)

fd = FrozenDictSubclass.fromkeys("abc")
print(fd)
print(type(fd))

For a frozendict subclass, the constructor is now called twice:

Call constructor with () and {}
Call constructor with (frozendict({'a': None, 'b': None, 'c': None}),) and {}
FrozenDictSubclass({'a': None, 'b': None, 'c': None})
<class '__main__.FrozenDictSubclass'>

The first call creates an empty frozendict: fromkeys() copies this empty frozendict and fills the copy.

The second call is to create a FrozenDictSubclass from the copy.

@read-the-docs-community

Copy link
Copy Markdown

Documentation build overview

📚 cpython-previews | 🛠️ Build #33309159 | 📁 Comparing 897d768 against main (56ae0b8)

  🔍 Preview build  

2 files changed
± library/stdtypes.html
± whatsnew/changelog.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting core review needs backport to 3.15 pre-release feature fixes, bugs and security fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants