Skip to content

json_to_pickle drops newargs for NEWOBJ objects with both __getnewargs__ and state #12

Description

@jensens

Summary

json_to_pickle (Rust _rust decoder) does not faithfully round-trip a
protocol-2 NEWOBJ reduction when the object has both non-empty
__getnewargs__ and instance state. The decoder drops the newargs, emits
cls.__new__(cls) with no arguments, and relocates args + state into a
{"@args": (...), "@state": {...}} dict. For any class whose __new__ requires
the newargs (immutable subclasses, ExtensionClass / Persistence.Persistent
subclasses), the decoded pickle raises at load time.

Found while debugging an AAF prod incident. Note: that incident turned out to be
unrelated stale data (it used the clean @reduce form), but this is a genuine
round-trip fidelity bug worth fixing.

Version: zodb-json-codec 1.6.1 (also reproduces on the cp314 build).

Minimal reproduction

import pickle, pickletools
import zodb_json_codec as z

class Weird:                      # __new__ needs an arg; __init__ must NOT run
    def __new__(cls, token):
        self = object.__new__(cls); self.token = token; return self
    def __getnewargs__(self):
        return (self.token,)
    def __init__(self, *a):
        raise RuntimeError("__init__ must not be called during unpickle")

p = pickle.dumps(Weird.__new__(Weird, "abc"), protocol=2)
assert pickle.loads(p).token == "abc"            # original is fine

p2 = z.json_to_pickle(z.pickle_to_json(p))       # round-trip through codec
pickle.loads(p2)                                 # -> BOOM
# TypeError: Weird.__new__() missing 1 required positional argument: 'token'

For a Persistence.Persistent / ExtensionClass subclass the same defect surfaces as:

TypeError: object.__new__(X) is not safe, use Persistence.Persistent.__new__()

Evidence (pickle disassembly)

Original (protocol=2):

'abc' TUPLE1 NEWOBJ ... BUILD      # -> Weird.__new__(Weird, 'abc'); state applied

After pickle_to_json -> json_to_pickle:

)    EMPTY_TUPLE
\x81 NEWOBJ                        # -> Weird.__new__(Weird)   <-- newargs DROPPED
}    EMPTY_DICT
(    MARK
    '@args'  ('abc',)             # newargs relocated into state
    '@state' {'token': 'abc'}
u    SETITEMS
b    BUILD

A plain str subclass without state happens to survive (it is re-emitted as
a REDUCE with args), which is why the corruption is intermittent and easy to
miss.

Expected

json_to_pickle must reproduce NEWOBJ-with-newargs faithfully:
cls.__new__(cls, *newargs) followed by normal state application via
__setstate__ / __dict__.update — never cls.__new__(cls) with the newargs
moved into a synthetic @args/@state wrapper.

Notes for a fix / data recovery

  • The encoder is finepickle_to_json retains @args/@state, so the JSON
    is not lossy. A corrected decoder can regenerate valid pickles in place from
    already-stored records (e.g. in a zodb-pgjsonb object_state.state column),
    with no re-import from source.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions