Summary
json_to_pickle (Rust _rust decoder) does not faithfully round-trip a
protocol-2 NEWOBJ reduction when the object has both non-empty
__getnewargs__ and instance state. The decoder drops the newargs, emits
cls.__new__(cls) with no arguments, and relocates args + state into a
{"@args": (...), "@state": {...}} dict. For any class whose __new__ requires
the newargs (immutable subclasses, ExtensionClass / Persistence.Persistent
subclasses), the decoded pickle raises at load time.
Found while debugging an AAF prod incident. Note: that incident turned out to be
unrelated stale data (it used the clean @reduce form), but this is a genuine
round-trip fidelity bug worth fixing.
Version: zodb-json-codec 1.6.1 (also reproduces on the cp314 build).
Minimal reproduction
import pickle, pickletools
import zodb_json_codec as z
class Weird: # __new__ needs an arg; __init__ must NOT run
def __new__(cls, token):
self = object.__new__(cls); self.token = token; return self
def __getnewargs__(self):
return (self.token,)
def __init__(self, *a):
raise RuntimeError("__init__ must not be called during unpickle")
p = pickle.dumps(Weird.__new__(Weird, "abc"), protocol=2)
assert pickle.loads(p).token == "abc" # original is fine
p2 = z.json_to_pickle(z.pickle_to_json(p)) # round-trip through codec
pickle.loads(p2) # -> BOOM
# TypeError: Weird.__new__() missing 1 required positional argument: 'token'
For a Persistence.Persistent / ExtensionClass subclass the same defect surfaces as:
TypeError: object.__new__(X) is not safe, use Persistence.Persistent.__new__()
Evidence (pickle disassembly)
Original (protocol=2):
'abc' TUPLE1 NEWOBJ ... BUILD # -> Weird.__new__(Weird, 'abc'); state applied
After pickle_to_json -> json_to_pickle:
) EMPTY_TUPLE
\x81 NEWOBJ # -> Weird.__new__(Weird) <-- newargs DROPPED
} EMPTY_DICT
( MARK
'@args' ('abc',) # newargs relocated into state
'@state' {'token': 'abc'}
u SETITEMS
b BUILD
A plain str subclass without state happens to survive (it is re-emitted as
a REDUCE with args), which is why the corruption is intermittent and easy to
miss.
Expected
json_to_pickle must reproduce NEWOBJ-with-newargs faithfully:
cls.__new__(cls, *newargs) followed by normal state application via
__setstate__ / __dict__.update — never cls.__new__(cls) with the newargs
moved into a synthetic @args/@state wrapper.
Notes for a fix / data recovery
- The encoder is fine —
pickle_to_json retains @args/@state, so the JSON
is not lossy. A corrected decoder can regenerate valid pickles in place from
already-stored records (e.g. in a zodb-pgjsonb object_state.state column),
with no re-import from source.
Summary
json_to_pickle(Rust_rustdecoder) does not faithfully round-trip aprotocol-2
NEWOBJreduction when the object has both non-empty__getnewargs__and instance state. The decoder drops the newargs, emitscls.__new__(cls)with no arguments, and relocates args + state into a{"@args": (...), "@state": {...}}dict. For any class whose__new__requiresthe newargs (immutable subclasses, ExtensionClass /
Persistence.Persistentsubclasses), the decoded pickle raises at load time.
Found while debugging an AAF prod incident. Note: that incident turned out to be
unrelated stale data (it used the clean
@reduceform), but this is a genuineround-trip fidelity bug worth fixing.
Version: zodb-json-codec 1.6.1 (also reproduces on the cp314 build).
Minimal reproduction
For a
Persistence.Persistent/ ExtensionClass subclass the same defect surfaces as:Evidence (pickle disassembly)
Original (
protocol=2):After
pickle_to_json->json_to_pickle:A plain
strsubclass without state happens to survive (it is re-emitted asa
REDUCEwith args), which is why the corruption is intermittent and easy tomiss.
Expected
json_to_picklemust reproduce NEWOBJ-with-newargs faithfully:cls.__new__(cls, *newargs)followed by normal state application via__setstate__/__dict__.update— nevercls.__new__(cls)with the newargsmoved into a synthetic
@args/@statewrapper.Notes for a fix / data recovery
pickle_to_jsonretains@args/@state, so the JSONis not lossy. A corrected decoder can regenerate valid pickles in place from
already-stored records (e.g. in a
zodb-pgjsonbobject_state.statecolumn),with no re-import from source.