Otel: emit plugin operation hooks during replay#496
Open
zhongkechen wants to merge 2 commits into
Open
Conversation
b44d55a to
30eb842
Compare
yaythomas
reviewed
Jun 30, 2026
| def on_operation_replay(self, operation: Operation) -> None: | ||
| """Execute plugins for a checkpointed operation observed during replay.""" | ||
| if operation.status is OperationStatus.READY: | ||
| return |
Contributor
There was a problem hiding this comment.
this check could possibly move to _emit_operation_replay_hooks, then it's all in one place?
def _emit_operation_replay_hooks(self, operation: Operation) -> None:
"""Emit operation hooks once for each checkpointed operation during replay."""
if operation.operation_type is OperationType.EXECUTION:
return
if operation.status is OperationStatus.READY: # moved here, before the set add
return
with self._replay_status_lock:
if self._replay_status is not ReplayStatus.REPLAY:
return
if operation.operation_id in self._replayed_operation_hooks:
return
self._replayed_operation_hooks.add(operation.operation_id)
self._plugin_executor.on_operation_replay(operation)
that way it happens together and the (highly theoretical) future possibility where the the op is recorded as handled even though no hook fired won't be there.
yaythomas
previously approved these changes
Jun 30, 2026
yaythomas
left a comment
Contributor
There was a problem hiding this comment.
very nice work in a tricky area! 🥇
I left a comment but it's just a suggestion, I didn't fully trace it through so I could easily be missing something. Have an approve regardless ![]()
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR makes plugin operation lifecycle hooks observable during replay, not just when an operation update is checkpointed for the first time.
Previously,
on_operation_startandon_operation_endwere only dispatched from the checkpoint processing path:on_operation_startfired when aSTARToperation update was successfully checkpointed.on_operation_endfired when a terminal operation state came back from checkpointing.ExecutionState.get_checkpoint_result()and skipped execution, but no operation lifecycle plugin hooks were emitted.That meant plugins could observe the original operation lifecycle but could not observe that the same logical operation was replayed.
What Changed
is_replayed: booltoOperationInfo.is_replayed=Falsefor normal, first-time operation hook dispatch.ExecutionState.get_checkpoint_result()while the state is in replay mode.PluginExecutor.on_operation_replay()to emit:OperationStartInfo(..., is_replayed=True)for replayed checkpointed operations.OperationEndInfo(..., is_replayed=True)when the replayed operation is terminal._replayed_operation_hooks.EXECUTIONoperations andREADYoperations.Behavior Notes
For a replayed terminal operation, plugins now receive a pair of lifecycle callbacks:
on_operation_start(info)withinfo.is_replayed is Trueon_operation_end(info)withinfo.is_replayed is TrueFor a replayed non-terminal operation such as
STARTED, plugins receive onlyon_operation_start(info)withinfo.is_replayed is True.Fresh checkpointed operations continue to emit the same lifecycle hooks as before, with
is_replayed=False.Test Fix Included
While running the full local unit suite,
test_event_timeout_handlingfailed because its upper timing bound was too tight under local scheduler load. This PR makes that test less flaky by:time.monotonic()for elapsed time measurement,0.1sto0.25s,scheduler.stop()runs in afinallyblock even if the assertion fails.Validation
Local checks run:
hatch fmt --checkhatch run types:checkhatch run test:all -qThe final full unit run passed:
2676 passed, 2 skipped.