Retry user-initiated splices across restarts and disconnects#930
Retry user-initiated splices across restarts and disconnects#930jkczyz wants to merge 10 commits into
Conversation
|
👋 Hi! I see this is a draft PR. |
When a splice is already pending, the user needs a way to replace its funding transaction at a higher feerate. This adds rbf_channel() to handle that case and guards splice_in/splice_out against being called while a pending splice exists, directing users to rbf_channel instead. Also fixes signing for RBF replacements, which requires accessing outputs spent by unconfirmed transactions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
492bd32 to
3ae2507
Compare
bump_channel_funding_fee only bumps a pending splice's fee; it cannot change the splice amount, so phrasing it as an alternative to splicing was misleading. Co-Authored-By: Claude <noreply@anthropic.com>
Channel-opening and splice transactions transition to Succeeded when ChannelReady fires, not after ANTI_REORG_DELAY confirmations. This matches the point at which the Lightning layer considers the channel usable: a zero-conf channel graduates as soon as its counterparty signals, and a high-conf channel waits however many confirmations the peer requires, rather than always stopping at six. For splice RBF, the payment records whichever candidate actually confirmed, with that candidate's amount and this node's share of the fee — not the fee-estimate used for weight at coin-selection time, and not the whole-tx fee for a multi-contributor splice. A channel closure whose funding or splice never confirmed discards its payment record instead of leaving it pending forever. Generated with assistance from Claude Code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The TxReplaced wallet event rebuilt the payment record from scratch, dropping its funding details. When a wallet sync fell between a splice broadcast and its RBF, the replacement of the original candidate cleared those details, so the payment no longer graduated to Succeeded on ChannelReady. Funding records are managed by the classify path and the Lightning lifecycle handlers, so leave them untouched on replacement. Co-Authored-By: Claude <noreply@anthropic.com>
bump_fee_rbf accepted channel-funding and splice payments because they are recorded as outbound, unconfirmed on-chain payments. Replacing such a transaction via wallet RBF would broadcast one LDK isn't tracking, and for splices the shared input can't be wallet-signed. Reject these and leave splice fee-bumping to rbf_channel. Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Once ChannelReady graduates a funding payment to Succeeded (removing its pending record), a concurrent wallet-sync event for one of its candidates could re-stamp the old Pending status and revert the graduation. Treat a Succeeded on-chain payment as terminal during sync, refining only the confirmation status of the candidate that locked. Co-Authored-By: Claude <noreply@anthropic.com>
Previously the BroadcasterInterface implementation wrote the payment record synchronously when LDK invoked it. With a remote KV store this could block LDK's message handling for hundreds of milliseconds per call, noticeably during force-close bursts or splice broadcasts. Persistence now happens asynchronously and must complete before the transaction is sent to the chain client. If persistence fails, the broadcast is dropped: a payment record must exist for every on-chain tx we emit, otherwise a crash could leave the tx confirmed with no matching record. Generated with assistance from Claude Code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LDK does not durably record a splice until its negotiation reaches the signature exchange, and it abandons an in-progress negotiation whenever the peer disconnects -- which includes stopping the node. A restart or an ill-timed disconnect after splice_in, splice_out, or bump_channel_funding_fee returned Ok would therefore silently drop the splice. Persist the splice intent in a new UserChannelId-keyed channel record store before handing the contribution to LDK, and resubmit it until the splice locks. A startup reconciler probes LDK's live splice state to detect dropped intents -- including those lost to a crash before LDK persisted anything -- and the SpliceNegotiationFailed handler retries recoverable failures, rebuilding the contribution with fresh parameters when the stored one has gone stale. Resubmission does not require the peer to be connected, as LDK holds the contribution and initiates quiescence once the peer reconnects. Event::SpliceNegotiationFailed is now emitted only once a splice is given up on (a non-transient failure or retries exhausted) rather than for every failed negotiation round. Generated with assistance from Claude Code. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
3ae2507 to
6098d0e
Compare
joostjager
left a comment
There was a problem hiding this comment.
The PR description explains that LDK currently does not durably record a splice until signature exchange, but I don’t think it explains why ldk-node should therefore become the owner of that durability?
This seems similar to the functionality added in #882: common enough, and close enough to channel/protocol state, that it feels like it should live in LDK if we want the behavior to be durable. Persisting it in ldk-node means we now have another store tracking protocol state alongside ChannelManager/ChannelMonitor, plus reconciliation logic to infer whether LDK still has or no longer has the splice. That creates desync risk between persistence layers.
I can see ldk-node owning product policy around retries or how/when to surface failures, but the durable record of an accepted splice contribution or in-flight splice intent feels like it should be owned by LDK.
LDK does not durably record a splice until its negotiation reaches the signature exchange, and it abandons an in-progress negotiation whenever the peer disconnects -- which includes stopping the node. A restart or an ill-timed disconnect after
splice_in,splice_out, orbump_channel_funding_feereturnedOkwould therefore silently drop the splice.Persist the splice intent in a new
UserChannelId-keyed channel record store before handing the contribution to LDK, and resubmit it until the splice locks. A startup reconciler probes LDK's live splice state to detect dropped intents -- including those lost to a crash before LDK persisted anything -- and theSpliceNegotiationFailedhandler retries recoverable failures, rebuilding the contribution with fresh parameters when the stored one has gone stale. Resubmission does not require the peer to be connected, as LDK holds the contribution and initiates quiescence once the peer reconnects.Event::SpliceNegotiationFailedis now emitted only once a splice is given up on (a non-transient failure or retries exhausted) rather than for every failed negotiation round.Based on #888.