fix(cilium): pass IPv6 to kernel stack when IPv6 datapath is disabled#2871
Conversation
📝 WalkthroughWalkthroughThis PR implements IPv6 pass-through handling in Cilium when the IPv6 datapath is disabled. It patches the BPF ChangesIPv6 Pass-Through Patching Infrastructure
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where Cilium's host firewall incorrectly drops IPv6 traffic when the IPv6 datapath is disabled. By injecting a minimal BPF patch into the Cilium image, the change ensures that necessary node-level IPv6 traffic, such as ICMPv6 Neighbor Discovery, is correctly routed to the kernel stack. The implementation includes robust build-time checks to prevent regressions during future upstream version bumps. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on Gemini (@gemini-code-assist) comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to apply and verify an IPv6-passthrough BPF patch for Cilium. It adds a verification step in the Makefile during updates, implements a multi-stage Dockerfile build to apply the patch, and documents the rationale in values.yaml. The review feedback correctly points out that both the Makefile verification and the Dockerfile build will fail because patch -p1 expects the target file to be located under a bpf/ subdirectory to match the paths in the patch file.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| tmpdir=$$(mktemp -d) && \ | ||
| { curl --fail --silent --show-error --location \ | ||
| "https://raw.githubusercontent.com/cilium/cilium/v$${version}/bpf/bpf_host.c" \ | ||
| -o "$$tmpdir/bpf_host.c" || { rm -rf "$$tmpdir"; exit 1; }; } && \ | ||
| { patch --dry-run --batch --forward --fuzz=0 -p1 -d "$$tmpdir" \ | ||
| < images/cilium/patches/pass-ipv6-to-stack-when-ipv6-datapath-disabled.diff || \ | ||
| { echo "ERROR: images/cilium/patches/pass-ipv6-to-stack-when-ipv6-datapath-disabled.diff no longer applies to upstream bpf/bpf_host.c v$${version}; regenerate it for the new cilium version" >&2; \ | ||
| rm -rf "$$tmpdir"; exit 1; }; } && \ | ||
| rm -rf "$$tmpdir" |
There was a problem hiding this comment.
If the patch file pass-ipv6-to-stack-when-ipv6-datapath-disabled.diff is a standard git patch generated against the upstream repository, it will contain file paths like a/bpf/bpf_host.c. Running patch -p1 strips the first component (a/) and expects the file to be located at bpf/bpf_host.c relative to the working directory. Since curl downloads the file directly to $$tmpdir/bpf_host.c, patch will fail to find the file.
Creating the bpf directory structure inside $$tmpdir ensures the patch applies correctly.
tmpdir=$$(mktemp -d) && \
mkdir -p "$$tmpdir/bpf" && \
{ curl --fail --silent --show-error --location \
"https://raw.githubusercontent.com/cilium/cilium/v$${version}/bpf/bpf_host.c" \
-o "$$tmpdir/bpf/bpf_host.c" || { rm -rf "$$tmpdir"; exit 1; }; } && \
{ patch --dry-run --batch --forward --fuzz=0 -p1 -d "$$tmpdir" \
< images/cilium/patches/pass-ipv6-to-stack-when-ipv6-datapath-disabled.diff || \
{ echo "ERROR: images/cilium/patches/pass-ipv6-to-stack-when-ipv6-datapath-disabled.diff no longer applies to upstream bpf/bpf_host.c v$${version}; regenerate it for the new cilium version" >&2; \
rm -rf "$$tmpdir"; exit 1; }; } && \
rm -rf "$$tmpdir"
There was a problem hiding this comment.
The patch is not a git-format diff rooted at the repo — it is generated with explicit --- a/bpf_host.c / +++ b/bpf_host.c labels (see the hunk headers in patches/pass-ipv6-to-stack-when-ipv6-datapath-disabled.diff), so -p1 resolves to bpf_host.c inside the patch directory, which is exactly where curl writes the file. This dry-run passes as-is, and the same patch is applied successfully by the image build in CI on every run.
| COPY --from=upstream /var/lib/cilium/bpf/bpf_host.c ./bpf_host.c | ||
| COPY patches/pass-ipv6-to-stack-when-ipv6-datapath-disabled.diff /patches/ | ||
| # --fuzz=0: any upstream drift in the patched context must fail the build, loudly. | ||
| RUN patch --batch --forward --fuzz=0 -p1 < /patches/pass-ipv6-to-stack-when-ipv6-datapath-disabled.diff || { \ | ||
| echo 'ERROR: the IPv6 passthrough patch no longer applies to bpf/bpf_host.c.' >&2; \ | ||
| echo 'Upstream cilium changed the host-firewall protocol switches (version bump?).' >&2; \ | ||
| echo 'Regenerate packages/system/cilium/images/cilium/patches/*.diff against the new tag.' >&2; \ | ||
| exit 1; } | ||
| RUN test "$(grep -c 'cozystack-ipv6-passthrough' bpf_host.c)" = 3 || { \ | ||
| echo 'ERROR: expected exactly 3 cozystack-ipv6-passthrough markers in bpf_host.c' >&2; \ | ||
| exit 1; } | ||
|
|
||
| FROM upstream | ||
| COPY --from=patcher /work/bpf_host.c /var/lib/cilium/bpf/bpf_host.c |
There was a problem hiding this comment.
Similarly to the Makefile, if the patch file is a standard git patch, patch -p1 expects the file to be located at bpf/bpf_host.c relative to the working directory. Copying the file to ./bpf/bpf_host.c instead of ./bpf_host.c ensures the patch applies successfully during the Docker build.
COPY --from=upstream /var/lib/cilium/bpf/bpf_host.c ./bpf/bpf_host.c
COPY patches/pass-ipv6-to-stack-when-ipv6-datapath-disabled.diff /patches/
# --fuzz=0: any upstream drift in the patched context must fail the build, loudly.
RUN patch --batch --forward --fuzz=0 -p1 < /patches/pass-ipv6-to-stack-when-ipv6-datapath-disabled.diff || { \
echo 'ERROR: the IPv6 passthrough patch no longer applies to bpf/bpf_host.c.' >&2; \
echo 'Upstream cilium changed the host-firewall protocol switches (version bump?).' >&2; \
echo 'Regenerate packages/system/cilium/images/cilium/patches/*.diff against the new tag.' >&2; \
exit 1; }
RUN test "$(grep -c 'cozystack-ipv6-passthrough' bpf/bpf_host.c)" = 3 || { \
echo 'ERROR: expected exactly 3 cozystack-ipv6-passthrough markers in bpf_host.c' >&2; \
exit 1; }
FROM upstream
COPY --from=patcher /work/bpf/bpf_host.c /var/lib/cilium/bpf/bpf_host.c
There was a problem hiding this comment.
Same as on the Makefile thread: the diff carries a/bpf_host.c / b/bpf_host.c labels, not a/bpf/bpf_host.c, so with -p1 the expected path is ./bpf_host.c — matching the COPY destination. This stage runs on every image build; the CI Build job applies the patch successfully, and the marker-count check right below would fail the build if the patch had not been applied.
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
Cozystack ships Cilium with the IPv6 datapath disabled
(ipv6.enabled=false) and the host firewall enabled. In this combination
upstream bpf_host.c drops all IPv6 on managed devices as unknown L3
("Unsupported L3 protocol") before any policy evaluation, in both
directions. This kills ICMPv6 Neighbor Discovery and BGP unnumbered over
link-local addresses on IPv6/L3 fabrics, and no
CiliumClusterwideNetworkPolicy can allow the traffic back because the
drop happens before policy enforcement.
Carry a BPF patch in the cozystack cilium image that adds an ETH_P_IPV6
passthrough case to the host-firewall protocol switches, mirroring the
behaviour without the host firewall: IPv6 is handed to the kernel stack
instead of dropped. IPv4 host policies remain fully enforced; node IPv6
is not filtered by Cilium.
The patch is applied with patch --fuzz=0 in a multi-stage build, so any
upstream drift in the patched context fails the image build loudly.
Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
The image build already fails on patch drift, but only when the image is built. Check the patch against the freshly vendored upstream tag inside make update as well, so the author of a cilium version bump sees the drift immediately instead of at the next image build. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
State next to the hostFirewall default why the cozystack cilium image deviates from upstream: with the IPv6 datapath disabled, upstream drops all node IPv6 pre-policy, while the patched image hands IPv6 to the kernel stack. Cilium host policies therefore apply to IPv4 only. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
ba3b0f5 to
ec0b266
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/system/cilium/Makefile (1)
30-32: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winAdd retry/timeout guards to upstream fetch
Line 30 performs a network call that can fail transiently and cause flaky
make updatefailures even when the patch is valid. Add bounded retries and timeouts to make this check more reliable.Suggested change
- { curl --fail --silent --show-error --location \ + { curl --fail --silent --show-error --location \ + --retry 3 --retry-all-errors --retry-delay 1 \ + --connect-timeout 10 --max-time 60 \ "https://raw.githubusercontent.com/cilium/cilium/v$${version}/bpf/bpf_host.c" \ -o "$$tmpdir/bpf_host.c" || { rm -rf "$$tmpdir"; exit 1; }; } && \🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/system/cilium/Makefile` around lines 30 - 32, The curl command that fetches bpf_host.c from the cilium repository can fail due to transient network issues, causing flaky make update failures. Add retry logic and timeout guards to the curl command by using curl's --retry flag to limit retry attempts and --connect-timeout and --max-time flags to set connection and total request timeouts, making the network call more resilient to temporary failures while still failing fast on persistent issues.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/system/cilium/Makefile`:
- Around line 30-32: The curl command that fetches bpf_host.c from the cilium
repository can fail due to transient network issues, causing flaky make update
failures. Add retry logic and timeout guards to the curl command by using curl's
--retry flag to limit retry attempts and --connect-timeout and --max-time flags
to set connection and total request timeouts, making the network call more
resilient to temporary failures while still failing fast on persistent issues.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f19386eb-54ab-4993-a396-7a1699f3ccf8
📒 Files selected for processing (4)
packages/system/cilium/Makefilepackages/system/cilium/images/cilium/Dockerfilepackages/system/cilium/images/cilium/patches/pass-ipv6-to-stack-when-ipv6-datapath-disabled.diffpackages/system/cilium/values.yaml
✅ Files skipped from review due to trivial changes (1)
- packages/system/cilium/values.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/system/cilium/images/cilium/Dockerfile
- packages/system/cilium/images/cilium/patches/pass-ipv6-to-stack-when-ipv6-datapath-disabled.diff
myasnikovdaniil
left a comment
There was a problem hiding this comment.
APPROVE
Static review plus a controlled live datapath test on a dev cluster running cozystack's default posture (enable-host-firewall=true, enable-ipv6=false, enable-ipv4=true).
What the patch does (verified in source): three #ifndef ENABLE_IPV6-guarded case ETH_P_IPV6 blocks in bpf/bpf_host.c at do_netdev, cil_to_netdev, and host_ingress_policy, passing node IPv6 to the kernel stack (CTX_ACT_OK) instead of dropping it as DROP_UNKNOWN_L3. The 4th site (from_host_to_lxc) is correctly left uncovered and documented, since bpf_lxc drops IPv6 anyway with the v6 datapath off. Build plumbing is solid: the Dockerfile applies the patch with --fuzz=0 and asserts exactly 3 markers, and make update dry-runs the patch against the upstream tag so a cilium bump fails loudly on drift. The values.yaml note documents the resulting behavior (host policies apply to IPv4 only; node IPv6 is not Cilium-filtered).
Runtime evidence (built the image from this branch and rolled it onto a dev cluster, host firewall enabled):
- Before (stock image): host IPv6 (ICMPv6 to
ff02::1from the node link-local on the node-IP interface) is dropped at the host datapath —cilium monitorshowsdrop (Unsupported L3 protocol) ... file bpf_host.c ... fe80::… -> ff02::1 icmp EchoRequest, andcilium_drop_count_total reason="Unsupported L3 protocol"increments on both directions. - After (patched image, host firewall still enabled): the identical trigger produces zero
Unsupported L3drops atbpf_host.cand the drop counter is flat. The only remaining drops are unrelated L2 LLC frames present before and after. - No regression: IPv4 unaffected — a freshly created pod got an IP, resolved cluster + external DNS, reached the kubernetes Service over TCP 443, and pod/node IPv4 connectivity was intact. Rollout was clean (one agent at a time, each returning
cilium statusOK), and the environment was restored to the pinned image afterward.
Scoping confirmed: the diff is purely additive — every block is inside #ifndef ENABLE_IPV6, so an IPv6-enabled datapath strips all three at preprocess time (no behavior change, no duplicate-case build error), and the IPv4 path is untouched. The two gemini-code-assist threads about patch -p1 needing a bpf/ subdirectory are false positives — the diff carries a/bpf_host.c labels, so -p1 resolves to bpf_host.c, matching the COPY destination; the green Build job and a local build both confirm it.
Non-blocking: no in-repo automated test (the package has no helm-unittest target and BPF can't be compiled in this CI), but the build-time marker/--fuzz=0 checks plus the upstream sibling PR's BPF unit tests are an adequate substitute for a vendored patch; and the CodeRabbit nit to add curl --retry/--max-time to the make update fetch is a reasonable optional robustness tweak.
The fix does exactly what it claims with no observed IPv4/connectivity regression. LGTM.
What this PR does
Cozystack enables Cilium's host firewall by default while the Cilium IPv6 datapath stays disabled (
ipv6.enabled=false). In upstream Cilium this combination drops all node IPv6 on managed devices as "Unsupported L3 protocol" (DROP_UNKNOWN_L3inbpf_host.c) before any policy evaluation, in both directions. This breaks ICMPv6 Neighbor Discovery — and with it all node-level IPv6, e.g. BGP unnumbered over link-local addresses on L3 fabrics. NoCiliumClusterwideNetworkPolicycan allow this traffic, because the drop happens before policy enforcement (upstream report: cilium/cilium#33155, closed stale).The cozystack cilium image now carries a minimal BPF patch that passes IPv6 to the kernel stack instead, mirroring the behavior with the host firewall disabled:
packages/system/cilium/images/cilium/patches/— threeETH_P_IPV6passthrough cases under#ifndef ENABLE_IPV6inbpf_host.c. The fourthDROP_UNKNOWN_L3site (from_host_to_lxc) is intentionally not covered: bpf_lxc drops IPv6 without the v6 datapath regardless, so a passthrough there restores nothing.patch --fuzz=0plus a marker-count check, so upstream drift fails the image build loudly.make updatedry-runs the patch against the freshly vendored upstream tag, so a cilium version bump surfaces drift immediately.values.yamldocuments the consequence: Cilium host policies apply to IPv4 only; node IPv6 is not filtered by Cilium.The same fix was submitted upstream: cilium/cilium#46473. The local patch is carried until it lands in a release Cozystack ships.
Verification: the image builds and contains the patched source; a corrupted patch context fails the build with a clear error; the upstream variant of this change passes new BPF unit tests that fail without the fix with exactly the "Unsupported L3 protocol" drop from the field report; existing host-firewall and IPv6 BPF tests still compile. PR e2e exercises runtime BPF compilation of the patched source, since the agent compiles
bpf_host.cat startup.Docs: cozystack/website#574
Closes #2806
Screenshots
Not a UI change.
Release note
Summary by CodeRabbit