Skip to content

Design proposal: Coroot eBPF observability option for Cozystack#22

Open
George Gaál (gecube) wants to merge 1 commit into
mainfrom
proposal/coroot-ebpf-observability
Open

Design proposal: Coroot eBPF observability option for Cozystack#22
George Gaál (gecube) wants to merge 1 commit into
mainfrom
proposal/coroot-ebpf-observability

Conversation

@gecube

Copy link
Copy Markdown

Migrates discussion cozystack/cozystack#3028 into the design-proposal process.

Adds design-proposals/coroot-ebpf-observability/README.md: add Coroot (Apache-2.0, eBPF) as a zero-instrumentation observability option — service maps, tracing, profiling — reusing Cozystack's existing VictoriaMetrics and ClickHouse backends. Includes a low-commitment agents-only entry point.

Source discussion: cozystack/cozystack#3028

Sibling proposal (migrated together): #21

DCO: commit is signed off.

Migrated from discussion cozystack/cozystack#3028 to the design-proposal
process for review.

Signed-off-by: Gaál György <gb12335@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@gecube, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 59 minutes and 47 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 88f8c8fc-8e66-4bdb-887c-5e409f631247

📥 Commits

Reviewing files that changed from the base of the PR and between fbfc6ba and 29692a9.

📒 Files selected for processing (1)
  • design-proposals/coroot-ebpf-observability/README.md
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch proposal/coroot-ebpf-observability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design proposal to integrate Coroot as a zero-instrumentation (eBPF) observability option for Cozystack, utilizing existing VictoriaMetrics and ClickHouse backends. The review feedback suggests improving the proposal by recommending minimum Linux capabilities (such as CAP_BPF and CAP_PERFMON) instead of full privileges for the daemonset to mitigate security risks, and defining how the agent handles backpressure and resource limits when backend storage is unavailable.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.


## Security

eBPF agents run privileged on each node — a new trust boundary that must be evaluated, especially for the tenant-app mode where namespace-scoped tenants would gain access to kernel-level telemetry. The per-tenant eBPF privilege model is an explicit open question; tenant-app mode is deferred to a later phase partly for this reason. SSO and ingress hardening for the UI land in Phase 2.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Running eBPF agents with full privileges (privileged: true) poses a significant security risk. It is highly recommended to explicitly state that the daemonset should be configured with the minimum necessary Linux capabilities (such as CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_RESOURCE) rather than full privileges, depending on the minimum supported kernel version.

Suggested change
eBPF agents run privileged on each node — a new trust boundary that must be evaluated, especially for the tenant-app mode where namespace-scoped tenants would gain access to kernel-level telemetry. The per-tenant eBPF privilege model is an explicit open question; tenant-app mode is deferred to a later phase partly for this reason. SSO and ingress hardening for the UI land in Phase 2.
eBPF agents run privileged on each node — a new trust boundary that must be evaluated, especially for the tenant-app mode where namespace-scoped tenants would gain access to kernel-level telemetry. To mitigate host compromise risks, the daemonset should be configured with the minimum necessary Linux capabilities (e.g., CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN) rather than full container privileges where supported by the kernel. The per-tenant eBPF privilege model is an explicit open question; tenant-app mode is deferred to a later phase partly for this reason. SSO and ingress hardening for the UI land in Phase 2.

## Failure and edge cases

- **Coroot UI / control plane down** → agents keep shipping to VictoriaMetrics/ClickHouse; dashboards in Grafana remain usable (agents-only mode is unaffected by UI availability).
- **eBPF unsupported on a node kernel** → agent should degrade/skip rather than crash-loop; document the minimum kernel.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When the backend storage (VictoriaMetrics or ClickHouse) is unavailable or experiencing high latency, eBPF agents can consume significant memory or CPU buffering events. It is important to define how the agent handles backpressure and resource limits under these conditions.

Suggested change
- **eBPF unsupported on a node kernel** → agent should degrade/skip rather than crash-loop; document the minimum kernel.
- **eBPF unsupported on a node kernel** → agent should degrade/skip rather than crash-loop; document the minimum kernel.
- **Backend backpressure / unavailability** → if VictoriaMetrics or ClickHouse is down or slow, the eBPF agent must safely drop or buffer telemetry without causing node-level memory exhaustion or high CPU overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant