Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ All notable user-visible changes should be recorded here.

### Docs

- Added a rule-by-rule false-positive taxonomy for NAT, bastion, internal scanner,
lab replay, scheduled admin task, and shared-account contexts.
- Expanded the parser conformance matrix with explicit Ubuntu / Debian
`auth.log`, RHEL-family `secure`, `journalctl --output=short-full`, `sshd`,
`sudo`, `pam_unix`, `pam_faillock`, and `pam_sss` style coverage.
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ A compact finding summary is a bounded triage signal, not attribution:

LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow.

Reviewing the project quickly? Start with [`docs/reviewer-path.md`](./docs/reviewer-path.md), [`docs/reviewer-brief.md`](./docs/reviewer-brief.md), and the [`quality gates map`](./docs/quality-gates.md). For detection reasoning, read the forensic-style [`Linux auth brute-force case study`](./docs/case-study-linux-auth-bruteforce.md) and the [`rule catalog`](./docs/rule-catalog.md). For local scale expectations, see the [`performance envelope`](./docs/performance-envelope.md).
Reviewing the project quickly? Start with [`docs/reviewer-path.md`](./docs/reviewer-path.md), [`docs/reviewer-brief.md`](./docs/reviewer-brief.md), and the [`quality gates map`](./docs/quality-gates.md). For detection reasoning, read the forensic-style [`Linux auth brute-force case study`](./docs/case-study-linux-auth-bruteforce.md), the [`rule catalog`](./docs/rule-catalog.md), and the [`false-positive taxonomy`](./docs/false-positive-taxonomy.md). For local scale expectations, see the [`performance envelope`](./docs/performance-envelope.md).

## Why This Project Exists

Expand Down Expand Up @@ -113,7 +113,7 @@ classes: `unknown_timestamp`, `unknown_program`,
`known_program_unknown_message`, `malformed_source_ip`, and
`unsupported_pam_variant`.

For rule-by-rule semantics and signal boundaries, see [`docs/rule-catalog.md`](./docs/rule-catalog.md). For a forensic-style evidence walkthrough, see [`docs/case-study-linux-auth-bruteforce.md`](./docs/case-study-linux-auth-bruteforce.md). For the parser behavior contract, supported modes, and fixture map, see [`docs/parser-contract.md`](./docs/parser-contract.md). For the deliberately noisy parser-coverage sample, see [`docs/parser-coverage-notes.md`](./docs/parser-coverage-notes.md).
For rule-by-rule semantics and signal boundaries, see [`docs/rule-catalog.md`](./docs/rule-catalog.md). For benign-context hypotheses and the evidence needed to support them, see [`docs/false-positive-taxonomy.md`](./docs/false-positive-taxonomy.md). For a forensic-style evidence walkthrough, see [`docs/case-study-linux-auth-bruteforce.md`](./docs/case-study-linux-auth-bruteforce.md). For the parser behavior contract, supported modes, and fixture map, see [`docs/parser-contract.md`](./docs/parser-contract.md). For the deliberately noisy parser-coverage sample, see [`docs/parser-coverage-notes.md`](./docs/parser-coverage-notes.md).

LogLens does not currently detect:

Expand Down
2 changes: 1 addition & 1 deletion docs/case-study-linux-auth-bruteforce.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ These warnings are useful because they prevent silent overconfidence. A reviewer

## False-positive boundary

The findings should be read as triage statements and checked against the rule-by-rule taxonomy in [`rule-catalog.md`](./rule-catalog.md):
The findings should be read as triage statements and checked against the rule semantics in [`rule-catalog.md`](./rule-catalog.md) and the evidence-review matrices in [`false-positive-taxonomy.md`](./false-positive-taxonomy.md):

- `203.0.113.10` is a documentation-range placeholder; in a real case, the same pattern could be an external scanner, shared gateway, internal test, or replayed lab traffic.
- Username spread supports a probing interpretation, but intent is not observable from these lines alone.
Expand Down
76 changes: 76 additions & 0 deletions docs/false-positive-taxonomy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# False-Positive Taxonomy

This document records benign or ambiguous contexts that can satisfy a LogLens rule threshold. A matching context does not erase the finding: it changes how a reviewer should interpret the normalized evidence and what external records are needed before disposition.

The taxonomy is not an allow-list, suppression policy, or incident verdict. LogLens reports the rule match, its evidence window, and its `verdict_boundary`; authorization, intent, compromise, and attribution remain outside the tool.

## Taxonomy Sources

| Source | Meaning in this catalog |
| --- | --- |
| NAT | Several clients are represented by one network address, weakening source-IP identity assumptions. |
| bastion | Administrative traffic is concentrated through an approved jump host or access gateway. |
| internal scanner | Authorized assessment, compliance, or account-audit tooling deliberately generates authentication activity. |
| lab replay | Training, test, demonstration, or pipeline validation data reproduces a finding-shaped sequence. |
| scheduled admin task | A recurring operational job produces repeated authentication failures or privileged commands. |
| shared account | Several operators or services use one account, weakening individual attribution and concentrating activity. |

`bastion` and `shared account` are separate hypotheses. A bastion explains host or network concentration; a shared account explains identity concentration. Either can exist without the other.

## Brute Force

Rule evidence: at least 5 terminal SSH failure signals grouped by `source_ip` within 10 minutes by default.

Verdict boundary: `triage_signal_not_compromise_or_attribution`.

| Source | Why the threshold can match | Evidence that supports the explanation | Residual uncertainty |
| --- | --- | --- | --- |
| NAT | Independent users or services behind one egress address contribute failures to the same `source_ip` group. | VPN, proxy, firewall, or DHCP records map the source address and window to multiple internal clients. | Aggregation can explain volume but does not establish that every attempt was authorized. |
| bastion | Multiple administrators or automation jobs originate from one approved jump host. | Bastion inventory, session audit records, and operator mappings cover the finding window and evidence event IDs. | An approved bastion can still carry stale credentials, misuse, or a compromised session. |
| internal scanner | An authorized scanner tests SSH exposure or credential controls and produces terminal failures by design. | Scanner ownership, target scope, source-address inventory, and a matching scan schedule or change record. | Scanner identity supports authorization but does not validate target scope or configuration. |
| lab replay | A fixture, demonstration, or validation job replays a concentrated failure sequence. | Ingestion provenance, replay job logs, fixture hashes, or known synthetic timestamps match the evidence. | Replayed data in a production evidence path is still a provenance or pipeline-quality issue. |
| scheduled admin task | A recurring job repeatedly uses an expired, rotated, or mistyped credential. | Scheduler logs, service ownership, credential-rotation history, and matching execution timestamps. | A job explanation does not prove the credential failures are harmless or properly contained. |
| shared account | Several operators or services retry the same shared credential from one source. | Account ownership records, approved-use policy, bastion or session logs, and change-window context. | The shared identity prevents reliable attribution to an individual operator. |

## Multi-User Probing

Rule evidence: at least 3 distinct usernames in attempt-evidence signals grouped by `source_ip` within 15 minutes by default.

Verdict boundary: `triage_signal_not_intent_or_attribution`.

| Source | Why the threshold can match | Evidence that supports the explanation | Residual uncertainty |
| --- | --- | --- | --- |
| NAT | Separate legitimate users behind one egress address attempt their own usernames during the same window. | Network translation, VPN, proxy, or DHCP records map the grouped address to distinct clients and expected users. | NAT explains source aggregation but not whether every attempted username was expected. |
| bastion | An access gateway handles sessions for several named administrators or service accounts. | Bastion session records map each attempted username and timestamp to approved operators or workflows. | Missing session attribution leaves the username spread unexplained. |
| internal scanner | Account-audit or exposure tooling tries a configured username set to validate controls. | Scanner configuration, approved account list, target scope, and execution schedule match the finding. | A broad or outdated username list may still represent a control or scope problem. |
| lab replay | Synthetic data preserves username diversity to exercise parser or detector behavior. | Fixture provenance, replay logs, and expected username lists match the evidence event IDs. | Synthetic data must still be separated from operational evidence before conclusions are drawn. |
| scheduled admin task | Migration, monitoring, or account-validation automation cycles through several service identities. | Job definition, account inventory, owner confirmation, and scheduler timestamps match the rule window. | Unexpected usernames or executions outside the approved window remain unexplained. |
| shared account | Operators or tooling fall back across several shared or service accounts from one source. | Account-use policy, workflow configuration, and session logs explain the full observed username set. | One shared account alone does not create distinct-username spread; the explanation requires evidence of multiple accounts being tried. |

## Sudo Burst

Rule evidence: at least 3 `sudo_command` signals grouped by `username` within 5 minutes by default.

Verdict boundary: `triage_signal_not_maliciousness_or_authorization`.

| Source | Why the threshold can match | Evidence that supports the explanation | Residual uncertainty |
| --- | --- | --- | --- |
| NAT | NAT does not directly increase this username-grouped rule, but it can confuse attempts to correlate the finding with nearby source-IP findings. | Session records and host-local audit context link the sudo commands to a specific login independently of the network address. | Without session linkage, network proximity is not evidence that SSH and sudo findings share an actor. |
| bastion | An approved administrator reaches the host through a jump path and executes several maintenance commands quickly. | Bastion session records, target-host login records, and a change ticket align with the sudo evidence window. | A valid access path does not establish that each command was authorized. |
| internal scanner | Compliance, inventory, or endpoint assessment tooling executes a short privileged command sequence. | Agent identity, scanner policy, command allow-list, and execution logs match the reported commands and timestamps. | Unexpected commands or host scope remain reviewable even when the tool is authorized. |
| lab replay | Demonstration or test evidence contains a compact sudo sequence. | Dataset provenance, replay job records, and known synthetic account or host values match the finding. | Replayed privileged activity mixed into operational logs still weakens evidence provenance. |
| scheduled admin task | Package updates, service repair, backup, or maintenance automation runs several sudo commands in one window. | Scheduler records, automation definitions, change windows, and command text match the evidence event IDs. | Execution outside schedule or divergence from the expected command set remains unexplained. |
| shared account | Several administrators use one account, or automation and humans share the same identity, concentrating commands under one `username`. | Session attribution, privileged access management records, operator rosters, and command ownership cover the complete window. | The account model prevents reliable individual attribution and may itself be a control weakness. |

## Cross-Rule Interpretation

- A `brute_force` and `multi_user_probing` finding over the same source and window are two views of overlapping evidence, not automatically two independent actors or incidents.
- A nearby `sudo_burst` finding is not causally linked to an SSH finding unless external session evidence establishes that relationship.
- `evidence_event_ids`, `window_start`, and `window_end` define exactly what LogLens counted. Review those records before applying contextual explanations.
- Parser warnings and unsupported lines describe evidence completeness. They do not count toward findings, but a high unsupported-line rate weakens claims that an activity is absent.

## Evidence Integrity Boundary

Duplicate recognized lines, replayed collections, or merged log exports can inflate a rule count even when every line parses successfully. That is an evidence-provenance question, distinct from unsupported parser warnings. Review ingestion history and source hashes when replay or duplication is plausible.

The appropriate conclusion is therefore bounded: a taxonomy source may explain why a threshold was met, but only corroborating records can support a benign disposition. LogLens does not make that disposition automatically.
3 changes: 3 additions & 0 deletions docs/reviewer-path.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This path is for reviewers who want to understand LogLens quickly without readin
| What log formats are supported? | [`docs/parser-contract.md`](./parser-contract.md) | Can name `syslog_legacy` and `journalctl_short_full` behavior |
| What artifacts does it produce? | [`docs/report-artifacts.md`](./report-artifacts.md) and report-contract fixtures | Can inspect Markdown, JSON, and optional CSV outputs |
| How do rules use evidence? | [`docs/rule-catalog.md`](./rule-catalog.md) | Can explain grouping keys, windows, thresholds, and unsupported-evidence boundaries |
| What benign context can match a rule? | [`docs/false-positive-taxonomy.md`](./false-positive-taxonomy.md) | Can distinguish rule-true evidence from compromise, intent, attribution, or authorization claims |
| Can the parser behavior be trusted? | Parser contract, fixture matrix, and [`assets/mixed_auth_parser_coverage.json`](../assets/mixed_auth_parser_coverage.json) | Can see known, unknown, and malformed line handling |
| What proves the main claims? | [`docs/quality-gates.md`](./quality-gates.md) | Can map claims to tests, fixtures, docs, and repeatable commands |
| How should a finding be interpreted? | [`docs/case-study-linux-auth-bruteforce.md`](./case-study-linux-auth-bruteforce.md) | Can trace raw evidence to normalized events, findings, warnings, and non-goals |
Expand Down Expand Up @@ -46,6 +47,7 @@ Inspect:
- [`assets/mixed_auth_parser_coverage.json`](../assets/mixed_auth_parser_coverage.json)
- [`docs/quality-gates.md`](./quality-gates.md)
- [`docs/rule-catalog.md`](./rule-catalog.md)
- [`docs/false-positive-taxonomy.md`](./false-positive-taxonomy.md)
- [`docs/case-study-linux-auth-bruteforce.md`](./case-study-linux-auth-bruteforce.md)

Look for the evidence route:
Expand All @@ -54,6 +56,7 @@ Look for the evidence route:
- normalized event
- signal mapping boundary
- rule grouping, window, and threshold
- false-positive hypotheses and required corroborating context
- report finding or parser warning

Look for parser coverage fields:
Expand Down
24 changes: 15 additions & 9 deletions docs/rule-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,16 +50,16 @@ Current `verdict_boundary` values are:

## False-Positive Taxonomy

The taxonomy names benign or ambiguous explanations a reviewer should consider before interpreting a finding. It is not an allow-list, suppression policy, or automatic disposition.
The taxonomy names benign or ambiguous explanations a reviewer should consider before interpreting a finding. It is not an allow-list, suppression policy, or automatic disposition. The detailed evidence-review matrices are in [`false-positive-taxonomy.md`](./false-positive-taxonomy.md).

Each rule uses the same review buckets:

- NAT
- bastion
- internal scanner
- lab replay
- shared bastion
- scheduled admin task
- malformed log replay
- shared account

## Brute Force

Expand Down Expand Up @@ -112,11 +112,13 @@ The finding is a triage signal. It is not a compromise verdict, attribution clai
| Bucket | Review interpretation |
| --- | --- |
| NAT | Multiple legitimate clients behind one egress address can collapse into one `source_ip`. |
| bastion | An approved jump host can concentrate many operators or jobs under one source address. |
| internal scanner | Authorized credential auditing or exposure scanning can intentionally generate repeated failures. |
| lab replay | Sanitized sample data, training fixtures, or repeated demos can preserve concentrated failure patterns. |
| shared bastion | A managed jump host or administrative relay can make many failed attempts appear to come from one source. |
| scheduled admin task | A recurring job with stale credentials can fail repeatedly inside the rule window. |
| malformed log replay | Duplicated or replayed log material can inflate apparent volume; unsupported malformed lines remain warnings and are not counted. |
| shared account | Several operators or services can retry one shared credential from the grouped source. |

See the [brute-force review matrix](./false-positive-taxonomy.md#brute-force) for corroborating evidence and residual uncertainty.

### Why unsupported evidence is not counted

Expand Down Expand Up @@ -180,11 +182,13 @@ The rule does not infer intent. It only states that one source IP produced attem
| Bucket | Review interpretation |
| --- | --- |
| NAT | Different users behind one egress address can look like one source probing multiple accounts. |
| bastion | A shared administrative entry point can originate expected attempts for several accounts. |
| internal scanner | Authorized username-enumeration tests or account-audit tooling can touch many usernames by design. |
| lab replay | Replayed lab logs can preserve synthetic username spread without representing live probing. |
| shared bastion | Shared administrative entry points can produce attempts for several accounts from one source IP. |
| scheduled admin task | Account validation, migration, or monitoring jobs can try multiple service or user accounts in one window. |
| malformed log replay | Replayed or partially malformed evidence can duplicate username variety; unsupported records remain parser warnings and do not add usernames. |
| shared account | Shared-account workflows can include fallback attempts across several shared or service identities. |

See the [multi-user probing review matrix](./false-positive-taxonomy.md#multi-user-probing) for corroborating evidence and residual uncertainty.

### Why unsupported evidence is not counted

Expand Down Expand Up @@ -243,11 +247,13 @@ The finding is strongest when reviewed with session context, change windows, hos
| Bucket | Review interpretation |
| --- | --- |
| NAT | Usually not a primary explanation because this rule groups by `username`, but it may matter when reviewed alongside source-IP findings. |
| bastion | Approved jump-host workflows can precede a compact sequence of privileged maintenance commands. |
| internal scanner | Endpoint assessment, compliance checks, or privileged inventory tooling can run several sudo commands quickly. |
| lab replay | Demo or training logs can replay a compact privileged-command sequence. |
| shared bastion | Shared administrative accounts or jump-host workflows can concentrate privileged commands under one username. |
| scheduled admin task | Maintenance windows, package updates, service repair, or scripted operations can produce bursty sudo activity. |
| malformed log replay | Duplicated sudo lines or replayed command logs can inflate the command count; unsupported malformed sudo-like lines stay out of rule input. |
| shared account | Several administrators or services can concentrate commands under one username. |

See the [sudo-burst review matrix](./false-positive-taxonomy.md#sudo-burst) for corroborating evidence and residual uncertainty.

### Why unsupported evidence is not counted

Expand Down
Loading