Skip to content

Add entity-normalization evidence to log-analysis correlation#1147

Open
Peter7896 wants to merge 1 commit into
UnitOneAI:mainfrom
Peter7896:codex/log-analysis-normalization-evidence
Open

Add entity-normalization evidence to log-analysis correlation#1147
Peter7896 wants to merge 1 commit into
UnitOneAI:mainfrom
Peter7896:codex/log-analysis-normalization-evidence

Conversation

@Peter7896
Copy link
Copy Markdown

@Peter7896 Peter7896 commented Jun 5, 2026

Skill Improvement ($50-150 Bounty)

Skill Modified

Skill name: log-analysis
Skill path: skills/secops/log-analysis/

What Was Wrong

log-analysis already tells analysts to pivot across users, hosts, IPs, and IOCs, but it does not require evidence showing that source-specific entity fields were normalized before cross-source correlation. A Windows DOMAIN\\user, Entra ID UPN, EDR short username, SaaS principal ID, NetBIOS hostname, FQDN, and device ID can represent the same entity or different entities. Without a documented join rule and confidence level, the skill can over-link unrelated activity or miss real cross-source behavior.

This PR addresses #1142, with emphasis on the entity-normalization portion. It is intended to be complementary to the existing timestamp-focused work in #1053 / #1054 / #1102 by adding normalized user/host keys, parser/schema evidence, and entity join confidence to the analysis workflow and output template.

What This PR Fixes

  • Adds a normalization preflight before timeline construction and cross-source pivots.
  • Requires source-quality evidence for parser/schema, normalized user key, normalized host key, and join confidence.
  • Adds guardrails for treating DOMAIN\\user, UPNs, short usernames, service principals, NetBIOS names, FQDNs, endpoint sensor IDs, and cloud device IDs as separate until an authoritative mapping confirms the join.
  • Extends correlation guidance so entity pivots must list the normalized key used for each join.
  • Updates the report template with Source Quality and Normalization and timeline fields for normalized entity and entity join confidence.
  • Adds a pitfall for joining entities without normalization evidence.
  • Keeps timestamp provenance fields in the same table because entity joins and event ordering are evaluated together in real incident timelines.

Evidence

Before (skill can over-link entities):

[
  {"source": "windows", "Account": "ACME\\alice", "Computer": "WS-17", "EventID": 4624},
  {"source": "azuread", "UserPrincipalName": "alice@acme.example", "DeviceId": "aad-device-123"},
  {"source": "edr", "user.name": "alice", "host.hostname": "ws-17.acme.example"}
]

Without a normalization table, an analyst may silently treat all three user and host representations as the same actor/device, or fail to join them when they are actually the same entity.

After (now explicitly handled):

Required evidence:
  parser/schema used, normalized user key, normalized host key,
  authoritative enrichment source or join rule, and entity join confidence.

The report template now requires the normalized entity and entity join confidence in the timeline, and weak joins are documented as analysis notes or visibility gaps instead of confirmed findings.

Test Cases Added/Updated

  • Added vulnerable test cases (tests/vulnerable/)
  • Added benign test cases (tests/benign/)
  • Existing markdown structure reviewed
  • git diff --check completed with no whitespace errors

This repository's current log-analysis skill is a markdown skill without adjacent test fixture directories on main, so this PR updates the executable analysis workflow and output template directly.

Bounty Tier

  • Minor ($50) - Doc update, small logic tweak, typo fix
  • Moderate ($100) - New edge case coverage, FP reduction with evidence
  • Substantial ($150) - Rewritten detection logic, major coverage expansion

Bounty Info

  • I have read and agree to the CONTRIBUTING.md bounty terms
  • Preferred payment method: Crypto; details can be provided privately after acceptance.

@Peter7896 Peter7896 changed the title Improve log-analysis timestamp and entity normalization evidence Add entity-normalization evidence to log-analysis correlation Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant