fix: OSD Global-tenant import + dropped report files with glob metacharacters; validate dev stack on OpenSearch 3.x with PostgreSQL#781
Merged
Conversation
dashboard-dev-bootstrap.sh sent `securitytenant: global_tenant`. The OpenSearch security plugin reads that header as a tenant *name*, and `global_tenant` is a sample custom tenant from the security demo config -- not the shared Global tenant, whose token is the literal `global`. The import therefore landed in a separate `global_tenant` tenant (its own `.kibana_<hash>_globaltenant_1` index) and the dashboards were invisible to anyone viewing the Global tenant in OpenSearch Dashboards. Verified against the live dev cluster: `_find` under `securitytenant: global` returned 26 objects and `.kibana_1` (the Global tenant index the UI reads) went from 2 to 67 docs after re-importing with the fix. An empty/omitted header read 0 from Global -- it falls back to the user's configured default tenant -- so `global` is the only reliable token. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #781 +/- ##
==========================================
+ Coverage 85.54% 85.58% +0.03%
==========================================
Files 17 17
Lines 4628 4633 +5
==========================================
+ Hits 3959 3965 +6
+ Misses 669 668 -1 ☔ View full report in Codecov by Sentry. |
The CLI expanded every file argument with glob(), which treats [, ], *, and ? as pattern syntax. A literal path like "[Netease DMARC Failure Report] Rent Reminder.eml" -- the bracketed shape many providers use for emailed failure reports -- was read as a character class, matched nothing, and was dropped before reaching the parser, with no error. File arguments that exist on disk are now taken literally; only non-existent paths are globbed, so shell-style wildcards still expand. Also adds "postgresql" to _KNOWN_SECTIONS so PARSEDMARC_POSTGRESQL_* env vars (and their _FILE Docker-secret variants) resolve like every other backend -- the PostgreSQL backend is new in 10.0.0, so this completes the unreleased feature rather than fixing a released regression, and is documented under the PostgreSQL enhancement, not Bug fixes. Regression tests added for both. Verified end-to-end: all four samples/failure/*.eml now index (the bracketed Netease report included). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dev stack The dev stack ran OpenSearch Dashboards 3.x against OpenSearch 2.x, an unsupported cross-major pairing. Bump opensearch to :3 (validated on 3.6.0: OSD import into the Global tenant and all dashboards work). Add a postgresql service plus bootstrap wiring so the new PostgreSQL backend is exercised alongside the others: wait for PG, seed it via PARSEDMARC_POSTGRESQL_* env vars on the same parsedmarc run, wipe it on RESEED, create a Grafana grafana-postgresql-datasource (uid dmarc-pg), and import dashboards/grafana/Grafana-DMARC_Reports-PostgreSQL.json. PG seeding is gated on psycopg being importable: parsedmarc aborts the whole run (exit 1, nothing written to any backend) when a configured output backend can't initialize, so wiring in PG without the optional extra would silently zero ES/OS/Splunk too. When psycopg is absent the script warns and skips PG, leaving the other backends seeded. Also fix the Grafana admin password env: the container was given GRAFANA_PASSWORD, which Grafana ignores -- it reads GF_SECURITY_ADMIN_PASSWORD. Defaults to admin to match the script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PostgreSQL ships a premade Grafana dashboard (dashboards/grafana/Grafana-DMARC_Reports-PostgreSQL.json), so it belongs on the "for use with premade dashboards" bullet alongside Elasticsearch, OpenSearch, and Splunk rather than on the plain-output-destinations line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The aggregate index pattern in dashboards/opensearch/opensearch_dashboards.ndjson
shipped a cached field-list snapshot where org_email was a text/object
conflict, plus leftover org_email.#text and org_email.#text.keyword
subfields. Those came from a cluster that had indexed a langAttrString
email dict ({"#text": ..., "@lang": ...}) before the parser unwrapped it.
org_email is mapped as Text() and parse_aggregate_report_xml now unwraps a
dict email to a plain string, so current data is consistently text -- a
clean cluster's _field_caps reports no conflict. Cleared the frozen
conflict and the two artifact subfields, leaving org_email (text) and
org_email.keyword, matching the live mapping.
Verified: re-importing the corrected ndjson yields an index pattern with
org_email as a plain text field and zero conflicts; only the aggregate
index-pattern line changed, all other saved objects byte-identical.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
samples/aggregate/rfc9990-sample.xml and rfc9990-example.net!...xml were not in the bootstrap's SAMPLE_FILES, so the dev stack only ever indexed RFC 7489 reports and the new DMARCbis fields (np, testing, discovery_method, generator, xml_namespace) never appeared in the OpenSearch/Kibana indices or were available to the dashboards. Added both samples (one declares the urn:ietf:params:xml:ns:dmarc-2.0 namespace, the other is namespaceless RFC 9990-shaped, covering both detection paths). Verified the seeded data now carries np/testing/ discovery_method/generator and xml_namespace=urn:ietf:params:xml:ns:dmarc-2.0; OpenSearch Dashboards surfaces them on an index-pattern field-list refresh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The seed previously required parsedmarc to be pre-installed and only warned-and-skipped PostgreSQL when psycopg was missing. Resolve the seed environment by precedence instead: 1. explicit PARSEDMARC_BIN -> used as-is, nothing installed 2. active $VIRTUAL_ENV 3. existing repo venv/ or .venv/ 4. otherwise create $REPO_ROOT/venv For cases 2-4, run `pip install -e .[postgresql]` only when the CLI or psycopg is missing, so the dev stack can populate Postgres out of the box without a manual install step. The explicit-PARSEDMARC_BIN path is left untouched (and the psycopg seed guard still warns/skips if that env lacks the extra). Verified: a RESEED run resolves the active venv, seeds ES/OS/Splunk/PG including the RFC 9990 fields, with no output-client errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Started as the OSD Global-tenant import bug; auditing the rest of the dev stack surfaced real library bugs, a shipped-dashboard mapping conflict, and several dev-stack gaps. Everything was validated end-to-end against a freshly rebuilt stack (OpenSearch 3.6.0 + OSD 3.5).
Bug fixes (released behavior)
OSD saved objects imported into the wrong tenant.
dashboard-dev-bootstrap.shsentsecuritytenant: global_tenant. The security plugin reads that as a tenant name, andglobal_tenantis a sample custom tenant in the demo config — not the Global tenant (tokenglobal). The import landed in a phantom tenant; Global looked empty. Fixed toglobal. (Shipped 9.10.3–9.11.2.)Report files with glob metacharacters in their names were silently dropped. The CLI ran every file argument through
glob(), which treats[,],*,?as patterns. A literal[Netease DMARC Failure Report] Rent Reminder.emlmatched nothing and never reached the parser. Files that exist on disk are now used literally; only non-existent paths are globbed. (This is why "4 failure samples → only 3 in the dashboard".)OSD mapping conflict on the aggregate
org_emailfield. The shippedopensearch_dashboards.ndjsonfroze a cached field-list whereorg_emailwas atext/objectconflict (plus staleorg_email.#text*subfields) — from a cluster that indexed alangAttrStringemail dict before the parser unwrapped it.org_emailisText()and the parser now unwraps dict emails, so live data is clean; cleared the frozen conflict + artifacts. (Shipped in 9.11.2.)Bug fixes 1–2 have regression tests in
tests/test_cli.py.Completing the (unreleased) PostgreSQL feature
postgresqlto_KNOWN_SECTIONSsoPARSEDMARC_POSTGRESQL_*env vars and their_FILEDocker-secret variants resolve like every other backend (was silently ignored). New in 10.0.0, so documented under the PostgreSQL enhancement, not Bug fixes. Has a test.Dev-stack audit follow-ups (bootstrap + compose)
PARSEDMARC_POSTGRESQL_*, RESEED wipe, Grafanagrafana-postgresql-datasource, PG dashboard import). Seeding is gated onpsycopgbeing importable — parsedmarcexit(1)s during client init if a configured backend can't load, which would otherwise zero every backend.GRAFANA_PASSWORD, which Grafana ignores; it readsGF_SECURITY_ADMIN_PASSWORD.Docs
docs/source/index.md).Validation (rebuilt stack)
globaltenant (OS 3.6.0)org_emaildmarc_f*)🤖 Generated with Claude Code