Skip to content

Add YARA-X rules engine integration#2900

Open
jhonesbr-jpva wants to merge 4 commits into
sepinf-inc:masterfrom
jhonesbr-jpva:yara-engine-upstream
Open

Add YARA-X rules engine integration#2900
jhonesbr-jpva wants to merge 4 commits into
sepinf-inc:masterfrom
jhonesbr-jpva:yara-engine-upstream

Conversation

@jhonesbr-jpva

Copy link
Copy Markdown

Summary

Adds a YARA-X rules engine to the processing pipeline. A new YaraScanTask scans item content with a configurable catalog of YARA rules and records the matches as searchable properties, a dedicated UI facet/columns, and entries in the HTML report.

The engine runs in-process via libyara-x-capi 1.16.0 (YARA-X) through JNA bindings — no external process.

What's included

Pipeline / engine (iped-engine/.../task/yara/)

  • YaraScanTask — new task between carving and indexing.
  • YaraEngine (JNA) + YaraScanner — native engine wrapper.
  • YaraRulesetLoader, YaraInstallPaths, YaraMatch, MatchedString, YaraHighlightSupport.
  • YaraConfig configurable + conf/YaraConfig.txt (rule catalog + limits).

Item model / search / UI

  • yara:* properties on ExtraProperties.
  • Dedicated UI facet and columns (MetadataPanel, ColumnsManager).
  • Matches rendered in the HTML report (HTMLReportTask).

CLI

  • --yara-only mode to re-apply the rule catalog over an already-processed case (orchestrated through SkipCommitedTask + IndexTask.updateDocuments).

Build / native / docs

  • JNA dependency in iped-engine; copy-yara-x execution in iped-app ships the natives into the release tree.
  • Bundled natives under tools/yara-x/: Linux x86_64 libyara_x_capi.so and Windows x64 yara_x_capi.dll + header, plus README.md/LICENSE.
  • licenses/YARA-X.txt, ThirdParty.txt and ReleaseNotes.txt updated.
  • CI step (maven.yml) that verifies the bundled Linux .so.

Configuration

Enabled via IPEDConfig.txt; rule catalog and limits in conf/YaraConfig.txt.

Testing

Full reactor mvn clean package is green on Java 11 (Liberica Full 11.0.31). The YARA tests run against the bundled native library and pass:

  • YaraConfigTest 22/22, YaraEngineTest 5/5, YaraHighlightSupportTest 13/13, YaraRulesetLoaderTest 9/9, YaraScanTaskIntegrationTest 7/7.

Note for maintainers

The two native libraries are committed under tools/yara-x/ (~31.9 MB .so + ~21.5 MB .dll). YARA-X 1.16.0 publishes a prebuilt C API only for Windows MSVC; the Linux .so was built locally (cargo build -p yara-x-capi --release, see tools/yara-x/README.md). If you prefer not to track these binaries in the repo, I'm happy to switch to a build-time download/unpack instead — just let me know your preference.

- New YaraScanTask in the pipeline (between carving and indexing) using
  libyara-x-capi 1.16.0 via JNA (in-process).
- yara/ task subpackage: YaraEngine, YaraScanner, YaraRulesetLoader,
  YaraInstallPaths, YaraMatch, MatchedString, YaraHighlightSupport.
- YaraConfig configurable + conf/YaraConfig.txt catalog/limits.
- yara:* properties (ExtraProperties), UI facet/columns, HTML report.
- --yara-only CLI mode to re-apply the catalog over a processed case
  (SkipCommitedTask + IndexTask.updateDocuments).
- Bundled native (tools/yara-x/), license, ThirdParty + ReleaseNotes,
  CI step verifying the bundled Linux .so.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an in-process YARA-X scanning capability into IPED’s processing pipeline via JNA bindings to libyara-x-capi, including configuration, UI integration (facet/columns + highlight terms), CI verification for the bundled Linux native, and a new --yara-only mode that re-scans an existing case and updates Lucene documents in place.

Changes:

  • Added YARA-X engine wrapper + scanner, catalog discovery, match decoding, and a pipeline task (YaraScanTask) that persists results into yara:* extra-properties.
  • Added --yara-only CLI mode and indexing changes to update existing Lucene docs for committed items.
  • Added packaging/docs/licensing/CI pieces to ship and validate the native runtime and document third-party usage.

Reviewed changes

Copilot reviewed 43 out of 45 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
tools/yara-x/win64/yara_x.h Bundled upstream C header for reference.
tools/yara-x/README.md Documents native layout, version pinning, hashes, and update procedure.
tools/yara-x/LICENSE License file for bundled YARA-X runtime (currently a placeholder in PR).
ThirdParty.txt Adds YARA-X + JNA third-party notices.
ReleaseNotes.txt Adds release entry describing YARA-X integration and --yara-only.
licenses/YARA-X.txt Adds BSD-3-Clause license text for YARA-X.
iped-engine/src/test/java/iped/engine/task/yara/YaraScanTaskIntegrationTest.java End-to-end integration tests for YaraScanTask against real native lib.
iped-engine/src/test/java/iped/engine/task/yara/YaraRulesetLoaderTest.java Unit tests for YARA ruleset discovery.
iped-engine/src/test/java/iped/engine/task/yara/YaraHighlightSupportTest.java Tests for hex-to-facet decoding logic.
iped-engine/src/test/java/iped/engine/task/yara/YaraEngineTest.java Integration-gated tests for engine compilation and scanning.
iped-engine/src/test/java/iped/engine/config/YaraConfigTest.java Unit tests for YaraConfig parsing and defaults.
iped-engine/src/main/java/iped/engine/task/yara/YaraScanTask.java New pipeline task: loads catalog once, scans items, persists yara:* fields, emits metrics.
iped-engine/src/main/java/iped/engine/task/yara/YaraScanner.java Per-worker scanner wrapper and match collection via native callbacks.
iped-engine/src/main/java/iped/engine/task/yara/YaraRulesetLoader.java Recursive discovery of .yar/.yara sources (deterministic ordering).
iped-engine/src/main/java/iped/engine/task/yara/YaraMatch.java Immutable match model (namespace/name/tags/strings).
iped-engine/src/main/java/iped/engine/task/yara/YaraInstallPaths.java Auto-detects release root and bundled native directory.
iped-engine/src/main/java/iped/engine/task/yara/YaraHighlightSupport.java Decodes matched bytes for facet/highlighting (printable ASCII vs hex).
iped-engine/src/main/java/iped/engine/task/yara/YaraEngine.java JNA bindings, native loading strategy, compilation + error parsing.
iped-engine/src/main/java/iped/engine/task/yara/MatchedString.java Represents a matched byte slice (id/offset/hex/truncation).
iped-engine/src/main/java/iped/engine/task/SkipCommitedTask.java Alters committed-item skipping behavior to support --yara-only.
iped-engine/src/main/java/iped/engine/task/index/IndexTask.java Adds updateDocuments path in --yara-only mode.
iped-engine/src/main/java/iped/engine/task/HTMLReportTask.java Adjusts extra-properties handling comments; references a (missing) renderer.
iped-engine/src/main/java/iped/engine/config/YaraConfig.java New task config: rule dirs, size/timeout, scan policy, library hint.
iped-engine/src/main/java/iped/engine/CmdLineArgs.java Adds isYaraOnly() default method (docs currently out of sync).
iped-engine/pom.xml Adds JNA dependency.
iped-app/src/main/java/iped/app/ui/MetadataPanel.java Extends highlight-term selection to yara:match:* fields.
iped-app/src/main/java/iped/app/ui/columns/ColumnsManager.java Adds YARA group and groups yara:* extra attributes.
iped-app/src/main/java/iped/app/processing/Main.java Enforces enableYara=true requirement in --yara-only mode.
iped-app/src/main/java/iped/app/processing/CmdLineArgsImpl.java Adds --yara-only flag parsing + validation; implies --continue.
iped-app/resources/localization/iped-engine-messages.properties Adds YARA task/report message keys (some wording out of sync).
iped-app/resources/localization/iped-engine-messages_pt_BR.properties Adds pt-BR equivalents (some wording out of sync).
iped-app/resources/localization/iped-desktop-messages.properties Adds ColumnsManager.Yara label.
iped-app/resources/localization/iped-desktop-messages_pt_BR.properties Adds pt-BR ColumnsManager.Yara label.
iped-app/resources/localization/iped-desktop-messages_it_IT.properties Adds it-IT ColumnsManager.Yara label.
iped-app/resources/localization/iped-desktop-messages_fr_FR.properties Adds fr-FR ColumnsManager.Yara label.
iped-app/resources/localization/iped-desktop-messages_es_AR.properties Adds es-AR ColumnsManager.Yara label.
iped-app/resources/localization/iped-desktop-messages_de_DE.properties Adds de-DE ColumnsManager.Yara label.
iped-app/resources/config/IPEDConfig.txt Adds enableYara toggle with documentation.
iped-app/resources/config/conf/YaraConfig.txt Adds default YARA config file template.
iped-app/resources/config/conf/TaskInstaller.xml Inserts YaraScanTask into pipeline between carving and indexing.
iped-app/pom.xml Copies tools/yara-x into the release tree during build.
iped-api/src/main/java/iped/properties/ExtraProperties.java Adds yara: constants for tags + per-rule match fields.
.github/workflows/maven.yml CI step to verify bundled Linux .so and run integration-gated tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/yara-x/LICENSE Outdated
Comment on lines +1 to +7
PLACEHOLDER — populated at release-build time with the LICENSE file from
https://github.com/VirusTotal/yara-x (BSD 3-clause).

This file MUST be replaced with the actual upstream YARA-X license text before
the native binaries (`win64/yara_x_capi.dll`, `linux64/libyara_x_capi.so`)
are shipped. See `licenses/YARA-X.txt` for the canonical copy used by IPED's
third-party license aggregation.
Comment thread tools/yara-x/README.md
Comment on lines +14 to +20
├── LICENSE (BSD 3-clause from upstream YARA-X)
├── win64/
│ ├── yara_x_capi.dll (21,542,400 bytes — YARA-X 1.16.0, MSVC x86_64)
│ └── yara_x.h (39,444 bytes — C header, kept for reference)
└── linux64/
└── (empty — see "Linux build" section below)
```
Comment thread tools/yara-x/README.md Outdated
Comment on lines +59 to +67
Diferente do YARA clássico, o upstream do YARA-X **publica binários
self-contained pré-compilados** para Windows e Linux — sem build manual.

1. **Identifique a versão alvo** em https://github.com/VirusTotal/yara-x/releases.
Procure os assets que começam com `libyara-x-capi-vX.Y.Z-...`.

2. **Linux (x86_64)** — **NÃO há prebuilt no release 1.16.0** (o upstream só
publica o asset `yara-x-capi-*-msvc.zip` para Windows; o asset Linux
`yara-x-v1.16.0-x86_64-unknown-linux-gnu.gz` é o CLI `yara-x`, não a C API).
Comment thread ReleaseNotes.txt Outdated
@@ -1,3 +1,7 @@
TBD: IPED-4.4.0
News:
#spec/001-yara-rules-engine: YARA Rules Engine. New `YaraScanTask` in the processing pipeline applies YARA-X 1.16.0 rules (via libyara-x-capi, in-process through JNA) to item content during processing, populating the indexed multi-valued fields `yara:rule` and `yara:tag` plus a structured `yara:matches` JSON field per matched item. Catalog is configured via `conf/YaraConfig.txt` (`ruleDirectories`, `maxFileSizeBytes`, `perItemTimeoutMs`, `scanAllItems`, `matchHexMaxBytes`). Profile-level overrides live at `profiles/<X>/conf/YaraConfig.txt`; `forensic` and `pedo` ship with `enableYara=true` by default (no-op without rules). Module `cuckoo` is banned at runtime via `yrx_compiler_ban_module`; the compiler runs with `YRX_RELAXED_RE_SYNTAX` for compatibility with classic YARA catalogs. The analysis UI exposes matched rules and tags as a dedicated facet (group `ColumnsManager.Yara`) so the analyst can filter and bookmark by rule with the standard flow. The HTML report includes a structured per-item "YARA matches" block with HTML-safe escape (`YaraReportRenderer`). A new CLI flag `--yara-only -o <CASE_OUTPUT_DIR>` re-applies the current catalog to an already processed case without reprocessing the pipeline, updating `yara:*` fields in the existing Lucene index. Native binary `yara_x_capi.dll` ships under `tools/yara-x/win64/`; Linux `libyara_x_capi.so` must be built from source via `cargo build -p yara-x-capi --release` (see `tools/yara-x/README.md`). Specs and contracts in `specs/001-yara-rules-engine/`. New dependency: `net.java.dev.jna:jna:5.7.0` declared in `iped-engine/pom.xml`.
Comment on lines +65 to +75
/**
* Quando {@code true}, o IPED roda apenas o pipeline YARA-X sobre um caso
* já processado (sem ingerir nova evidência), atualizando os campos
* {@code yara:rule}/{@code yara:tag}/{@code yara:matches} no índice
* Lucene existente. Ver {@code specs/001-yara-rules-engine/research.md}
* §R-08 e {@code contracts/cli-yara-only.contract.md}.
*
* <p>Default {@code false} (modo padrão de processamento). O método é
* {@code default} para preservar compatibilidade com implementações
* existentes de {@code CmdLineArgs}.</p>
*/
Comment on lines +36 to +46
/**
* Integration test of the full {@link YaraScanTask} pipeline against the real
* {@code libyara-x-capi}. Loads a small rule catalog from a temp directory,
* compiles via {@link YaraEngine}, and runs the task's {@link YaraScanTask#process}
* on in-memory {@link Item}s to verify that {@code yara:rule}, {@code yara:tag}
* and {@code yara:matches} are populated correctly (FR-001 / FR-003 / FR-004 /
* FR-005 / FR-006 / FR-012).
*
* <p>Skipped via {@link org.junit.Assume} when {@code libyara-x-capi} is not
* loadable in the test environment.</p>
*/
Comment on lines +140 to +142
YaraScanTask.Name=YARA scan
YaraScanTask.Description=Apply YARA-X rules to item content and tag matched items with rule/tag/offset metadata.
YaraScanTask.EngineUnavailable=YARA-X engine (libyara-x-capi) not loadable — YARA scan disabled for this case.
Comment on lines +140 to +142
YaraScanTask.Name=Scan YARA
YaraScanTask.Description=Aplica regras YARA-X ao conteúdo do item e marca os itens casados com regra/tag/offset.
YaraScanTask.EngineUnavailable=Engine YARA-X (libyara-x-capi) não pôde ser carregada — scan YARA desabilitado para este caso.
Comment on lines 276 to +286
if (!parentsWithLostSubitems.remove(trackID)) {
item.setToIgnore(true);
// In --yara-only mode we deliberately do NOT setToIgnore: the item must
// keep flowing through the pipeline so YaraScanTask can re-scan its
// content and IndexTask can issue an updateDocuments for the existing
// trackID. We still tag IS_COMMITTED so IndexTask knows it's the
// update branch (vs. addDocuments for new items).
item.setTempAttribute(IS_COMMITTED, Boolean.TRUE.toString());
return;
if (!args.isYaraOnly()) {
item.setToIgnore(true);
return;
}
}
}

// YARA match rendering moved to iped.engine.task.yara.YaraReportRenderer (testable in isolation).
jhonesbr-jpva and others added 2 commits June 18, 2026 09:36
Functional:
- YaraScanTask: ceil perItemTimeoutMs -> seconds so sub-second timeouts
  (the config accepts >= 100 ms) no longer truncate to 0 (= no timeout).
- YaraScanTask: cap content reads at maxFileSizeBytes+1 instead of an
  unbounded readAllBytes(), so items with unknown/incorrect getLength()
  (notably with scanAllItems=true) can't bypass the size limit and OOM.

Docs/comments:
- Sync comments, Javadoc, i18n, ReleaseNotes and TaskInstaller to the
  rev-5 field model (yara:tag + per-rule yara:match:<namespace>/<name>);
  drop stale yara:rule / yara:matches references (kept only where they
  document the removal).
- Remove the dangling reference to the non-existent YaraReportRenderer.
- Drop references to specs/001-yara-rules-engine/* and "Constitution
  Principle" (spec-kit artifacts not shipped to upstream).
- Replace the tools/yara-x/LICENSE placeholder with the actual YARA-X
  BSD-3-Clause text; fix README (linux64 .so is shipped; remove the
  "prebuilt for both platforms" contradiction).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nf-inc#15)

Per Copilot review (sepinf-inc#15), the --yara-only re-index path has known issues
that are deferred (not fixed) in this PR:
- IndexTask.updateDocuments(trackId) removes only the parent doc; content
  fragments carry fragParentId but not trackId, so stale fragments can
  remain on re-index.
- Leaf items are reassigned a new id on reprocess (SkipCommitedTask only
  restores ids for containers/dirs/roots/split-text items).

Corrected the misleading IndexTask comment that claimed fragments were
deleted, and marked --yara-only as EXPERIMENTAL in the CLI help and
ReleaseNotes, advising a full reprocess for production cases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jhonesbr-jpva

Copy link
Copy Markdown
Author

Thanks for the review. I went through all 16 comments — here is how each was handled (fixes in 964aff9a5, the --yara-only documentation in c48cb6b1d).

Functional fixes

  • Per-item timeout truncated to 0 (YaraScanTask): perItemTimeoutMs / 1000 truncated sub-second values (the config accepts >= 100 ms) to 0 = no timeout. Now uses ceil division so any positive value enforces at least a 1 s native limit.
  • Unbounded read / size-limit bypass (YaraScanTask.process + readItemContent): reads are now capped at maxFileSizeBytes + 1 instead of readAllBytes(). Items whose getLength() is unknown/incorrect (notably with scanAllItems=true) can no longer pull arbitrary content into memory or bypass maxFileSizeBytes; oversize items are detected post-read and skipped.

Documentation / accuracy fixes

The field model was simplified in a late revision to yara:tag + per-rule yara:match:<namespace>/<name> (the aggregated yara:rule list and the yara:matches JSON were dropped). Several comments/docs still referenced the old model; all were synced:

  • CmdLineArgs, YaraMatch, TaskInstaller.xml, YaraScanTaskIntegrationTest, YaraScanner, YaraEngine, YaraHighlightSupport, and the iped-engine-messages (en + pt_BR) task description.
  • Removed the dangling reference to the non-existent YaraReportRenderer in HTMLReportTask (the report now renders the per-rule fields as ordinary multi-valued metadata).
  • ReleaseNotes.txt rewritten to match the shipped model and the bundled natives.
  • tools/yara-x/LICENSE placeholder replaced with the actual YARA-X BSD-3-Clause text.
  • tools/yara-x/README.md: fixed the "linux64 empty" note (the .so is shipped) and the contradiction about prebuilt binaries (Windows has a prebuilt; Linux x86_64 is built from source for 1.16.0).

Deferred (documented), not fixed in this PR

  • Stale fragments on --yara-only re-index (IndexTask.updateDocuments): confirmed real. updateDocuments(trackIdTerm, …) removes only the parent doc — content-fragment docs carry fragParentId but not trackId, so old fragments are left behind. While digging in I also found that --yara-only reassigns a new id to leaf items on reprocess (SkipCommitedTask only restores ids for containers/dirs/roots/split-text items), which is a broader soundness issue for in-place re-indexing. Rather than ship a partial/risky fix to the index path, I corrected the misleading comment, marked --yara-only as EXPERIMENTAL in the CLI help and ReleaseNotes, and documented the limitation (prefer a full reprocess for production cases). Happy to follow up with a proper fix (e.g. carrying the parent trackId on fragment docs, or extending id restoration) in a separate PR.

Intentional behavior (by design)

  • --yara-only lets committed items flow through the whole pipeline (SkipCommitedTask does not setToIgnore): this is deliberate — items must reach YaraScanTask and IndexTask to be re-scanned and updated, and the pipeline has no "run only tasks X/Y" mode. The comment already explains this; it is part of the experimental caveat above.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 43 out of 45 changed files in this pull request and generated 4 comments.

Comment on lines +61 to +82
public List<YaraMatch> scan(byte[] buffer, int length, int timeoutSeconds) {
if (closed || scannerPtr == null || buffer == null || length <= 0) {
return Collections.emptyList();
}
collector.reset(buffer, length);
if (timeoutSeconds > 0) {
YaraEngine.LibYaraX.INSTANCE.yrx_scanner_set_timeout(scannerPtr, (long) timeoutSeconds);
}
Memory native_buf = new Memory(length);
native_buf.write(0, buffer, 0, length);
try {
int rc = YaraEngine.LibYaraX.INSTANCE.yrx_scanner_scan(scannerPtr, native_buf, (long) length);
if (rc != YaraEngine.YRX_SUCCESS && rc != YaraEngine.YRX_SCAN_TIMEOUT) {
logger.debug("yrx_scanner_scan returned {}", rc);
}
return collector.takeMatches();
} finally {
// Release the reference to the Java buffer; the native callback does NOT
// retain pointers after yrx_scanner_scan returns.
collector.clearBuffer();
}
}
Comment on lines +331 to +336
synchronized (finished) {
if (!finished.get()) {
if (sharedEngine != null) {
sharedEngine.close();
sharedEngine = null;
}
Comment on lines +153 to +167
if (isCommitted && yaraOnly) {
// --yara-only re-index (EXPERIMENTAL): refresh an already-committed item's
// yara:* fields. updateDocuments(Term, Iterable) deletes the docs matching
// the trackId and atomically adds the new block.
//
// KNOWN LIMITATION: only the parent (metadata) doc carries trackId; the
// content-fragment docs carry fragParentId but NOT trackId, so they are
// NOT removed here -- re-running --yara-only can leave stale content
// fragments behind. Leaf items are also re-assigned a new id on reprocess
// (SkipCommitedTask only restores ids for containers/dirs/roots/split-text
// items), which can change the parent id. Treat --yara-only as experimental
// until the re-index path is hardened.
Term trackIdTerm = new Term(IndexItem.TRACK_ID, Util.getTrackID(evidence));
worker.writer.updateDocuments(trackIdTerm, new DocumentsIterable(evidence, fragReader));
} else {
Comment on lines +182 to +193
if (cmdLineParams.isYaraOnly()) {
// --yara-only goes through the normal Manager flow (DataSourceReader →
// pipeline → IndexTask). The CLI parser already enforced that -d is
// present and the case folder exists; isContinue() now also returns
// true for yara-only mode so SkipCommitedTask loads the committed
// trackIDs, and IndexTask switches to updateDocuments for those items.
YaraConfig yaraConfig = ConfigurationManager.get().findObject(YaraConfig.class);
if (yaraConfig == null || !yaraConfig.isEnabled()) {
throw new IPEDException(
"--yara-only requires enableYara=true in IPEDConfig.txt (or in the chosen -profile). "
+ "Otherwise YaraScanTask would not run and updateDocuments would wipe the existing yara:* fields.");
}
Addresses Copilot review: the existing guard only checked enableYara=true,
but that alone does not guarantee YaraScanTask runs. If the native lib is
unavailable, ruleDirectories is empty, or no .yar/.yara files are found,
the task stays disabled and IndexTask's updateDocuments path would re-index
the items WITHOUT yara:* attributes, silently wiping the previously stored
yara:* fields across the case.

Main.startManager() now aborts --yara-only (before any processing) unless
ruleDirectories is non-empty, at least one rule file is discovered, and
libyara-x-capi loads, reusing YaraRulesetLoader.discover and
YaraEngine.ensureAvailable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jhonesbr-jpva

Copy link
Copy Markdown
Author

Thanks for the second pass. Notes on the three new comments (latest commit 92663c9ed):

Addressed

  • --yara-only can wipe yara:* fields when the engine doesn't run (Main/IndexTask): valid and important. The previous guard only checked enableYara=true, which doesn't guarantee YaraScanTask actually runs (native lib unavailable, empty ruleDirectories, or no .yar/.yara files → task disabled → updateDocuments re-indexes items with no yara:* attributes and strips the stored fields). Main.startManager() now fails fast before any processing unless ruleDirectories is non-empty, at least one rule file is discovered, and libyara-x-capi loads (reusing YaraRulesetLoader.discover / YaraEngine.ensureAvailable). The stale-fragment half of this comment is the same known --yara-only re-index limitation already documented and tracked as deferred.

Not a defect in this codebase (explained)

  • YaraScanTask.finish() destroying the shared engine while other workers scan: in IPED the shared engine is only destroyed after Manager.monitorProcessing() has returned, and that method only returns once every worker reports isWaiting() (queues drained, no item in flight). Manager.finishProcessing() then calls Worker.finish() sequentially, but at that point no worker is inside YaraScanner.scan(), so destroying YRX_RULES on the first finish() is safe. This is the same one-shot synchronized close pattern used by the existing HashDBLookupTask (shared SQLite DB). I kept it consistent rather than adding a ref-count that the lifecycle doesn't require.

Low priority (acknowledged)

  • Empty-buffer scan (length <= 0): YARA-X can scan a zero-length buffer and rules like condition: true could in principle match, but JNA new Memory(0) throws, so supporting it needs a native special-case for negligible forensic value (matching empty/zero-length items). Left as a deliberate skip for now; happy to revisit if there's a real use case.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 43 out of 45 changed files in this pull request and generated 5 comments.

Comment on lines +118 to +123
if (hint != null && !hint.isEmpty()) {
File hintFile = new File(hint);
if (hintFile.exists()) {
NativeLibrary.addSearchPath("yara_x_capi", hintFile.getParentFile().getAbsolutePath());
}
}
Comment on lines +157 to +160
/** Idempotent shutdown (no-op for YARA-X — classic libyara required {@code yr_finalize}). */
public static synchronized void shutdown() {
libraryAvailable = false;
}
Comment on lines +262 to +263
long cap = Math.min(maxBytes + 1, (long) MAX_ARRAY_LENGTH);
return in.readNBytes((int) cap);
Comment on lines +205 to +212
long offset = match.offset;
long length = match.length;
if (offset < 0 || length <= 0) {
return;
}
boolean truncated = length > matchHexMaxBytes;
String hex = extractHex(offset, length);
out.add(new MatchedString(id, offset, hex, truncated));
Comment on lines +44 to +49
String printable = decodePrintable(hex);
if (printable != null) {
return printable;
}
return hex.toLowerCase();
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants