Queries with OR/Contains fall back to BSON scan and return cross-collection phantom objects in single-file mode

### Package version

4.3.1

### Affected package

BLite (client SDK)

### .NET version

10.0

### Description

Hello! @mrdevrobot While using BLite with multiple collections, I encountered a critical edge case where queries return duplicate/wrong-typed entities from other collections.

When performing a query using `||` (OR) or `.Contains()` on an indexed field in the default embedded mode (single-file), the query engine bypasses the B-Tree index, falls back to a physical page scan, and mistakenly deserializes documents from other collections into "phantom objects".

<img width="1278" height="951" alt="Image" src="https://github.com/user-attachments/assets/27b4a1f3-f48b-4431-9266-54d1ef093036" />

### Minimal reproduction

Here is a minimal reproducible example using BLite v4.3.1.
*Note: Both `PhotoPo` and `PhotoMetadataPo` share the `Id` and `SourceId` properties.*

*[[MRE](http://github.com/LeoYang06/BLiteTestCases/tree/master/QueriesWithORContainsReturnPhantomObjectsIssue)](http://github.com/LeoYang06/BLiteTestCases/tree/master/QueriesWithORContainsReturnPhantomObjectsIssue)*

### Expected behavior

`Where(x => idList.Contains(x.SourceId)).Count()` should return `710`.

### Actual behavior

It returns `1420`. Half of the results are `PhotoMetadataPo` documents silently deserialized as `PhotoPo` (with `RelativePath` defaulting to `null`/empty string).

### Additional context

**Root Cause Analysis (Based on source code):**

Based on tracing the execution path in the source code, this issue appears to be caused by a combination of AST fallback and shared physical pages:

1. **AST Fallback (Missing `OrElse`/`Contains`)**:
   In `IndexOptimizer.OptimizeExpression`, there is no handling for `ExpressionType.OrElse` or `MethodCallExpression` (for `Contains`). This forces the optimizer to return `null`, bypassing the isolated B-Tree index scan (Strategy 1 in `FetchAsync`).

2. **Shared Pages across Collections**:
   In embedded mode (`_collectionFiles == null`), `_storage.GetCollectionPageIds(_collectionName)` yields **all pages** from the single DB file, meaning physical pages contain mixed slots from both `Photos` and `PhotoMetadata`.

3. **BSON Predicate matching foreign collections**:
   The engine falls back to `ScanAsync(BsonReaderPredicate predicate, ...)`. The compiled BSON predicate matches fields purely by name. Since `PhotoMetadataPo` also has a `SourceId` field, the predicate returns `true` for foreign documents.

4. **Silent Deserialization Bypass**:
   The `catch` block in `ScanAsync` assumes that cross-collection deserialization will throw an exception (`// foreign-collection document — skip silently`). However, because `PhotoMetadataPo` shares the `Id` and `SourceId` fields with `PhotoPo`, the deserializer succeeds without throwing. The missing `RelativePath` property is simply assigned its default value. The bare `catch` is bypassed, and the phantom object is yielded.

**Proposed Suggestions:**

* **Immediate fix via AST**: I noticed that the execution path for `IndexQueryPlan.PlanKind.IndexIn` is already implemented in `DocumentCollection.ScanAsync`. Adding support for `OrElse` / `Contains` in `IndexOptimizer` to map to `IndexIn` would immediately fix this specific query, as the B-Tree index is inherently collection-isolated.
* **Safer Fallback Scans**: To prevent cross-collection data leaks during `ScanAsync` or `FindAllAsync` fallbacks, perhaps the `SlottedPageHeader`/`SlotEntry` could embed a collection identifier, or the BSON payload could strictly tag the `CollectionName` for validation before deserialization. Relying on the deserializer to throw exceptions might be too permissive when types share a subset of properties.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queries with OR/Contains fall back to BSON scan and return cross-collection phantom objects in single-file mode #65

Package version

Affected package

.NET version

Description

Minimal reproduction

Expected behavior

Actual behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Queries with OR/Contains fall back to BSON scan and return cross-collection phantom objects in single-file mode #65

Description

Package version

Affected package

.NET version

Description

Minimal reproduction

Expected behavior

Actual behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions