Question about some excluded LongMemEval questions

thanks for releasing the code and evaluation configs for OBLIVION. i have a question about the number of the questions of longmemeval

in [README.md](https://github.com/nec-research/oblivion/blob/main/experiments/longmemeval_benchmark/config/README.md), it says full_eval_20260202/ -- Full-scale evaluation configs (488 samples, various strategies), but in the official repo the cleaned benchmark files are described as containing 500 evaluation instances

i also noticed that [these codes](https://github.com/nec-research/oblivion/blob/main/experiments/longmemeval_benchmark/runner/query_pipeline.py#L507) exclude some questions in blacklist, but this list  seemed not provided in this repo, so could you clarify whether the LongMemEval results in the paper were computed on the official 500-instance cleaned split, or on a filtered 488-instance subset? and if a filtered 488-instance subset was used, i wonder if it's common?

i may have misunderstood the setup, so please correct me if I missed something...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about some excluded LongMemEval questions #1

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Question about some excluded LongMemEval questions #1

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions