Skip to content

[WIP] fix benchmark pipeline | add questions to test various capabilities#94

Open
sureshkumarsrinath wants to merge 1 commit into
mainfrom
benchmark/fix-pipeline-questions
Open

[WIP] fix benchmark pipeline | add questions to test various capabilities#94
sureshkumarsrinath wants to merge 1 commit into
mainfrom
benchmark/fix-pipeline-questions

Conversation

@sureshkumarsrinath
Copy link
Copy Markdown
Contributor

@sureshkumarsrinath sureshkumarsrinath commented Mar 6, 2026

📊 Proposed Questions

The following table outlines the IDs, categories, and pedagogical rationales for each question.

ID Type Question Rationale
buffer_recovery_steal Multi-hop Explain how the 'steal' policy in buffer management necessitates the 'undo' phase in a recovery algorithm. Connects Buffer Management and Transaction Recovery.
norm_performance_tradeoff Multi-hop Evaluate the trade-offs between BCNF and 3NF decomposition regarding both update consistency and query performance. Connects Normalization Theory with Implementation Performance.
snapshot_isolation_skew Complex Reasoning Describe the 'write-skew' anomaly that can occur under Snapshot Isolation but is prevented by Serializability. Tests understanding of weak isolation models.
indexing_range_scan Multi-hop Analyze why a B+ tree index is generally preferred over a hash index for range queries (e.g., salary between 50k and 100k). Connects Data Structures with Physical Storage access costs.
slotted_page_locking Connection How does the slotted page structure for record storage enable fine-grained row-level locking? Connects Storage Structures with Concurrency Control.
aries_analysis_phase Multi-hop How does the ARIES 'Analysis' pass use the Dirty Page Table to determine the starting point for the 'Redo' pass? Tests deep understanding of the ARIES recovery algorithm.
mars_capital_idk "I don't know" What is the capital of the planet Mars according to the database textbook? Tests handling of out-of-scope/hallucinated facts.
future_db_idk "I don't know" Which database engine was released in the year 2029 according to the history section? Tests temporal boundaries and scope.

@shahmeer99
Copy link
Copy Markdown
Contributor

@sureshkumarsrinath I am having some issues verifying these changes manually. Off the bat, the questions in your PR comment look good. But I have added some specific comments about the changes you made in the benchmarks.yaml and I am wondering if its because we are using diff books, diff extractors, or different configs or something.

For example:

  1. The PDF of the book I am using has 1,373 pages. And I am using the current config that is default in main.
  2. For question id "acid_properties" you added chunk 1231 in the ideal_chunks however for me that chunk is as below. It seems to be completely irrelevant to the question. Can you confirm what that chunk is for you? I got the chunk using the code below:
f = pd.read_pickle("./index/sections/full_book/textbook_index_chunks.pkl")
f[1231]

Chunk 1231 (using the above code):

"'. Hibernate and other object-relational mapping systems therefore perform the version number checks transparently as part of commit processing. (Transactions that involve user interaction are called conversations in Hibernate to differentiate them from regular transactions; validation using version numbers is particularly useful for such transactions.) Application developers must, however, be aware of the potential for nonserializable execution, and they must restrict their usage of the scheme to applications where non-serializability does not cause serious problems.'"

How many chunks do you have in total in the textbook_index_chunks.pkl file? For me its 1,825.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants