[WIP] fix benchmark pipeline | add questions to test various capabilities by sureshkumarsrinath · Pull Request #94 · georgia-tech-db/TokenSmith

sureshkumarsrinath · 2026-03-06T09:10:03Z

📊 Proposed Questions

The following table outlines the IDs, categories, and pedagogical rationales for each question.

ID	Type	Question	Rationale
`buffer_recovery_steal`	Multi-hop	Explain how the 'steal' policy in buffer management necessitates the 'undo' phase in a recovery algorithm.	Connects Buffer Management and Transaction Recovery.
`norm_performance_tradeoff`	Multi-hop	Evaluate the trade-offs between BCNF and 3NF decomposition regarding both update consistency and query performance.	Connects Normalization Theory with Implementation Performance.
`snapshot_isolation_skew`	Complex Reasoning	Describe the 'write-skew' anomaly that can occur under Snapshot Isolation but is prevented by Serializability.	Tests understanding of weak isolation models.
`indexing_range_scan`	Multi-hop	Analyze why a B+ tree index is generally preferred over a hash index for range queries (e.g., salary between 50k and 100k).	Connects Data Structures with Physical Storage access costs.
`slotted_page_locking`	Connection	How does the slotted page structure for record storage enable fine-grained row-level locking?	Connects Storage Structures with Concurrency Control.
`aries_analysis_phase`	Multi-hop	How does the ARIES 'Analysis' pass use the Dirty Page Table to determine the starting point for the 'Redo' pass?	Tests deep understanding of the ARIES recovery algorithm.
`mars_capital_idk`	"I don't know"	What is the capital of the planet Mars according to the database textbook?	Tests handling of out-of-scope/hallucinated facts.
`future_db_idk`	"I don't know"	Which database engine was released in the year 2029 according to the history section?	Tests temporal boundaries and scope.

shahmeer99 · 2026-03-13T09:21:00Z

@sureshkumarsrinath I am having some issues verifying these changes manually. Off the bat, the questions in your PR comment look good. But I have added some specific comments about the changes you made in the benchmarks.yaml and I am wondering if its because we are using diff books, diff extractors, or different configs or something.

For example:

The PDF of the book I am using has 1,373 pages. And I am using the current config that is default in main.
For question id "acid_properties" you added chunk 1231 in the ideal_chunks however for me that chunk is as below. It seems to be completely irrelevant to the question. Can you confirm what that chunk is for you? I got the chunk using the code below:

f = pd.read_pickle("./index/sections/full_book/textbook_index_chunks.pkl")
f[1231]

Chunk 1231 (using the above code):

"'. Hibernate and other object-relational mapping systems therefore perform the version number checks transparently as part of commit processing. (Transactions that involve user interaction are called conversations in Hibernate to differentiate them from regular transactions; validation using version numbers is particularly useful for such transactions.) Application developers must, however, be aware of the potential for nonserializable execution, and they must restrict their usage of the scheme to applications where non-serializability does not cause serious problems.'"

How many chunks do you have in total in the textbook_index_chunks.pkl file? For me its 1,825.

fix benchmark pipeline | add questions to test various capabilities

4244831

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] fix benchmark pipeline | add questions to test various capabilities#94

[WIP] fix benchmark pipeline | add questions to test various capabilities#94
sureshkumarsrinath wants to merge 1 commit into
mainfrom
benchmark/fix-pipeline-questions

sureshkumarsrinath commented Mar 6, 2026 •

edited

Loading

Uh oh!

shahmeer99 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sureshkumarsrinath commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Proposed Questions

Uh oh!

shahmeer99 commented Mar 13, 2026

Chunk 1231 (using the above code):

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sureshkumarsrinath commented Mar 6, 2026 •

edited

Loading