A sophisticated search system for PDF documents featuring intelligent ranking, autocomplete, and boolean operations.
- Smart Search - TF-IDF ranking with contextual scoring
- Boolean Operators - AND, OR, NOT support with nested queries
- Phrase Search - Exact phrase matching with
"quoted terms" - Autocomplete - Wildcard
*completion with popularity ranking - Page Linking - Cross-reference detection ("see page X")
- PDF Export - Results automatically saved to
rezultati.pdf - Did You Mean? - Suggestions for typos and low-result queries
# Index your PDF
python searching_util.py # Creates data.pkl
# Start searching
python search.pyword # Simple search
"exact phrase" # Phrase search
word1 AND word2 # Both terms
word1 OR word2 # Either term
word1 NOT word2 # Exclude term
auto* # Autocomplete
(word1 OR word2) AND word3 # Complex boolean (infix notation)
- Trie - Efficient word storage and prefix matching
- TF-IDF - Document relevance scoring
- PageRank-style - Cross-reference boosting
- Context Window - Surrounding word analysis for better ranking
PyPDF2, reportlab, sty
Serbian UI Β· PDF exports Β· Cross-reference aware