Data Science MSc student at FER (Zagreb). I build ML/NLP systems that are reproducible, measurable, and useful — from data and training to evaluation and clean, maintainable code.
- Spaniverse — benchmark & tooling for evaluating LLMs on span-level NLP tasks (e.g., NER, entity linking). I’m currently designing the knowledge base workflow: Wikidata/Wikipedia builders → validated intermediate format → disk-backed KB, plus evaluation utilities around it.
- Carefree — student–psychologist matching platform with an LLM-based assistant. I’m exploring how LLMs should be used in socially sensitive domains, with safety, privacy, and human-in-the-loop usage in mind.
- LLM evaluation & reproducibility (data, metrics, baselines, ablations).
- Interpretability & robustness in high-stakes settings.
- Practical ML systems engineering: data → models → deployment-ready structure.
Also used when needed: TypeScript/JavaScript, D3.js, BigQuery, dbt, Great Expectations, CI/CD, Kubernetes (basics).

