Skip to content

R0mb0/Morph-it_Lexicon_Processor_Exporter

📚 Morph-it Lexicon Processor 🧠

Codacy Badge pages-build-deployment Maintenance Open Source Love svg3 MIT Donate

A lightning-fast, client-side web application designed to parse, filter, and export the massive Morph-it Italian lexicon dataset. Upload the raw .txt file, explore complex linguistic categories through an intuitive collapsible tree interface, and instantly download your custom dictionary in JSON, CSV, or TXT format. Everything runs locally in your browser for maximum privacy and performance! 🚀

01.png 02.png


🚀 Features

  • 100% Local Processing: Capable of parsing over 500,000 lines of text instantly in your browser. Zero server uploads, zero latency, and absolute data privacy.
  • Recursive Tree UI: Intelligently groups 600+ complex morphological tags (like VER:ind+pres+1+p) into a clean, collapsible directory-style tree using native HTML <details> tags.
  • Smart Selection: Prevent misclicks! Select or deselect entire branches (e.g., all Adjectives or Verbs) with a single click using the dedicated branch controls.
  • Multi-Format Export: Tailor your output to your needs. Export a key-value JSON for software development, a CSV for data science and databases, or a pure TXT wordlist for Scrabble and anagram games.
  • Completely Offline Ready: Powered by a local Tailwind CSS script. You can run the entire application perfectly without an active internet connection.
  • Bilingual & Dark Mode: Automatically adapts the interface to English or Italian, and perfectly aligns with your system's Light or Dark theme preferences.

🛠️ How it works

  1. File Parsing: The HTML5 FileReader API reads the tab-separated Morph-it document line by line, ensuring only unique words are kept.
  2. Tree Construction: The algorithm splits the category tags using : and + delimiters to dynamically generate a nested JavaScript object, mapping the macroscopic categories down to the most granular leaf nodes.
  3. Asynchronous UI: Utilizes requestAnimationFrame and JavaScript Promises to render elegant loading overlays, preventing the browser's main thread from freezing during heavy array processing.
  4. Blob Generation: Upon export, the app filters the massive array in milliseconds and generates a downloadable Blob URL completely on the fly.

🏆 What makes it special?

  • Highly Optimized: Handling half a million DOM nodes can crash a browser. By using collapsible sections and smart asynchronous yielding, the app remains responsive and smooth.

💡 Why use this project?

  • Game Development: Easily extract a clean TXT list of valid Italian words (excluding proper nouns or specific verb tenses) to power your next crossword, Scrabble clone, or Wordle-like game.
  • NLP & Data Science: Quickly generate tailored CSV datasets for machine learning models or linguistic research without touching a command line.

⚡ Getting Started

Online

Simply open the Live Demo link on any browser, drop your .txt file, and start filtering!

Local Installation

Want to run it locally or use it entirely offline?

  1. Clone this repository or download the source code ZIP.
  2. Make sure the tailwind.js file is located in your src folder as referenced in the index.html.
  3. Double-click index.html to open it in your browser. No local server required!

📄 Get a sample dataset

https://docs.sslmit.unibo.it/doku.php?id=resources:morph-it


Crafted with AI

About

A fast, client-side web app to parse, filter, and export the Morph-it! Italian lexicon dataset. Upload the .txt file, select specific linguistic categories, and instantly download a custom JSON or CSV dictionary. Everything runs locally in your browser for maximum privacy and performance.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors

Languages