This repository was archived by the owner on Mar 30, 2026. It is now read-only.
Releases: JarbasAl/ModelZoo
Releases · JarbasAl/ModelZoo
Release list
0.2.0a2
0.2a1
Models
training scripts can be found in the train folder
NER
| model_id | language | dataset | accuracy |
|---|---|---|---|
| nltk_clftagger_conll2003_NER | en | CONLL2003 | 0.874% |
| nltk_clftagger_gmb_NER | en | GMB 2.2.0 | 0% |
| nltk_clftagger_slsmovies_NER | en | MIT Movie Corpus | 0% |
| nltk_clftagger_slstrivia10k13_NER | en | MIT Movie Corpus - Trivia | 0.806% |
| nltk_clftagger_slsrestaurants_NER | en | MIT Restaurant Corpus | 0% |
| nltk_clftagger_onto5_NER | en | OntoNotes-5.0-NER-BIO | 0.910% |
| nltk_clftagger_paramopama_NER | pt | Paramopama | 0% |
| nltk_clftagger_paramopama+harem_NER | pt | Paramopama + HAREM (v2) | 0% |
| nltk_clftagger_WNUT17_NER | en | WNUT17 | 0% |
| nltk_clftagger_leNERbr_NER | pt-br | leNER-Br | 0% |
CHUNKING
| model_id | language | dataset | tagset | accuracy |
|---|---|---|---|---|
| nltk_conll2000_postag_ngram_chunk_tagger | en | CONLL2000 | 0% | |
| nltk_conll2000_clf_chunk_tagger | en | CONLL2000 | 0% |
POSTAG
| model_id | language | dataset | tagset | accuracy |
|---|---|---|---|---|
| nltk_floresta_macmorpho_brill_tagger | pt | floresta + macmorpho | universal | 0% |
| nltk_brown_brill_tagger | en | brown | brown | 0.941% |
| nltk_brown_maxent_tagger | en | brown | brown | 0% |
| nltk_brown_ngram_tagger | en | brown | brown | 0.930% |
| nltk_floresta_brill_tagger | pt | floresta | VISL (Portuguese) | 0.938% |
| nltk_floresta_ngram_tagger | pt | floresta | VISL (Portuguese) | 0.925% |
| nltk_cess_cat_udep_brill_tagger | ca | cess_cat_udep | Universal Dependencies | 0.974% |
| nltk_cess_esp_udep_brill_tagger | es | cess_esp_udep | Universal Dependencies | 0.975% |
| nltk_macmorpho_unvtagset_brill_tagger | pt | macmorpho | Universal Dependencies | 0.966% |
| nltk_onto5_brill_tagger | en | OntoNotes-5.0-NER-BIO | Penn Treebank | 0% |
| nltk_treebank_clftagger | en | treebank | Penn Treebank | 0% |
| nltk_treebank_brill_tagger | en | treebank | Penn Treebank | 0% |
| nltk_treebank_ngram_tagger | en | treebank | Penn Treebank | 0% |
| nltk_treebank_maxent_tagger | en | treebank | Penn Treebank | 0% |
| nltk_treebank_tnt_tagger | en | treebank | Penn Treebank | 0% |
| nltk_nilc_brill_tagger | pt-br | NILC_taggers | NILC | 0.881% |
| nltk_nilc_ngram_tagger | pt-br | NILC_taggers | NILC | 0.869% |
| nltk_cess_cat_brill_tagger | ca | cess_cat | EAGLES | 0.939% |
| nltk_cess_esp_brill_tagger | es | cess_esp | EAGLES | 0.926% |
| nltk_macmorpho_brill_tagger | pt | macmorpho | 0% |
0.1 - Iberian Brill
NLTK
Postag
nltk does not come with pre-trained pos taggers for most languages
Portuguese
brill tagger trained on floresta and mac_morpho corpus
import pickle
from nltk import word_tokenize
with open("brill_tagger_floresta_mcmorpho_pt.pkl", "rb") as f:
tagger = pickle.load(f)
tokens = word_tokenize("Olá, o meu nome é Joaquim")
postagged = tagger.tag(tokens)
# [('Olá', 'NOUN'), (',', '.'), ('o', 'DET'), ('meu', 'PRON'), ('nome', 'NOUN'), ('é', 'VERB'), ('Joaquim', 'NOUN')]Spanish
brill tagger trained on cess_esp corpus
import pickle
from nltk import word_tokenize
with open("brill_tagger_cess_es.pkl", "rb") as f:
tagger = pickle.load(f)
tokens = word_tokenize("Hola, mi nombre es Daniel")
postagged = tagger.tag(tokens)
# [('Hola', 'NOUN'), (',', 'fc'), ('mi', 'DET'), ('nombre', 'NOUN'), ('es', 'VERB'), ('Daniel', 'NOUN')]Catalan
brill tagger trained on cess_cat corpus
import pickle
from nltk import word_tokenize
with open("brill_tagger_cess_ca.pkl", "rb") as f:
tagger = pickle.load(f)
tokens = word_tokenize("Quién es el presidente de Cataluña?")
postagged = tagger.tag(tokens)
# [('Quién', 'NOUN'), ('es', 'PRON'), ('el', 'DET'), ('presidente', 'NOUN'), ('de', 'ADP'), ('Cataluña', 'NOUN'), ('?', 'fit')]