Skip to content
This repository was archived by the owner on Mar 30, 2026. It is now read-only.

Releases: JarbasAl/ModelZoo

0.2.0a2

0.2.0a2 Pre-release
Pre-release

Choose a tag to compare

@JarbasAl JarbasAl released this 01 Sep 14:45
new model batch

0.2a1

0.2a1 Pre-release
Pre-release

Choose a tag to compare

@JarbasAl JarbasAl released this 09 Jun 07:29

Models

training scripts can be found in the train folder

NER

model_id language dataset accuracy
nltk_clftagger_conll2003_NER en CONLL2003 0.874%
nltk_clftagger_gmb_NER en GMB 2.2.0 0%
nltk_clftagger_slsmovies_NER en MIT Movie Corpus 0%
nltk_clftagger_slstrivia10k13_NER en MIT Movie Corpus - Trivia 0.806%
nltk_clftagger_slsrestaurants_NER en MIT Restaurant Corpus 0%
nltk_clftagger_onto5_NER en OntoNotes-5.0-NER-BIO 0.910%
nltk_clftagger_paramopama_NER pt Paramopama 0%
nltk_clftagger_paramopama+harem_NER pt Paramopama + HAREM (v2) 0%
nltk_clftagger_WNUT17_NER en WNUT17 0%
nltk_clftagger_leNERbr_NER pt-br leNER-Br 0%

CHUNKING

model_id language dataset tagset accuracy
nltk_conll2000_postag_ngram_chunk_tagger en CONLL2000 0%
nltk_conll2000_clf_chunk_tagger en CONLL2000 0%

POSTAG

model_id language dataset tagset accuracy
nltk_floresta_macmorpho_brill_tagger pt floresta + macmorpho universal 0%
nltk_brown_brill_tagger en brown brown 0.941%
nltk_brown_maxent_tagger en brown brown 0%
nltk_brown_ngram_tagger en brown brown 0.930%
nltk_floresta_brill_tagger pt floresta VISL (Portuguese) 0.938%
nltk_floresta_ngram_tagger pt floresta VISL (Portuguese) 0.925%
nltk_cess_cat_udep_brill_tagger ca cess_cat_udep Universal Dependencies 0.974%
nltk_cess_esp_udep_brill_tagger es cess_esp_udep Universal Dependencies 0.975%
nltk_macmorpho_unvtagset_brill_tagger pt macmorpho Universal Dependencies 0.966%
nltk_onto5_brill_tagger en OntoNotes-5.0-NER-BIO Penn Treebank 0%
nltk_treebank_clftagger en treebank Penn Treebank 0%
nltk_treebank_brill_tagger en treebank Penn Treebank 0%
nltk_treebank_ngram_tagger en treebank Penn Treebank 0%
nltk_treebank_maxent_tagger en treebank Penn Treebank 0%
nltk_treebank_tnt_tagger en treebank Penn Treebank 0%
nltk_nilc_brill_tagger pt-br NILC_taggers NILC 0.881%
nltk_nilc_ngram_tagger pt-br NILC_taggers NILC 0.869%
nltk_cess_cat_brill_tagger ca cess_cat EAGLES 0.939%
nltk_cess_esp_brill_tagger es cess_esp EAGLES 0.926%
nltk_macmorpho_brill_tagger pt macmorpho 0%

0.1 - Iberian Brill

Choose a tag to compare

@JarbasAl JarbasAl released this 31 May 20:14

NLTK

Postag

nltk does not come with pre-trained pos taggers for most languages

Portuguese

brill tagger trained on floresta and mac_morpho corpus

import pickle
from nltk import word_tokenize

with open("brill_tagger_floresta_mcmorpho_pt.pkl", "rb") as f:
    tagger = pickle.load(f)
    
tokens = word_tokenize("Olá, o meu nome é Joaquim")
postagged = tagger.tag(tokens)
# [('Olá', 'NOUN'), (',', '.'), ('o', 'DET'), ('meu', 'PRON'), ('nome', 'NOUN'), ('é', 'VERB'), ('Joaquim', 'NOUN')]

Spanish

brill tagger trained on cess_esp corpus

import pickle
from nltk import word_tokenize

with open("brill_tagger_cess_es.pkl", "rb") as f:
    tagger = pickle.load(f)
    
tokens = word_tokenize("Hola, mi nombre es Daniel")
postagged = tagger.tag(tokens)
# [('Hola', 'NOUN'), (',', 'fc'), ('mi', 'DET'), ('nombre', 'NOUN'), ('es', 'VERB'), ('Daniel', 'NOUN')]

Catalan

brill tagger trained on cess_cat corpus

import pickle
from nltk import word_tokenize

with open("brill_tagger_cess_ca.pkl", "rb") as f:
    tagger = pickle.load(f)
    
tokens = word_tokenize("Quién es el presidente de Cataluña?")
postagged = tagger.tag(tokens)
# [('Quién', 'NOUN'), ('es', 'PRON'), ('el', 'DET'), ('presidente', 'NOUN'), ('de', 'ADP'), ('Cataluña', 'NOUN'), ('?', 'fit')]