Models

training scripts can be found in the train folder

NER

model_id	language	dataset	accuracy
nltk_clftagger_conll2003_NER	en	CONLL2003	0.874%
nltk_clftagger_gmb_NER	en	GMB 2.2.0	0%
nltk_clftagger_slsmovies_NER	en	MIT Movie Corpus	0%
nltk_clftagger_slstrivia10k13_NER	en	MIT Movie Corpus - Trivia	0.806%
nltk_clftagger_slsrestaurants_NER	en	MIT Restaurant Corpus	0%
nltk_clftagger_onto5_NER	en	OntoNotes-5.0-NER-BIO	0.910%
nltk_clftagger_paramopama_NER	pt	Paramopama	0%
nltk_clftagger_paramopama+harem_NER	pt	Paramopama + HAREM (v2)	0%
nltk_clftagger_WNUT17_NER	en	WNUT17	0%
nltk_clftagger_leNERbr_NER	pt-br	leNER-Br	0%

CHUNKING

model_id	language	dataset	tagset	accuracy
nltk_conll2000_postag_ngram_chunk_tagger	en	CONLL2000		0%
nltk_conll2000_clf_chunk_tagger	en	CONLL2000		0%

POSTAG

model_id	language	dataset	tagset	accuracy
nltk_floresta_macmorpho_brill_tagger	pt	floresta + macmorpho	universal	0%
nltk_brown_brill_tagger	en	brown	brown	0.941%
nltk_brown_maxent_tagger	en	brown	brown	0%
nltk_brown_ngram_tagger	en	brown	brown	0.930%
nltk_floresta_brill_tagger	pt	floresta	VISL (Portuguese)	0.938%
nltk_floresta_ngram_tagger	pt	floresta	VISL (Portuguese)	0.925%
nltk_cess_cat_udep_brill_tagger	ca	cess_cat_udep	Universal Dependencies	0.974%
nltk_cess_esp_udep_brill_tagger	es	cess_esp_udep	Universal Dependencies	0.975%
nltk_macmorpho_unvtagset_brill_tagger	pt	macmorpho	Universal Dependencies	0.966%
nltk_onto5_brill_tagger	en	OntoNotes-5.0-NER-BIO	Penn Treebank	0%
nltk_treebank_clftagger	en	treebank	Penn Treebank	0%
nltk_treebank_brill_tagger	en	treebank	Penn Treebank	0%
nltk_treebank_ngram_tagger	en	treebank	Penn Treebank	0%
nltk_treebank_maxent_tagger	en	treebank	Penn Treebank	0%
nltk_treebank_tnt_tagger	en	treebank	Penn Treebank	0%
nltk_nilc_brill_tagger	pt-br	NILC_taggers	NILC	0.881%
nltk_nilc_ngram_tagger	pt-br	NILC_taggers	NILC	0.869%
nltk_cess_cat_brill_tagger	ca	cess_cat	EAGLES	0.939%
nltk_cess_esp_brill_tagger	es	cess_esp	EAGLES	0.926%
nltk_macmorpho_brill_tagger	pt	macmorpho		0%

NLTK

Postag

nltk does not come with pre-trained pos taggers for most languages

Portuguese

brill tagger trained on floresta and mac_morpho corpus

import pickle
from nltk import word_tokenize

with open("brill_tagger_floresta_mcmorpho_pt.pkl", "rb") as f:
    tagger = pickle.load(f)
    
tokens = word_tokenize("Olá, o meu nome é Joaquim")
postagged = tagger.tag(tokens)
# [('Olá', 'NOUN'), (',', '.'), ('o', 'DET'), ('meu', 'PRON'), ('nome', 'NOUN'), ('é', 'VERB'), ('Joaquim', 'NOUN')]

Spanish

brill tagger trained on cess_esp corpus

import pickle
from nltk import word_tokenize

with open("brill_tagger_cess_es.pkl", "rb") as f:
    tagger = pickle.load(f)
    
tokens = word_tokenize("Hola, mi nombre es Daniel")
postagged = tagger.tag(tokens)
# [('Hola', 'NOUN'), (',', 'fc'), ('mi', 'DET'), ('nombre', 'NOUN'), ('es', 'VERB'), ('Daniel', 'NOUN')]

Catalan

brill tagger trained on cess_cat corpus

import pickle
from nltk import word_tokenize

with open("brill_tagger_cess_ca.pkl", "rb") as f:
    tagger = pickle.load(f)
    
tokens = word_tokenize("Quién es el presidente de Cataluña?")
postagged = tagger.tag(tokens)
# [('Quién', 'NOUN'), ('es', 'PRON'), ('el', 'DET'), ('presidente', 'NOUN'), ('de', 'ADP'), ('Cataluña', 'NOUN'), ('?', 'fit')]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release list

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Models

NER

CHUNKING

POSTAG

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

NLTK

Postag

Portuguese

Spanish

Catalan

Uh oh!

Uh oh!

Releases: JarbasAl/ModelZoo

Release list

0.2.0a2

Uh oh!

0.2a1

Models

NER

CHUNKING

POSTAG

Uh oh!

0.1 - Iberian Brill

NLTK

Postag

Portuguese

Spanish

Catalan

Uh oh!