This file provides descriptions and access details for each external resource not included in ModelBlocks. Modelblocks needs to know where to access external resources, so each such resource has an associated config/user-*.txt file, which you will need to edit so that it contains the absolute path of that resource on your system.
NAME: The Alice in Wonderland corpus
MB POINTER FILE: config/user-alice-directory.txt
AVAILABILITY: Unreleased
DESCRIPTION: fMRI data from 28 subjects listening to the first chapter of Alice in Wonderland.
Collected by Brennan et al (2016).
NAME: The BMMM Unsupervised PoS tagger (Christodoulopoulos et al 2011)
MB POINTER FILE: config/user-bmmm-directory.txt
AVAILABILITY: Free
URL: https://github.com/christos-c/bmmm
DESCRIPTION: A Bayesian multinomial mixture model (BMMM) for unsupervised
part of speech tagging (Christodoulopoulos et al 2011).
NAME: The Berkeley Parser (jarfile)
MB POINTER FILE: config/user-berkeleyparserjar-directory.txt
AVAILABILITY: Free
URL: http://nlp.cs.berkeley.edu/software.shtml
DESCRIPTION: The directory containing the jarfile for the berkeley parser.
NAME: The British National Corpus (BNC)
MB POINTER FILE: config/user-bnc-directory.txt
AVAILABILITY: FREE
URL: http://www.natcorp.ox.ac.uk/
DESCRIPTION: The British National Corpus (BNC) is a 100 million word collection
of samples of written and spoken language from a wide range of sources, designed
to represent a wide cross-section of British English, both spoken and written,
from the late twentieth century.
NAME: CCL unsupervised parser (Seginer, 2007)
MB POINTER FILE: config/user-ccl-directory.txt
AVAILABILITY: Free
URL: http://www.seggu.net/ccl/
DESCRIPTION: The CCL unsupervised constituency parser (Seginer, 2007).
NAME: The CHILDES Corpus
MB POINTER FILE: config/user-childes-directory.txt
AVAILABILITY: Free
URL: http://childes.talkbank.org/
DESCRIPTION: CHILDES is the child language component of the TalkBank system.
TalkBank is a system for sharing and studying conversational interactions.
NAME: The Dependency Model with Valence (DMV)
MB POINTER FILE: config/user-dmv-directory.txt
AVAILABILITY: Free
URL: https://code.google.com/archive/p/pr-toolkit/
DESCRIPTION: An implementation (Gillenwater et al 2010) of the Dependency
Model with Valence parser (Klein & Manning 2004) for unsupervised
dependency parsing.
NAME: The Dundee eye-tracking corpus
MB POINTER FILE: config/user-dundee-directory.txt
AVAILABILITY: Unreleased
DESCRIPTION: A corpus of eye-tracking measures from 10 subjects who read
newspaper articles (Kennedy et al, 2003).
NAME: Echo-state netork (ESN) directory
MB POINTER FILE: config/user-esn-directory.txt
DESCRIPTION: A directory in which to store output of ESN
NAME: English Gigaword
MB POINTER FILE: config/user-gigaword4-directory.txt
AVAILABILITY: Paid
URL: https://catalog.ldc.upenn.edu/ldc2003t05
DESCRIPTION: A comprehensive archive of newswire text data in English that
has been acquired over several years by the Linguistic Data Consortium.
NAME: Extended Penn Tokenizer
MB-POINTER-FILE: config/user-tokenizer-directory.txt
AVAILABILITY: Free
URL: https://github.com/vansky/extended_penn_tokenizer
DESCRIPTION: Extended version of Robert McIntyre's (1995) Penn tokenizer.
NAME: GENIA Tagger
MB POINTER FILE: config/user-geniatagger-directory.txt
AVAILABILITY: Free
URL: http://www.nactem.ac.uk/GENIA/tagger/
DESCRIPTION: Part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text.
NAME: KenLM Language Model Toolkit
MB POINTER FILE: config/user-kenlm-directory.txt
AVAILABILITY: Free
URL: https://kheafield.com/code/kenlm/
DESCRIPTION: KenLM estimates, filters, and queries language models. Estimation
is fast and scalable due to streaming algorithms.
NAME: KenLM Language Model Toolkit (model binaries directory)
MB POINTER FILE: config/user-kenlm-model-directory.txt
AVAILABILITY: Free
URL: https://kheafield.com/code/kenlm/
DESCRIPTION: KenLM estimates, filters, and queries language models. Estimation
is fast and scalable due to streaming algorithms.
This resource is just a directory in which to store compiled binaries.
You can specify a binaries directory using the pointer file above.
NAME: The MIT Sentence Passages corpus
MB POINTER FILE: config/user-passages-directory.txt
AVAILABILITY: Unreleased
DESCRIPTION: A corpus of fMRI bold responses by subjects to audio presentation
of short passages (3-4 sentences each) in isolation.
NAME: The Natural Stories Corpus
MB POINTER FILE: config/user-naturalstories-directory.txt
AVAILABILITY: Unreleased
DESCRIPTION: A corpus of naturalistic stories meant to contain varied,
low-frequency syntactic constructions. There are a variety of annotations
and psycholinguistic measures available for the stories.
NAME: OntoNotes
MB POINTER FILE: config/user-ontonotes-directory.txt
AVAILABILITY: Paid
URL: https://catalog.ldc.upenn.edu/ldc2013t19
DESCRIPTION: Syntactic and semantic annotations of a large corpus comprising
various genres of text.
NAME: The Penn Treebank (PTB)
MB POINTER FILE: config/user-treebank-directory.txt
AVAILABILITY: Paid
URL: https://catalog.ldc.upenn.edu/ldc99t42
DESCRIPTION: One million words of 1989 Wall Street Journal material annotated in Treebank II style.
A small sample of ATIS-3 material annotated in Treebank II style.
Switchboard tagged, dysfluency-annotated, and parsed text.
A fully tagged version of the Brown Corpus.
Brown parsed text.
NAME: R-Hacks
MB POINTER FILE: config/user-rhacks-directory.txt
AVAILABILITY: Free
URL: https://github.com/aufrank/R-hacks
DESCRIPTION: Useful bits of code for programming and analysis in R.
NAME: The Roark Parser
MB-POINTER-FILE: config/user-roark-directory.txt
AVAILABILITY: Free
URL: https://github.com/roarkbr/incremental-top-down-parser
DESCRIPTION: A standard parser from Roark (2001, 2004) that computes psycholinguistic
complexity measures.
NAME: SRILM Language Model Toolkit
MB POINTER FILE: config/user-srilm-directory.txt
AVAILABILITY: Free for non-commercial use
URL: http://www.speech.sri.com/projects/srilm/download.html
DESCRIPTION: SRILM is a toolkit for building and applying statistical language models (LMs),
primarily for use in speech recognition, statistical tagging and segmentation,
and machine translation.
NAME: The UCL corpus (Frank et al, 2013)
MB POINTER FILE: config/user-ucl-directory.txt
AVAILABILITY: Free
URL: http://www.stefanfrank.info/readingdata/Data.zip
DESCRIPTION: Eye-tracking and self-paced-reading data
from subjects reading isolated sentences from a corpus
of novels written by amateur authors.
NAME: UPPARSE (Unsupervised parser, Ponvert et al, 2011)
MB POINTER FILE: config/user-upparse-directory.txt
AVAILABILITY: Free
URL: https://github.com/eponvert/upparse
DESCRIPTION:i Efficient implementations of hidden Markov
models (HMMs) and probabilistic right linear grammars (PRLGs) for
unsupervised partial parsing (also known as: unsupervised chunking,
unsupervised NP identification, unsupervised phrasal segmentation).
NAME: WordNet
MB POINTER FILE: config/user-wordnet-directory.txt
AVAILABILITY: Free
URL: https://wordnet.princeton.edu/wordnet/download/
DESCRIPTION: WordNet is a large lexical database of English.
NAME: xlsx2cxv
MB POINTER FILE: config/user-xlsx2csv-directory.txt
AVAILABILITY: Free
URL: https://github.com/dilshod/xlsx2csv
DESCRIPTION: XLS to CSV converter