TakeLab · mariosasko · Mar 1, 2021
diff --git a/README.md b/README.md
@@ -64,7 +64,7 @@ For usage examples see the documentation pages [walkthrough](http://takelab.fer.
 Use some of our pre-defined datasets:
 
 ```python
->>> from podium.datasets import SST
+>>> from podium import SST
 >>> sst_train, sst_test, sst_dev = SST.get_dataset_splits()
 >>> print(sst_train)
 SST({
@@ -93,7 +93,7 @@ Load datasets from [🤗/datasets](https://github.com/huggingface/datasets):
 
 ```python
 
-  >>> from podium.datasets.hf import HFDatasetConverter
+  >>> from podium import HFDatasetConverter
   >>> import datasets
   >>> # Load the huggingface dataset
   >>> imdb = datasets.load_dataset('imdb')
@@ -124,8 +124,7 @@ Load datasets from [🤗/datasets](https://github.com/huggingface/datasets):
 Load your own dataset from a standardized tabular format (e.g. `csv`, `tsv`, `jsonl`):
 
 ```python
->>> from podium.datasets import TabularDataset
->>> from podium import Vocab, Field, LabelField
+>>> from podium import Vocab, Field, LabelField, TabularDataset
 >>> fields = {'premise':   Field('premise', numericalizer=Vocab()),
 ...           'hypothesis':Field('hypothesis', numericalizer=Vocab()),
 ...           'label':     LabelField('label')}

diff --git a/docs/source/advanced.rst b/docs/source/advanced.rst
@@ -1,8 +1,6 @@
 .. testsetup:: *
 
-  from podium import Field, LabelField, Vocab, Iterator, TabularDataset
-  from podium.datasets import SST
-  from podium.vectorizers import GloVe, TfIdfVectorizer
+  from podium import Field, LabelField, Vocab, Iterator, TabularDataset, SST, GloVe, TfIdfVectorizer
 
 The Podium data flow
 ====================
@@ -14,7 +12,7 @@ The data is processed immediately when the instance is loaded from disk and then
 
 .. doctest:: sst_field
 
-  >>> from podium.datasets import SST
+  >>> from podium import SST
   >>> sst_train, sst_test, sst_dev = SST.get_dataset_splits()
   >>> print(sst_train[222]) 
   Example({'text': (None, ['A', 'slick', ',', 'engrossing', 'melodrama', '.']), 'label': (None, 'positive')})
@@ -159,7 +157,7 @@ To better understand how specials work, we will walk through the implementation
 
 .. doctest:: specials
 
-  >>> from podium.vocab import Special
+  >>> from podium import Special
   >>> class BOS(Special):
   ...   default_value = "<BOS>"
   ...
@@ -187,8 +185,7 @@ To see the effect of the ``apply`` method, we will once again take a look at the
 
 .. doctest:: specials
 
-  >>> from podium import Vocab, Field, LabelField
-  >>> from podium.datasets import SST
+  >>> from podium import Vocab, Field, LabelField, SST
   >>> 
   >>> vocab = Vocab(specials=(bos))
   >>> text = Field(name='text', numericalizer=vocab)
@@ -236,8 +233,7 @@ We have so far covered the case where you have a single input column, tokenize a
 
 .. doctest:: multioutput
 
-  >>> from podium import Vocab, Field, LabelField
-  >>> from podium.datasets import SST
+  >>> from podium import Vocab, Field, LabelField, SST
   >>> char = Field(name='char', numericalizer=Vocab(), tokenizer=list)
   >>> text = Field(name='word', numericalizer=Vocab())
   >>> label = LabelField(name='label')
@@ -303,8 +299,7 @@ For this reason, usage of :class:`podium.datasets.BucketIterator` is recommended
 
 .. code-block:: python
 
-  >>> from podium import Vocab, Field, LabelField
-  >>> from podium.datasets import SST, IMDB
+  >>> from podium import Vocab, Field, LabelField, SST, IMDB
   >>> vocab = Vocab()
   >>> text = Field(name='text', numericalizer=vocab)
   >>> label = LabelField(name='label')
@@ -343,7 +338,7 @@ The ``bucket_sort_key`` function defines how the instances in the dataset should
   For Iterator, padding = 148141 out of 281696 = 52.588961149608096%
   For BucketIterator, padding = 2125 out of 135680 = 1.5661851415094339%
 
-As we can see, the difference between using a regular Iterator and a BucketIterator is massive. Not only do we reduce the amount of padding, we have reduced the total amount of tokens processed by about 50%. The SST dataset, however, is a relatively small dataset so this experiment might be a bit biased. Let's take a look at the same statistics for the :class:`podium.datasets.IMDB` dataset. After changing the highligted data loading line in the first snippet to:
+As we can see, the difference between using a regular ``Iterator`` and a ``BucketIterator`` is massive. Not only do we reduce the amount of padding, we have reduced the total amount of tokens processed by about 50%. The SST dataset, however, is a relatively small dataset so this experiment might be a bit biased. Let's take a look at the same statistics for the :class:`podium.datasets.IMDB` dataset. After changing the highligted data loading line in the first snippet to:
 
 .. code-block:: python
 
@@ -374,8 +369,7 @@ As an example, we will again turn to the SST dataset and some of our previously
 .. doctest:: saveload
   :options: +NORMALIZE_WHITESPACE
 
-  >>> from podium import Vocab, Field, LabelField
-  >>> from podium.datasets import SST
+  >>> from podium import Vocab, Field, LabelField, SST
   >>>
   >>> vocab = Vocab(max_size=5000, min_freq=2)
   >>> text = Field(name='text', numericalizer=vocab)

diff --git a/docs/source/faq.rst b/docs/source/faq.rst
@@ -9,7 +9,7 @@ FAQ
 
 .. code-block:: python
 
-  >>> from podium.datasets import SST
+  >>> from podium import SST
   >>> sst_train, sst_test, sst_dev = SST.get_dataset_splits()
   >>> x, y = sst_train.batch()
   >>> print(x.text.shape, y.label.shape, sep='\n')
@@ -20,8 +20,7 @@ Be aware that you will get a dataset as a matrix by default -- meaning that all
 
 .. code-block:: python
 
-  >>> from podium.datasets import SST
-  >>> from podium import Vocab, Field, LabelField
+  >>> from podium import Vocab, Field, LabelField, SST
   >>> text = Field(name='text', numericalizer=Vocab(), disable_batch_matrix=True)
   >>> label = LabelField(name='label')
   >>> fields = {'text':text, 'label':label}

diff --git a/docs/source/preprocessing.rst b/docs/source/preprocessing.rst
@@ -43,8 +43,7 @@ Regex Replace
 
 .. code-block:: python
 
-   >>> from podium import Field, LabelField, Vocab
-   >>> from podium.datasets import SST
+   >>> from podium import Field, LabelField, Vocab, SST
    >>> 
    >>> text = Field('text', numericalizer=Vocab())
    >>> label = LabelField('label')
@@ -123,7 +122,7 @@ Truecase
 
 .. code-block:: python
 
-   >>> from podium.preproc import truecase
+   >>> from podium import truecase
    >>> apply_truecase = truecase(oov='as-is')
    >>> print(apply_truecase('hey, what is the weather in new york?'))
    Hey, what is the weather in New York?

diff --git a/docs/source/walkthrough.rst b/docs/source/walkthrough.rst
@@ -1,9 +1,7 @@
 
 .. testsetup:: *
 
-  from podium import Field, LabelField, Vocab, Iterator, TabularDataset
-  from podium.datasets import SST
-  from podium.vectorizers import GloVe, TfIdfVectorizer
+  from podium import Field, LabelField, Vocab, Iterator, TabularDataset, SST, GloVe, TfIdfVectorizer
 
 
 Walkthrough
@@ -29,7 +27,7 @@ One built-in dataset available in Podium is the `Stanford Sentiment Treebank <ht
 .. doctest:: sst
   :options: +NORMALIZE_WHITESPACE
 
-  >>> from podium.datasets import SST
+  >>> from podium import SST
   >>> sst_train, sst_test, sst_valid = SST.get_dataset_splits() # doctest:+ELLIPSIS
   >>> print(sst_train)
   SST({
@@ -100,7 +98,7 @@ This way, we can define a static dictionary which we might have obtained on anot
 
 .. doctest:: custom_vocab
 
-  >>> from podium.vocab import UNK
+  >>> from podium import UNK
   >>> custom_itos = [UNK(), 'this', 'is', 'a', 'sample']
   >>> vocab = Vocab.from_itos(custom_itos)
   >>> print(vocab)
@@ -285,7 +283,7 @@ The output of the function call is a numpy matrix of word embeddings which you c
 
 .. code-block:: python
 
-  >>> from podium.vectorizers import GloVe
+  >>> from podium import GloVe
   >>> vocab = fields['text'].vocab
   >>> glove = GloVe()
   >>> embeddings = glove.load_vocab(vocab)
@@ -308,8 +306,7 @@ As we intend to use the whole dataset at once, we will also set ``disable_batch_
 
 .. doctest:: vectorizer
 
-  >>> from podium.datasets import SST
-  >>> from podium import Vocab, Field, LabelField
+  >>> from podium import Vocab, Field, LabelField, SST
   >>> vocab = Vocab(max_size=5000)
   >>> text = Field(name='text', numericalizer=vocab, disable_batch_matrix=True)
   >>> label = LabelField(name='label')
@@ -320,7 +317,7 @@ Since the Tf-Idf vectorizer needs information from the dataset to compute the in
 
 .. doctest:: vectorizer
 
-  >>> from podium.vectorizers.tfidf import TfIdfVectorizer
+  >>> from podium import TfIdfVectorizer
   >>> tfidf_vectorizer = TfIdfVectorizer()
   >>> tfidf_vectorizer.fit(dataset=sst_train, field=text)
 
@@ -433,7 +430,7 @@ You can load a dataset in 🤗/datasets and then convert it to a Podium dataset
 
 .. code-block:: python
 
-  >>> from podium.datasets.hf import HFDatasetConverter
+  >>> from podium import HFDatasetConverter
   >>> import datasets
   >>> # Loading a huggingface dataset returns an instance of DatasetDict
   >>> # which contains the dataset splits (usually: train, valid, test,