Skip to content

DRAFT: Add class to package together multiple datasets#49

Open
amanchoudhri wants to merge 7 commits into
mainfrom
add-fusion-dataset
Open

DRAFT: Add class to package together multiple datasets#49
amanchoudhri wants to merge 7 commits into
mainfrom
add-fusion-dataset

Conversation

@amanchoudhri
Copy link
Copy Markdown
Collaborator

Closes #48.

This commit allows the recently created `GerbilConcatDataset` to be
instantiated elsewhere (for example in `main.py`) and passed into
`Trainer`. In addition, it makes more semantic sense, since the
`Trainer` class itself doesn't need to know about the way
the data is stored.
Essentially, change the `data` arg to consume multiple values, packaging
them into a list. Also add `proportions` and `data_random_seed` args to
optionally specify random subset sizes of each dataset and the seed with
which the subsets will be selected.

This significantly simplifies my first-pass implementation, since the
function `build_multi_source_datasets` still works perfectly even if
only one source is provided (ie. the arg `data_dirs` is a list of length
one). So I've removed my awkward first attempt, `build_datasets`. The
main function can still be run exactly as before, but now it's just more
flexible!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ability to train on multiple GerbilVocalizationDataset objects at once

1 participant