Skip to content

DataScienceProjectDow/nlpmaps-1

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic Selection of Task-Optimal Embedding Models

When trying to incorporate non-structured information in the model, word-embedding is also a key step in the process. However, it is often non-trivial to select the right word-embedding method to transform text data into numerical representation that can be fed to any modeling frameworks for a task at hand. With new embedding techniques getting actively published, choosing the optimal embedding model also becomes more complicated. Would a simple bag of words or TF-ITF be good enough? Or should a classical embedding (i.e., Word2Vec, fastText, GLOVE) or contextual embeddings (i.e., ELMO, BERT) or even graph representations be applied? Often times, the answer depends on their performances on downstream NLP tasks. For this project, we would like to review the list of possible embedding techniques ranging ranged from complexity level and create a framework that can automatically select the best embedding strategy for a given task. This means the resulting embedding can be any existing/pre-trained models or an ensemble of models or even a transfer-learned model that is trained directly with the task model. The solution is open-ended and depends on what works the best. Potentially, the embedding models can be thought of as a hyperparameter to be tuned for the final task model. Nevertheless, it is critical to come up with certain metrics for selecting such optimal embedding strategy and evaluating the performance of the framework using both standard benchmark dataset and real-world industrial dataset for a variety of different tasks.

There are some existing literatures available (see reference section) on this topic however usually the study only presented the accuracy of the algorithm bench marked on a specific task (i.e. text classification, NER).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published