Skip to content
This repository was archived by the owner on Jun 26, 2025. It is now read-only.
This repository was archived by the owner on Jun 26, 2025. It is now read-only.

Try out different word embeddings for BERT intrinsic evaluation #38

@siemdejong

Description

@siemdejong

Research question
The last hidden layer of BERT is best suited for contextualized text embeddings.

Hypothesis
It is the layer where the structure is best defined, considering all previous relations in the other 11 layers.

Method

  1. Instantiate pretrained ClinialBERT
  2. Gather a dataset of medical terms with different classes. E.g. all brain locations, but locations are grouped by occurrence of tumours in those regions.
  3. Generate embeddings from layers as proposed in https://jalammar.github.io/illustrated-bert/
  4. Intrinsic evaluation per embedding strategy. Evaluation measure tbd

Why is this experiment worthwhile?
Papers report different accuracies when using different embedding strategies from pretrained models (ref!).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions