Partially trainable embeddings

November 10, 2019

Understanding the meaning of natural language require a huge amount of information to be arranged by a neural network. And the largest part if this information is usually stored in word embeddings.

Typically, labeled data from a particular task is not enough to train so many parameters. Thus, word embeddings are trained separately on a large general-purpose corpora.

But there are some cases when we want to be able to train word embeddings in our custom task, for example:

We have a specific domain with a non-standard terminology or sentence structure
We want to use additional markup like <tags> in our task

In these cases, we need to update a small number of weights, responsible for new words and meanings. At the same time, we can’t update pre-trained embeddings cause it will lead to very quick overfitting.

To deal with this problem partially trainable embeddings were used in this project. The idea is to concatenate fixed pre-trained embeddings with additional small trainable embeddings. It is also useful to add a linear layer right after concatenation so embeddings could interact during training. Changing the size of an additional embedding gives control over the number of parameters and, as a result, allows to prevent overfitting.

Another good thing is that AllenNLP allows implementing this technique without a single line of code but with just a simple configuration:

{
  "token_embedders": {
    "tokens-ngram": {
      "type": "fasttext-embedder",
      "model_path": "./data/fasttext_embedding.model",
      "trainable": false
    },
    "tokens": {
      "type": "embedding",
      "embedding_dim": 20,
      "trainable": true
    }
  }
}