machine_learning_site

Response for Tuesday, July 28

Word Embeddings

Using one-hot encoding is inefficient because one-hot encoded vectors are sparse. This means that most of the indices are 0. At first, this might not seem too huge an issue, espcially with smaller datasets, but when reaching lists of words in the hundreds and thousands, that means that each vector would have a value 1 amongst a sea of hundreds or thousands of 0’s. Word embeddings are different, as they are dense and allow us to represent similar words in similar ways instead of making complete distinctions between them. The encoding also does not need to be specified, as this is something that the model learns while training. Although embedded vectors can still have many dimensions, all of the indices are valuable and are not just a bunch of 0’s, but instead help capture relationships between words.
Below are the plots for training/validation loss and accuracy. As we can see, the training loss consistently decreases with each epoch and the training accuracy increases just the same. The model quickly becomes overfit, however, as after about the third epoch, the validation loss and accuracy go stagnant and then start increasing for each epoch. This can be a big problem when working with text, as whenever new rarer words are introduced, the model trains on them and overfits. Despite this, the model still performs decently well with accuracy ranging between 0.8 and 0.9.
Here we can see what is almost a three-dimensional mapping of the words and partial words in the dataset. The way they are organized is by positive or negative sentiment attached to them. We can also see which words are near each other due to being similar or having the same sentiment. This is how the model is able to classify whether a review is positive or negative, due to the positivity of the words used in the review.

Text Classification with an RNN

Here in these plots we can see similar patterns to the plots above. For both of the models, with and without LSTM layers, the model rapidly becomes more and more overfit. As we can see with both accuracy plots, the validation accuracy reaches 0.86 after one epoch and then goes more or less stagnant afterwards. With LSTM layers, however, the second and third epochs do increase the accuracy slightly, unlike with the other plot. Looking at the loss plots, in both cases the loss improves a good amount after the first epoch, a small amount after the second epoch, and then rapidly increases afterwards. In all cases, the model becomes overfit once it reaches the second epoch. This is again due to the same issue mentioned above, where overfitting happens easily when new rarer words are introduced into the training set.

Without LSTM layers

output_ZfIVoxiNmKBF_0 output_IUzgkqnhmKD2_0

With LSTM layers

output__YYub0EDtwCu_0 output_DPV3Nn9xtwFM_0