machine_learning_site

Responses for Wednesday, July 15

A.

  1. The ImageDataGenerator() command and the argument rescale allow the images to be normalized for training. We can then flow the images from the specified directory to the generated object by specifying the directory, the target size, the batch size, and class mode. Specifying target size makes sure that all of the images end up the same size for training, as sometimes images will be different sizes and we don’t want that to interrupt results. When specifying the class mode, it is important to consider whether there are two or more classifications. If there are only two, then ‘binary’ should be specified. If there are more than two, ‘categorical’ should be specified. For training and testing datasets, the directory to flow from images from should be from a different directory (specifically the validation directory instead of the training directory), but the rest of the process is the exact same. Target size should be the same for both, but batch size can be smaller for the testing set.
  2. The model has three Conv2D layers and 3 MaxPooling2D layers. Like before, the layers are then flattened and there are two more dense layers to do the classification. The number of filters is modified to go from 16 to 32 to 64. This is because the image size decreases with each layer, so it will be easier to learn more filters. The reason why the image size decreases is because each Conv2D layer removes a one pixel deep border from around the image and then each MaxPooling2D layer decreases the image’s dimensions by half, making the image a quarter of its original size. Since there are three layers, the image size gets greatly reduced (from 300x300 to 35x35). For the output layer, the ‘sigmoid’ activation was used, in contrast with the ‘relu’ from the other layers. This is significant, as sigmoid can be used when there are only two classifications to be made. It assigns the value 0 to one class and 1 to the other class. Anything above 0.5 will belong to class 1 and anything below will belong to class 0. In the model compiler, the arguments used are ‘binary_crossentropy’ for the loss function, RMSprop with a learning rate of 0.001 for the optimizer function, and ‘accuracy’ for the metric.

B.
output_oRKO_x8gWKv-_1
In this plot, each variable is plotted against each of the other variables. This provides a useful tool for investigating the co-relationship amongst the variables, as we can see how they interact independently from all of the other variables. For example, weight and displacement appear to have a positive linear relationship when viewed independently from MPG and cylinders. The diagonals are where one variable is plotted against itself, and this represents the univariate distribution for the data. In this case, the kernel density estimates are viewed for the univariate plots. Essentially this function is describing how useful each of the variables can be regarding classification. Variables that have more and stronger relationships are more likely to have a greater effect when determining the class.
2. When viewing the last 5 observations, it seems like the model has been overfit at this point, as the values appear to be stagnant or even increasing. There is no real improvement for any of the statistics anymore. This can be backed up by viewing the error plots for each of the statistics. For example, in the MAE plot below, we can see that the vaidation error has been stagnant and even increasing slightly since around the 100th epoch. The training validation error has been improving slowly, but when nearing the 1000th epoch it begins to look more and more horizontal. The same pattern can be seen in other plots, like the MSE plot.
output_nMCWKskbUTvG_1 output_N9u74b1tXMd9_1

C.

  1. Comparing the 4 different sized models was important, as increasing the size lead to easier overfitting. As seen in the plot below, the tiny model is able to avoid overfitting entirely, but as the size increases, overfitting happens earlier. The small model is overfit after several hundred epochs, the medium model is overfit after fifty or so epochs, and the large is overfit after only around twenty epochs.
    output_0XmKDtOWzOpk_1