For this project, I started out with a sample size of 20 images to make sure everything went smoothly. I was able to upsize to a set of 2000 images and after that also went well, I decided to try the full 10000 images overnight. The model was saved and when I came back in the morning it seemed like the training was a success.
In terms of training and testing, I split the 10000 images into 9000 training and 1000 testing images. To prepare for actually performing the training, I set the train labels and test labels equal to the first 9000 and last 1000 labels, respectively, then defined the model. Through my testing, I found that I got the best accuracy from using three Conv2D and three MaxPooling2D layers as well as two Dense layers. The last dense layer had only one neuron and had a linear activation, as I felt this would be best for determining the population. I used the RMSprop optimizer function with a learning rate of 0.001 as well as the mse loss function. For the metrics, I used mse and mae, although that caused loss and mse to have the same value, meaning I had duplicate numbers in some of my analysis. Although there was some variation in the mse and mae for each epoch, I decided that using 10 epochs would be optimal in terms of accuracy and time.
Although definitely not the best, my model performed decently overall. In the first graph below are all of the 1000 actual vs predicted points for population. The line of best fit is positive, which is a good result, but its slope is not one, which would be an indication of a good model. Instead it is about .52. It seems like the images where the population was zero as well as the pictures with very high populations made it difficult for the model to handle the data, causing the slope of the line of best fit to be so low.
Here is a plot of only the 20 sample images to make viewing slightly easier. This way we can see that the majority of the points are actually labelled decently well with only a few outliers towards the low and high ends. In the previous plot, having so many data points made this harder to visualize.
Below is a table of the results of training the data. We can see that the loss and mse are equivalent, as I had mse be the loss function. In future testing, I would alter this so that mae is the only metric or so that the loss function is different. Besides that, we can see that for the most part, both training and validation mse and mae decrease with each epoch, but with several exceptions. Originally, I had only three epochs, as the validation mse and mae appeared to go stagnant afterwards. With further testing, however, I found that increasing the epochs past about seven lead to more improved results. Although the validation mse is still very high at above 1000, I could not find a way to improve the results in a short amount of time.
Metric | loss | mse | mae | val_loss | val_mse | val_mae | epoch |
---|---|---|---|---|---|---|---|
0 | 523.9864501953125 | 523.9864501953125 | 17.336877822875977 | 3533.88671875 | 3533.88671875 | 45.3958740234375 | 1 |
1 | 498.2512512207031 | 498.2512512207031 | 16.847721099853516 | 2530.9287109375 | 2530.9287109375 | 38.457763671875 | 2 |
2 | 482.28619384765625 | 482.28619384765625 | 16.42477035522461 | 1541.272216796875 | 1541.272216796875 | 31.401973724365234 | 3 |
3 | 425.48736572265625 | 425.48736572265625 | 15.42639446258545 | 1992.7490234375 | 1992.7490234375 | 36.24886703491211 | 4 |
4 | 388.6453857421875 | 388.6453857421875 | 14.706588745117188 | 1996.5902099609375 | 1996.5902099609375 | 34.83063507080078 | 5 |
5 | 373.5116271972656 | 373.5116271972656 | 14.228646278381348 | 1932.45654296875 | 1932.45654296875 | 35.72097396850586 | 6 |
6 | 392.7917175292969 | 392.7917175292969 | 13.689448356628418 | 2016.4285888671875 | 2016.4285888671875 | 33.22691345214844 | 7 |
7 | 296.2869567871094 | 296.2869567871094 | 12.641560554504395 | 1674.0047607421875 | 1674.0047607421875 | 30.684524536132812 | 8 |
8 | 268.0354309082031 | 268.0354309082031 | 11.95444393157959 | 1616.0892333984375 | 1616.0892333984375 | 29.922853469848633 | 9 |
9 | 243.9693603515625 | 243.9693603515625 | 11.280399322509766 | 1236.1129150390625 | 1236.1129150390625 | 26.667203903198242 | 10 |
Below is a histogram of the error for each data point. We can see that it looks somewhat like a bell curve centered near zero, a good sign, but it is skewed to the left. There also seems to be a slight dip in error close to zero as well as a large increase in errors between 10 and 15. Despite most of the errors being above -50 and below 25, some were off by more than (-)100.
Here are graphs of training and validation loss and mae for each epoch. We can see why at first I thought 3 epochs would be sufficient, as there is a huge decrease in mae as well as loss from epoch 1 to 3. Afterwards, it seemed to be stagnant, but then decreased further until 10. Due to training taking about seven hours, I decided not to test anything higher than 10 epochs, as it would take too long. We can also see that, although validation loss and mae are decreasing, the model is very overfit.
If I were to do this project again, or if this course were longer than five weeks so that we could have more time for this project, the first thing I would add would be using augmentation on the images to try to help with this major overfitting problem I have. This could make the training loss and mae worse, but the testing loss and mae should be improved. This would not be the solution to everything, however, as the values for the training data are still very high, so maybe I could try out different loss or optimizer functions, although the ones that I did try already were not as good as mse and RMSprop. Potentially something else I could do to try and solve the accuracy issue would be to drop some values that appear to be hurting accuracy, such as the images with no population. This seems to be throwing off the model, as it expects there to be a population in every image due to random noise. Doing this would prevent the model from underestimating other values and hopefully help with the accuracy. Lastly, although it might take much longer, I could try to increase the amount of epochs, as testing loss and mae appeared to be on a downtrend when I cut off the epochs at 10.