A. Boosted Trees
In these two histograms, we can see the frequencies of the probabilities the model gives passengers to live. In the first histogram, the frequencies are slightly more spread out, but we can see two prominent clusters and two smaller clusters in the second histogram. This could be because these clusters actually correpond to groups of passengers, such as children, first class men, first class women, second class men, and second class women. I say this because there are certain probabilities with higher frequencies, and because I know the different groups listed have different chances of living. Overall, however, the predictions are almost the same, with many instances of low chances of living mixed with a group of higher chances (0.5-1.0).
In the two probability density plots, we can see very similar results to the predicted probability histograms. Once again, both have quite high numbers for lower probabilities and the logistic regression is more uniform between 0.5 and 1.0 while the boosted tree model shows more definite clusters. There are four groups again here, just like with the first histogram.
In these ROC curves, we can see that the boosted tree model has a better ratio of true positives to false positives. This can be seen around the 0.0 false positive rate where the first curve has a rate of 0.4 and the second curve has a rate of 0.6. The second curve also has a sharper curve than the first and is closer to reaching the top left corner. This means that the area under the second curve is greater, making boosted trees the better model in this case. This agrees with the predictive power of the models, as boosted trees has a higher accuracy of around 0.82 versus 0.75.
B. Boosted Trees model understanding
Here we can see just how each feature contributes to the survival of this particular passenger. Using this information, we can also assume what other plots would be like for passengers with different information. We can see that sex, age, class, and deck are the most important, with embark town and fare also having some significance. It is evident that being male is detrimental to the survival of the passenger, and that being middle-aged, second class, and belonging to an unknown deck is not helping. Although having a higher fare seems to help some with the survival, this is not nearly enough to offset some of these other features. Embark town may not seem like it would contribute to the survival, but this could be somewhat significant as different locations may have different demographics, which could then contribute to other features such as class, fare, and even age. When looking at being alone or having spouses and siblings, it seems that they are not very important when determining the survival, but having others with you does appear to help slightly.
In this plot, we can see the feature contributions for the particular individual being compared to the feature contributions of all the passengers. As stated before, sex is the most important for determining a passenger’s survival, and that is evident in the two hotspots in the plot. One spot is located on the negative side (male) and the other is on the positive side (female). Moving on to age, although there isn’t as huge of a distinction between two groups, we can see that there are many on the negative side representing the adults as well as a much smaller amount on the positive side representing children. It also seems like this is the case for having a known (positive) versus unknown (negative) deck. Moving on to class and fare, the majority of individuals had negative contributions for those features, likely being those who have low fares and are second class. There are small clusters with very positive contributions likely representing those with high fares and who are first class. In terms of embark town, being alone, and having spouses and/or siblings, there is very little contribution or variation in contribution.
As seen in this first plot, as the price of the fare goes up, so does its contribution to the outcome. This could be because those with especially high fares were given special treatment and had a higher chance of living. Those with more standard fares did not have this luxury.
In this second plot, it seems like those with lower ages, especially babies and young children, had a higher chance of surviving. It is also interesting looking at the age range between 20 and 60, as there seem to be two groups forming, likely due to the sex of the passenger being so significant. There are also several outliers near the end where elderly passengers can have a very high chance of living, comparable to that of babies.