Breast Cancer Prediction

Machine Learning Project

Random Forest Model

Random Forest is a highly configurable model. We started our tuning exercise with 200 trees, which we were able to reduce to 10 while still maintaining a good accuracy. Next we tried setting the max_features to ‘sqrt’ to limit the number of features being considered for each split. Given the wide variety of parameters, we went with a Grid Search. It is very resource consuming and takes about 1min runtime with an M1 chip, however does not improve the accuracy drastically and so was not considered as a candidate for our predictor app.

Random Forest Model with Full 30 Features

Random Forest Model with 7 BestFeature Selected Features

Random Forest Model with 7 Correlation Selected Features

Observations:

1. Model accuracy did not change after reducing feature number from 30 to 7.

2. Both 7 feature models had the same performance.

Northwestern Data Visualization Final Project