The goal of ClassificationEnsembles is to automatically conduct a thorough analysis of data that includes classification data. The user only needs to provide the data and answer a few questions (such as which column to analyze). ClassificationEnsembles fits 25 models (15 individual models and 10 ensembles of models). The package also returns 13 plots, five tables and a summary report sorted by accuracy (highest to lowest)
Installation
You can install the development version of ClassificationEnsembles like so:
devtools::install_github("InfiniteCuriosity/ClassificationEnsembles")Example
ClassificationEnsembles will model the location of a car seat (Good, Medium or Bad) based on the other features in the Carseats data set
library(ClassificationEnsembles)
Classification(data = Carseats,
colnum = 7,
numresamples = 2,
do_you_have_new_data = "N",
how_to_handle_strings = 1,
save_all_trained_models = "N",
use_parallel = "N",
train_amount = 0.60,
test_amount = 0.20,
validation_amount = 0.20)The 25 models which are build automatically are:
- ADABag
- Bagged Random Forest
- Bagging
- C50
- Ensemble ADABag
- Ensemble BaggedCart
- Ensemble Bagged Random Forest
- Ensemble C50
- Ensemble NaiveBayes
- Ensemble Random Forest
- Ensemble Ranger
- Ensemble Regularized Discrmininant Analysis
- Ensemble Support Vector Machines
- Ensemble Trees
- Linear
- Naive Bayes
- Partial Least Squares
- Penalized Discrmininant Analysis
- Random Forest
- Ranger
- Regularized Discrmininant Analysis
- RPart
- Support Vector Machines
- Trees
- XGBoost
The 12 plots it returns automatically are:
1. Overfitting by model and resample
2. Accuracy by model, resample and train/holdout values
3. Accuracy by model and resample
4. Histogram of numeric data
5. Boxplots of numeric data
6. Duration barchart
7. Over or underfitting barchart
8. Model accuracy barchart
9. Target (ShelveLoc in the demo) vs each feature in the data
10. Pairwise scatterplots
11. Correlation of the numeric data as circles and colors
12. Correlation of the numeric data as numbers and colors<br<
The 5 tables the package returns automatically are:
1. Head of the ensemble
2. Head of the data frame
3. Correlation of the data
4. Data summary
5. Summary report, including accuracy, duration, overfitting, sum of diagonals
The package also returns 25 summary tables, one for each of the models. These can be found in the Console.