Automatically Builds 25 Classification Models (15 Individual and 10 Ensembles of Model) From Classification Data • ClassificationEnsembles

The goal of ClassificationEnsembles is to automatically conduct a thorough analysis of data that includes classification data. The user only needs to provide the data and answer a few questions (such as which column to analyze). ClassificationEnsembles fits 25 models (15 individual models and 10 ensembles of models). The package also returns 13 plots, five tables and a summary report sorted by accuracy (highest to lowest)

Installation

You can install the development version of ClassificationEnsembles like so:

devtools::install_github("InfiniteCuriosity/ClassificationEnsembles")

Example

ClassificationEnsembles will model the location of a car seat (Good, Medium or Bad) based on the other features in the Carseats data set

library(ClassificationEnsembles)
Classification(data = Carseats,
  colnum = 7,
  numresamples = 2,
  do_you_have_new_data = "N",
  how_to_handle_strings = 1,
  save_all_trained_models = "N",
  use_parallel = "N",
  train_amount = 0.60,
  test_amount = 0.20,
  validation_amount = 0.20)

The 25 models which are build automatically are:

ADABag
Bagged Random Forest
Bagging
C50
Ensemble ADABag
Ensemble BaggedCart
Ensemble Bagged Random Forest
Ensemble C50
Ensemble NaiveBayes
Ensemble Random Forest
Ensemble Ranger
Ensemble Regularized Discrmininant Analysis
Ensemble Support Vector Machines
Ensemble Trees
Linear
Naive Bayes
Partial Least Squares
Penalized Discrmininant Analysis
Random Forest
Ranger
Regularized Discrmininant Analysis
RPart
Support Vector Machines
Trees
XGBoost

The 12 plots it returns automatically are:
1. Overfitting by model and resample
2. Accuracy by model, resample and train/holdout values
3. Accuracy by model and resample
4. Histogram of numeric data
5. Boxplots of numeric data
6. Duration barchart
7. Over or underfitting barchart
8. Model accuracy barchart
9. Target (ShelveLoc in the demo) vs each feature in the data
10. Pairwise scatterplots
11. Correlation of the numeric data as circles and colors
12. Correlation of the numeric data as numbers and colors<br<

The 5 tables the package returns automatically are:
1. Head of the ensemble
2. Head of the data frame
3. Correlation of the data
4. Data summary
5. Summary report, including accuracy, duration, overfitting, sum of diagonals

The package also returns 25 summary tables, one for each of the models. These can be found in the Console.