explore mtcars

Roland Krasser

2019-10-08

How to explore the mtcars dataset using the explore package.

The explore package simplifies Exploratory Data Analysis (EDA). Get faster insights with less code!

The mtcars dataset comes with the dplyr package. We use the packages explore and dplyr (for mtcars, select, mutate and the %>% operator).

library(dplyr)
library(explore)

Explore dataset

mtcars %>% explore_tbl()

mtcars %>% describe()
#>    variable type na na_pct unique   min   mean    max
#> 1       mpg  dbl  0      0     25 10.40  20.09  33.90
#> 2       cyl  dbl  0      0      3  4.00   6.19   8.00
#> 3      disp  dbl  0      0     27 71.10 230.72 472.00
#> 4        hp  dbl  0      0     22 52.00 146.69 335.00
#> 5      drat  dbl  0      0     22  2.76   3.60   4.93
#> 6        wt  dbl  0      0     29  1.51   3.22   5.42
#> 7      qsec  dbl  0      0     30 14.50  17.85  22.90
#> 8        vs  dbl  0      0      2  0.00   0.44   1.00
#> 9        am  dbl  0      0      2  0.00   0.41   1.00
#> 10     gear  dbl  0      0      3  3.00   3.69   5.00
#> 11     carb  dbl  0      0      6  1.00   2.81   8.00

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

variable description
mpg Miles/(US) gallon
cyl Number of cylinders
disp Displacement (cu.in.)
hp Gross horsepower
drat Rear axle ratio
wt Weight (lb/1000)
qsec 1/4 mile time
vs V/S
am Transmission (0 = automatic, 1 = manual)
gear Number of forward gears
carb Number of carburetors

Explore variables

mtcars %>% 
  explore_all()

Number of gears?

Is there a difference between cars with 3,4 and 5 gears?

mtcars %>% 
  explore(gear)

Most of the cars in the dataset have 3 or 4 gears. 15.6% have 5 gears.

Now check relation between some of the variables and gear:

mtcars %>% 
  select(gear, mpg, hp, cyl, am) %>% 
  explore_all(target = gear)

We see that 100% of cars with am = 0 (automatic) have 3 gears. All cars with am = 1 (manual) have 5 gears.

High miles per gallon?

Let’s define an interesting target: Cars that have mpg (miles per gallon) > 25

We copy the data and create a new target variable

data <- mtcars %>% 
  mutate(highmpg = if_else(mpg > 25, 1, 0, 0)) %>% 
  select(-mpg)

data %>% explore(highmpg)

So, about 19% of all cars have mpg > 25. What else is special about them?

data %>% 
  select(highmpg, cyl, disp, hp) %>% 
  explore_all(target = highmpg)

data %>% 
  select(highmpg, drat, wt, qsec, vs) %>% 
  explore_all(target = highmpg)

data %>% 
  select(highmpg, am, gear, carb) %>% 
  explore_all(target = highmpg)

There are some strong differences between cars with / without “high mpg”.

Now let’s grow a decision tree:

data %>% 
  explain_tree(target = highmpg)

Growing a decision tree, shows that there seems to be a very strong correlation between wt (weight) and “high mpg”. Cars with a low weight are much more likely to have “high mpg”.

Let’s take a closer look to wt:

data %>% explore(wt, target = highmpg)

data %>% explore(wt, target = highmpg, split = FALSE)

wt (weight) is a good predictor for high mpg.

mtcars %>% explore(wt, mpg)

There is a strong correlation between wt and mpg.

If you want to have high miles per gallon (mpg), buy a car with low weight (wt)!

Horsepower?

Is there a relation between horsepower and other variables like number of cylinder?

Let’s build a decision tree with horsepower as target:

mtcars %>% 
  explain_tree(target = hp, minsplit=15)

mtcars %>% 
  select(hp, cyl, mpg) %>% 
  explore_all(target = hp)

Cars with 8 cylinders have higher horsepower.

Cars with low miles per gallon (mgp) have higher horsepower!