AirlineArrival {fastR2} | R Documentation |
Flights categorized by destination city, airline, and whether or not the flight was on time.
A data frame with 11000 observations on the following 3 variables.
a factor with levels LosAngeles
,
Phoenix
, SanDiego
, SanFrancisco
, Seattle
a factor with levels Delayed
, OnTime
a factor with levels Alaska
, AmericaWest
Barnett, Arnold. 1994. “How numbers can trick you.” Technology Review, vol. 97, no. 7, pp. 38–45.
These and similar data appear in many text books under the topic of Simpson's paradox.
tally(
airline ~ result, data = AirlineArrival,
format = "perc", margins = TRUE)
tally(
result ~ airline + airport,
data = AirlineArrival, format = "perc", margins = TRUE)
AirlineArrival2 <-
AirlineArrival %>%
group_by(airport, airline, result) %>%
summarise(count = n()) %>%
group_by(airport, airline) %>%
mutate(total = sum(count), percent = count/total * 100) %>%
filter(result == "Delayed")
AirlineArrival3 <-
AirlineArrival %>%
group_by(airline, result) %>%
summarise(count = n()) %>%
group_by(airline) %>%
mutate(total = sum(count), percent = count/total * 100) %>%
filter(result == "Delayed")
gf_line(percent ~ airport, color = ~ airline, group = ~ airline,
data = AirlineArrival2) %>%
gf_point(percent ~ airport, color = ~ airline, size = ~total,
data = AirlineArrival2) %>%
gf_hline(yintercept = ~ percent, color = ~airline,
data = AirlineArrival3, linetype = "dashed") %>%
gf_labs(y = "percent delayed")