library(gapminder)
library(tidyverse)
They always take an x and compute and plot the corresponding summarizing y-values for it: central value and dispersion of some sort, or just one of these.
You have two options to say what values the geom should render:
fun
, fun.min
, fun.max
.fun.data
.You either have to write these functions yourself, or you can use a few well-matching ones from the Hmisc
package or wrappers around them created for ggplot2.
Examples of summary functions
geom_errorbar
and geom_linerange
Just ranges, without the central value
Compute standard error of the mean lifeExpectancy for each continent in each year. Render them as errorbars (i.e. without the mean).
gapminder %>%
ggplot(aes(x = factor(year), y = lifeExp, color = continent)) +
geom_point(alpha = 0.2,
position = position_jitterdodge(jitter.width = 0.7,
dodge.width = 0.7)) +
stat_summary(fun.data = "mean_se", geom = "errorbar", size = 1)
Compute standard deviation of the life expectancy for each continent in each year using linerange (i.e. without the mean)
Use the ggplot2::mean_sd
function. NB: it is a wrapper around the Hmisc::smean_sdl
function. It is documented below. By default, the range represents standard error times two (double length). To alter this, one has to use the mult
parameter just like in the originalHmisc::smean_sdl
function. Look at the way stat_summary
inputs these arguments: fun.args = list()
.
In fun.data, preferably use these functions:
Usage
mean_cl_boot(x, ...)
mean_cl_normal(x, ...)
mean_sdl(x, ...)
median_hilow(x, ...)
Arguments
x a numeric vector
... other arguments passed on to the respective Hmisc function.
Value
A data frame with columns y, ymin, and ymax.
These are wrappers around some summary function from the Hmisc package
, and they use Hmisc’s functions’ parameters. Documentation of these functions follows below.
Usage
smean.cl.normal(x, mult=qt((1+conf.int)/2,n-1), conf.int=.95, na.rm=TRUE)
smean.sd(x, na.rm=TRUE)
smean.sdl(x, mult=2, na.rm=TRUE)
smean.cl.boot(x, conf.int=.95, B=1000, na.rm=TRUE, reps=FALSE)
smedian.hilow(x, conf.int=.95, na.rm=TRUE)
These cannot be used directly as fun.data
, since their output is different from the dataframe with column names y
, y.min
, and y.max
. Example:
c(1,1,1,10,10,10,10) %>% Hmisc::smean.cl.normal() %T>% str() #mean and confidence intervals
Named num [1:3] 6.14 1.69 10.59
- attr(*, "names")= chr [1:3] "Mean" "Lower" "Upper"
Mean Lower Upper
6.142857 1.693700 10.592015
As seen above, the resulting output is a named vector, with names different from y
, y.min
, y.max
.
In all summary functions, we can supply either fun.data
, or functions for each statistics separately. These arguments are called fun
(the central value), fun.min
(the lower dispersion value), and fun.max
(the upper dispersion value).
gapminder %>%
ggplot(aes(x = factor(year), y = lifeExp, color = continent)) +
geom_point(alpha = 0.2,
position = position_jitterdodge(jitter.width = 0.7,
dodge.width = 0.7)) +
stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1)
,
geom = "linerange", size = 0.7,
position = position_dodge(width = 0.8))
###geom_crossbar
and geom_pointrange
These include the central value
gapminder %>%
ggplot(aes(x = factor(year), y = lifeExp, color = continent)) +
geom_point(alpha = 0.2,
position = position_jitterdodge(jitter.width = 0.7,
dodge.width = 0.7)) +
stat_summary(fun.data = "mean_se", geom = "crossbar", size = 0.7)
The
gapminder %>%
ggplot(aes(x = factor(year), y = lifeExp, color = continent)) +
geom_point(alpha = 0.2,
position = position_jitterdodge(jitter.width = 0.7,
dodge.width = 0.7)) +
stat_summary(fun.data = "mean_sdl", fun.args = list(mult = 1),
geom = "pointrange", position = position_dodge(width = 0.8),
size = 0.5)
With fun
, fun.min
, and fun.max
: you have to write your own functions first :-/ .
low_f <- function(x) {quantile(x, probs = 0.25)}
hi_f <- function(x) {quantile(x, probs = 0.75)}
gapminder %>%
ggplot(aes(x = factor(year), y = lifeExp, color = continent)) +
geom_point(alpha = 0.2,
position = position_jitterdodge(jitter.width = 0.7,
dodge.width = 0.7)) +
stat_summary(fun = "median", fun.min = "low_f",
fun.max = "hi_f",
geom = "pointrange", position = position_dodge(width = 0.8),
size = 0.5)
geom_smooth
, stat_smooth
gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point() +
geom_smooth(method = "lm")
geom_quantile
, stat_quantile
gapminder %>% filter(year > 1995) %>% ggplot(aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.3) +
geom_quantile(formula = y ~ x ,quantiles = c(0.01, 0.25, 0.5, 0.75),
aes(color = factor(..quantile..)), size = 2) +
geom_smooth( formula = y ~ x , method = "lm", color = "black", linetype = 2, se = FALSE)# + #facet_wrap(~continent)
stat_function