Statistical analysis: beta, mixture
Please note that while we will be using brms to conduct a Bayesian analysis, the models would hold for other packages and frequentist analyses.
Researchers are interested in determining if the canopy health of Black Box (Eucalyptus largiflorens) is effected by the connectivity of their wetland and/or the flooding history. In this study a total of 80 known Black Box sites were sampled across 5 connectivity and flooding histories (CAFH). The average crown condition score (ACCS) for Black Box was recorded. The crown condition score for a tree is an ordinal1 score from 0 (no canopy) to 7 (91%-100% canopy cover). That means the score at any site could range from 0 to 7, inclusive. The dataset for this study can be found in df_crown_score.RDS
.
After we load the data we can compare the ACCS by using a combined boxplot and violin plot. Figure 1 shows that the distribution of scores is quite different between some of the groups both in terms of central tendency and spread.
cafh accs
1 Connected frequently flooded 4.8
2 Connected frequently flooded 4.8
3 Connected frequently flooded 3.6
4 Connected frequently flooded 4.6
5 Connected frequently flooded 4.7
6 Connected frequently flooded 3.9
ggplot(df_crown_score, aes(cafh, accs)) + geom_violin() +
geom_boxplot(fill = 'darkgrey', width = 0.1) +
geom_jitter(width = 0.1, height = 0) +
ylab('Average crown condition score') +
xlab('Connectivity and flooding history') +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
Figure 1: Combined boxplot and Violin plots of the average crown condition score of Black Box trees at wetlands with different connectivity.
Each site is a sampling event, with a response variable ACCS (\(y_i\)) and a categorical explanatory variable CAFH (\(x_i\)). The response data is continuous but it is bound between two values (0 and 7), but it does not reach those values (Figure 1). Therefore, we could start with a beta distribution.
In words I would describe the model as “The ACCS for any site is a scaled beta distributed, where the mean and precision are functions of the wetlands CAFH category”. Here the ACCS at site \(i\) is \(y_i\), the CAFH is \(x_i\), the mean is \(\mu_i\), and the precision is \(\phi_i\). Please note that we are allowing the precision parameter to vary between CAFH as in the exploration the spread of the different categories seemed quite different.
Observational model:
\[ \begin{align} \frac{y_i}{7} &\sim beta(\mu_i, \phi) \\ \end{align} \] Process model:
The mean (\(\mu_i\)) and precision (\(\phi\)) are linked to the predictors via the logit and log links respectively:
In formal style:
\[ \begin{align} logit(\mu_i) &= \text{cafh}_i \\ log(\phi_i) &= \text{cafh}_i \end{align} \]
Formal style:
\[ \begin{align} logit(\mu_i) &= \alpha + \beta x_i \\ log(\phi_i) &= \gamma + \delta x_i \end{align} \]
From the model output below and Figure ?? the model looks to have converged.
mod_accs
Family: beta
Links: mu = logit; phi = log
Formula: accs/7 ~ cafh
phi ~ cafh
Data: df_crown_score (Number of observations: 80)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI
Intercept 0.20 0.09 0.01 0.38
phi_Intercept 2.67 0.26 2.14 3.14
cafhConnectedrarelyflooded -0.37 0.15 -0.67 -0.09
cafhConnectedunflooded -0.54 0.16 -0.86 -0.24
cafhIsolatedrarelyflooded -0.60 0.12 -0.83 -0.37
cafhLinearrarelyflooded -0.62 0.11 -0.83 -0.41
phi_cafhConnectedrarelyflooded 0.44 0.46 -0.51 1.31
phi_cafhConnectedunflooded 0.24 0.45 -0.66 1.12
phi_cafhIsolatedrarelyflooded 1.96 0.58 0.73 3.00
phi_cafhLinearrarelyflooded 2.77 0.56 1.58 3.79
Rhat Bulk_ESS Tail_ESS
Intercept 1.00 2028 1967
phi_Intercept 1.00 3073 2581
cafhConnectedrarelyflooded 1.00 2764 2680
cafhConnectedunflooded 1.00 2842 2940
cafhIsolatedrarelyflooded 1.00 2288 2402
cafhLinearrarelyflooded 1.00 2098 2152
phi_cafhConnectedrarelyflooded 1.00 3264 2755
phi_cafhConnectedunflooded 1.00 2998 3001
phi_cafhIsolatedrarelyflooded 1.00 2935 2126
phi_cafhLinearrarelyflooded 1.00 3218 2321
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(mod_accs)
Figure 2: The chains from each estimated parameter in the average crown condition score model.
Figure 3: The chains from each estimated parameter in the average crown condition score model.
The QQ plot (Figure 4) does not show any violations of the beta distribution assumption. The homogeneity plot is also fine, which is good, since the model allowed the spread to different between levels.
check_brms(mod_accs)
Figure 4: Residual plots for the average crown condition score model.
The posterior predictive check looks okay (Figure 5).
pp_check(mod_accs, ndraws = 100)
Figure 5: Posterior predictive plot for the average crown condition score model.
We can interpret the model estimates roughly without getting too much into the weeds and let the plot do the heavy lifting for us. If the 95% credible interval excludes zero, then we can infer that that parameter was different to the probability for the default option (connected frequently flooded). Here all of the levels are lower. In Figure 6 we can see that connected frequently flooded is clearly higher scoring than the others, but also that connected rarely flooded is most likely higher than isolated or linear rarely flooded.
prep_accs <- df_crown_score %>% select(cafh) %>% distinct()
pred_accs <- cbind(prep_accs, 7*fitted(mod_accs, newdata = prep_accs)) %>% rename(lcl = Q2.5, ucl = Q97.5)
ggplot() +
geom_pointrange(data = pred_accs, aes(cafh, Estimate, ymin = lcl, ymax = ucl)) +
geom_jitter(data = df_crown_score, aes(cafh, accs), alpha = 0.2, width = 0.1) +
ylab('Average crown condition score') +
xlab(NULL) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
Figure 6: A plot of the estimated average crown condition score for each connectivity and flood history category. The black point is the mean, the vertical line is the 95% credible interval.
In the results section you may want to produce Figure 6 as well as a table with the parameter estimates and the 95% credible intervals (see Table ??). Depending on the audience, maybe the parameter estimate go into an appendix.
tb_mod_accs <- cbind(data.frame(Component = c('Mean', 'Precision', rep('Mean', 4),
rep('Precision', 4)),
Parameter = c(rep('Intercept', 2),
rep(c('CAFH: Connected raely flooded',
'CAFH: Connected unflooded',
'CAFH: Isolated rarely flooded',
'CAFH: Linear rarely flooded'), 2))),
summary(mod_accs)$fixed %>%
select(Estimate, Est.Error, Lower = `l-95% CI`, Upper = `u-95% CI`)) %>%
as_tibble()
imp_accs <- (tb_mod_accs$Lower*tb_mod_accs$Upper > 0)
tb_mod_accs %>%
mutate(across(where(is.numeric), ~ round(.x, 3))) %>% arrange(Component) %>% flextable() %>%
set_header_labels(values = c('Component', 'Parameter', 'Estimate', 'S.E.', 'Lower', 'Upper')) %>%
add_header_row(colwidths = c(4, 2), values = c("", "95% Credible bound")) %>%
align( j=4:6, align='right', part='body' ) %>%
bold(i = imp_accs) %>%
ari_theme()
95% Credible bound | |||||
---|---|---|---|---|---|
Component | Parameter | Estimate | Est.Error | Lower | Upper |
Mean | Intercept | 0.197 | 0.095 | 0.014 | 0.380 |
Mean | CAFH: Connected raely flooded | -0.374 | 0.146 | -0.667 | -0.085 |
Mean | CAFH: Connected unflooded | -0.545 | 0.157 | -0.859 | -0.238 |
Mean | CAFH: Isolated rarely flooded | -0.600 | 0.117 | -0.826 | -0.366 |
Mean | CAFH: Linear rarely flooded | -0.620 | 0.105 | -0.829 | -0.411 |
Precision | Intercept | 2.675 | 0.258 | 2.140 | 3.143 |
Precision | CAFH: Connected raely flooded | 0.435 | 0.461 | -0.513 | 1.306 |
Precision | CAFH: Connected unflooded | 0.243 | 0.452 | -0.662 | 1.119 |
Precision | CAFH: Isolated rarely flooded | 1.958 | 0.575 | 0.731 | 2.998 |
Precision | CAFH: Linear rarely flooded | 2.770 | 0.561 | 1.583 | 3.794 |
The model shows that the connectivity and flood history categories have different Black Box canopy health. In particular connected and frequently flooded wetlands had a greater ACCS than all other wetland types (Figure 6). Additionally there is evidence to conclude that connected rarely flooded wetland have greater scores than isolated rarely flooded or linear rarely flooded sites.
A client is interested in determining the impact of excluding livestock from wetlands. To assess the effectiveness, a BACI study was conducted (that we have simplified here). There were 10 locations monitored before and after the intervention was completed. At each location there were 5 transect that were to continue with the current continual grazing regime, and 5 transects that that would have livestock excluded. The dataset for this study can be found in df_td_cm_press.RDS
.
We load the data, then plot the data to see how it is distributed (Figure 7). Hopefully you noticed that a lot of the data is in the first bin, which is at 0, then there is a dip before a mini-peak around 3 cm. If we look at the violin plots (Figure 8) you can see that there is a lot a data (at least 25% given the lower quartile is 0) at 0 cm and then a gap until after 1 cm. It is almost like the data is split into two groups, the places iti s zero and then the rest.
# A tibble: 6 × 7
location grazing exclusion transect year td_cm time
<chr> <chr> <fct> <chr> <int> <dbl> <fct>
1 P Press No exclusion P_1 1 0 Before
2 P Press No exclusion P_1 6 7.54 After
3 P Press Livestock exclusion P_1 1 6.59 Before
4 P Press Livestock exclusion P_1 6 0 After
5 P Press No exclusion P_2 1 0 Before
6 P Press No exclusion P_2 6 5.28 After
ggplot(df_td_cm_press, aes(td_cm)) +
geom_histogram(aes(y = after_stat(density)), alpha = 0.2) +
geom_density() +
xlab('Thatch depth (cm)') +
ylab('Density')
Figure 7: Combined histogram and density plot of the thatch depth.
ggplot(df_td_cm_press, aes(time, td_cm)) + geom_violin() +
geom_boxplot(fill = 'darkgrey', width = 0.1) +
geom_jitter(width = 0.1, height = 0) +
ylab('Thatch depth (cm)') +
xlab('Intervention schedule') +
facet_wrap(.~ exclusion)
Figure 8: Combined boxplot and Violin plots of the thatch depth by time and exclusion.
The response variable is thatch depth (\(\text{td}_i\)), which is a continuous measurement and two categorical explanatory variables, time period (\(t_i\)) and exclusion \(f_i\). The data is close to zero, so maybe a log-normal distribution could be used, except for the fact that we have lots (or any2) zeroes. A mixture model could account for those structural zeroes, in particular a hurdle model as there can be no random zeroes with a log-normal distribution. Therefore we end up with trying a hurdle log-normal model. Given the data was from repeated visits to the same locations and transects, it is appropriate to use a mixed model with transect nested within location.
Observational model:
\[ \begin{align} \text{td}_i &\sim \text{HLN}(\pi, \mu_i, \sigma^2) \\ \end{align} \]
where \(\pi\) is the probability of a non-zero answer, \(\mu\) is the mean-log and \(\sigma\) is the sd-log.
Process model:
The mean-log (\(\mu_i\)) is linked to the predictors via the identity link, while the sd-log (\(\sigma\)) and hurdle (\(\pi\)) are constants on the identity link:
In formal style:
\[ \begin{align} \mu_i &= \text{time}_i \times \text{exclusion}_i \end{align} \]
Formal style:
\[ \begin{align} \mu_i &= \beta_0 + \beta_1 \times t_i + \beta_2 \times f_i + \beta_3 \times t_i \times f_i \end{align} \]
where \(t_i = 0\) when time is before and 1 otherwise and \(f_i = 1\) if the transect is a exclusion transect and 0 otherwise.
From the model output below and Figure ?? the model looks to have converged.
mod_td_press
Family: hurdle_lognormal
Links: mu = identity; sigma = identity; hu = identity
Formula: td_cm ~ time * exclusion + (1 | location/transect)
Data: df_td_cm_press (Number of observations: 200)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~location (Number of levels: 10)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.42 0.14 0.22 0.77 1.00 1148
Tail_ESS
sd(Intercept) 1594
~location:transect (Number of levels: 50)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.22 0.05 0.12 0.32 1.00 1033
Tail_ESS
sd(Intercept) 1645
Regression Coefficients:
Estimate Est.Error l-95% CI
Intercept 1.34 0.15 1.03
timeAfter -0.16 0.07 -0.31
exclusionLivestockexclusion -0.06 0.08 -0.22
timeAfter:exclusionLivestockexclusion 1.26 0.12 1.03
u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.63 1.00 1204 1970
timeAfter -0.02 1.00 2943 2947
exclusionLivestockexclusion 0.11 1.00 2804 2836
timeAfter:exclusionLivestockexclusion 1.49 1.00 2423 2880
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.26 0.03 0.22 0.32 1.00 1788 2707
hu 0.45 0.03 0.38 0.52 1.00 6243 3297
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(mod_td_press)
Figure 11: The chains from each estimated parameter in the thatch depth model.
Figure 12: The chains from each estimated parameter in the thatch depth model.
The QQ plot (Figure 13) does not show any violations of the hurdle log-normal distribution assumption. The homogeneity plot is also fine.
check_brms(mod_td_press)
Figure 13: Residual plots for the thatch depth model.
The posterior predictive check looks okay (Figure 14).
pp_check(mod_td_press, ndraws = 100)
Figure 14: Posterior predictive plot for the thatch depth model.
Important parts of this model are that BACI interaction term provided strong evidence the the intervention is associated with increased thatch cover by a factor of 3.5 (95% CI from 2.8 to 4.4) and that we estimate that only 45% (95% CI from 38 to 52%) of the transects had some thatch.
Figure 15 is the unconditional estimate of thatch depth, and the difference in the interaction term is clear.
prep_td_press <- df_td_cm_press %>% filter(location == 'P', transect == 'P_1') %>%
select(time, exclusion) %>% distinct()
pred_td_press <- cbind(prep_td_press,
fitted(mod_td_press, newdata = prep_td_press, re_formula = NA)) %>%
rename(lcl = Q2.5, ucl = Q97.5) %>% mutate(time = factor(time, labels = c('Before intervention', 'After intervention')))
ggplot(pred_td_press, aes(exclusion, Estimate, group = time, shape = time, color = time)) +
geom_pointrange(aes(ymin = lcl, ymax = ucl), position = position_dodge(width = 0.2)) +
ylab('Thatch depth (cm)') +
xlab('Herbivore exclusion') +
theme(legend.title = element_blank())
Figure 15: A plot of the estimated thatch depth for each option. The black point is the mean, the vertical line is the 95% credible interval.
In the results section you may want to produce Figure 15 as well as a table with the parameter estimates and the 95% credible intervals (see Table ??). Depending on the audience, maybe the parameter estimate go into an appendix.
tb_mod_td_press <- rbind(cbind(data.frame(Component = 'Conditional: Fixed',
Parameter = c('Intercept', 'After intervention', 'Exclusion used',
'After intervention and exclusion used')),
summary(mod_td_press)$fixed %>%
select(Estimate, Est.Error, Lower = `l-95% CI`, Upper = `u-95% CI`)),
cbind(data.frame(Component = c('Conditional: Random', 'Hurdle'),
Parameter = c('Standard deviation: Residual', 'Probability of non-zero')),
summary(mod_td_press)$spec_pars %>%
select(Estimate, Est.Error, Lower = `l-95% CI`, Upper = `u-95% CI`)),
cbind(data.frame(Component = 'Conditional: Random',
Parameter = c('Standard deviation: Location', 'Standard deviation: Transect')),
rbind(summary(mod_td_press)$random$location,
summary(mod_td_press)$random$`location:transect`) %>%
select(Estimate, Est.Error, Lower = `l-95% CI`, Upper = `u-95% CI`))) %>% as_tibble() %>%
mutate(Component = factor(Component, levels =c('Hurdle', 'Conditional: Fixed', 'Conditional: Random'))) %>%
arrange(Component)
imp_td_press <- (tb_mod_td_press$Lower*tb_mod_td_press$Upper > 0) &
(tb_mod_td_press$Component == 'Conditional: Fixed')
tb_mod_td_press %>%
mutate(across(where(is.numeric), ~ round(.x, 3))) %>% flextable() %>%
set_header_labels(values = c('Component', 'Parameter', 'Estimate', 'S.E.', 'Lower', 'Upper')) %>%
add_header_row(colwidths = c(4, 2), values = c("", "95% Credible bound")) %>%
align( j=3:6, align='right', part='body' ) %>%
bold(i = imp_td_press) %>%
ari_theme()
95% Credible bound | |||||
---|---|---|---|---|---|
Component | Parameter | Estimate | Est.Error | Lower | Upper |
Hurdle | Probability of non-zero | 0.450 | 0.034 | 0.384 | 0.517 |
Conditional: Fixed | Intercept | 1.337 | 0.150 | 1.035 | 1.631 |
Conditional: Fixed | After intervention | -0.165 | 0.074 | -0.306 | -0.019 |
Conditional: Fixed | Exclusion used | -0.055 | 0.081 | -0.216 | 0.105 |
Conditional: Fixed | After intervention and exclusion used | 1.261 | 0.116 | 1.034 | 1.487 |
Conditional: Random | Standard deviation: Residual | 0.264 | 0.025 | 0.221 | 0.319 |
Conditional: Random | Standard deviation: Location | 0.416 | 0.144 | 0.221 | 0.774 |
Conditional: Random | Standard deviation: Transect | 0.217 | 0.051 | 0.116 | 0.318 |
The model shows that the exclusion of livestock increases the thatch depth and that over half the transect had no measurable thatch.
Researcher are interested in determining if tree recruitment in semi-arid woodlands is effected by environmental features at the site. In this study a total of 320 sites were sampled. The number of recruits (seedling or juveniles) and well as several environmental factors were recorded. The dataset for this study can be found in df_rec_saw.RDS
.
After we load the data we can explore the data. Figure 16 shows that there are qute a few zeroes and some relatively large values.
crust recruits
1 0.42 0
2 0.14 11
3 0.00 0
4 0.00 1
5 0.59 0
6 0.79 0
ggplot(df_rec_saw, aes(recruits)) +
geom_histogram(aes(y = after_stat(density)), alpha = 0.2) +
geom_density() +
xlab('Tree recruitment') +
ylab('Density')
Figure 16: Combined histogram and density of the number of recruits.
ggplot(df_rec_saw, aes(crust, recruits)) +
geom_jitter() +
ylab('Tree recruitment') +
xlab('Soil crust (%)')
Figure 17: Scatter plot of the number of recruits compared to the percentage of soil crust.
The data is count data and from Figure 16 it would seem that there are some quite large values compared to the majority of the data. There is also a very large peak at zero, which may or may not be explainable by the model. Therefore with those things together we should try an initial model of a zero-inflated negative binomial. The zero-inflation will account for any structural zeroes,while still allowing for random zeroes. The negative binomial as the very large values may be examples of “clumping”.
Observational model:
\[ \begin{align} y_i &\sim \text{ZINB}(\pi, \mu_i, \rho) \\ \end{align} \] Process model:
The mean (\(\mu_i\)) linked to the predictors via the log link, while the shape or aggregation (\(\rho\)) and the zero-inflation (\(\pi\)) are constants and use the identity link.
In formal style:
\[ \begin{align} log(\mu_i) &= \text{crust}_i \end{align} \]
Formal style:
\[ \begin{align} log(\mu_i) &= \alpha + \beta x_i \end{align} \]
where \(x_i\) if the proportion of soil crust at that site.
mod_rec_zinb <- brm(recruits ~ z_crust, family = 'zero_inflated_negbinomial', data = df_rec_saw, cores = 4)
From the model output below and Figure 18 the model looks to have converged.
mod_rec_zinb
Family: zero_inflated_negbinomial
Links: mu = log; shape = identity; zi = identity
Formula: recruits ~ z_crust
Data: df_rec_saw (Number of observations: 320)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 2.37 0.17 2.05 2.70 1.00 1102 1329
z_crust -0.26 0.10 -0.45 -0.07 1.00 2169 2052
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 0.27 0.06 0.18 0.42 1.00 829 841
zi 0.17 0.09 0.01 0.35 1.00 767 1071
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
plot(mod_rec_zinb)
Figure 18: The chains from each estimated parameter in the recruits model.
The QQ plot (Figure 19) does not show any violations of the beta distribution assumption. The homogeneity plot is also fine, which is good, since the model allowed the spread to different between levels.
check_brms(mod_rec_zinb)
Figure 19: Residual plots for the recruits model.
The posterior predictive check looks okay (Figure 20).
pp_check(mod_rec_zinb, ndraws = 100)
Figure 20: Posterior predictive plot for the recruits model.
Important parts of this model are that slope term provided strong evidence as soil crust increases the recruitment rate decreased, but have important is that decrease in the overall system,and that zero-inflation may have been justified, as an estimated 17% (95% CI from 1 to 35%) of site have no recruitment through structural zeroes. In Figure 21 we can see that there is still a high degree of variability that the model has not accounted for.
prep_rec_zinb <- data.frame(crust = seq(0, 0.9, 0.025)) %>% mutate(z_crust = (crust - 0.25)/0.25)
pred_rec_zinb <- cbind(prep_rec_zinb,
fitted(mod_rec_zinb, newdata = prep_rec_zinb)) %>%
rename(lcl = Q2.5, ucl = Q97.5)
ggplot() +
geom_ribbon(data = pred_rec_zinb, aes(crust, ymin = lcl, ymax = ucl), alpha = 0.2) +
geom_line(data = pred_rec_zinb, aes(crust, Estimate)) +
geom_jitter(data=df_rec_saw, aes(crust, recruits)) +
ylab('Tree recruitment') +
xlab('Soil crust (%)')
Figure 21: A plot of the estimated recruits. The black line is the mean, and the shade area is the 95% credible interval.
In the results section you may want to produce Figure 21 as well as a table with the parameter estimates and the 95% credible intervals (see Table ??). Depending on the audience, maybe the parameter estimate go into an appendix.
tb_mod_rec_zinb <- rbind(data.frame(Source = 'Negative binomial', Parameter = c('Intercept', 'Soil crust'), summary(mod_rec_zinb)$fixed[, 1:4]),
data.frame(Source = c('Negative binomial', 'Zero-inflation'), Parameter = c('Aggregation', 'Zero-inflation'),
summary(mod_rec_zinb)$spec_pars[, 1:4])) %>% rename(Lower = `l.95..CI`, Upper = `u.95..CI`) %>% as_tibble()
imp_mod_rec_zinb <- (tb_mod_rec_zinb$Lower*tb_mod_rec_zinb$Upper > 0)
tb_mod_rec_zinb %>%
mutate(across(where(is.numeric), ~ round(.x, 3))) %>% flextable() %>%
set_header_labels(values = c('Source', 'Parameter', 'Estimate', 'S.E.', 'Lower', 'Upper')) %>%
add_header_row(colwidths = c(4, 2), values = c("", "95% Credible bound")) %>%
align( j=3:6, align='right', part='body' ) %>%
bold(i = imp_mod_rec_zinb) %>%
ari_theme()
95% Credible bound | |||||
---|---|---|---|---|---|
Source | Parameter | Estimate | Est.Error | Lower | Upper |
Negative binomial | Intercept | 2.368 | 0.170 | 2.048 | 2.700 |
Negative binomial | Soil crust | -0.264 | 0.100 | -0.452 | -0.073 |
Negative binomial | Aggregation | 0.270 | 0.064 | 0.178 | 0.419 |
Zero-inflation | Zero-inflation | 0.169 | 0.094 | 0.009 | 0.346 |
The model shows that the amount of soil crust may influence the amount of recruits present, but there is something else driving the system (Figure 21). Additionally there is evidence that at 17% of sites have no recruitment over the baseline level expected.
Just one group exercise today. The data involves pointing data in Volcanic Plains Grasslands. Of interest is the effect of different regimes on the proportion of the landscape with themeda present. At 15 locations 5 transects were monitored annually for 5 years. On each transect 50 points were assessed as to the presence of themeda. Additionally, at the commencement of the monitoring, the soil phosphorus was determined at each location. Can you explore any pattern in the data? How would you shape the data to help with your analysis? The data set is ex_them.RDS
.
Ordinal data is ordered categorical data. A common example would be the Likert scales (e.g. strongly disagree, disagree, neutral, agree, strongly agree).↩︎
If we only had a couple of zeroes and thousands of observations, I would be tempted to either exclude these observations or use some other approach to avoid making the model much more complex.↩︎