--- title: "Introduction to taxdiv" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to taxdiv} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 5 ) ``` ## Overview The **taxdiv** package provides tools for calculating taxonomic diversity indices that incorporate the hierarchical structure of biological classification. While traditional diversity indices like Shannon and Simpson only consider species abundances, taxonomic diversity measures account for the evolutionary and ecological relationships among species. The package implements three main approaches: 1. **Classical diversity indices**: Shannon and Simpson 2. **Clarke & Warwick taxonomic distinctness**: Delta, Delta*, AvTD, VarTD 3. **Ozkan (2018) Deng entropy-based diversity**: pTO and pTO+ ```{r setup} library(taxdiv) ``` ## Example Data: Mediterranean Forest Community We will use a hypothetical Mediterranean forest community to demonstrate the package functionality. This community has 10 tree and shrub species with known abundances and a 4-level taxonomic hierarchy. ```{r data} # Species abundances (individuals per plot) community <- c( Quercus_coccifera = 25, Quercus_infectoria = 18, Pinus_brutia = 30, Pinus_nigra = 12, Juniperus_excelsa = 8, Juniperus_oxycedrus = 6, Arbutus_andrachne = 15, Styrax_officinalis = 4, Cercis_siliquastrum = 3, Olea_europaea = 10 ) # Taxonomic hierarchy tax_tree <- build_tax_tree( species = names(community), Genus = c("Quercus", "Quercus", "Pinus", "Pinus", "Juniperus", "Juniperus", "Arbutus", "Styrax", "Cercis", "Olea"), Family = c("Fagaceae", "Fagaceae", "Pinaceae", "Pinaceae", "Cupressaceae", "Cupressaceae", "Ericaceae", "Styracaceae", "Fabaceae", "Oleaceae"), Order = c("Fagales", "Fagales", "Pinales", "Pinales", "Pinales", "Pinales", "Ericales", "Ericales", "Fabales", "Lamiales") ) tax_tree ``` ## Classical Diversity Indices Shannon and Simpson indices measure diversity based solely on species abundance distribution: ```{r classical} # Shannon diversity (natural log) H <- shannon(community) cat("Shannon H':", round(H, 4), "\n") # Simpson indices D <- simpson(community, type = "dominance") GS <- simpson(community, type = "gini_simpson") inv_D <- simpson(community, type = "inverse") cat("Simpson dominance (D):", round(D, 4), "\n") cat("Gini-Simpson (1-D):", round(GS, 4), "\n") cat("Inverse Simpson (1/D):", round(inv_D, 4), "\n") ``` These indices tell us about the evenness and richness of the community, but they do not consider that some species are taxonomically more distant from each other than others. ## Taxonomic Distance Before computing taxonomic diversity, we can examine the pairwise taxonomic distances between species: ```{r distance} dist_mat <- tax_distance_matrix(tax_tree) round(dist_mat, 2) ``` Species in the same genus (e.g., *Quercus coccifera* and *Q. infectoria*) have distance 0 at all shared taxonomic levels, while species in different orders have the maximum distance. ## Clarke & Warwick Taxonomic Distinctness The Clarke & Warwick framework provides abundance-weighted and presence/absence-based measures: ```{r cw} # Delta: average taxonomic diversity (abundance-weighted) d <- delta(community, tax_tree) cat("Delta (taxonomic diversity):", round(d, 4), "\n") # Delta*: taxonomic distinctness (abundance-weighted, excludes same-species) ds <- delta_star(community, tax_tree) cat("Delta* (taxonomic distinctness):", round(ds, 4), "\n") # AvTD (Delta+): average taxonomic distinctness (presence/absence) spp <- names(community) avg_td <- avtd(spp, tax_tree) cat("AvTD (Delta+):", round(avg_td, 4), "\n") # VarTD (Lambda+): variation in taxonomic distinctness var_td <- vartd(spp, tax_tree) cat("VarTD (Lambda+):", round(var_td, 4), "\n") ``` ## Deng Entropy and Ozkan pTO The Deng entropy framework (Deng, 2016) generalizes Shannon entropy through Dempster-Shafer evidence theory. At each taxonomic level, the mass function accounts for the number of species within each group (the "focal element size"), giving more weight to groups that contain more species. Ozkan (2018) uses Deng entropy to construct four measures: - **uTO**: Unweighted taxonomic diversity (uses slicing procedure) - **TO**: Weighted taxonomic diversity (taxonomic levels weighted by rank) - **uTO+**: Unweighted taxonomic distance (presence/absence, nk=0 only) - **TO+**: Weighted taxonomic distance ```{r pto} result <- ozkan_pto(community, tax_tree) cat("uTO (unweighted diversity):", round(result$uTO, 4), "\n") cat("TO (weighted diversity):", round(result$TO, 4), "\n") cat("uTO+ (unweighted distance):", round(result$uTO_plus, 4), "\n") cat("TO+ (weighted distance):", round(result$TO_plus, 4), "\n") ``` The Deng entropy values at each taxonomic level reveal the contribution of each hierarchical level to overall diversity: ```{r ed_levels} cat("Deng entropy at each level:\n") for (i in seq_along(result$Ed_levels)) { cat(" ", names(result$Ed_levels)[i], ":", round(result$Ed_levels[i], 4), "\n") } ``` At the species level, Ed equals ln(S) = ln(10) since all present species receive equal weight. At higher levels, the Deng correction term (2^|Fi| - 1) increases entropy for groups containing more species. ### Convenience wrapper The `pto_components()` function returns all four values as a named vector: ```{r components} pto_components(community, tax_tree) ``` ## Comparing Communities Taxonomic diversity measures are most useful when comparing communities. Here we compare our Mediterranean community with a species-poor community: ```{r compare} # Degraded community (fewer species, less taxonomic breadth) degraded <- c( Quercus_coccifera = 40, Pinus_brutia = 35, Juniperus_oxycedrus = 10 ) tax_degraded <- tax_tree[tax_tree$Species %in% names(degraded), ] cat("=== Original community (10 species) ===\n") cat("Shannon:", round(shannon(community), 4), "\n") r1 <- ozkan_pto(community, tax_tree) cat("uTO+:", round(r1$uTO_plus, 4), "\n") cat("TO+:", round(r1$TO_plus, 4), "\n\n") cat("=== Degraded community (3 species) ===\n") cat("Shannon:", round(shannon(degraded), 4), "\n") r2 <- ozkan_pto(degraded, tax_degraded) cat("uTO+:", round(r2$uTO_plus, 4), "\n") cat("TO+:", round(r2$TO_plus, 4), "\n") ``` The degraded community shows lower diversity across all measures, but the taxonomic indices (uTO+, TO+) capture not only the loss of species richness but also the narrowing of taxonomic breadth. ## Stochastic Resampling (Run 2) The deterministic pTO calculation (Run 1) uses all species as they are. But how sensitive is the result to community composition? Run 2 explores this by randomly including or excluding each species with 50% probability across many iterations, then taking the maximum pTO value observed. ```{r run2} set.seed(42) run2 <- ozkan_pto_resample(community, tax_tree, n_iter = 101, seed = 42) cat("=== Run 1 (deterministic) ===\n") cat("uTO+:", round(run2$uTO_plus_det, 4), "\n") cat("TO+: ", round(run2$TO_plus_det, 4), "\n") cat("uTO: ", round(run2$uTO_det, 4), "\n") cat("TO: ", round(run2$TO_det, 4), "\n\n") cat("=== Run 2 (max across", run2$n_iter, "iterations) ===\n") cat("uTO+:", round(run2$uTO_plus_max, 4), "\n") cat("TO+: ", round(run2$TO_plus_max, 4), "\n") cat("uTO: ", round(run2$uTO_max, 4), "\n") cat("TO: ", round(run2$TO_max, 4), "\n") ``` The maximum values from Run 2 are always >= the deterministic values from Run 1, because the deterministic calculation is included as the first iteration. We can examine how the pTO values vary across iterations: ```{r run2_summary} iter_df <- run2$iteration_results cat("uTO+ range:", round(min(iter_df$uTO_plus), 4), "to", round(max(iter_df$uTO_plus), 4), "\n") cat("TO+ range:", round(min(iter_df$TO_plus), 4), "to", round(max(iter_df$TO_plus), 4), "\n") ``` ## Sensitivity Analysis (Run 3) Run 3 refines the exploration by assigning species-specific inclusion probabilities based on the Run 2 results. Species that contributed to higher diversity in Run 2 get different inclusion probabilities than those that did not. ```{r run3} run3 <- ozkan_pto_sensitivity(community, tax_tree, run2, seed = 123) cat("=== Run 3 (sensitivity analysis) ===\n") cat("Run 3 max uTO+:", round(run3$run3_uTO_plus_max, 4), "\n") cat("Run 3 max TO+: ", round(run3$run3_TO_plus_max, 4), "\n\n") cat("=== Overall max (across Run 1 + 2 + 3) ===\n") cat("uTO+:", round(run3$uTO_plus_max, 4), "\n") cat("TO+: ", round(run3$TO_plus_max, 4), "\n") cat("uTO: ", round(run3$uTO_max, 4), "\n") cat("TO: ", round(run3$TO_max, 4), "\n") ``` The overall maximum across all three runs represents the "potential" taxonomic diversity of the community — the highest diversity that can be observed under different species compositions derived from the original community. ### Species Inclusion Probabilities Run 3 assigns each species a probability of being included in each iteration: ```{r species_probs} probs <- run3$species_probs prob_df <- data.frame( Species = names(probs), Probability = round(probs, 4) ) print(prob_df, row.names = FALSE) ``` ## Full Pipeline Summary The complete Ozkan (2018) analysis pipeline runs three stages: ```{r pipeline_summary} cat("Pipeline: Run 1 -> Run 2 -> Run 3\n\n") cat(" uTO+ TO+ uTO TO\n") cat("Run 1:", sprintf("%9.4f %9.4f %9.4f %9.4f", run2$uTO_plus_det, run2$TO_plus_det, run2$uTO_det, run2$TO_det), "\n") cat("Run 2:", sprintf("%9.4f %9.4f %9.4f %9.4f", run2$uTO_plus_max, run2$TO_plus_max, run2$uTO_max, run2$TO_max), "\n") cat("Run 3:", sprintf("%9.4f %9.4f %9.4f %9.4f", run3$uTO_plus_max, run3$TO_plus_max, run3$uTO_max, run3$TO_max), "\n") ``` Each subsequent run finds values >= the previous run, reflecting the increasing exploration of the diversity landscape. ## Rarefaction Rarefaction curves allow you to evaluate whether your sampling effort is sufficient to capture the diversity of the community. The `rarefaction_taxonomic()` function computes bootstrap-based rarefaction for any index supported by taxdiv: ```{r rarefaction} rare <- rarefaction_taxonomic(community, tax_tree, index = "shannon", steps = 10, n_boot = 100, seed = 42) cat("Rarefaction results (Shannon):\n") print(round(rare, 4)) ``` The curve should reach a plateau if sampling is adequate. A steep curve at the maximum sample size indicates that more sampling is needed. ## Visualization The taxdiv package provides seven specialized plot types built on ggplot2. Each plot answers a different analytical question about taxonomic diversity. ### Taxonomic Tree (Dendrogram) Visualizes the taxonomic hierarchy of species as a dendrogram. Species on the same branch share closer taxonomic classification: ```{r dendrogram, fig.width=9, fig.height=5.5, fig.alt="Dendrogram showing species grouped by family with abundance labels"} plot_taxonomic_tree(tax_tree, community = community, color_by = "Family", label_size = 3.5, title = "Mediterranean Forest - Taxonomic Tree") ``` **How to read:** Species emerging from the same branch are taxonomically close. Numbers in parentheses show abundance. Colors indicate family groupings. Longer branches represent greater taxonomic distance. ### Taxonomic Distance Heatmap Displays the pairwise taxonomic distance matrix as a color grid: ```{r heatmap, fig.width=9, fig.height=8, fig.alt="Heatmap showing pairwise taxonomic distances between species"} plot_heatmap(tax_tree, label_size = 2.8, title = "Taxonomic Distance Heatmap") ``` **How to read:** Dark red cells indicate distant species pairs, white cells indicate closely related species. Each cell value is the taxonomic distance (0 = same genus, higher = more distant). The diagonal is always zero. ### Community Comparison (Bar Plot) Comparing diversity indices across communities reveals how different disturbance types affect taxonomic structure. We create a second community where one species dominates: ```{r comparison_data} # Dominant community: same species, uneven abundances dominant_community <- c( Quercus_coccifera = 80, Quercus_infectoria = 5, Pinus_brutia = 3, Pinus_nigra = 2, Juniperus_excelsa = 2, Juniperus_oxycedrus = 1, Arbutus_andrachne = 3, Styrax_officinalis = 1, Cercis_siliquastrum = 2, Olea_europaea = 1 ) communities <- list( Diverse = community, Dominant = dominant_community ) ``` ```{r comparison_plot, fig.width=10, fig.height=6, fig.alt="Bar chart comparing 14 diversity indices between diverse and dominant communities"} comparison <- compare_indices(communities, tax_tree, plot = TRUE) comparison$plot ``` **How to read:** Abundance-weighted indices (Shannon, Simpson, Delta, TO) differ markedly between the two communities because they detect the uneven distribution. Presence/absence indices (AvTD, VarTD, uTO+, TO+) remain identical because both communities have the same species list. This illustrates why both types of measures are needed for a complete assessment. Index values: ```{r comparison_table} comparison$table ``` ### Iteration Plot (Run 2) Shows how pTO values change across stochastic resampling iterations: ```{r iteration, fig.width=9, fig.height=5, fig.alt="Scatter plot showing TO values across 101 stochastic resampling iterations"} plot_iteration(run2, component = "TO", title = "Run 2 - TO Values Across Iterations") ``` **How to read:** - **Grey dots**: TO value for each iteration (random species subset) - **Red line**: Deterministic value from Run 1 (all species included) - **Blue line**: Maximum value found across all iterations Low points correspond to iterations where few species were included (lower diversity). Points above the red line indicate subcommunities with higher diversity than the full community, which occurs when removing certain species increases the ratio of between-group to within-group diversity. ### Bubble Chart Shows each species' contribution to community diversity based on abundance and average taxonomic distance to all other species: ```{r bubble, fig.width=10, fig.height=7, fig.alt="Bubble chart showing species contributions to diversity colored by family"} plot_bubble(community, tax_tree, color_by = "Family", title = "Species Contributions to Diversity") ``` **How to read:** - **X-axis**: Abundance (number of individuals) - **Y-axis**: Average taxonomic distance to all other species - **Bubble size**: Contribution weight (abundance x distance) Species in the upper right corner contribute most to taxonomic diversity (both abundant and taxonomically distinct). An isolated species with low abundance (lower right) contributes less than a taxonomically unique species that is also abundant. ### Radar Chart (Spider Plot) Provides a multivariate comparison of all indices simultaneously: ```{r radar, fig.width=8, fig.height=8, fig.alt="Radar chart comparing normalized diversity index values between diverse and dominant communities"} plot_radar(communities, tax_tree, title = "Diverse vs Dominant - Radar Comparison") ``` **How to read:** Each axis represents one diversity index, normalized to 0-1 range. A larger area indicates higher overall diversity. The shape reveals which aspects of diversity differ between communities. When the diverse and dominant community polygons overlap on presence/absence indices but diverge on abundance-weighted indices, it confirms that both communities share the same species pool but differ in evenness. ### Rarefaction Curve Visualizes how diversity estimates change with increasing sample size: ```{r rarefaction_plot, fig.width=8, fig.height=5, fig.alt="Rarefaction curve for Shannon index with bootstrap confidence interval"} plot_rarefaction(rare) ``` **How to read:** The x-axis shows the number of individuals sampled, the y-axis shows the estimated diversity index. The shaded band is the 95% bootstrap confidence interval. If the curve reaches a plateau at maximum sample size, your sampling effort is sufficient. A steep curve at the right edge suggests more sampling is needed to capture the full diversity. ## References - Deng, Y. (2016). Deng entropy. *Chaos, Solitons & Fractals*, 91, 549-553. - Ozkan, K. (2018). A new proposed measure for estimating taxonomic diversity. *Turkish Journal of Forestry*, 19(4), 336-346. - Clarke, K.R. & Warwick, R.M. (1998). A taxonomic distinctness index and its statistical properties. *Journal of Applied Ecology*, 35, 523-531. - Warwick, R.M. & Clarke, K.R. (1995). New 'biodiversity' measures reveal a decrease in taxonomic distinctness with increasing stress. *Marine Ecology Progress Series*, 129, 301-305.