--- title: "2. Efficient visPedigree Workflows" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{2. Efficient visPedigree Workflows} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6.5, fig.height = 6, dpi = 96, out.width = "100%" ) ``` This vignette summarizes efficient day-to-day workflows for `visPedigree` after the `tidyped` architecture updates. The goal is simple: 1. tidy once, 2. reuse the resulting `tidyped` object many times, 3. subset safely, 4. trace candidates explicitly when pedigree completeness matters. For basic tidying, see `tidy-pedigree`. For downstream statistics, see `pedigree-analysis`. ## 1. Load packages and example data ```{r setup} library(visPedigree) library(data.table) data(simple_ped, package = "visPedigree") ``` ## 2. Tidy once, reuse many times The most efficient workflow is to create a master `tidyped` object once and reuse it for plotting, tracing, inbreeding, and matrix calculations. ```{r tidy-once} tp_master <- tidyped(simple_ped) class(tp_master) is_tidyped(tp_master) pedmeta(tp_master) ``` This avoids repeated validation, founder insertion, loop checking, generation assignment, and integer re-indexing. ## 3. Fast repeated tracing from an existing `tidyped` When the input is already a `tidyped` object and `cand` is supplied, `tidyped()` now uses a fast path. It skips the expensive global preprocessing steps and directly traces the requested candidates. ```{r fast-trace} tp_up <- tidyped(tp_master, cand = "J5X804", trace = "up", tracegen = 2) tp_down <- tidyped(tp_master, cand = "J0Z990", trace = "down") has_candidates(tp_up) tp_up[, .(Ind, Sire, Dam, Cand)] ``` Recommended pattern: ```{r fast-trace-pattern, eval = FALSE} # expensive once # tp_master <- tidyped(raw_ped) # cheap many times # tp_a <- tidyped(tp_master, cand = ids_a, trace = "up") # tp_b <- tidyped(tp_master, cand = ids_b, trace = "all", tracegen = 3) # tp_c <- tidyped(tp_master, cand = ids_c, trace = "down") ``` ## 4. Safe `data.table` usage on `tidyped` A `tidyped` object is also a `data.table`, so by-reference workflows remain available. ### 4.1 Adding new columns is safe ```{r dt-modify} tp_work <- copy(tp_master) tp_work[, phenotype := seq_len(.N)] class(tp_work) head(tp_work[, .(Ind, phenotype)]) ``` The `tidyped` class is preserved after `:=` operations. ### 4.2 Incomplete row subsetting now degrades safely If row filtering removes required parents, the result is no longer a complete pedigree. In that case the object is downgraded to a plain `data.table` with a warning. ```{r incomplete-subset} ped_year <- data.table( Ind = c("A", "B", "C", "D"), Sire = c(NA, NA, "A", "C"), Dam = c(NA, NA, "B", "B"), Year = c(2000, 2000, 2005, 2006) ) tp_year <- tidyped(ped_year) sub_dt <- tp_year[Year > 2005] class(sub_dt) sub_dt ``` This behavior prevents invalid integer pedigree indices from silently reaching C++ code. Completeness-sensitive analyses now fail fast on such truncated subsets: ```{r incomplete-subset-error, error = TRUE} inbreed(sub_dt) ``` ### 4.3 Use explicit tracing when you need a valid sub-pedigree If the goal is to keep a structurally valid pedigree around focal individuals, use candidate tracing instead of ad hoc row filtering. ```{r explicit-tracing} valid_sub_tp <- tidyped(tp_year, cand = "D", trace = "up") class(valid_sub_tp) valid_sub_tp[, .(Ind, Sire, Dam, Cand)] ``` Then compute on the valid sub-pedigree and, if needed, filter the final result back to the focal individuals: ```{r explicit-analysis} inbreed(valid_sub_tp)[Ind == "D", .(Ind, f)] ``` ## 5. `splitped()` versus `pedsubpop()` These two functions serve different purposes. - `splitped()` returns the actual split pedigree objects. - `pedsubpop()` returns a summary table. ```{r split-vs-summary} sub_tps <- splitped(tp_master) length(sub_tps) class(sub_tps[[1]]) pedsubpop(tp_master) ``` Use `splitped()` when you need downstream analysis on each component. Use `pedsubpop()` when you only need the component summary. ## 6. Use accessors instead of manual attribute checks The updated accessors are the preferred way to inspect object state. ```{r accessors} tp_f <- inbreed(tp_master) is_tidyped(tp_f) has_inbreeding(tp_f) has_candidates(tp_f) pedmeta(tp_f) ``` This is preferable to hand-written checks such as `"f" %in% names(tp)` or manual attribute access scattered throughout user code. ## 7. Recommended high-efficiency workflow A practical pattern for large pedigrees is: ```{r best-practice, eval = FALSE} # 1. build one validated master object # tp_master <- tidyped(raw_ped) # 2. add analysis-specific columns in place # tp_master[, phenotype := pheno_vector] # tp_master[, cohort := year_vector] # 3. extract valid candidate sub-pedigrees explicitly # tp_sel <- tidyped(tp_master, cand = selected_ids, trace = "up", tracegen = 3) # 4. run downstream analysis on either the full master or traced sub-pedigree # pedstats(tp_master) # pedmat(tp_sel) # inbreed(tp_sel) # visped(tp_sel) # 5. split only when disconnected components really matter # comps <- splitped(tp_master) ``` ## 8. Practical rules of thumb 1. Call `tidyped()` on raw pedigree data once. 2. Reuse the resulting `tidyped` object as the master pedigree. 3. Use `tidyped(tp_master, cand = ...)` for valid local extraction. 4. Use ordinary row filtering only when a plain `data.table` result is acceptable. 5. Use `splitped()` for actual component objects and `pedsubpop()` for summaries. 6. Use `pedmeta()`, `is_tidyped()`, `has_inbreeding()`, and `has_candidates()` to inspect object state. These rules keep workflows fast, explicit, and structurally safe.