--- title: "GPCM scope and current limitations" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{GPCM scope and current limitations} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} is_cran_check <- !isTRUE(as.logical(Sys.getenv("NOT_CRAN", "false"))) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, eval = !is_cran_check ) ``` `mfrmr` includes a bounded implementation of the Generalized Partial Credit Model (GPCM; Muraki 1992). The estimator is fully functional, but several downstream reporting helpers remain restricted because score-side semantics under free discrimination differ from the Rasch-family case. This vignette documents which helpers are available, which are not, and what to use as a substitute when a helper is restricted. ## Before fitting: model-choice triage Do not choose `GPCM` only because it is the most flexible model in the menu. Start with the score interpretation. | Model | Use when | Main risk if over-used | |---|---|---| | `RSM` | The rating scale is intended to share one category-threshold structure across the step facet. | Real threshold differences can be hidden in residual diagnostics. | | `PCM` | Thresholds may differ by item, criterion, task, or another designated step facet, but rating events should still contribute equally after conditioning on the modeled facets. | It can absorb threshold heterogeneity without asking whether some levels are more discriminating. | | bounded `GPCM` | The analysis explicitly allows discrimination-based reweighting and treats slopes as part of the substantive sensitivity question. | Better statistical fit can be mistaken for a better operational scoring rule. | This ordering matters for reporting. `RSM` and `PCM` are the package's equal-weighting reference route; bounded `GPCM` is a slope-aware extension. If equal contribution of items, criteria, or raters is part of the validity argument, a better-fitting bounded `GPCM` should be reported as sensitivity evidence rather than as an automatic replacement. ## Report wording templates Use wording that matches the model actually fitted: - `RSM`: "We fit a many-facet rating-scale Rasch model, treating category thresholds as common across the step facet." - `PCM`: "We fit a many-facet partial-credit Rasch model, allowing thresholds to vary by the designated step facet while retaining equal discrimination." - bounded `GPCM`: "We fit a bounded generalized partial-credit many-facet model as a slope-aware sensitivity analysis; interpretation focused on whether discrimination-based reweighting changed the substantive conclusions." Avoid wording that says bounded `GPCM` "improves the score" solely because it improves log-likelihood, `AIC`, or `BIC`. The model can fit better while changing the scoring contract. ## Checking the support boundary `gpcm_capability_matrix()` is the canonical reference. It returns one row per helper family with a `Status` column drawn from `supported`, `supported_with_caveat`, `blocked`, and `deferred`, plus the rationale and the evidence trail behind each classification. The `RecommendedRoute` column states what to do instead when a helper is blocked or deferred, and `NextValidationStep` records what evidence would be needed before broadening that route. ```{r capability-supported} library(mfrmr) gpcm_capability_matrix("supported")[, c("Area", "Status")] ``` ```{r capability-with-caveat} gpcm_capability_matrix("supported_with_caveat")[, c("Area", "Status")] ``` ```{r capability-blocked} gpcm_capability_matrix("blocked")[, c("Area", "Status", "RecommendedRoute")] ``` ```{r capability-deferred} gpcm_capability_matrix("deferred")[, c("Area", "Status", "NextValidationStep")] ``` The matrix is intentionally conservative. A row stays in `blocked` or `deferred` even when some lower-level component already runs, because the scope statement reflects the validation evidence rather than the raw code path. ## Source-grounded recovery interpretation The bounded `GPCM` route follows Muraki's generalized partial credit model and its information-function extension. The package-specific `slope_regime` labels are narrower than that model theory: they summarize the centered log-slope spread of the simulation generator so recovery evidence can be read against a declared stress condition. They are not model-fit tests and they are not literature-derived adequacy cut points. For simulation reporting, read direct recovery checks in an ADEMP-style order: the data-generating mechanism first, then the estimands and performance measures, and only then the row-level recovery diagnostics. In practice, this means: 1. Build or extract an explicit `mfrm_sim_spec`. 2. Run `evaluate_mfrm_recovery()` for the direct parameter-recovery question. 3. Run `assess_mfrm_recovery()` with practical RMSE/bias limits. 4. Read `summary(recovery_review)`, then `recovery_review$condition_reporting_notes` and `recovery_review$condition_review`, then `recovery_review$diagnostic_reporting_notes` and `recovery_review$diagnostic_review` when optional diagnostics were retained, then `plot(recovery_review, type = "status")`, then `plot(recovery_review, type = "metrics", metric = "rmse")`. For release-scale checks, the packaged `recovery-validation.R` protocol separates core release evidence from extended sensitivity cases. Read `topline_release_decision` before `condition_reporting_notes`, `condition_summary`, or row-level case tables, and treat `ExtendedSensitivityStatus` as sensitivity evidence rather than as the core release gate by itself. Fit/separation operating characteristics belong in the diagnostic summary; they are not part of the top-line release-recovery gate. Read `diagnostic_reporting_notes` first when deciding whether zero separation, reliability collapse, or df-sensitive ZSTD flags need explicit report language. ## What works today The following routes are validated for bounded `GPCM`: - **Fitting and core summaries** via `fit_mfrm(model = "GPCM", step_facet = ...)`. The validated default keeps `slope_facet == step_facet`, with the direct `MML` engine. - **Posterior scoring and information** via `predict_mfrm_units()`, `sample_mfrm_plausible_values()`, `compute_information()`, and `plot_information()`. - **Curve and category views** via `plot(fit, type = c("wright", "pathway", "ccc", "ccc_surface"))`, `category_structure_report()`, and `category_curves_report()`. - **Slope-aware simulation specifications** via `build_mfrm_sim_spec()` and `simulate_mfrm_data()`. - **Direct recovery checks** via `evaluate_mfrm_recovery()` and `assess_mfrm_recovery()`, including fitted bounded-GPCM slope recovery on the log-slope scale. ## What works with caveats The following are exposed for `GPCM` but should be read as exploratory screens rather than as Rasch-style invariance evidence: - `diagnose_mfrm()` and the residual and unexpected-response stack: `unexpected_response_table()`, `displacement_table()`, `measurable_summary_table()`, `rating_scale_table()`, `interrater_agreement_table()`, `facet_quality_dashboard()`, `plot_qc_dashboard()`, `plot_marginal_fit()`, `plot_marginal_pairwise()`. - `reporting_checklist()` and `precision_review_report()` route to the supported direct tables and plots. The broader APA/QC/export family is available as caveated sensitivity-reporting output with explicit `gpcm_boundary` rows. - `build_misfit_casebook()` inherits the exploratory screening framing of its underlying sources. - `estimate_bias()` now provides bounded-GPCM conditional screening rows with slope-aware information and profile-likelihood columns. Treat these rows as screening evidence for follow-up, not as standalone confirmatory fairness tests. - `analyze_dff()`, `analyze_dif()`, `dif_interaction_table()`, `dif_report()`, `plot_dif_heatmap()`, and `plot_dif_summary()` provide bounded-GPCM DFF/DIF screening and reporting surfaces with explicit `gpcm_boundary` rows. - `build_apa_outputs()`, `build_visual_summaries()`, `run_qc_pipeline()`, `build_mfrm_manifest()`, `build_mfrm_replay_script()`, `export_mfrm_bundle()`, package-native scorefile export, and `build_linking_review()` return caveated bounded-`GPCM` reporting or exploratory-review objects with explicit `gpcm_boundary` rows. The package-native scorefile can include native structural delta-method expected-score SEs and score-side delta SEs selected by `score_se_method` when the required MML diagnostics are available, but those SEs are not FACETS-equivalent score-side uncertainty. - `evaluate_mfrm_design()`, `predict_mfrm_population()`, `evaluate_mfrm_diagnostic_screening()`, and `evaluate_mfrm_signal_detection()` are available as caveated role-based repeated simulation/refit routes. Treat their outputs as design-level or screening sensitivity evidence, not as operational scoring, calibrated inferential testing, or arbitrary-facet planning validation. The dashboard marks the fair-average panel unavailable under `GPCM`; use `fair_average_table()` directly for the slope-aware element-conditional table and `fair_average_table(fair_se = TRUE)` when you need structural fair-average SEs for non-person rows. ## What is intentionally restricted The slope-aware `fair_average_table()` route and package-native scorefile route are available under `GPCM`, including native expected-score uncertainty and score-side delta SEs where the required MML diagnostics support them. Full FACETS-style score-side compatibility remains restricted because free discrimination changes the relationship between the latent measure and operational score-side summaries. Specifically: - `facets_output_contract_review()` still depends on FACETS-style compatibility semantics that are not generalized to free discrimination. - posterior-predictive checks, MCMC, and heavy-backend extensions are still future scope. - Caveated reporting, export, linking, design-forecast, and screening helpers must keep their `gpcm_boundary` wording visible and must not imply FACETS-equivalent score-side uncertainty, operational scoring, calibrated screening gates, or arbitrary-facet planning validation. ## Recommended substitutes When a restricted helper is needed for a `GPCM` report, the practical paths are: - Refit with `model = "PCM"` if the discrimination-free assumption is defensible for the data. The full APA / output-contract / fit-based export stack becomes available, and `compare_mfrm()` quantifies the loss in fit. - Keep the report on the `GPCM` fit itself but draft the manuscript section manually around the supported tables: `summary(fit)` for parameters, `diagnose_mfrm()` for residual fit, `facet_quality_dashboard()` for the per-facet quality summary, and `compute_information()` for precision evidence. - Generate the reproducibility manifest from a parallel `RSM` or `PCM` baseline fit. The two fits can be reported side by side in the same document, with the `GPCM` fit footnoted as the discrimination-aware counterpart. Those restricted helpers use the same capability matrix at runtime. A blocked or deferred bounded-`GPCM` call stops with the relevant capability row, recommended route, and next validation step instead of producing a partial score-side or unsupported backend result. The condition class is `mfrmr_gpcm_scope_error`, and the condition object carries `helper`, `area`, `status`, `recommended_route`, and `next_validation_step` fields so wrappers can catch and route the failure without parsing the message text. The release-readiness protocol checks that the blocked/deferred rows in `gpcm_capability_matrix()` are represented in the runtime guard coverage table or explicitly marked as roadmap-only. Call `gpcm_runtime_guard_coverage()` to inspect that table. Use `mfrmr_output_guide("gpcm")` when you want the shorter user-facing route map that points to both the support matrix and guard coverage. ## A worked example The `example_core` dataset includes a small synthetic block that supports a bounded `GPCM` fit. This example uses compact quadrature and iteration settings to keep optional local execution short; for final evidence, rerun with the package default or a higher quadrature setting and a larger recovery design. ```r library(mfrmr) toy <- load_mfrmr_data("example_core") fit_gpcm <- fit_mfrm( data = toy, person = "Person", facets = c("Rater", "Criterion"), step_facet = "Criterion", score = "Score", model = "GPCM", method = "MML", quad_points = 7, maxit = 20 ) summary(fit_gpcm) diag_gpcm <- diagnose_mfrm(fit_gpcm) summary(diag_gpcm) info <- compute_information(fit_gpcm) plot_information(info) rec_gpcm <- evaluate_mfrm_recovery( sim_spec = build_mfrm_sim_spec( n_person = 30, n_rater = 3, n_criterion = 4, raters_per_person = 2, model = "GPCM", step_facet = "Criterion", slope_facet = "Criterion", slopes = c(0.8, 1.0, 1.15, 1.05) ), reps = 10, model = "GPCM", fit_method = "MML", quad_points = 7, maxit = 20, include_diagnostics = TRUE, diagnostic_fit_df_method = "both", seed = 1 ) review_gpcm <- assess_mfrm_recovery( rec_gpcm, max_rmse = c(facet = 0.5, step = 0.5, slope = 0.25), max_abs_bias = c(default = 0.25) ) summary(review_gpcm)$overview summary(review_gpcm)$reading_order review_gpcm$condition_reporting_notes[, c( "ConditionArea", "ReportingAttention", "ConditionFinding" )] review_gpcm$condition_review[, c( "Model", "GPCMSlopeRegime", "StressLevel", "ScoreSupportStatus" )] review_gpcm$diagnostic_reporting_notes[, c( "Facet", "ReportingAttention", "DiagnosticFinding" )] summary(review_gpcm)$diagnostic_review plot(review_gpcm, type = "status") plot(review_gpcm, type = "metrics", metric = "rmse") # For a release-scale smoke read: # source(system.file("validation", "recovery-validation.R", package = "mfrmr")) # validation <- mfrmr_run_recovery_validation( # case_ids = c("gpcm_slope_profile", "gpcm_high_dispersion_sparse"), # quick = TRUE, # verbose = FALSE # ) # validation_summary <- summary(validation) # validation_summary$reading_order # validation_summary$topline_release_decision # validation_summary$condition_reporting_notes # validation_summary$condition_summary # validation_summary$diagnostic_reporting_notes # build_summary_table_bundle(validation_summary)$tables$reading_order # build_summary_table_bundle(validation_summary)$tables$domain_decision_table ``` The fit, summary, residual diagnostics, information, recovery, fair-average, and conditional bias-screening helpers all run under `GPCM` with the caveats listed above. Trying `build_apa_outputs(fit_gpcm)` raises an explicit message pointing back at `gpcm_capability_matrix()` rather than producing a partial output. ## Roadmap The boundary above is a release-scope statement, not a permanent design choice. Score-side semantics for free-discrimination polytomous models are on the roadmap for a future release. Until then, the matrix returned by `gpcm_capability_matrix()` is the binding contract.