Find out why your keys don’t match.
You ran a left join and lost 40% of your rows. dplyr says “many-to-many relationship.” joinspy says 12 keys have trailing spaces, 8 differ only by case, and 3 contain invisible Unicode characters. Then it fixes them.
library(joinspy)
join_spy(orders, customers, by = "customer_id")
repaired <- join_repair(orders, customers, by = "customer_id")
suggest_repairs(join_spy(orders, customers, by = "customer_id"))Most join failures come down to string-level problems in keys:
"Alice" vs "Alice " (trailing space,
invisible)"NYC" vs "nyc" (case)"Johansson" vs "Johannson" (one character
off)NAR won’t warn you about any of these. join_spy() catches
them before the join runs.
join_spy() examines keys before the join:
orders <- data.frame(
id = c("A", "B ", "c", "D"),
amount = c(100, 200, 300, 400),
stringsAsFactors = FALSE
)
customers <- data.frame(
id = c("A", "B", "C", "E"),
name = c("Alice", "Bob", "Carol", "Eve"),
stringsAsFactors = FALSE
)
join_spy(orders, customers, by = "id")
#> -- Join Diagnostic Report --
#> Match rate (left): 25%
#>
#> Issues Detected:
#> ! "B " has trailing whitespace (would match "B")
#> ! "c" vs "C" — case mismatch
#> x "D" has no match in right tablejoin_repair() fixes the issues, or previews what it
would change with dry_run = TRUE.
suggest_repairs() prints the R code instead of running
it.
join_repair(orders, customers, by = "id", dry_run = TRUE)
repaired <- join_repair(orders, customers, by = "id",
standardize_case = "upper")
suggest_repairs(join_spy(orders, customers, by = "id"))
#> x$id <- trimws(x$id)
#> x$id <- toupper(x$id)
#> y$id <- toupper(y$id)join_spy() also estimates result size for each join
type:
report <- join_spy(orders, customers, by = "id")
report$expected_rows
#> inner_join: 1
#> left_join: 4
#> right_join: 4
#> full_join: 7join_explain() works after the join, on the result:
result <- merge(orders, customers, by = "id", all.x = TRUE)
join_explain(result, orders, customers, by = "id", type = "left")
#> Result has same row count as left table
#> ! 3 left key(s) have no match in right tableThe package ships join wrappers (left_join_spy(),
inner_join_spy(), etc.) that run diagnostics before joining
and attach the report as an attribute. join_strict()
enforces cardinality (1:1, 1:m,
m:1, m:m) and errors on violation.
check_cartesian() flags many-to-many keys that would
multiply your row count. analyze_join_chain() handles
multi-step A-B-C sequences.
Joins work with tibbles, data.tables, and plain data frames.
# Install from CRAN
install.packages("joinspy")
# Or install development version from GitHub
# install.packages("pak")
pak::pak("gcol33/joinspy")| Package | Focus |
|---|---|
| dplyr 1.1+ | Cardinality checks via relationship argument |
| powerjoin | 12-level configurable checks, key preprocessing |
| joyn | Match-status reporting variable per row |
| tidylog | Logs row count changes after joins |
joinspy focuses on string-level key diagnostics: whitespace, case, encoding, typos, and type mismatches. It identifies which specific keys failed, why, and can fix them automatically.
“Software is like sex: it’s better when it’s free.” — Linus Torvalds
I’m a PhD student who builds R packages in my free time because I believe good tools should be free and open. I started these projects for my own work and figured others might find them useful too.
If this package saved you some time, buying me a coffee is a nice way to say thanks.
MIT (see the LICENSE.md file)
@software{joinspy,
author = {Colling, Gilles},
title = {joinspy: Diagnostic Tools for Data Frame Joins},
year = {2025},
url = {https://github.com/gcol33/joinspy}
}