---
title: "Quick Start with bigKNN"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Quick Start with bigKNN}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

`bigKNN` provides exact nearest-neighbour and radius-search routines that work
directly on `bigmemory::big.matrix` references. This quickstart walks through a
small end-to-end example: create a reference matrix, run exact `k`-nearest
neighbour and radius queries, and interpret the objects that come back.

```{r setup, include=FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")

if (!requireNamespace("bigmemory", quietly = TRUE)) {
  cat("This vignette requires the 'bigmemory' package.\n")
  knitr::knit_exit()
}

library(bigKNN)
library(bigmemory)
```

```{r helpers, include=FALSE}
knn_table <- function(result, query_ids, ref_ids) {
  do.call(rbind, lapply(seq_along(query_ids), function(i) {
    data.frame(
      query = query_ids[i],
      rank = seq_len(result$k),
      neighbor = ref_ids[result$index[i, ]],
      distance = signif(result$distance[i, ], 4),
      row.names = NULL
    )
  }))
}

radius_slice <- function(result, i, ref_ids) {
  start <- result$offset[i]
  end <- result$offset[i + 1L] - 1L

  if (start > end) {
    return(data.frame(neighbor = character(0), distance = numeric(0)))
  }

  data.frame(
    neighbor = ref_ids[result$index[start:end]],
    distance = signif(result$distance[start:end], 4),
    row.names = NULL
  )
}
```

# Create a Small Reference Matrix

The reference data for `bigKNN` lives in a `bigmemory::big.matrix`. For this
quickstart we will use six points in two dimensions and keep separate labels so
that the returned row indices are easy to read.

```{r create-reference}
reference_points <- data.frame(
  id = paste0("p", 1:6),
  x1 = c(1, 2, 1, 2, 3, 4),
  x2 = c(1, 1, 2, 2, 2, 3)
)

query_points <- data.frame(
  id = c("q1", "q2"),
  x1 = c(1.2, 2.8),
  x2 = c(1.1, 2.2)
)

reference <- as.big.matrix(as.matrix(reference_points[c("x1", "x2")]))
query_matrix <- as.matrix(query_points[c("x1", "x2")])

reference_points
query_points
```

In the examples below, `reference` is the `big.matrix` searched by `bigKNN`,
while `query_matrix` is an ordinary dense R matrix. The same APIs also accept
queries stored in another `big.matrix`, and the v3 search paths accept sparse
query matrices as well.

# First Exact `k`-Nearest-Neighbour Search

With `query = NULL`, `knn_bigmatrix()` performs a self-search. By default
`exclude_self = TRUE` in that case, so each row does not return itself as its
own nearest neighbour.

```{r self-knn}
self_knn <- knn_bigmatrix(reference, k = 2)
self_knn
```

The raw result stores two `n_query x k` matrices:

- `index`: 1-based row indices into the reference matrix
- `distance`: the corresponding exact distances

```{r self-knn-components}
self_knn$index
round(self_knn$distance, 3)
```

A small helper table makes the same result easier to read:

```{r self-knn-table}
knn_table(self_knn, query_ids = reference_points$id, ref_ids = reference_points$id)
```

# Searching New Query Points

To search new observations against the same reference, pass them through the
`query` argument. Here we ask for the three nearest reference rows for two new
points.

```{r query-knn}
query_knn <- knn_bigmatrix(
  reference,
  query = query_matrix,
  k = 3,
  exclude_self = FALSE
)

query_knn
knn_table(query_knn, query_ids = query_points$id, ref_ids = reference_points$id)
```

The returned indices are still row numbers in `reference`. That makes the
object compact and easy to use in later workflows, while a lookup table like
`reference_points` can translate those indices into human-readable labels when
needed.

# First Radius Search

Radius search returns every neighbour within a fixed distance threshold instead
of a fixed `k`. This is useful when the local neighbourhood size should vary by
query.

```{r radius-search}
radius_result <- radius_bigmatrix(
  reference,
  query = query_matrix,
  radius = 1.15,
  exclude_self = FALSE
)

radius_result
radius_result$n_match
radius_result$offset
```

`radius_bigmatrix()` uses a flattened output format:

- `index` and `distance` hold all matches back-to-back
- `n_match` gives the number of matches per query
- `offset` tells you where each query's slice starts and ends

For query `i`, the matching rows live in
`index[offset[i]:(offset[i + 1] - 1)]`, with the same slice in `distance`.

```{r radius-slices}
radius_slice(radius_result, 1, reference_points$id)
radius_slice(radius_result, 2, reference_points$id)
```

If you only need the counts, `count_within_radius_bigmatrix()` avoids returning
the flattened neighbour vectors.

```{r radius-counts}
count_within_radius_bigmatrix(
  reference,
  query = query_matrix,
  radius = 1.15,
  exclude_self = FALSE
)
```

# Choosing a Metric

`bigKNN` currently supports three exact metrics:

- `"euclidean"` for ordinary Euclidean distance
- `"sqeuclidean"` for squared Euclidean distance
- `"cosine"` for cosine distance, defined as `1 - cosine similarity`

The squared Euclidean metric preserves the same neighbour ordering as ordinary
Euclidean distance, but reports squared values. Cosine distance can choose a
different neighbour because it prefers similar direction rather than similar
absolute location.

```{r metric-comparison}
metric_summary <- do.call(rbind, lapply(
  c("euclidean", "sqeuclidean", "cosine"),
  function(metric) {
    result <- knn_bigmatrix(
      reference,
      query = query_matrix,
      k = 1,
      metric = metric,
      exclude_self = FALSE
    )

    data.frame(
      metric = metric,
      query = query_points$id,
      nearest = reference_points$id[result$index[, 1]],
      distance = signif(result$distance[, 1], 4),
      row.names = NULL
    )
  }
))

metric_summary
```

In this example, Euclidean and squared Euclidean agree on the nearest row for
each query, while cosine distance can favour a different point because its
direction is more similar. Cosine distance requires non-zero rows in both the
reference and the query data.

# Where to Go Next

This vignette focused on the smallest useful workflow. For larger or repeated
jobs, the next places to look are:

- `knn_prepare_bigmatrix()` and `knn_search_prepared()` for repeated exact
  queries against the same reference
- `knn_plan_bigmatrix()`, `knn_stream_bigmatrix()`, and
  `radius_stream_bigmatrix()` for memory-aware and streamed workflows
- `knn_graph_bigmatrix()`, `mutual_knn_graph_bigmatrix()`,
  `snn_graph_bigmatrix()`, and `radius_graph_bigmatrix()` for exact graph
  construction
- `recall_against_exact()` and `rerank_candidates_bigmatrix()` when `bigKNN`
  is being used as the exact ground-truth engine for approximate search