--- title: "iRfcb Introduction" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{iRfcb Introduction} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} NOT_CRAN <- identical(Sys.getenv("NOT_CRAN"), "true") knitr::opts_chunk$set( eval = NOT_CRAN ) ``` ## Introduction The `iRfcb` package is an open-source R package designed to streamline the analysis of Imaging FlowCytobot (IFCB) data, with a focus on supporting marine ecological research and monitoring. By integrating R and Python functionalities, the package facilitates efficient handling and sharing of IFCB image data, extraction of key metadata, and preparation of outputs for further taxonomic, ecological, or spatial analyses. This tutorial serves as an introduction to the core functionalities of `iRfcb`, providing step-by-step instructions for data preprocessing, taxonomic analysis, and SHARK-compliant data export. For additional guides—such as quality control of IFCB data, data sharing, and integration with MATLAB—please refer to the other tutorials available on the project's [webpage](https://europeanifcbgroup.github.io/iRfcb/). ## Getting Started ### Installation You can install the package from CRAN using: ```{r, eval=FALSE} install.packages("iRfcb") ``` Load the `iRfcb` and `dplyr` libraries: ```{r} library(iRfcb) ``` ### Download Sample Data To get started, download sample data from the [SMHI IFCB Plankton Image Reference Library](https://doi.org/10.17044/scilifelab.25883455.v3) (Torstensson et al. 2024) with the following function: ```{r, message=FALSE} # Define data directory data_dir <- "data" # Download and extract test data in the data folder ifcb_download_test_data(dest_dir = data_dir) ``` ## Extract IFCB Data This section demonstrates a selection of general data extraction tools available in `iRfcb`. ### Extract Timestamps from IFCB Sample Filenames Extract timestamps from sample names or filenames: ```{r} # Example sample names filenames <- list.files("data/data/2023/D20230314", recursive = TRUE) # Print filenames print(filenames) # Convert filenames to timestamps timestamps <- ifcb_convert_filenames(filenames) # Print result print(timestamps) ``` If the filename includes ROI numbers (e.g., in an extracted `.png` image), a separate column, `roi`, will be added to the output. ```{r} # Example sample names filenames <- list.files("data/png/Alexandrium_pseudogonyaulax_050") # Print filenames print(filenames) # Convert filenames to timestamps timestamps <- ifcb_convert_filenames(filenames) # Print result print(timestamps) ``` ### Calculate Volume Analyzed in ml The analyzed volume of a sample can be calculated using data from `.hdr` and `.adc` files. ```{r} # Path to HDR file hdr_file <- "data/data/2023/D20230314/D20230314T001205_IFCB134.hdr" # Calculate volume analyzed (in ml) volume_analyzed <- ifcb_volume_analyzed(hdr_file) # Print result print(volume_analyzed) ``` ### Get Sample Runtime Get the runtime from a `.hdr` file: ```{r} # Get runtime from HDR-file run_time <- ifcb_get_runtime(hdr_file) # Print result print(run_time) ``` ### Read Feature Data Read all feature files (`.csv`) from a folder: ```{r} # Read feature files from a folder features <- ifcb_read_features("data/features/2023/", verbose = FALSE) # Do not print progress bar # Print output from the first sample in the list print(features[[1]]) # Read only multiblob feature files multiblob_features <- ifcb_read_features("data/features/2023", multiblob = TRUE, verbose = FALSE) # Print output from the first sample in the list print(multiblob_features[[1]]) ``` ## Extract Images from ROI files IFCB images stored in `.roi` files can be extracted as `.png` files using the `iRfcb` package, as demonstrated below. Extract all images from a sample using the `ifcb_extract_pngs()` function. You can specify the `out_folder`, but by default, images will be saved in a subdirectory within the same directory as the ROI file. The `gamma` can be adjusted to enhance image contrast, and an optional scale bar can be added by specifying `scale_bar_um`. ```{r} # All ROIs in sample ifcb_extract_pngs( "data/data/2023/D20230314/D20230314T001205_IFCB134.roi", gamma = 1, # Default gamma value scale_bar_um = 5 # Add a 5 micrometer scale bar ) ``` Extract specific ROIs: ```{r} # Only ROI number 2 and 5 ifcb_extract_pngs("data/data/2023/D20230314/D20230314T003836_IFCB134.roi", ROInumbers = c(2, 5)) ``` To extract annotated images or classified results from MATLAB files, please see the `vignette("image-export-tutorial")` and `vignette("matlab-tutorial")` tutorials. ## Classify IFCB Images IFCB images can be classified directly in R using a CNN model served by a [Gradio](https://www.gradio.app/) application. By default, the classification functions use a public example Space hosted on Hugging Face (`https://irfcb-classify.hf.space`). This Space has limited resources and is intended for testing and demonstration purposes. For large-scale or production classification, we recommend deploying your own instance of the [IFCB Classification App](https://github.com/EuropeanIFCBGroup/ifcb-inference-app) with your own model and passing its URL via the `gradio_url` argument. ### Available Models Use `ifcb_classify_models()` to list the CNN models available on the Gradio server: ```{r, eval=FALSE} ifcb_classify_models() ``` ### Classify All Images in a Sample `ifcb_classify_sample()` extracts images from a `.roi` file internally and returns predictions without requiring a separate extraction step: ```{r, eval=FALSE} # Classify all images in a sample results <- ifcb_classify_sample( "data/data/2023/D20230314/D20230314T001205_IFCB134.roi", verbose = FALSE ) # Print result print(results) ``` ### Classify Pre-extracted PNG Images If images have already been extracted, pass a vector of PNG file paths to `ifcb_classify_images()`: ```{r, eval=FALSE} # List extracted PNG files png_files <- list.files( "data/data/2023/D20230314/D20230314T001205_IFCB134", pattern = "\\.png$", full.names = TRUE ) # Classify images results <- ifcb_classify_images(png_files, verbose = FALSE) # Print result print(results) ``` Both functions return a data frame with `file_name`, `class_name`, `class_name_auto`, `score`, and `model_name` columns, and query the Gradio API at `https://irfcb-classify.hf.space` by default. Per-class F2 optimal thresholds are always applied: `class_name` contains the threshold-applied classification (labeled `"unclassified"` when below threshold), while `class_name_auto` contains the winning class without any threshold. The `top_n` argument controls how many top predictions are returned per image, and `model_name` specifies which CNN model to use (default: `"SMHI NIVA ResNet50 V5"`). ### Save Classification Results `ifcb_save_classification()` classifies all images in a `.roi` file and saves the full score matrix. Three output formats are supported via the `format` argument: ```{r, eval=FALSE} # HDF5 (default) - IFCB Dashboard v3 format (requires hdf5r package) ifcb_save_classification( "data/data/2023/D20230314/D20230314T001205_IFCB134.roi", output_folder = "output" ) # Creates: output/D20230314T001205_IFCB134_class.h5 # MAT - IFCB Dashboard v1 format (requires Python with scipy) ifcb_save_classification( "data/data/2023/D20230314/D20230314T001205_IFCB134.roi", output_folder = "output", format = "mat" ) # Creates: output/D20230314T001205_IFCB134_class_v1.mat # CSV - ClassiPyR-compatible format ifcb_save_classification( "data/data/2023/D20230314/D20230314T001205_IFCB134.roi", output_folder = "output", format = "csv" ) # Creates: output/D20230314T001205_IFCB134.csv ``` The output file contains `output_scores` (N x C matrix), `class_labels`, `roi_numbers`, per-class `thresholds`, and `class_labels_above_threshold`. ## Taxonomical Data Maintaining up-to-date taxonomic data is essential for ensuring accurate species names and classifications, which directly impact calculations like carbon concentrations in `iRfcb`. Up-to-date taxonomy also ensures data harmonization by preventing issues like misspellings, outdated synonyms, or inconsistent classifications. This consistency is crucial for integrating and comparing datasets across studies, regions, and time periods, improving the reliability of scientific outcomes. ### Taxon matching with WoRMS Taxonomic names can be matched against the [World Register of Marine Species (WoRMS)](https://www.marinespecies.org/), ensuring accuracy and consistency. The `iRfcb` package includes a built-in function for taxon matching via the WoRMS API, featuring a retry mechanism to handle server errors, making it particularly useful for automated data pipelines. For additional tools and functionality, the R package [`worrms`](https://cran.r-project.org/package=worrms) provides a comprehensive suite of options for interacting with the WoRMS database. ```{r} # Example taxa names taxa_names <- c("Alexandrium_pseudogonyaulax", "Guinardia_delicatula") # Retrieve WoRMS records worms_records <- ifcb_match_taxa_names(taxa_names, verbose = FALSE) # Do not print progress bar # Print result print(worms_records) ``` ### Check whether a class name is a diatom This function takes a list of taxa names, cleans them, retrieves their corresponding classification records from WoRMS, and checks if they belong to the specified diatom class. The function only uses the first name (genus name) of each taxa for classification. This function can be useful for converting biovolumes to carbon according to Menden-Deuer and Lessard (2000). See `vol2C_nondiatom()` and `vol2C_lgdiatom()` for carbon calculations (not included in NAMESPACE). ```{r} # Read class2use file and select five taxa class2use <- ifcb_get_mat_variable("data/config/class2use.mat")[10:15] # Create a dataframe with class name and result from `ifcb_is_diatom` class_list <- data.frame(class2use, is_diatom = ifcb_is_diatom(class2use, verbose = FALSE)) # Print rows 10-15 of result print(class_list) ``` The default class for diatoms is defined as Bacillariophyceae, but may be adjusted using the `diatom_class` argument. ### Find trophic type of plankton taxa This function takes a list of taxa names and matches them with the **SMHI Trophic Type** list used in [SHARK](https://shark.smhi.se/en/). ```{r} # Example taxa names taxa_list <- c( "Acanthoceras zachariasii", "Nodularia spumigena", "Acanthoica quattrospina", "Noctiluca", "Gymnodiniales" ) # Get trophic type for taxa trophic_type <- ifcb_get_trophic_type(taxa_list) # Print result print(trophic_type) ``` ## SHARK export This function is used by SMHI to map IFCB data into the [SHARK](https://shark.smhi.se/hamta-data/) standard data delivery format. An example submission is also provided in `iRfcb`. ```{r} # Get column names from example shark_colnames <- ifcb_get_shark_colnames() # Print column names print(shark_colnames) # Load example stored from `iRfcb` shark_example <- ifcb_get_shark_example() # Print the SHARK data submission example print(shark_example) ``` This concludes this tutorial for the `iRfcb` package. For additional guides—such as quality control of IFCB data, data sharing, and integration with MATLAB—please refer to the other tutorials available on the project's [webpage](https://europeanifcbgroup.github.io/iRfcb/). See how data pipelines can be constructed using `iRfcb` in the following [Example Project](https://github.com/nodc-sweden/ifcb-data-pipeline). Happy analyzing! ## Citation ```{r, echo=FALSE} # Print citation citation("iRfcb") ``` ```{r, include=FALSE} # Clean up unlink(data_dir, recursive = TRUE) ``` ## References - Torstensson, A., Skjevik, A-T., Mohlin, M., Karlberg, M. and Karlson, B. (2024). SMHI IFCB Plankton Image Reference Library. SciLifeLab. Dataset. https://doi.org/10.17044/scilifelab.25883455.v3