
Data Science Core Objectives
- To support network investigators in biomedical, health, and community-based research with data science services through Data Science Consultation
- To develop and enhance biomedical and health-focused data science knowledge and capacity through the Educational Programming and Outreach Center

Data Science Consultation
The Data Science Core accepts requests for data science support then matches investigators to faculty affiliates whose expertise aligns with the request. Any researcher at one of the NM-INBRE participating institutions can utilize services.
Statistical Support Offered
- Experimental design, sample size determination, and power calculations
- Planning and execution of studies using existing datasets, e.g., in public health
- Statistical analysis including data visualization, descriptive statistics, hypothesis testing, and regression modeling
- Use of statistical software including SAS, SPSS, and R
Bioinformatics Support Offered
- Sequencing and Molecular Biology Lab: The lab is equipped for DNA/RNA sequencing and molecular biology analyses, currently using Oxford Nanopore DNA/RNA sequences and expertise in other major platforms.
- Bioinformatics Analysis: NCGR offers broad bioinformatics expertise, such as genome assembly, pangenomic, single-cell, transcriptomic and metagenomic analyses and beyond.
Request Statistical and Bioinformatics Support
Statistical Support from NMSU
Please use the form below to request support from the Data Science Core. We will review your request and reach out to schedule a meeting to learn more about your project.
Bioinformatics Support from NCGR
Please use the form below to request support from NCGR. We will review your request and reach out to schedule a meeting to learn more about your project.
Educational Programming and Outreach
The Educational Programming and Outreach Center offers workshops, asynchronous short courses, colloquiums, and undergraduate summer experiences designed to build data science capacity across the network.
Bioinformatics Training
NCGR trains students in bioinformatics, genomics, genetics, computing and biology. They provide onsite and virtual training across many levels, includingg k-12, undergraduate and graduate levels, postdoctoral and faculty level. They run in person and virtual bioinformatics workshops for New Mexico/INBRE students and researchers across several topics and can create custom bioinformatics workshops as well.
Differential Expression Workshop
Analysis of gene expression changes in response to different treatments or conditions can yield important biological insights, including elucidating abiotic and biotic stress responses, uncovering disease and resistance mechanisms, and highlighting developmental changes. In this workshop you will learn how to perform differential gene expression, visualization, and pathway analysis. In addition, you will learn how to perform coexpression analysis and identify genes driving expression shifts.
Metagenomics Workshop
Metagenomics approaches sequence and analyze communities of organisms, allowing the identification of community members and/or the genes and metabolic pathways that they harbor. In this workshop, you will learn how to analyze both 16S amplicon and whole genome shotgun sequence data, including filtering host reads (if appropriate), performing community analysis and creating metagenome-assembled genomes (MAGs), calculate measures of diversity, and visualize results. You will also learn how to perform metatranscriptomic analyses that enable you to investigate genes that are being actively expressed in the community.
Single Cell Transcriptomics Workshop
A typical RNA sequencing experiment captures transcriptional changes at the bulk tissue level representing a heterogenous mixture of cells and providing an aggregated expression result. By performing RNA sequencing on isolated single cells, one can pinpoint cellular expression and consider transcriptome sequences from individual cells to investigate, for example, cell lineage relationships, variability between cells and how a cell functions in its local environment. In this workshop, you will import single cell data, and cover topics such as normalization, quality control, dimensionality reduction, visualization, and data interpretation.
Pangenomics Workshop
Pangenomic analyses capitalize on the increasing availability of multiple, high quality, conspecific genome assemblies and the genetic variation that they contain. Pangenome approaches remove reference-bias and elucidate biological signals at a more comprehensive population or species scale. In this workshop, you will learn what pangenomes are, how to build a pangenome, and how to perform fundamental bioinformatic analyses to identify biological insights.
To register for a workshop or check upcoming dates, visit the NCGR workshop page below.
Explore the NCGR website to learn more about their full range of offerings.
Data Science Colloquium Series

The Bright Future of Applied Statistics
Rafael A. Irizarry, PhD
Professor and Chair of the Department of Data Science at Dana-Farber Cancer Institute, Professor of Applied Statistics at Harvard.
Presentation Given on March 21st, 2025

Harnessing the Power of Real-World Data to Build a Novel Cardiac Arrest Database
Dr. Ryan Huebinger
University of New Mexico Health Sciences Center
Presentation Scheduled for September 26th, 2025
Real-world datasets are power tools for conducting low-cost or free research and generating preliminary data for grant applications. In this lecture, Dr. Huebinger covers different types of real-world data in addition to their strengths and weakness. He will then use an example from his own research to show how these data can be used to address research questions in novel ways.
Click the link below to register and access the Zoom session.

Examining factors that contribute to racial and ethnic disparities in liver cancer with a multi-institutional, EHR-based epidemiologic cohort linked to population-based cancer registries.
Mindy Hébert-DeRouen, PhD, MPH
Assistant Professor Department of Public Health Sciences College of Health, Education, and Social Transformation New Mexico State University
Presentation Scheduled for October 24th, 2025
Click the link below to register and access the Zoom session.
National Institute of General Medical Sciences (NIGMS) Sandbox Modules
The NIGMS Sandbox is a collection of cloud-based biomedical data science learning modules to teach students, researchers, clinicians, and others how to use the power of cloud technology for life sciences applications and research. Development and deployment of the NIGMS Sandbox is described in the following article, from Briefings in Bioinformatics.
The modules in this section are grouped into recommended learning pathway categories.
Introduction to Biomedical Data Science

Fundamentals of Bioinformatics
Dartmouth CollegeIn this module you will learn to use the Bash shell scripting language to work with common genomics file formats, create Conda environments, and troubleshoot command line errors.

Introduction to Data Science for Biology
San Francisco State UniversityIn this module you will learn how to create a simple decision tree using a structured dataset, evaluate model performance quantitatively, and understand why machine learning models require retraining from time to time.

Introduction to Python for Biology
Northwest Nazarene UniversityThe module prioritizes practical coding techniques for biological scientists who have limited or no background in programming in Python or other languages. The module also utilizes a blend of short instructional videos, interactive demonstrations, and hands-on exercises to facilitate self-directed learning and knowledge retention.

Introduction to R and LLMs for Biology
Duke UniversityThis repository contains materials for an introductory data science module, part of the NIH NIGMS sandbox initiative, designed for learners new to data science concepts and techniques. The module emphasizes practical applications using R programming and foundational statistical concepts, leveraging cloud computing on Google Cloud Platform (GCP) with Jupyter notebooks.
Introduction to Biomedical Machine Learning and Artificial Intelligence

Python and ML for Biomedical Data Science
University of DelawareThe module prioritizes practical, data-centric techniques, ensuring researchers can immediately apply their acquired data science and AI/ML knowledge to real-world problems. The module also utilizes a blend of engaging instructional videos, interactive turorials, hands-on exercises to facilitate self-directed learning and knowledge retention.

Analysis of Biomedical Data for Biomarker Discovery
University of Rhode IslandIn this module you will learn how to use exploratory data analysis, linear models, regression, and machine learning to discover biomarkers for kidney disease. This learning module will introduce the user to basic concepts in biomarker discovery that the user is likely to encounter in the clinical and biomedical literature.

Biomedical Imaging Analysis Using AI/ML Approaches
University of ArkansasIn this module you will learn how to generate a neural network, manipulate datasets, train a neural network on the dataset, apply the trained neural network to a new dataset, and quantify its performance.
Introduction to Biomedical Genomics

Consensus Pathway Analysis in the Cloud
University of Nevada RenoIn this module, you will learn how to download expression data, conduct differential analysis, perform enrichment analysis, meta-analysis and visualization. This cloud-based learning module teaches pathway analysis, a term that describes the set of tools and techniques used in life sciences research to discover the biological mechanism behind a condition from high throughput biological data

DNA Methylation Sequencing Analysis with WGBS
University of Hawaii at ManoaAs one of the most abundant and well-studied epigenetic modifications, DNA methylation plays an essential role in normal cell development and has various effects on transcription, genome stability, and DNA packaging within cells.

ATAC-Seq and Single Cell ATAC-Seq Analysis
University of Nebraska Medical CenterIn this module you will learn how to download raw sequence data, run differential peak identification, genome annotation, transcription factor footprinting, and produce common plots and visualizations.

Chromatin Occupancy with Cut and Run
University of Nebraska Medical CenterThis module covers the basic analysis and considerations for ChIP-seq, CUT&RUN, and CUT&Tag. Topics include quality control, filtering, alignment, deduplication, peak identification, visualization, and differential analysis of occupancy.

Integrating Multi-Omics Datasets
University of North DakotaIn this module you will learn how to analyze RNA-Seq, Epigenetics, and integrated multiomics datasets in R with a Nextflow pipeline. This module will walk you through some of the techniques to integrate transcriptomic and epigenetic data.
Introduction to Metagenomics and Phylogenetics

Introduction to Amplicon-Based Metagenomics
University of Nevada RenoThis cloud-based learning module introduces the principles of 16S rRNA sequencing and its applications in microbial community analysis. This sequencing technique generates a vast amount of data. Understanding how to process and analyze this data through a series of computational steps is critical in studies related to the human gut microbiome, among others.

Introduction to Phylogenetics
University of South DakotaThese submodules cover the end-to-end workflow of a standard phylogenetic analysis, starting at extracting a gene sequence to creating a phylogenetic tree to analyzing the tree. The phylogenetic analysis modules will serve for undergraduate through graduate level.

Introduction to Population Genomics
University of WyomingIn this tutorial, we show users how to assemble restriction-site associated DNA sequence (RADseq) data and perform some basic population genetic and phylogenetic analyses. All tutorials are presented as Jupyter notebooks.

Comparative Prokaryotic Genomics
University of New HampshireThis module introduces you to whole-genome sequencing and comparative genomics. You will work with numerous tools to assemble and assess a microbial genome, automate the process on many samples, and utilize the full dataset for comparative genomics analyses.

Introduction to Pangenomic Methods
National Center for Genome ResourcesThis module will introduce you to (graphical) pangenomics and walk you through a pangenomics pipeline. Specifically, you will learn how to build a pangenome graph, index the graph for analysis, map reads to the graph, call variants on the mapped reads, and visualize the graph.

Metagenomics Analysis of Biofilm-Microbiome
University of South DakotaIn this module you will learn how to use a Docker container to analyze amplicon sequencing metagenomics data with common tools such as qiime2 and PICRUSt2.
Introduction to Proteomics

Proteome Quantification
University of Arkansas for Medical SciencesIn this module you will learn about mass spectrometry data, statistical terminology for data preprocessing, data normalization, and differential abundance analysis for proteomics.

Proteome Structures and Docking
University of Arkansas for Medical SciencesThis module outlines the essential steps in the process of analyzing proteomics data and recommends commonly used tools and techniques for this purpose.This notebook describes mass spectrometry and statistical terminology for data preprocessing, normalization, and differential abundance analysis.
Introduction to RNAseq and Transcriptome Assembly

Explore RNA methylation using MeRIP-seq
University of Hawai'i ManoaThe MeRIP-seq data analysis tutorial is structured into four submodules, designed to comprehensively guide users through the complete workflow for RNA methylation analysis.

RNAseq Differential Expression Analysis
University of MaineIn this module you will learn how to download raw sequence data, run differential gene expression analysis, and produce common plots in R.

Transciptome Assembly Refinement and Applications
MDI Biological LaboratoryIn this module you will learn how to use a Nextflow pipeline to assemble and annotate a novel transcriptome using RNA-Seq data.

Transciptome scRNASeq, miRNASeq, and Transcription Factors
University of MaineThe purpose of these tutorials is to help users familiarize themselves with RNA sequencing (RNA-Seq) analysis workflows using Cloud computing. These tutorials do this by going step-by-step through specific workflows for bulk RNA-Seq, small RNA-Seq and single cell RNA-Seq (scRNA-Seq).
Follow the link below to view all available modules
Data Science Core Contacts
Charlotte Gard
Data Science Core Director,
Data Science Collaborative Consultation Center Director
New Mexico State University
cgard@nmsu.edu
Marshall Taylor
Educational Programming & Outreach Center Director
New Mexico State University
mtaylor2@nmsu.edu
Joann Mudge
NCGR Director
National Center for Genome Resources
jm@ncgr.org