
Data Science Core Objectives
- To support network investigators in biomedical, health, and community-based research with data science services through the Data Science Consultation
- To develop and enhance biomedical and health-focused data science knowledge and capacity through the Educational Programming and Outreach Center

Data Science Consultation
The Data Science Core accepts requests for data science support then matches investigators to faculty affiliates whose expertise aligns with the request. Any researcher at one of the NM-INBRE participating institutions can utilize services.
Statistical Support Offered
- Experimental design, sample size determination, and power calculations
- Planning and execution of studies using existing datasets, e.g., in public health
- Statistical analysis including data visualization, descriptive statistics, hypothesis testing, and regression modeling
- Use of statistical software including SAS, SPSS, and R
Bioinformatics Support Offered
- Sequencing and Molecular Biology Lab The lab is equipped for DNA/RNA sequencing and molecular biology, currently using Oxford Nanopore with access to discounted services and expertise in other major platforms.
- Bioinformatics Analysis NCGR offers broad bioinformatics expertise—from genome assembly to single-cell and metagenomic analyses—using advanced tools like PacBio IsoSeq and pangenomics to connect genetic variation with traits.
Request Statistical and Bioinformatics Support
Statistical Support from NMSU
Please use the form below to request support from the Data Science Core. We will review your request and reach out to schedule a meeting to learn more about your project.
Bioinformatics Support from NCGR
Please use the form below to request support from NCGR. We will review your request and reach out to schedule a meeting to learn more about your project.
Educational Programming and Outreach
The Educational Programming and Outreach Center offers workshops, asynchronous short courses, colloquiums, and undergraduate summer experiences designed to build data science capacity across the network.
Bioinformatics Training
NCGR trains students in bioinformatics, genomics, genetics, computing and biology. They provide onsite and virtual training across many levels, includingg k-12, undergraduate and graduate levels, postdoctoral and faculty level. They run in person and virtual bioinformatics workshops for New Mexico/INBRE students and researchers across several topics.
Differential Gene Expression Workshop
In this 1-week workshop, we focus on a particular analysis area. For example, in the popular Differential gene Expression (DE) workshop, we teach students the skill-set to independently analyze RNA-Seq data using the command line interface, analytical workflows and current DE tools. As part of the course, we cover UNIX fundamentals, data QC, alignments, read count generation, featureCounts and pathway analysis.
Multi-Omic Bioinformatics Intensive
In this multi-topic course, we teach students UNIX and cover other analysis areas such as assembly, differential gene expression, variant detection, metagenomics and visualization methods. As with all our workshops, we work with students as a group and individually to ensure their success.
Pan-Genomics Workshop
In this 1-week workshop, students will be introduced to the concept of the pan-genome and receive instruction on how to analyze these data. Topics will include relevant UNIX skills, pan-genome representation and construction, visualization, and pan-genome analysis. Analysis topics will include read-mapping, alignment, variant/haplotype calling and annotation.
To register for a workshop or check upcoming dates, visit the NCGR workshop page below.
Explore the NCGR website to learn more about their full range of offerings.
Data Science Colloquium Series

The Bright Future of Applied Statistics
Rafael A. Irizarry, PhD
Professor and Chair of the Department of Data Science at Dana-Farber Cancer Institute, Professor of Applied Statistics at Harvard.
Presentation Given on March 21st, 2025
National Institute of General Medical Sciences (NIGMS) Sandbox Modules
The NIGMS Sandbox is a collection of cloud-based biomedical data science learning modules to teach students, researchers, clinicians, and others how to use the power of cloud technology for life sciences applications and research. Development and deployment of the NIGMS Sandbox is described in the following article, from Briefings in Bioinformatics.
The modules in this section are grouped into recommended learning pathway categories.
Introduction to Biomedical Data Science

Fundamentals of Bioinformatics
Dartmouth CollegeIn this module you will learn to use the Bash shell scripting language to work with common genomics file formats, create Conda environments, and troubleshoot command line errors.

Introduction to Data Science for Biology
San Francisco State UniversityIn this module you will learn how to create a simple decision tree using a structured dataset, evaluate model performance quantitatively, and understand why machine learning models require retraining from time to time.

Introduction to Python for Biology
Northwest Nazarene UniversityThe module prioritizes practical coding techniques for biological scientists who have limited or no background in programming in Python or other languages. The module also utilizes a blend of short instructional videos, interactive demonstrations, and hands-on exercises to facilitate self-directed learning and knowledge retention.

Introduction to R and LLMs for Biology
Duke UniversityThis repository contains materials for an introductory data science module, part of the NIH NIGMS sandbox initiative, designed for learners new to data science concepts and techniques. The module emphasizes practical applications using R programming and foundational statistical concepts, leveraging cloud computing on Google Cloud Platform (GCP) with Jupyter notebooks.
Introduction to Biomedical Machine Learning and Artificial Intelligence

Python and ML for Biomedical Data Science
University of DelawareThe module prioritizes practical, data-centric techniques, ensuring researchers can immediately apply their acquired data science and AI/ML knowledge to real-world problems. The module also utilizes a blend of engaging instructional videos, interactive turorials, hands-on exercises to facilitate self-directed learning and knowledge retention.

Analysis of Biomedical Data for Biomarker Discovery
University of Rhode IslandIn this module you will learn how to use exploratory data analysis, linear models, regression, and machine learning to discover biomarkers for kidney disease. This learning module will introduce the user to basic concepts in biomarker discovery that the user is likely to encounter in the clinical and biomedical literature.

Biomedical Imaging Analysis Using AI/ML Approaches
University of ArkansasIn this module you will learn how to generate a neural network, manipulate datasets, train a neural network on the dataset, apply the trained neural network to a new dataset, and quantify its performance.
Introduction to Biomedical Genomics

Consensus Pathway Analysis in the Cloud
University of Nevada RenoIn this module, you will learn how to download expression data, conduct differential analysis, perform enrichment analysis, meta-analysis and visualization. This cloud-based learning module teaches pathway analysis, a term that describes the set of tools and techniques used in life sciences research to discover the biological mechanism behind a condition from high throughput biological data

DNA Methylation Sequencing Analysis with WGBS
University of Hawaii at ManoaAs one of the most abundant and well-studied epigenetic modifications, DNA methylation plays an essential role in normal cell development and has various effects on transcription, genome stability, and DNA packaging within cells.

ATAC-Seq and Single Cell ATAC-Seq Analysis
University of Nebraska Medical CenterIn this module you will learn how to download raw sequence data, run differential peak identification, genome annotation, transcription factor footprinting, and produce common plots and visualizations.

Chromatin Occupancy with Cut and Run
University of Nebraska Medical CenterThis module covers the basic analysis and considerations for ChIP-seq, CUT&RUN, and CUT&Tag. Topics include quality control, filtering, alignment, deduplication, peak identification, visualization, and differential analysis of occupancy.

Integrating Multi-Omics Datasets
University of North DakotaIn this module you will learn how to analyze RNA-Seq, Epigenetics, and integrated multiomics datasets in R with a Nextflow pipeline. This module will walk you through some of the techniques to integrate transcriptomic and epigenetic data.
Introduction to Metagenomics and Phylogenetics

Introduction to Amplicon-Based Metagenomics
University of Nevada RenoThis cloud-based learning module introduces the principles of 16S rRNA sequencing and its applications in microbial community analysis. This sequencing technique generates a vast amount of data. Understanding how to process and analyze this data through a series of computational steps is critical in studies related to the human gut microbiome, among others.

Introduction to Phylogenetics
University of South DakotaThese submodules cover the end-to-end workflow of a standard phylogenetic analysis, starting at extracting a gene sequence to creating a phylogenetic tree to analyzing the tree. The phylogenetic analysis modules will serve for undergraduate through graduate level.

Introduction to Population Genomics
University of WyomingIn this tutorial, we show users how to assemble restriction-site associated DNA sequence (RADseq) data and perform some basic population genetic and phylogenetic analyses. All tutorials are presented as Jupyter notebooks.

Comparative Prokaryotic Genomics
University of New HampshireThis module introduces you to whole-genome sequencing and comparative genomics. You will work with numerous tools to assemble and assess a microbial genome, automate the process on many samples, and utilize the full dataset for comparative genomics analyses.

Introduction to Pangenomic Methods
National Center for Genome ResourcesThis module will introduce you to (graphical) pangenomics and walk you through a pangenomics pipeline. Specifically, you will learn how to build a pangenome graph, index the graph for analysis, map reads to the graph, call variants on the mapped reads, and visualize the graph.

Metagenomics Analysis of Biofilm-Microbiome
University of South DakotaIn this module you will learn how to use a Docker container to analyze amplicon sequencing metagenomics data with common tools such as qiime2 and PICRUSt2.
Introduction to Proteomics

Proteome Quantification
University of Arkansas for Medical SciencesIn this module you will learn about mass spectrometry data, statistical terminology for data preprocessing, data normalization, and differential abundance analysis for proteomics.

Proteome Structures and Docking
University of Arkansas for Medical SciencesThis module outlines the essential steps in the process of analyzing proteomics data and recommends commonly used tools and techniques for this purpose.This notebook describes mass spectrometry and statistical terminology for data preprocessing, normalization, and differential abundance analysis.
Introduction to RNAseq and Transcriptome Assembly

Explore RNA methylation using MeRIP-seq
University of Hawai'i ManoaThe MeRIP-seq data analysis tutorial is structured into four submodules, designed to comprehensively guide users through the complete workflow for RNA methylation analysis.

RNAseq Differential Expression Analysis
University of MaineIn this module you will learn how to download raw sequence data, run differential gene expression analysis, and produce common plots in R.

Transciptome Assembly Refinement and Applications
MDI Biological LaboratoryIn this module you will learn how to use a Nextflow pipeline to assemble and annotate a novel transcriptome using RNA-Seq data.

Transciptome scRNASeq, miRNASeq, and Transcription Factors
University of MaineThe purpose of these tutorials is to help users familiarize themselves with RNA sequencing (RNA-Seq) analysis workflows using Cloud computing. These tutorials do this by going step-by-step through specific workflows for bulk RNA-Seq, small RNA-Seq and single cell RNA-Seq (scRNA-Seq).
Data Science Core Contacts
Charlotte Gard
Data Science Core Director,
Data Science Collaborative Consultation Center Director
New Mexico State University
cgard@nmsu.edu
Marshall Taylor
Educational Programming & Outreach Center Director
New Mexico State University
mtaylor2@nmsu.edu
Joann Mudge
NCGR Director
National Center for Genome Resources
jm@ncgr.org