
Objectives
- To support investigators in biomedical, health, and community-based research across the network with data science services.
- To develop and enhance biomedical and health-focused data science knowledge and capacity across the network.
Types of Support Offered
- Experimental design, sample size determination, and power calculations
- Planning and execution of studies using existing datasets, e.g., in public health
- Statistical analysis including data visualization, descriptive statistics, hypothesis testing, and regression modeling
- Use of statistical software including SAS, SPSS, and R

Available Cores and Institutional Research Facilities at NCGR
Sequencing and Molecular Biology Lab
The lab includes the needed equipment for performing DNA/RNA sequencing and molecular biology experiments. Currently, NCGR only performs Oxford Nanopore Sequencing but has agreements in place for discounted sequencing services on other platforms and have a thorough understanding of the main sequencing technologies.
Bioinformatics Analysis
NCGR provides extensive bioinformatics expertise, covering genomic variation, expression analysis, epigenetics, single-cell studies, genome assembly, structural variation, and metagenomics. NCGR employs advanced tools and techniques like PacBio IsoSeq and pangenomic analyses to link genetic variation with phenotypic traits.
Bioinformatics Training
NCGR trains students in bioinformatics, genomics, genetics, computing and biology. They provide onsite and virtual training across many levels, including K-12, undergraduate and graduate levels, postdoctoral and faculty level. They run in person and virtual bioinformatics workshops for New Mexico/INBRE students and researchers across several topics.
Request Statistical Support
Data Science Core at NMSU
Please use the form below to request support from the Data Science Core. We will review your request and reach out to schedule a meeting to learn more about your project.
Support from NCGR
Please use the form below to request support for NCGR Cores. We will review your request and reach out to schedule a meeting to learn more about your project.
Educational Programming and Outreach
Differential Gene Expression Workshop
In this 1-week workshop, we focus on a particular analysis area. For example, in the popular Differential gene Expression (DE) workshop, we teach students the skill-set to independently analyze RNA-Seq data using the command line interface, analytical workflows and current DE tools. As part of the course, we cover UNIX fundamentals, data QC, alignments, read count generation, featureCounts and pathway analysis.
Multi-Omic Bioinformatics Intensive
In this multi-topic course, we teach students UNIX and cover other analysis areas such as assembly, differential gene expression, variant detection, metagenomics and visualization methods. As with all our workshops, we work with students as a group and individually to ensure their success.
Pan-Genomics Workshop
In this 1-week workshop, students will be introduced to the concept of the pan-genome and receive instruction on how to analyze these data. Topics will include relevant UNIX skills, pan-genome representation and construction, visualization, and pan-genome analysis. Analysis topics will include read-mapping, alignment, variant/haplotype calling and annotation.
Follow the link below to explore the programs and opportunities available at NCGR.
March 21st, 2025 Presentation by Rafael A. Irizarry, PhD
Professor and Chair of the Department of Data Science at Dana-Farber Cancer Institute, Professor of Applied Statistics at Harvard
Briefings In Bioinformatics
Briefings in Bioinformatics is an international forum for researchers and educators in the life sciences. The journal will also be of interest to mathematicians, statisticians and computer scientists who apply their work to biological problems. The journal publishes reviews for the users of databases and analytical tools of contemporary genetics, molecular and systems biology and is unique in providing practical help and guidance to the non-specialist in computerized methodology. Papers range in scope and depth, from the introductory level to specific details of protocols and analyses encompassing bacterial, plant, fungal, animal and human data.
To view the latest and past issues, please follow the read more link below.
National Institute of General Medical Sciences (NIGMS) Sandbox Modules
The NIGMS Sandbox is a collection of cloud-based biomedical data science learning modules to teach students, researchers, clinicians, and others how to use the power of cloud technology for life sciences applications and research.
Available Modules
If you’d like to view all available modules, click the ‘Learn More’ button below. The modules shown here are grouped into recommended learning pathway categories.
Introduction to Biomedical Data Science
Fundamentals of Bioinformatics
Dartmouth College
In this module you will learn to use the Bash shell scripting language to work with common genomics file formats, create Conda environments, and troubleshoot command line errors.
Introduction to Data Science for Biology
San Francisco State University
In this module you will learn how to create a simple decision tree using a structured dataset, evaluate model performance quantitatively, and understand why machine learning models require retraining from time to time.
Introduction to Python for Biology
Northern Nazarene University
The module prioritizes practical coding techniques for biological scientists who have limited or no background in programming in Python or other languages. The module also utilizes a blend of short instructional videos, interactive demonstrations, and hands-on exercises to facilitate self-directed learning and knowledge retention.
Introduction to R and LLMs for Biology
Duke University
This repository contains materials for an introductory data science module, part of the NIH NIGMS sandbox initiative, designed for learners new to data science concepts and techniques. The module emphasizes practical applications using R programming and foundational statistical concepts, leveraging cloud computing on Google Cloud Platform (GCP) with Jupyter notebooks.
Introduction to Biomedical Machine Learning and Artificial Intelligence
Python and ML for Biomedical Data Science
University of Delaware
The module prioritizes practical, data-centric techniques, ensuring researchers can immediately apply their acquired data science and AI/ML knowledge to real-world problems. The module also utilizes a blend of engaging instructional videos, interactive turorials, hands-on exercises to facilitate self-directed learning and knowledge retention.
Analysis of Biomedical Data for Biomarker Discovery
University of Rhode Island
In this module you will learn how to use exploratory data analysis, linear models, regression, and machine learning to discover biomarkers for kidney disease. This learning module will introduce the user to basic concepts in biomarker discovery that the user is likely to encounter in the clinical and biomedical literature.
Biomedical Imaging Analysis Using AI/ML Approaches
University of Arkansas
In this module you will learn how to generate a neural network, manipulate datasets, train a neural network on the dataset, apply the trained neural network to a new dataset, and quantify its performance.
Introduction to Biomedical Genomics
Consensus Pathway Analysis in the Cloud
University of Nevada Reno
In this module, you will learn how to download expression data, conduct differential analysis, perform enrichment analysis, meta-analysis and visualization. This cloud-based learning module teaches pathway analysis, a term that describes the set of tools and techniques used in life sciences research to discover the biological mechanism behind a condition from high throughput biological data
DNA Methylation Sequencing Analysis with WGBS
University of Hawaii at Manoa
As one of the most abundant and well-studied epigenetic modifications, DNA methylation plays an essential role in normal cell development and has various effects on transcription, genome stability, and DNA packaging within cells.
ATAC-Seq and Single Cell ATAC-Seq Analysis
University of Nebraska Medical Center
In this module you will learn how to download raw sequence data, run differential peak identification, genome annotation, transcription factor footprinting, and produce common plots and visualizations.
Chromatin Occupancy with Cut and Run
University of Nebraska Medical Center
This module covers the basic analysis and considerations for ChIP-seq, CUT&RUN, and CUT&Tag. Topics include quality control, filtering, alignment, deduplication, peak identification, visualization, and differential analysis of occupancy.
Integrating Multi-Omics Datasets
University of North Dakota
In this module you will learn how to analyze RNA-Seq, Epigenetics, and integrated multiomics datasets in R with a Nextflow pipeline. This module will walk you through some of the techniques to integrate transcriptomic and epigenetic data.
Introduction to Metagenomics and Phylogenetics
Introduction to Amplicon-Based Metagenomics
University of Nevada Reno
This cloud-based learning module introduces the principles of 16S rRNA sequencing and its applications in microbial community analysis. This sequencing technique generates a vast amount of data. Understanding how to process and analyze this data through a series of computational steps is critical in studies related to the human gut microbiome, among others.
Introduction to Phylogenetics
University of South Dakota
These submodules cover the end-to-end workflow of a standard phylogenetic analysis, starting at extracting a gene sequence to creating a phylogenetic tree to analyzing the tree. The phylogenetic analysis modules will serve for undergraduate through graduate level.
Introduction to Population Genomics
University of Wyoming
In this tutorial, we show users how to assemble restriction-site associated DNA sequence (RADseq) data and perform some basic population genetic and phylogenetic analyses. All tutorials are presented as Jupyter notebooks.
Comparative Prokaryotic Genomics
University of New Hampshire
This module introduces you to whole-genome sequencing and comparative genomics. You will work with numerous tools to assemble and assess a microbial genome, automate the process on many samples, and utilize the full dataset for comparative genomics analyses.
Introduction to Pangenomic Methods
National Center for Genome Resources
This module will introduce you to (graphical) pangenomics and walk you through a pangenomics pipeline. Specifically, you will learn how to build a pangenome graph, index the graph for analysis, map reads to the graph, call variants on the mapped reads, and visualize the graph.
Metagenomics Analysis of Biofilm-Microbiome
University of South Dakota
In this module you will learn how to use a Docker container to analyze amplicon sequencing metagenomics data with common tools such as qiime2 and PICRUSt2.
Introduction to Proteomics
Proteome Quantification
University of Arkansas for Medical Sciences
In this module you will learn about mass spectrometry data, statistical terminology for data preprocessing, data normalization, and differential abundance analysis for proteomics.
Proteome Structures and Docking
University of Arkansas for Medical Sciences
This module outlines the essential steps in the process of analyzing proteomics data and recommends commonly used tools and techniques for this purpose.This notebook describes mass spectrometry and statistical terminology for data preprocessing, normalization, and differential abundance analysis.
Introduction to RNAseq and Transcriptome Assembly
Explore RNA methylation using MeRIP-seq
University of Hawai’i Manoa
The MeRIP-seq data analysis tutorial is structured into four submodules, designed to comprehensively guide users through the complete workflow for RNA methylation analysis.
RNAseq Differential Expression Analysis
University of Maine
In this module you will learn how to download raw sequence data, run differential gene expression analysis, and produce common plots in R.
Transciptome Assembly Refinemen and Applications
MDI Biological Laboratory
In this module you will learn how to use a Nextflow pipeline to assemble and annotate a novel transcriptome using RNA-Seq data.
Transciptome scRNASeq, miRNASeq, and Transcription Factors
The University of Maine
The purpose of these tutorials is to help users familiarize themselves with RNA sequencing (RNA-Seq) analysis workflows using Cloud computing. These tutorials do this by going step-by-step through specific workflows for bulk RNA-Seq, small RNA-Seq and single cell RNA-Seq (scRNA-Seq).
Data Science Core Contacts
Charlotte Gard
Data Science Core Director
Data Science Collaborative Consulting Center Director
New Mexico State University
cgard@nmsu.edu
Marshall Taylor
Education & Outreach Director
New Mexico State University
mtaylor2@nmsu.edu
Joann Mudge
NCGR Director
National Center for Genome Resources
jm@ncgr.org