Objectives

  • To support investigators in biomedical, health, and community-based research across the network with data science services.
  • To develop and enhance biomedical and health-focused data science knowledge and capacity across the network.

Types of Support Offered

  • Experimental design, sample size determination, and power calculations
  • Planning and execution of studies using existing datasets, e.g., in public health 
  • Statistical analysis including data visualization, descriptive statistics, hypothesis testing, and regression modeling
  • Use of statistical software including SAS, SPSS, and R

Available Cores and Institutional Research Facilities at NCGR

Sequencing and Molecular Biology Lab

The lab includes the needed equipment for performing DNA/RNA sequencing and molecular biology experiments. Currently, NCGR only performs Oxford Nanopore Sequencing but has agreements in place for discounted sequencing services on other platforms and have a thorough understanding of the main sequencing technologies.

Bioinformatics Analysis

NCGR provides extensive bioinformatics expertise, covering genomic variation, expression analysis, epigenetics, single-cell studies, genome assembly, structural variation, and metagenomics. NCGR employs advanced tools and techniques like PacBio IsoSeq and pangenomic analyses to link genetic variation with phenotypic traits.

Bioinformatics Training

NCGR trains students in bioinformatics, genomics, genetics, computing and biology. They provide onsite and virtual training across many levels, including K-12, undergraduate and graduate levels, postdoctoral and faculty level. They run in person and virtual bioinformatics workshops for New Mexico/INBRE students and researchers across several topics.

Request Statistical Support

Data Science Core at NMSU

Please use the form below to request support from the Data Science Core. We will review your request and reach out to schedule a meeting to learn more about your project.

Support from NCGR 

Please use the form below to request support for NCGR Cores. We will review your request and reach out to schedule a meeting to learn more about your project.

Educational Programming and Outreach

Differential Gene Expression Workshop

In this 1-week workshop, we focus on a particular analysis area. For example, in the popular Differential gene Expression (DE) workshop, we teach students the skill-set to independently analyze RNA-Seq data using the command line interface, analytical workflows and current DE tools. As part of the course, we cover UNIX fundamentals, data QC, alignments, read count generation, featureCounts and pathway analysis.

Multi-Omic Bioinformatics Intensive

In this multi-topic course, we teach students UNIX and cover other analysis areas such as assembly, differential gene expression, variant detection, metagenomics and visualization methods. As with all our workshops, we work with students as a group and individually to ensure their success.

Pan-Genomics Workshop

In this 1-week workshop, students will be introduced to the concept of the pan-genome and receive instruction on how to analyze these data. Topics will include relevant UNIX skills, pan-genome representation and construction, visualization, and pan-genome analysis. Analysis topics will include read-mapping, alignment, variant/haplotype calling and annotation.

Follow the link below to explore the programs and opportunities available at NCGR.

March 21st, 2025 Presentation by Rafael A. Irizarry, PhD

Professor and Chair of the Department of Data Science at Dana-Farber Cancer Institute, Professor of Applied Statistics at Harvard

Briefings In Bioinformatics

Briefings in Bioinformatics is an international forum for researchers and educators in the life sciences. The journal will also be of interest to mathematicians, statisticians and computer scientists who apply their work to biological problems. The journal publishes reviews for the users of databases and analytical tools of contemporary genetics, molecular and systems biology and is unique in providing practical help and guidance to the non-specialist in computerized methodology. Papers range in scope and depth, from the introductory level to specific details of protocols and analyses encompassing bacterial, plant, fungal, animal and human data.

To view the latest and past issues, please follow the read more link below.

National Institute of General Medical Sciences (NIGMS) Sandbox Modules

The NIGMS Sandbox is a collection of cloud-based biomedical data science learning modules to teach students, researchers, clinicians, and others how to use the power of cloud technology for life sciences applications and research. 

Available Modules

If you’d like to view all available modules, click the ‘Learn More’ button below. The modules shown here are grouped into recommended learning pathway categories.

Introduction to Biomedical Data Science

Fundamentals of Bioinformatics

Dartmouth College

In this module you will learn to use the Bash shell scripting language to work with common genomics file formats, create Conda environments, and troubleshoot command line errors.

Introduction to Data Science for Biology

San Francisco State University

In this module you will learn how to create a simple decision tree using a structured dataset, evaluate model performance quantitatively, and understand why machine learning models require retraining from time to time. 

Introduction to Python for Biology

Northern Nazarene University

The module prioritizes practical coding techniques for biological scientists who have limited or no background in programming in Python or other languages. The module also utilizes a blend of short instructional videos, interactive demonstrations, and hands-on exercises to facilitate self-directed learning and knowledge retention.

Introduction to R and LLMs for Biology

Duke University

This repository contains materials for an introductory data science module, part of the NIH NIGMS sandbox initiative, designed for learners new to data science concepts and techniques. The module emphasizes practical applications using R programming and foundational statistical concepts, leveraging cloud computing on Google Cloud Platform (GCP) with Jupyter notebooks.

Introduction to Biomedical Machine Learning and Artificial Intelligence

Python and ML for Biomedical Data Science

University of Delaware

The module prioritizes practical, data-centric techniques, ensuring researchers can immediately apply their acquired data science and AI/ML knowledge to real-world problems. The module also utilizes a blend of engaging instructional videos, interactive turorials, hands-on exercises to facilitate self-directed learning and knowledge retention.

Analysis of Biomedical Data for Biomarker Discovery

University of Rhode Island

In this module you will learn how to use exploratory data analysis, linear models, regression, and machine learning to discover biomarkers for kidney disease. This learning module will introduce the user to basic concepts in biomarker discovery that the user is likely to encounter in the clinical and biomedical literature.

Biomedical Imaging Analysis Using AI/ML Approaches

University of Arkansas

In this module you will learn how to generate a neural network, manipulate datasets, train a neural network on the dataset, apply the trained neural network to a new dataset, and quantify its performance. 

Introduction to Biomedical Genomics

Consensus Pathway Analysis in the Cloud

University of Nevada Reno

In this module, you will learn how to download expression data, conduct differential analysis, perform enrichment analysis, meta-analysis and visualization. This cloud-based learning module teaches pathway analysis, a term that describes the set of tools and techniques used in life sciences research to discover the biological mechanism behind a condition from high throughput biological data

DNA Methylation Sequencing Analysis with WGBS

University of  Hawaii at Manoa

As one of the most abundant and well-studied epigenetic modifications, DNA methylation plays an essential role in normal cell development and has various effects on transcription, genome stability, and DNA packaging within cells.

ATAC-Seq and Single Cell ATAC-Seq Analysis

University of Nebraska Medical Center

In this module you will learn how to download raw sequence data, run differential peak identification, genome annotation, transcription factor footprinting, and produce common plots and visualizations.

Chromatin Occupancy with Cut and Run

University of Nebraska Medical Center

This module covers the basic analysis and considerations for ChIP-seq, CUT&RUN, and CUT&Tag. Topics include quality control, filtering, alignment, deduplication, peak identification, visualization, and differential analysis of occupancy.

Integrating Multi-Omics Datasets

University of North Dakota

In this module you will learn how to analyze RNA-Seq, Epigenetics, and integrated multiomics datasets in R with a Nextflow pipeline. This module will walk you through some of the techniques to integrate transcriptomic and epigenetic data. 

Introduction to Metagenomics and Phylogenetics

Introduction to Amplicon-Based Metagenomics

University of Nevada Reno

This cloud-based learning module introduces the principles of 16S rRNA sequencing and its applications in microbial community analysis. This sequencing technique generates a vast amount of data. Understanding how to process and analyze this data through a series of computational steps is critical in studies related to the human gut microbiome, among others.

Introduction to Phylogenetics

University of South Dakota

These submodules cover the end-to-end workflow of a standard phylogenetic analysis, starting at extracting a gene sequence to creating a phylogenetic tree to analyzing the tree. The phylogenetic analysis modules will serve for undergraduate through graduate level.

Introduction to Population Genomics

University of Wyoming

In this tutorial, we show users how to assemble restriction-site associated DNA sequence (RADseq) data and perform some basic population genetic and phylogenetic analyses. All tutorials are presented as Jupyter notebooks.

Comparative Prokaryotic Genomics

University of New Hampshire

This module introduces you to whole-genome sequencing and comparative genomics. You will work with numerous tools to assemble and assess a microbial genome, automate the process on many samples, and utilize the full dataset for comparative genomics analyses.

Introduction to Pangenomic Methods

National Center for Genome Resources

This module will introduce you to (graphical) pangenomics and walk you through a pangenomics pipeline. Specifically, you will learn how to build a pangenome graph, index the graph for analysis, map reads to the graph, call variants on the mapped reads, and visualize the graph.

Metagenomics Analysis of Biofilm-Microbiome

University of South Dakota

In this module you will learn how to use a Docker container to analyze amplicon sequencing metagenomics data with common tools such as qiime2 and PICRUSt2.

Introduction to Proteomics

Proteome Quantification

University of Arkansas for Medical Sciences

In this module you will learn about mass spectrometry data, statistical terminology for data preprocessing, data normalization, and differential abundance analysis for proteomics.

Proteome Structures and Docking

University of Arkansas for Medical Sciences

This module outlines the essential steps in the process of analyzing proteomics data and recommends commonly used tools and techniques for this purpose.This notebook describes mass spectrometry and statistical terminology for data preprocessing, normalization, and differential abundance analysis.

Introduction to RNAseq and Transcriptome Assembly

Explore RNA methylation using MeRIP-seq

University of Hawai’i Manoa

The MeRIP-seq data analysis tutorial is structured into four submodules, designed to comprehensively guide users through the complete workflow for RNA methylation analysis.

RNAseq Differential Expression Analysis

University of Maine

In this module you will learn how to download raw sequence data, run differential gene expression analysis, and produce common plots in R.

Transciptome Assembly Refinemen and Applications

MDI Biological Laboratory

In this module you will learn how to use a Nextflow pipeline to assemble and annotate a novel transcriptome using RNA-Seq data.

Transciptome scRNASeq, miRNASeq, and Transcription Factors

The University of Maine

The purpose of these tutorials is to help users familiarize themselves with RNA sequencing (RNA-Seq) analysis workflows using Cloud computing. These tutorials do this by going step-by-step through specific workflows for bulk RNA-Seq, small RNA-Seq and single cell RNA-Seq (scRNA-Seq).

Data Science Core Contacts

Charlotte Gard

Data Science Core Director

Data Science Collaborative Consulting Center Director

New Mexico State University
cgard@nmsu.edu

Marshall Taylor

Education & Outreach Director

New Mexico State University
mtaylor2@nmsu.edu

Joann Mudge

NCGR Director

National Center for Genome Resources
jm@ncgr.org