Data Science Core

Request Statistical & Bioinformatics Support >>

Educational Programming and Outreach >>

Data Science Core Objectives

To support network investigators in biomedical, health, and community-based research with data science services through Data Science Consultation
To develop and enhance biomedical and health-focused data science knowledge and capacity through the Educational Programming and Outreach Center

Data Science Consultation

The Data Science Core accepts requests for data science support then matches investigators to faculty affiliates whose expertise aligns with the request. Any researcher at one of the NM-INBRE participating institutions can utilize services.

Statistical Support Offered

Experimental design, sample size determination, and power calculations
Planning and execution of studies using existing datasets, e.g., in public health
Statistical analysis including data visualization, descriptive statistics, hypothesis testing, and regression modeling
Use of statistical software including SAS, SPSS, and R

Bioinformatics Support Offered

Sequencing and Molecular Biology Lab: The lab is equipped for DNA/RNA sequencing and molecular biology analyses, currently using Oxford Nanopore DNA/RNA sequences and expertise in other major platforms.
Bioinformatics Analysis: NCGR offers broad bioinformatics expertise, such as genome assembly, pangenomic, single-cell, transcriptomic and metagenomic analyses and beyond.

Request Statistical and Bioinformatics Support

Statistical Support from NMSU

Please use the form below to request support from the Data Science Core. We will review your request and reach out to schedule a meeting to learn more about your project.

Request Statistical Support Here

Bioinformatics Support from NCGR

Please use the form below to request support from NCGR. We will review your request and reach out to schedule a meeting to learn more about your project.

Request Bioinformatics Support Here

Educational Programming and Outreach

The Educational Programming and Outreach Center offers workshops, asynchronous short courses, colloquiums, and undergraduate summer experiences designed to build data science capacity across the network.

Bioinformatics Training

NCGR trains students in bioinformatics, genomics, genetics, computing and biology. They provide onsite and virtual training across many levels, includingg K-12, undergraduate and graduate levels, postdoctoral and faculty level. They run in person and virtual bioinformatics workshops for New Mexico/INBRE students and researchers across several topics and can create custom bioinformatics workshops as well.



Differential Expression Workshop

Analysis of gene expression changes in response to different treatments or conditions can yield important biological insights, including elucidating abiotic and biotic stress responses, uncovering disease and resistance mechanisms, and highlighting developmental changes. In this workshop you will learn how to perform differential gene expression, visualization, and pathway analysis. In addition, you will learn how to perform coexpression analysis and identify genes driving expression shifts.



Metagenomics Workshop

Metagenomics approaches sequence and analyze communities of organisms, allowing the identification of community members and/or the genes and metabolic pathways that they harbor. In this workshop, you will learn how to analyze both 16S amplicon and whole genome shotgun sequence data, including filtering host reads (if appropriate), performing community analysis and creating metagenome-assembled genomes (MAGs), calculate measures of diversity, and visualize results. You will also learn how to perform metatranscriptomic analyses that enable you to investigate genes that are being actively expressed in the community.



Single Cell Transcriptomics Workshop

A typical RNA sequencing experiment captures transcriptional changes at the bulk tissue level representing a heterogenous mixture of cells and providing an aggregated expression result. By performing RNA sequencing on isolated single cells, one can pinpoint cellular expression and consider transcriptome sequences from individual cells to investigate, for example, cell lineage relationships, variability between cells and how a cell functions in its local environment. In this workshop, you will import single cell data, and cover topics such as normalization, quality control, dimensionality reduction, visualization, and data interpretation.



Pangenomics Workshop

Pangenomic analyses capitalize on the increasing availability of multiple, high quality, conspecific genome assemblies and the genetic variation that they contain. Pangenome approaches remove reference-bias and elucidate biological signals at a more comprehensive population or species scale. In this workshop, you will learn what pangenomes are, how to build a pangenome, and how to perform fundamental bioinformatic analyses to identify biological insights.

To register for a workshop or check upcoming dates, visit the NCGR workshop page below.

Bioinformatics Workshops

Explore the NCGR website to learn more about their full range of offerings.

Learn More

Data Science Colloquium Series

The Bright Future of Applied Statistics

Rafael A. Irizarry, PhD

Professor and Chair of the Department of Data Science at Dana-Farber Cancer Institute, Professor of Applied Statistics at Harvard.

Presentation Given on March 21st, 2025

The increasing need for data scientists has prompted universities to explore effective ways to train skilled professionals in this field. Yet, there’s no clear agreement on the essential components –like foundational principles, expertise, skills, or knowledge – that constitute data science as an academic discipline. In my seminar, I’ll begin by examining the role of academia, especially applied statisticians, in educating future data scientists. Following this, I’ll share two or three examples from my research to illustrate the crucial role of statistical thinking in solving real-world problems.

Harnessing the Power of Real-World Data to Build a Novel Cardiac Arrest Database

Dr. Ryan Huebinger

University of New Mexico Health Sciences Center

Presentation Given on September 26th, 2025

Real-world datasets are power tools for conducting low-cost or free research and generating preliminary data for grant applications. In this lecture, Dr. Huebinger covers different types of real-world data in addition to their strengths and weaknesses. He will then use an example from his own research to show how these data can be used to address research questions in novel ways.

Examining factors that contribute to racial and ethnic disparities in liver cancer with a multi-institutional, EHR-based epidemiologic cohort linked to population-based cancer registries.

Mindy Hébert-DeRouen, PhD, MPH

Assistant Professor Department of Public Health Sciences College of Health, Education, and Social Transformation New Mexico State University

Presentation Given on October 24th, 2025

Use of EHR-based cohorts for research requires informed extraction, harmonization, and operationalization of EHR data. Dr. Hebert-DeRouen will describe the development of an EHR-based cohort with data from three healthcare systems linked to population-based cancer registry data to examine clinical and neighborhood factors that contribute to racial and ethnic disparities in liver cancer. She will detail operationalization of detailed race/ethnicity (17 categories) and liver cancer risk factors from EHR data; illustrate the utility of linking EHR data to cancer registry data; and present results on racial/ethnic disparities in liver cancer as well as the relative prevalence and contribution of risk factors to liver cancer diagnosis across racial/ethnic groups.

Restricted-Use Federal Data for Health Research

Jani Little, PhD

Director, Rocky Mountain Research Data Center
Senior Research Associate, Institute of Behavioral Science
University of Colorado Boulder

Laura Argys, PhD

Professor of Economics
University of Colorado Denver

Presentation Given on November 7th, 2025

Through the Federal Statistical Research Data Center (FSRDC) program, approved university researchers gain access to national collections of population and healthdata collected internally and confidentially by federal statistical agencies. These data are NOT otherwise available for public use. Professor Laura Argys and Dr. Jani Littlewill provide an overview of restricted-use microdata resources, along with example research projects, and thought- provoking ideas for future research.

Highlighted datasets include: National Survey of Children’s Health (NSCH); Mortality Disparities in American Communities (MDAC); National Health and Nutrition Examination Surveys (NHANES); National Health Interview Surveys (NHIS); and National Immunization Surveys (NIS).

Characterizing Psychiatric Multimorbidity Trajectories Around Treatment-Resistant Depression Using Longitudinal Electronic Health Records

William Ofosu Agyapong, PhD

Data Scientist Laureate Institute for Brain Research

Presentation Given on April 17th, 2026

Dr. Agyapong presented a data-driven analysis of psychiatric multimorbidity trajectories aligned to treatment-resistant depression (TRD) onset using longitudinal electronic health records from the All of Us Research Program. By applying sequence analysis to diagnostic histories within a four-year window before and after TRD onset, he characterizes patterns of multimorbidity accumulation and identifies distinct trajectory clusters. He then examines demographic characteristics and polygenic risk scores to assess factors associated with different trajectory patterns.

National Institute of General Medical Sciences (NIGMS) Sandbox Modules

The NIGMS Sandbox is a collection of cloud-based biomedical data science learning modules to teach students, researchers, clinicians, and others how to use the power of cloud technology for life sciences applications and research. Development and deployment of the NIGMS Sandbox is described in the following article, from Briefings in Bioinformatics.

The modules in this section are grouped into recommended learning pathway categories.

Introduction to Biomedical Data Science >>

Introduction to Biomedical Machine Learning and Artificial Intelligence >>

Introduction to Biomedical Genomics >>

Introduction to Metagenomics and Phylogenetics >>

Introduction to Proteomics >>

Introduction to RNAseq and Transcriptome Assembly >>

Introduction to Biomedical Data Science

Fundamentals of Bioinformatics

Dartmouth College

Introduction Video

In this module you will learn to use the Bash shell scripting language to work with common genomics file formats, create Conda environments, and troubleshoot command line errors.