Sequencing & Bioinformatics Core

  • NCGR Sequencing
The mission of the New Mexico INBRE Sequencing and Bioinformatics Core (SBC) is to provide genomics and bioinformatics resources, tools, education, and training to enable INBRE researchers to make rapid scientific progress in their field of study. The SBC is housed at the National Center for Genome Resources, a nonprofit research institute located in Santa Fe, NM, with an international reputation for productivity and dedication to scientific achievement.

The National Center for Genome Resources (NCGR) has been at the forefront of bioinformatics since 1994 when it developed the first relational genome sequence database (GSDB), and thereafter with a series of innovative software tools and practices having a common theme of integrating and extracting value from –omics data. Establishing an NGS center in late 2007 and becoming the first Illumina sequencing certified service provider (CS-Pro®) in North America in 2008, NCGR and the New Mexico INBRE SBC have the capacity to transform the research of the NM-INBRE network and to prepare the next generation of biomedical scientists for the era of personalized medicine.

NCGR provides INBRE researchers with a collaborative sequencing and bioinformatics core that supports hypothesis and discovery-driven research in the thematic research focus areas of the Network.The SBC’s specific objectives are to:

  • Advance cutting-edge knowledge discovery through innovative bioinformatics analysis techniques, resources, and tools.
  • Deliver and develop ground-breaking next-generation sequencing technologies and resources.
  • Engage the network in education and training through outreach, mentorships, internships, and symposia. 
  • Build and maintain research-enabling IT Infrastructure and mechanisms for communication within the network and to the public.

Sequencing

By establishing a Next Generation Sequencing (NGS) Center in late 2007, and being the first Illumina sequencing Certified Service Provider (CS-Pro®) in North America in 2008, NCGR and the NM INBRE Sequencing and Bioinformatics Core (SBC) are transforming the research of the INBRE network by providing de novo sequencing, resequencing and functional genomics applications of enormous translational impact. These include the molecular understanding of single-gene and complex disorders, cancer biology, traits of agronomic importance, environmental genomics and personalized medicine. This insight is achieved by combining NGS with the center’s sophisticated bioinformatics tools, software development, robust IT infrastructure and creative techniques for data mining and knowledge discovery.

The objective of the core is to provide the INBRE network with turn-key access to discovery-enabling sequencing and bioinformatics solutions, educational symposia, mentorship/internship programs and collaborative research support.

NCGR employs the latest methods and is a recognized Illumina and Agilent CS-Pro certified lab with sequencing instrumentation comprised of two HiSeq2000’s, a GAIIx, and a Pacific Biosciences RS partially funded by NM INBRE. The lab uses the PerkinElmer (Caliper) LCGX high-throughput nucleic acid assessment instrument in conjunction with the Sciclone NGS liquid handling robot to QC samples and prepare libraries in a high-throughput manner. Sequencing services include whole genome and transcriptome shotgun, ChIP, small RNA, mate-pair, ultra-low sample input (1ng) DNA, and targeted exome. Bioinformatics includes de novo assembly (transcriptome and genome), read count-based expression, variant detection, differential expression analysis and custom bioinformatics for challenging analysis problems. A competitive pilot study RFP mechanism is used to establish collaborations in the INBRE network to solve pressing research questions in various organisms and clinical areas.

Since 2008, over 45 INBRE collaborations have been created enabling 15 grant awards and 12 peer reviewed publications in the areas of basic science and environmental research, assessment and modeling of human disease (in traditional and emerging model organisms),  de novo genome assembly and comparative genomics of agents of infectious disease, and biological studies in human disease.

The NM-INBRE SBC increases the research capacity of INBRE investigators and their competitiveness in attaining research grants and manuscript publications by providing cutting-edge discovery techniques, fruitful collaborations, a yearly educational New Mexico BioInformatics, Science, and Technology (NMBIST) symposium, and bioinformatics-based educational initiatives for elementary through graduate school students.


Bioinformatics

NCGR has amassed a large portfolio of bioinformatics expertise and tools and provides researchers in academia and industry around the world a conscientious knowledge-discovery partner tackling complex analytical issues.  The NM INBRE Sequencing and Bioinformatics Core (SBC) leverages this expertise, which includes:

  • Analyses of genomic variation, including variant detection, marker development, genotyping by sequencing, functional characterization, and phenotypic association studies.

  • Expression analyses, including gene and isoform level differential expression, allele-specific, small RNA studies, and pathway analyses.

  • Epigenetic analyses of single base pair resolution methylation states.

  • ChIP-Seq differential binding and chromatin modification studies.

  • Data-driven genome annotation and structural variation (CNV, translocations, novel insertion) studies.

  • Genome and transcriptome assembly and annotation.

NCGR offers tremendous bioinformatics tool expertise in the use of proprietary, open-source, and in-house tools to support researchers’ pursuits.  These include off-the-shelf tools such as the Statistical Analysis System (SAS) the gold standard in statistical software, its genomics analysis and visualization component JMP-Genomics, Mathematica for computation and visualization, and GeneGo, a powerful functional database to explore pathway enrichment and analyze networks. There are a further 30+ open-source tools focused on genome and transcriptome analysis in the areas of variant detection, visualization, differential gene expression analysis, genome structural and functional annotation, genome and transcriptome de novo assembly, and pathway/network analysis. New tools are added to NCGR/SBC arsenal continually for our network colleagues and students to leverage, learn and use. 

Presented below is a recommended Bioinformatician Tool-kit with many useful tools that are frequently used in bioinformatics analyses of Next-Generation sequencing projects. NCGR scientists and analysts have compiled this list for the NM-INBRE community as recommended and frequently used open-source or NCGR-developed tools to apply to your own projects. Bioinformatics is a constantly changing and dynamic field, so please note that these are suggestions as of Sep 20, 2013. It is always good to refer to tool home pages for the most up-to-date versions and use recommendations.

Analysis Tools for our informatics analysis and training include proprietary, open-source, and tools we have developed in house.

Proprietary tools include Statistical Analysis System (SAS) the gold standard in statistical software, its John’s Mac Program (JMP) Genomics analysis and visualization component, and Mathematica for computation and visualization. We use GeneGo, a powerful funcational database, to explore pathway enrichment and network analysis in human, mouse and rat projects.

Open-source tools include the following:

Alignment

  1. The Basic Local Alignment Search Tool (BLAST) to sequence similarities between nucleotide or proteins and infer functional, evolutionary relationships and identify species. BLAST+  allows faster searches as well as more flexibility in output formats and in the search input.
  2. The Genomic Mapping and Alignment Program for mRNA and EST Sequences (GMAP) for single-end sequence read sequence with minimal hardware requirements and provides fast batch processing of large sequence sets. The Genomic Short-read Nucleotide Alignment Program (GSNAP) which can align both single- and paired-end reads. NCGR’s Alpheus variant and expression detection pipeline uses GSNAP as its alignment algorithm.
  3. The Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome.
  4. Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
  5. For genome wide comparison we use MUMmer for the rapid alignment of very large DNA and amino acid sequences. It is particularly useful in comparing a de novo assembly to a known reference.

Alignment Visualization

  1. Tablet is a lightweight, high-performance graphical viewer for next generation sequence assemblies and alignments.
  2. The Comparative Map and Trait Viewer (CMTV).  A comparative map and trait visualization framework enabling visual integration of genomic data from disparate data sources and allowing rich client-side interactivity and manipulation. Extensible through plugins for new datasources and algorithms. (developed by NCGR).

Differential gene expression analysis

  1. The R-project (a language and environment for statistical computing and graphics) combined with modules from the Bioconductor suite such as DESeq, and EdgeR.

Full genome structural and functional annotation

  1. The Rapid Annotations using Subsystems Technology (RAST) Server, and theMetagenome (MG)-RAST server for identifying species in mixed samples such as ‘contaminating’ bacterial sequences
  2. The Gene Locator and Interpolated Markov ModelER (GLIMMER) typically finding 98–99% of all protein-coding genes.
  3. AUGUSTUS, a gene prediction in eukaryotes tool that allows user-defined constraints.

Assembly Tools

  1. The Assembly by Short Sequence (ABySS) tool for both transcriptomic and genomic assemblies.
  2. The ALLPATHS- Large Genomes (LG) short-read assembler
  3. The Mimicking Intelligent Read Assembly (MIRA) tool is a multi-pass DNA sequence data assembler/mapper for whole genome and EST projects. MIRA assembles reads gained by Sanger, 454, Solexa (Illumina), IonTorrent data and more recently PacBio sequencing technologies.
  4. The Contig Assembly Program 3 (CAP3) for transcriptome assembly uses long reads (e.g. 454 and Sanger. This improvised version 3 of CAP also addresses assembly errors due to repeats.
  5. The Phragment Assembly Program (PHRAP) for genome assembly uses long reads (e.g. 454 and Sanger). Is computationally trained to improve accuracy of assembly in the presence of low quality and repeats and can handle very large datasets.
  6. Short Oligonucleotide Analysis Package (SOAPdenovo) is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads.

Post Assembly Analysis

  1. The GapCloser program is designed to close the gaps that arise during the scaffolding process utilizing paired-end short read data.
  2. The Cluster Database at High Identity with Tolerance (CD-HIT) program is a database reduction program which removes any repetitive or redundant sequences from the input fasta file producing a set of ‘non-redundant’ sequences.

Protein/Peptide Prediction & Annotation

  1. The ESTScan is a program that detects coding regions in DNA sequences and can work with low quality data and correct sequencing errors that lead to frameshifts.
  2. The Hidden Marfov Model Scan (HmmScan) is a tool that takes a query sequence and searches it against the Pfam profile HMM library database reporting significance and thresholds scores/values.

Pathway/Network analysis

  1. Cytoscape integrates biomolecular interaction networks with high-throughput expression data into a unified conceptual framework.
  2. The Database for Annotation, Visualization and Integrated Discovery (DAVID) is a program for performing gene-annotation, enrichment analysis, and functional annotation clustering on large gene sets to infer biological meaning.
For More Information Contact:
Faye D. Schilkey
Director, NM INBRE Sequencing and Bioinformatics Core
Phone: (505) 995-4449