Se rendre au contenu

Gene expression analysis application

The TeraGenomics gene expression application supports analysis of Affymetrix GeneChip microarray data to answer questions such as:
  • Which genes are the most active within specific cells or regions of an organism?
  • Where are there differences in gene expression levels between different cells or regions, or between the same cell or region at different points in time or under different conditions?
  • Which highly expressed genes may co-exist (and thus be related) across different cells or regions?

Key Differences from other tools
The TeraGenomics application is differentiated from other bioinformatics tools that support GeneChip analysis in two fundamental ways:
  • Speed and Scalability-- The application is built to run on Teradata, the fastest and most scalable database for analytical processing in commercial use. Teradata leads the worldwide market for multi-terabyte-class data warehouses in the retail, finance, transportation, and telecommunications industries, and is now being applied to life sciences. TeraGenomics will linearly scale to support hundreds or even thousands of users, as well as the comparative analysis of data from thousands of GeneChips, increasing the potential for discovery as well as efficiency.
  • Quality of the Analysis-- The application is designed to use the hundreds of thousands of detailed probe set measurements produced by GeneChips to obtain the most sensitive and precise analysis possible. This is in sharp contrast to most other tools that create summaries of probe set data because of scalability limitations, which mask patterns and subtle effects. The application was designed by biologists to provide the flexibility needed to support the exploratory drill-down and iterative filtering patterns characteristic of gene expression analysis.
Application Summary
The TeraGenomics gene expression analysis application is accessed through a browser and organized into modules:
  • Load-- An automated load module provides a controlled process for secure file transfer, staging, uploading, and initial processing of GeneChip data to the database. Data can be loaded one chip at a time or in large numbers, via HTTP, FTP, direct from host, or via CD/DVD. Data value checks, enforced metadata completion and other quality controls can be flexibly applied. Loads can be scheduled or run in background mode. Metadata are added via drop-down menus or free entry. The metadata taxonomy supplied can be modified. The application also supports the loading of GeneChip image (.dat) files, photos, and videos (e.g., illustrating experimental procedures) and linking with their associated GeneChip data files.
  • Query-- The query module supports both standard and ad hoc analysis of GeneChip data. Chips can be selected for analysis based on their entered metadata variables via pull-down menus, subject to chip-level access control rights. An advanced query function with a wizard-like interface is also included. A key timesaving feature is the comparison query, which computes pair-wise comparisons among selected GeneChip files using detailed probe set data. Multiple files can be selected and then individually designated as baseline or experimental for each comparison. The complete automation of this process obviates many hours of data staging required when running such comparisons in other tools. Extensive controls via pull-downs are provided for filtering absolute, processed, and comparison files.
  • Export-- Although extensive analysis of GeneChip data is supported within the data warehouse, users have the option of exporting data to familiar desktop tools. The application has a flexible export function to output files in GeneSpring® as well as XML, CSV and tab-delimited formats.
  • Data Mining-- An optional data mining module is available that runs within the data warehouse and supports a variety of analyses such as: descriptive statistics; data visualization; organization and partitioning; transformation and data reduction; and analytical algorithms such as clustering, regression, and sequence analysis.
Acknowledgement 

The scientific approach and specific algorithms underlying the TeraGenomics gene expression analysis application were developed by Dr. Carrolee Barlow and her colleagues from the Laboratory of Genetics at the Salk Institute for Biological Studies, in La Jolla, CA, and Dr. David Lockhart, President and CSO of Ambit Biosciences, in San Diego. Dr. Barlow specializes in brain research, and she and Dr. Lockhart have pioneered methods to reliably detect subtle changes in gene expression through the large scale analysis of detailed probe hybridization data obtained with Affymetrix GeneChip oligonucleotide arrays.

Dr. Barlow and Dr. Lockhart participated in this effort solely as scientific collaborators and are not investors in the companies involved nor have any financial interest in the commercial success of the technologies.