Supplementary MaterialsText S1: Discovering biological progression underlying microarray samples. to a

Supplementary MaterialsText S1: Discovering biological progression underlying microarray samples. to a cell cycle time series microarray dataset, SPD was not provided any prior knowledge of samples’ time order or of which genes are cell-cycle regulated, yet SPD recovered the correct time order and identified many genes that have been associated with the cell cycle. When applied to B-cell differentiation data, SPD recovered the correct order of stages of normal B-cell differentiation and the linkage between preB-ALL tumor cells with their cell origin preB. When applied to mouse embryonic stem cell differentiation data, SPD uncovered a landscape of ESC differentiation into various lineages and genes that represent both generic and lineage specific processes. When applied to a prostate cancer microarray dataset, SPD identified gene modules that reflect a progression consistent with disease stages. SPD may be best viewed as a novel tool for synthesizing biological hypotheses because it provides a likely biological progression underlying a microarray dataset and, perhaps more importantly, the candidate genes that regulate that progression. Author Summary We present a novel computational approach, Sample Progression Discovery (SPD), to discover biological progression underlying a microarray dataset. In contrast to the majority of microarray data analysis methods which identify differences between sample groups (normal vs. EX 527 manufacturer cancer, treated vs. control), SPD aims to identify an underlying progression among individual samples, both within and across sample groups. We validated SPD’s ability to discover biological progression using datasets of cell cycle, B-cell differentiation, and mouse embryonic stem cell differentiation. We view SPD as a hypothesis generation tool when applied to datasets where the progression is unclear. For example, when applied to a microarray dataset of cancer samples, SPD assumes that the cancer samples collected from individual patients represent different stages during an intrinsic progression underlying cancer development. The inferred relationship among the samples may therefore indicate a trajectory or hierarchy of cancer progression, which serves as a hypothesis to be tested. SPD is not limited to microarray data analysis, and can be applied to a variety of high-dimensional datasets. We implemented SPD using MATLAB graphical user interface, which is available at http://icbp.stanford.edu/software/SPD/. Introduction Biological processes of development, differentiation and aging are increasingly being described by the temporal ordering of highly orchestrated transcriptional programs [1]. When such processes are analyzed with gene expression microarrays at specified time points, a variety of computational methods are available to identify which genes vary and how they vary across part or all the time points [2], [3], [4], [5], [6]. However, when microarray samples of a biological process are available but their ordering is not known, fewer methods are available to recover the correct ordering, especially when the underlying process contains branchpoints, as occurs in the differentiation from hematopoietic stem cells to myeloid and lymphoid lineages. We present a novel method, referred to as Sample Progression Discovery (SPD), to discover the progression among microarray samples, even if the EX 527 manufacturer progression contains branchpoints. In addition, SPD simultaneously identifies genes that define the progression. SPD can be used to generate EX 527 manufacturer biological hypotheses about a progressive relationship among samples, and the genes that serve as key candidate regulators of the underlying process. Recovery of an ordering among unordered objects has been analyzed in a variety of contexts. In computer vision, images taken from random viewpoints and perspectives were ordered for the purpose of multi-view coordinating JWS [7], where the purchasing was based on predefined features that are invariant to different viewpoints. In genetics, spanning trees were applied to reconstruct genetic linkage maps [8], which was an purchasing of genetic markers. Using gene manifestation data of a small set of preselected genes, phylogenetic trees were EX 527 manufacturer constructed to study cancer progression among microarray malignancy samples [9], [10]. Microarray samples were also ordered by a touring salesman path from combinatorial optimization theory, but feature selection was not discussed [11], [12]. Although these methods proved useful in the recovery of an purchasing from unordered EX 527 manufacturer objects, their direct applications cannot address the difficulties of extracting progression and differentiation hierarchy from microarray gene manifestation data. Algorithms in [7], [11], [12] presume linear purchasing of unordered objects, and therefore are not able to reveal potential branchpoints. Furthermore, most existing methods order samples based on cautiously designed or preselected.