|
2005 PROCEEDINGS
Joint Annual RECOMB 2005 Satellite Workshops on Systems Biology and on Regulatory Genomics, San Diego, CA, USA; December 2-4, 2005, Revised Selected Papers. Lecture Notes in Computer Science, Springer Berlin / Heidelberg, Volume 4023/2006, (Print) 1611-3349 (Online May 16 2007) 0302-9743 . [Springer link]
2006 POSTER ABSTRACTS
P1: Positional Coexpression Gene Clustering in Zebrafish Genome
Wei Wu
Institute of Molecular and Cell Biology
wwu@imcb.nus.edu.sg
Microarray experiments provide deep insight into molecular evolution and how structure and function interrelate in a genome. Our study investigates whether neighboring and clustering genes in the zebrafish genome are co-expressed using the Affymetrix microarray data. A further analysis of the data set relates to the effect of intergenic distance, as we can predict that genes that are closer to each other would have a greater degree of co-expression than those that are more distant in the genome. A significant correlation between distance and co-expression is found, either with or without the inclusion of tandem duplicates. We also study the positional clustering of genes in the zebrafish genome. A significant trend for large clusters is recognized. The correlation of positional clustering of genes and the co-expression level of neighboring genes is also studied. A positive correlation between the significance of positional clustering and the degree of neighboring gene co-expression is found in the genome. Finally, we study the co-expression of genes with their gene ontology information, and find that the molecular function plays an important role in the gene co-expression.
P2: Identification of a metastasis signature by combined transcriptome approach
Shuta Tomida, Kiyoshi Yanagisawa, Kattsumi Koshikawa
Nagoya University
Yasushi Yakabe, Tetsuya Mitsudomi, Hirotaka Osada
Aichi Cancer Center Research Institute
Takahashi Takashi
Nagoya University
s-tomida@med.nagoya-u.ac.jp
Widespread metastasis is the major cause of human lung cancer-related deaths, but there is much to be elucidated about underlying mechanism. Our genome-wide comparison of the expression profiles between a highly metastatic lung cancer cell line NCI-H460-LNM35 (LNM35) and its parental clone NCI-H460-N15 (N15) resulted in the identification of a cancer metastasis signature composed of 45 genes, and also provided functional insight through Gene Ontology analysis into how this 45-gene metastasis signature might contribute in the acquisition of metastatic potential. By applying the 45-gene metastasis signature to datasets of human cancer cases, we could show significant associations of this signature with a subset of cases with poor prognosis not only in the two datasets of cancers of the lung but also the breast. These findings indicate that our combined approach of transcriptome analysis is an efficient means to search for genes possessing both clinical usefulness in terms of prognostic prediction in human cancer cases and clear functional relevance for studying cancer biology in relation to metastasis.
P3: Proteomic Trajectory Mapping of Biological Transformation Based on Two-State Model of Biological Transition
Hiroyuki Matsumoto, Hisao Haniu, Nobuaki Takemori, Anil Singh, John Ash, Naoka Komori
Uniersity of Oklahoma HSC
hiro-matsumoto@ouhsc.edu
Quantitative molecular parameters along the time axis are prerequisite for the understanding of any biological transformation. We propose two-state model of biological transition assuming that 1) we can define a biological state by a set of molecular parameters such as proteins and metabolites, and 2) a transition of an initial state into its adjacent state can be described by a two-state model of biological transition. In the transition of state A into state B at the protein level (proteomic trajectory mapping), it is anticipated that some classes of proteins are expressed higher in A, and gradually declines as the transition progresses. Let's call this class of proteins .A-type.. B-type proteins are expressed low or null in A, and their expression increases along the transition. The third group of proteins is expressed constitutively throughout the transition (C-type). The fourth group is expressed transiently during the process (T-type). Reliable quantification of proteins at multiplex data points is essential for this analysis. We will report a successful case of proteomic trajectory mapping of a postnatal mouse retina.
P4: Comparison of system response to individual and combined stresses using integrated OMICS approach
Harin H Kanani, Bhaskar Dutta, Maria I Klapa
University of Maryland, College Park
John Quackenbush
Dana-Farber Cancer Institute
kanani@umd.edu
Integrated transcriptomic and metabolomic analysis of a systematically perturbed biological system in a dynamic fashion can provide clues about gene and metabolic regulation, reconstruction of bio-reaction and gene regulation networks and even the function of unknown genes. As a case study 12-day old Arabidopsis thaliana liquid cultures subjected to elevated CO2 levels (1%) and osmotic stress (50 mM NaCl), both individually and simultaneously, continuously for 30 hrs. The metabolomic and transcriptomic profiles were acquired by GC-MS and full-genome cDNA microarrays, respectively. Both transcriptomic and metabolomic data were further analyzed using multivariate statistical analysis. Novel GC-MS metabolomic data correction method and time-series significance analysis method were developed and used. The comparison of individual and combined stress responses revealed robustness of NaCl stress response as compared to CO2. Obtained results analyzed in the context of biochemical pathways demonstrated non-linearity in the system as the response to the combined stress was different from the addition of system response to individual stresses. The breadth and depth of the information obtained clearly demonstrates the usefulness of systems biology approach.
P5: Individual synthetic lethal genetic screens reveal groups of physically connected proteins
Robert Prill and Andre Levchenko
Department of Biomedical Engineering, Johns Hopkins University
rprill@jhu.edu
One way to view a synthetic lethal (SL) genetic screen is to think of the query gene deletion as a preexisting condition, a genetic deficiency. The SL screen reveals a "hit list" of genes that are each independently necessary for life with respect to this preexisting condition. One of the challenges of high-throughput biology is to extract functional modules from structural interactions. Protein-interaction networks are so dense and so unreliable that connectivity alone does not motivate functional modules. By mapping SL hit list genes to their locations in the protein-network we have selected protein sub-networks that are more connected than random sub-networks. We believe that this is indicative of protein complex or pathway membership. We are now working to confirm that SL hit lists are functionally coupled using non-structural information, such as mRNA expression. We have presented a potential strategy for extracting functional modules from protein networks using synthetic lethal genetic screens.
P6: Bioinformatic approach to interpret the flexibility of extra-cellular matrixes
Ryuji Kato Nagoya, Chiaki Kaga Nagoya, Yasuyuki Tomita Nagoya, Mina Okochi Nagoya,
Hiroyuki Honda
School of Engineering, Nagoya University
kato-r@nubio.nagoya-u.ac.jp
In the aspect of tissue engineering and regenerative medicine, one of the hardest tasks is to provide safe scaffold material to mimic the best tissue engineering environment to produce the damaged tissues by patient's own cells. In spite of wide varieties of extra-cellular matrixes (ECMs) existing in our body, little is understand for its complex contribution to cell-growth. In the present studies, the ECM proteins are used without any specificity. On the other hand, it is understood that different matrixes for cell culture provides different patterns of differentiation and morphology. Therefore, as one of the objectives of proteomic researches, hidden peptide motifs in ECMs serve as an interesting target to understand tissue formation. For this objective, we have constructed Peptide Array-Based Interaction Assay of Solid-Bound Peptides and Anchorage-Dependant Cells (PIASPAC) technology for proteomic study of cell-peptide interaction. We here report the combinational work of PIASPAC data with Fuzzy Neural Network to interpret the sensitive and flexible recognition system in cell and ECMs.
P7: Learning to Annotate DNA-binding Proteins
Nitin Bhardwaj and Hui Lu
University of Illinois at Chicago
rlangl1@uic.edu
A protein's function depends in a large part on interactions with other biological macromolecules. Given the ever increasing number of protein structures solved each year, a protocol to annotate proteins by identifying interactions grows more necessary. Likewise, machine learning has grown in popularity providing a robust method to model a variety of bioinformatics tasks. In this work, we have developed a machine learning protocol to identify proteins that bind to a particular macromolecule and apply it to proteins that bind DNA. In general, there is no theory to help pick the best machine learning algorithm. Thus, we perform a comparison of several of classification algorithms known to perform well. Indeed, we found AdaBoost on decision trees to yield the best accuracy, 88%, significantly outperforming all published works. We also attempted to address the important attributes that contributed to this success. A graphical model based on boosted decision stumps is applied to study the relevant features. In summary, the current protocol identified physical characteristics important in DNA binding, rather than annotating function through sequence identity alone.
P8: Gene regulation by cpcA and its role in pathogenicity in filamentous fungi
Betul Soyler and Zumrut Ogel
Middle East Technical University
betul@metu.edu.tr
The previously cloned and characterized cpcA gene of Aspergillus niger is found to display the same functions of GCN4 as in Saccharomyces cerevisiae. In S. cerevisiae Gcn4p activates transcription of various amino acid biosynthetic genes, pathway specific activators, aminoacyl t-RNA and purine sythethase gene. This regulatory response is named as General Amino Acid Control (GAAC) in S. cerevisiae and cross pathway control (CPC) in filamentous fungi. In this study it is aimed to determine the mode of action of the regulation of A. niger cpcA and the transcriptional activation effects of the same gene product in response to different stress conditions. The stress conditions possible to interact with the pathways in concern are, phenyl lactic acid, rapamycin and 3-aminotriazole. The cpcA knockout strain constructed will be used in a microarray. The outcome of this microarray study will further be used to target genes that play an important role in regulation of cpcA and as well the genes directly or indirectly regulated by cpcA.
P9: Accurate and reproducible label-free quantitative proteomics of brain synapsotomes
Arsalan S. Haqqani, Danica B. Stanimirovic, John Kelly
Institute for Biological Sciences, National Research Council, Canada
arsalan.haqqani@nrc.ca
Quantifying changes in protein abundance between samples is a key requirement for profiling changes in cell state at a molecular level. Mass spectrometry (MS)-based quantitative proteomic methods mainly utilize stable isotope labels, making it easy to identify differentially expressed proteins in two or more samples. As an alternative to isotope labeling, label-free MS-based quantitative methods have been gaining momentum as they alleviate some of the limitations of the labeling methods. We have developed a label-free approach that aligns images of MS spectra to allow relative quantification of peptides in multiple samples. In combination with various known bioinformatics tools, differentially expressed peptides are found and subsequently identified by tandem MS. Using a biological sample from brain synaptosomes, we have shown the method has high quantitative accuracy (<15% of expected values) and quantitative reproducibility (<10% median coefficient of variance). In comparison with a known labeling quantitative proteomic method ICAT, the label-free approach provided a more comprehensive coverage of the proteome and addressed some of the limitations of ICAT, such as detection of cysteine-free proteins.
P10: Correlation between gene expression profiles and protein.protein interactions within and across genomes
Nitin Bhardwaj and Hui Lu
Univesity of Illinois at Chicago,
nbhard2@uic.edu
Reliable predictions of protein-protein interactions from gene expression measurements would provide another route to function annotation. We investigate genome-scale relationship of protein-protein interactions with gene expression using four evolutionarily diverse species. To strengthen the expression correlation between interacting pairs, we develop a protocol to integrate ortholog information into the analysis. In all four genomes, the likelihood of predicting protein interactions from expression increases multi-fold using our protocol. This strengthening of the correlation between interaction and expression data by adding evolutionary information suggests that that co-expression among interacting protein pairs is more conserved than that among random ones. We extend this analysis to multi-protein interactions in interaction motifs of increasing complexities. We provide the first global evidence of co-expression among the constituents of these motifs and prove that the degree of co-expression correlates strongly with the complexity of the motif. We further reinforce such co-expression of component proteins by integrating conservation data. These results suggest that co-expression levels of interaction motifs may co-vary to minimize deleterious effects of unbalanced stoichiometric concentrations of constituent proteins.
P11: Structural Bioinformatics Prediction and Modeling of Membrane-Binding Peripheral Proteins
Hui Lu
University of Illinois at Chicago
huilu@uic.edu
Membrane-binding peripheral proteins play important roles in many biological processes, including cell signaling and membrane trafficking and bind the membrane mostly in a reversible manner. Since they do not have canonical transmembrane segments, it is difficult to identify them from their amino acid sequences. As a first step toward genome-scale identification of membrane-binding peripheral proteins, we have built a kernel-based machine learning protocol. Key features of known membrane-binding proteins, including electrostatic properties and amino acid composition, are calculated. The machine learning prediction accuracy is 90% with the test set. The protocol is then applied to the prediction of membrane binding properties of four C2 domains from novel protein kinases C sharing more than 50% identity. Only one of them was predicted to bind the membrane, which was verified experimentally with surface plasmon resonance analysis. These results suggest that our protocol can be used for predicting membran-binding properties of a wide variety of modular domains. Furthermore, a pure sequence based features set is developed and apply to genome-scale identifications of peripheral proteins.
P12: Insights into transcriptional and translational regulation by absolute protein expression profiling
Peng Lu, Christine Vogel, Rong Wang, Xin Yao, Edward M. Marcotte
Institute for Cellular and Molecular Biology; University of Texas at Austin
cvogel@mail.utexas.edu
We developed a new simple and robust method, called Absolute Protein Expression profiling (APEX), which generates absolute protein abundances for measurements from large-scale mass spectrometry experiments. APEX relies upon correcting each protein's mass spectrometry sampling depth (observed peptide count) by learned probabilities for identifying the peptides. APEX abundances agree with measurements from controls, Western blotting, flow cytometry, and 2D gels, and known correlations with mRNA abundances and codon bias. We use our method to characterize protein expression in E.coli, yeast and mouse, and quantify expression levels across ~3-4 orders of magnitude and for concentrations down to <500 hundred molecules/cell. We compare APEX-based protein abundances to other data of transcriptional and translational activity, and find that up to >70% of protein expression levels are based on mRNA expression. Both eukaryotic and prokaryotic proteins are set per mRNA molecules independently of overall protein concentration. In addition, APEX-based protein abundances can be applied to calculations of other protein characteristics such as degradation rates, and we discuss some examples.
P13: Reverse Engineering via Ranking
Jonathan R. Landers, Chris H. Wiggins, Christina Leslie, Anshul Kundaje
Columbia University
jrl2121@cs.columbia.edu
For nearly a decade, one of the central problems in molecular biology has been learning the structure and control of transcriptional regulatory networks from the copious but noisy data provided by high-throughput technologies such as the expression microarray. We present a large-margin approach to "reverse engineering" a predictive model of these regulatory networks which does not rely on discretization of the expression of the target genes to be predicted. The resulting approach, based on ranking, improves on approaches used with success based on classification. We show that we are able to reveal interactions between transcription factors and regulatory sequence elements ('motifs') which may be validated both statistically (by plotting theoretical prediction vs. unseen expression data) and biologically (by confirming interactions known in the literature). We show that exploiting real-valued data allows us to predict accurately on held-out continuous expression data. Moreover, we show that in discretizing the data we lose a significant amount of the information that is otherwise captured by retaining the real-valued expression in the ranking setting.
P14: Extensions of IsoformResolver protein profiling software to analysis of novel splice variants
K. Meyer-Arendt, M. Hamady, R. Knight, K.A. Resing, N.G. Ahn HHMI
University of Colorado
karen.meyer-arendt@colorado.edu
Advances have been made in peptide identification from high throughput tandem mass spectrometric proteomics, but improved methods are needed to identify peptides not found in protein databases and to infer the proteins from which these peptides derive. Our IsoformResolver software uses a peptide grouping approach to identify a minimal set of genes or proteins for a set of validated peptides. When applied at the protein level this approach results in a set of protein isoforms organized by protein families. Applied at the gene level, this approach facilitates delineation of alternatively spliced proteins. To identify novel peptides, we use a six frame translation against UniGene entries to create a list of peptides including those containing previously unidentified start and stop sites and splice junctions. We search mass spectrometric data against the peptides using database search programs, and apply high discrimination methods to validate the search results. Validated peptides are presented in a gene-centric manner with reference to known proteins. In this poster we outline all steps of this methodology and present preliminary results.
P15: Regulon Structure of Arabidopsis
Wieslawa Mentzen, Nick Ransom, Basil J. Nikolau, Eve Syrkin Wurtele
Iowa State University
wiesia@iastate.edu
We apply combined bioinformatic approaches using genomic and transcriptomic data to investigate the transcriptional networks of three core metabolic processes in the context of the system biology of a model plant Arabidopsis thaliana. As revealed by meta-analysis of a wide array of Arabidopsis transcriptomic data, these pathways: fatty acid biosynthesis, starch metabolism and leucine catabolism, are transcriptionally regulated, and the regulation not only extends across all pathway reactions, but also some substrate- and cofactor-producing reactions, thus defining a major transcriptionally co-regulated module. We extend the meta-analysis of the transcriptome to find groups of coexpressed genes (also called modules, or regulons) in the Arabidopsis genome. Major functionally-coherent gene groups were identified. These comprise development, information processing, defense, and metabolism, as well as tissue- and organelle- specific processes.
P16: pFind 2.0: A Software Tool Suite for Peptide and Protein Identification via Tandem Mass Spectrometry
Dequan Li, Yan Fu, Haipeng Wang, Jingfen Zhang, Ruixiang Sun, Simin He, Leheng Wang
Institute of Computing Technology, Chinese Academy of Sciences
lhwang@jdl.ac.cn
In this paper we extend our earlier work and describe the latest version of pFind system, pFind 2.0. In this version, we have designed and improved the pre-process algorithms, the peptide-scoring algorithm, the toolbox to index protein database for high-throughput proteomics and an effective validation algorithm based on support vector machines. In addition, pFind 2.0 system is implemented with an architecture designed for large-scale parallel and distributed searching. This structure design provides the fault-tolerance ability when running on inexpensive commodity cluster. As a result, the new pFind system obtains higher identification sensitivity, accuracy and speed than previous version of pFind and other popular systems.
P17: PathwayOracle: Hypothesis Generation and Validation Tools for Signaling
Derek Ruths and Luay Nakhleh
Rice University
Jen-Te Tseng and Prahlad Ram
M. D. Anderson Cancer Center
druths@rice.edu
Altered signaling networks have been implicated as causes of many types of cancer. As a result, obtaining more complete knowledge of these networks is essential to the advancement of cancer diagnostic and therapeutic techniques. However, the task of discovering the structure and oncogenic properties of signaling networks is complicated by their inherent complexity, the vast and disparate data available about them, and the time and labor-intensive experimental methods required to identify their constituent transduction paths. In this poster, we present PathwayOracle, a suite of computational tools that generate and test experimental hypotheses relating to signaling networks based on existing experimental inhibition results, protein-protein interaction databases, non-parametric models of signaling networks, and novel algorithms that analyze network connectivity. These hypotheses can help biologists use all available knowledge and data to identify and design experiments that will yield key insights into previously unknown structure of a signaling network. As a result, the suite provides the dual benefit of helping the biologist both avoid experiments which have a low knowledge payoff and focus on experiments whose results will refine or add to an existing model of a signaling network. Presently, PathwayOracle includes four tools. The Downstream tool identifies compounds and reactions that lie downstream of a set of compounds in a network. The Knockout tool identifies the compounds and reactions that will be knocked down as a result of inhibiting a set of compounds. The Minimum Knockout tool computes the smallest set of compounds that must be inhibited in order to knockout a set of compounds. The Pathway Prediction tool constructs a set of biologically-likely hypothetical pathways that explain an experimental result which disagrees with an existing model of the signaling network. In addition to these features, PathwayOracle is under active development with ongoing research into improvements to existing functionality and new tools.
P18: The identification of known proteins from the parallel sequence assembly
Abhishek Pratap and Prateek Singh
Vellore Institute of Technology
abhishek.vit@gmail.com
The Genome assembly of the sequence data from the automated pyrosequencing of the prokaryote is given as an input data. Our aim is to assemble these sequences in to contigs and supercontigs which can then be used for gene mapping for protien identification in that specific prokaryote. The sequences were cleaned and tagged for any repeats to store a final set of unique data in a database. First the k-tuple was removed and then all the strings which were substrings were tagged. The final non-redundant set of sequence data was then analyzed at the head/tail of the sequences to calculate the pair wise global alignment scores for a given user defined window reading size. Each sequence was analyzed head to tail and vice-a-versa using a Needleman Wunsch Algorithm. Using a graph traversal technique for each sequences those sub-nodes were seen where the alignment score was greater then the cut off value. This process is taken as a n,m step recursive look ahead strategy. The node having maximum sub-nodes was then taken as the base for contig assembly. The same process is repeated when we go for building supercontig from contigs. In the next step we fragmented the input set of sequences and ran the same job on a cluster parallely with thirty percent reduction in job running time in contig building from fragmented data source. The assembled contigs and supercontigs were then blasted on the NCBI blast server using blastx with hit and trial method was done using different genetic codes. We were able to identify presence of the known proteins through gene identification.
P19: Assessing Reproducibility of Mass Spectrometry Experiments
Amol Prakash, Jennifer Sutton, Tori Richmond, Leo Bonilla
Thermo Electron
Brian Piening, Jeff Whiteaker, Heidi Zhang, Amanda Paulovich
Fred Hutchinson Cancer Research Institute
Jullian Watts and Dan Martin
Institute for Sytems Biology
Ruedi Aebersold
Institue for Molecular Systems Biology, ETH
Benno Schwikowski
Instiut Pasteur
amol.prakash@thermo.com
Many studies over the last decade have shown that mass spectrometry holds a huge promise in providing solution to early diagnostics of many diseases. One of the biggest challenges that need to be overcome to achieve this goal is to make mass spectrometry experiments reproducible. While sample processing steps, experimental protocol and the instruments. settings have been shown to affect the experimental output, little progress has been done in developing methods to quantify reproducibility. Assessing reproducibility is very critical to understand the affects of the various factors, eliminate them, and thus make mass spectrometry data comparable over time and across different labs and instruments. Using a novel definition of reproducibility, we present a tool that efficiently measures experimental reproducibility for high throughput mass spectrometry data and can form the basis of quality control of every lab performing mass spectrometry experiments. Using multiple high throughput data sets from many labs involving varying instruments (TOF, LTQ, FT), experimental protocols and sample complexities (from simple peptide mixtures to human plasma), we show the wide applicability of this tool, and the challenges that need be solved before mass spectrometry becomes a powerful diagnostic technology. One of the interesting data sets is an FT-MS experiment on human plasma, that involves immunoaffinity depletion and high-resolution LC-MS/MS.
P20: Computational search for transcription-factor binding sites: Statistical mechanics approach
Marko Djordjevic
Mathematical Biosciences Institute, The Ohio State University
mdjordjevic@math.ohio-state.edu
Identification of transcription factor binding sites within the regulatory segments of genomic DNA is an important step towards understanding of gene regulatory networks. However, majority of weight matrices - used to computationally predict TF binding sites . lead to large numbers of false positives, which is typically attributed to a non-suitable dataset from which they are constructed. As a desirable alternative to a dataset assembled from biological databases, binding sequences inferred from an experiment performed under controlled conditions (a biophysical experiment) can be used. In particular, SELEX (Systematic Evolution of Ligands by Exponential Enrichment) is an experimental procedure that allows extracting, from an initially random pool of DNA, those oligomers with high affinity for a given DNA-binding protein. What is a suitable experimental and computational procedure to infer parameters of transcription factor-DNA interaction from SELEX experiments? To answer this, we use a biophysical model of transcription factor-DNA interactions to quantitatively model SELEX. We show that a standard procedure is unsuitable for obtaining accurate interaction parameters. However, we show that a modified experiment in which chemical potential is fixed through different rounds of the experiment allows robust generation of an appropriate data set. Further, based on our quantitative model, we propose a novel bioinformatics method of data analysis for such modified experiment, and apply it to extract the interaction parameters for a mammalian transcription factor CTF/NFI. Our algorithm results in a significantly improved false positive/false negative trade-off, as compared to both the standard information theory based method and a widely used empirically formulated procedure. Finally, I will briefly discuss our work in progress, which uses the inferred interaction parameters to address what are principal limitations to accurate computational prediction of TF.
P21: Co-evolutionary Analysis of Domains in Interacting Proteins Reveals Insights into Domain-Domain Interactions Mediating Protein-Protein Interactions
Raja Jothi and Teresa M. Przytycka
NCBI/NLM/NIH
Praveen F. Cherukuri
Boston University & NCBI/NLM/NIH
Asba Tasneem
Booz Allen Hamilton Inc
jothi@ncbi.nlm.nih.gov
Recent advances in functional genomics have helped generate large-scale high-throughput protein interaction data. Such networks, though extremely valuable towards molecular level understanding of cells, do not provide any direct information about the regions (domains) in the proteins that mediate the interaction. We performed co-evolutionary analysis of domains in interacting proteins in order to understand the degree of co-evolution of interacting and non-interacting domains. Using a combination of sequence and structural analysis, we analyzed protein-protein interactions in F1-ATPase, Sec23p/Sec24p, DNA-directed RNA polymerase and nuclear pore complexes, and found that interacting domain pair(s) for a given interaction exhibits higher level of co-evolution than the non-interacting domain pairs. Motivated by this finding, we developed a computational method called RCDP to predict large-scale domain-domain interactions. Given a protein-protein interaction, RCDP predicts the domain pair(s) that is most likely to mediate the protein interaction. We applied this method on the yeast interactome to predict domain-domain interactions, and used known domain-domain interactions found in PDB crystal structures to validate our predictions. Our results show that the prediction accuracy of the proposed method is statistically significant. Comparison of our prediction results with those from two other methods reveals that only a fraction of predictions are shared by all the three methods, indicating that the proposed method can detect known interactions missed by other methods. We believe that the proposed method can be used with other methods to help identify previously unrecognized domain-domain interactions on a genome scale, and could potentially help reduce the search space for identifying interaction sites.
P22: Reconstructed dynamic regulatory maps reveal factors and mechanisms controlling stress response
Jason Ernst and Ziv Bar-Joseph
Carnegie Mellon University
Oded Vainas and Itamar Simon
Hebrew University Medical School
Christopher T. Harbison
Whitehead Institute for Biomedical Research
jernst@cs.cmu.edu
Even simple organisms have the ability to respond to internal and external stimuli. This response is carried out by a dynamic network of protein-DNA interactions that allows the specific regulation of genes needed for the response. We have developed a novel computational method that uses an Input-Output Hidden Markov Model to model these regulatory networks while taking into account their dynamic nature. Our method works by identifying bifurcation points, places in the time series where the expression of a subset of genes diverges from the rest of the genes. These points are annotated with the transcription factors regulating these transitions resulting in a unified temporal map. Applying our method to study yeast response to stress we derive dynamic models that are able to recover many of the known aspects of these responses. Predictions made by our method have been experimentally validated leading to new roles for Ino4 and Gcn4 in controlling yeast response stress. The temporal cascade of factors reveals common pathways and highlights differences between master and secondary factors in the utilization of network motifs and in condition specific regulation.
P23: Genome-Scale Reconstruction Of The Transcriptional And Translational Machinery Of E. coli: The "Dogma" Matrix
Ines Thiele, Neema Jamshidi, Matthew Yeung, Bernhard Ă Palsson
Systems Biology Research Group, Dept. of Bioengineering, University of California, San Diego
ithiele@ucsd.edu
The reconstruction of metabolic networks has become an established procedure (Reed et al., 2006). To date, numerous comprehensive metabolic reconstructions are available for organisms in all domains of life. Here, we present the first comprehensive, manually curated, genome-scale reconstruction of the transcriptional and translational machinery of E. coli. The reconstruction is based on information from numerous primary and review publications, the recently published re-annotation of the E. coli K12 MG1655 genome (Riley et al., 2006) and three databases, namely EcoCyc, CyberCell,and tRNA DB. The dogma matrix consists of the following pathways: transcription, translation, mRNA degradation, stable RNA processing (5. and 3. trimming), tRNA and rRNA modification, tRNA charging, protein folding, protein complex formation, iron-sulfur-cluster formation, as well as metallo-ion and cofactor incorporation into proteins. The resulting network contains approximately 12,311 components, 14,772 reactions, 181 transcription units, and a total of 337 genes. The stoichiometric reactions are mass and charge balanced. In addition, the formulae, charge and molecular weight of all RNA, proteins, and complexes are calculated. This work represents the first comprehensive, stoichiometric and manually curated formulation of the central dogma of molecular biology.
P24: Studies on Analysis of LC-MS Proteomics Data
Peicheng Du
IBM/Albert Einstein College of Medicine
Rajagopolan Sudha, Michael B. Prystowsky, Ruth Hogue Angeletti
Albert Eintein College of Medicine
peicheng2005@gmail.com
Liquid Chromatography-Mass Spectrometry (LC-MS) has been an important technology in protein profiling. Computational methods for LC-MS data analysis are necessary because of huge amount of data. The challenging aspects in LC-MS data analysis include: 1) peak-picking/peptide-picking among noisy spectra, 2) retention time alignment to correct for retention time drift from run to run, and 3) matching peaks/peptides based on mass and retention time to construct peak/peptide arrays. We present novel methods to address each aspect of the problems. 1) For peak-picking, we have developed methods to find elution peaks and remove solvent peaks, and the method compares favorably to other methods including vendor software; we have also developed an algorithm for deisotoping and charge assignment for complex spectra, which is based on statistical model selection and prior peptide mass distribution. 2) For retention time alignment, we have developed a curve fitting method and compared it to traditional time warping method, i.e. Correlation Optimized Warping. 3) For peak/peptide matching, the key is to reduce ambiguity in the matching process. We have developed a method to match peptides from multiple runs, which aims to reduce ambiguity in the matching, and how to handle ambiguous matches. We have applied these methods to the analysis of cell line data for biomarker discovery.
P25: The integrated pathway and microarray resources for deciphering the biological hotspots
Kun-Nan Tsai, Tsung-Yeh Tsai, Chung-Ming Chen, Err-Cheng Chen
Institute of Biomedical Engineering National Taiwan University
d91548013@ntu.edu.tw
Quickly finding out the possible biological hotspots on pathway so as to investigate further the significance, distribution, clustering of the genes can offer the important information of gene expression; benefit the follow-up study on the biological insight. But there is still want of a set of available analytical methods at present, so it is quite essential and even urgent to develop the annotated method. Here, we present a biological hotspots knowledge-based approach included the following steps: (1) profiling the significant gene-gene interaction through pathway database, (2) utilizing Fisher's exact test to analyze pathways information and to rank the significant pathways, (3) annotating biological hotspots of genes in the pathways through density, quality and cluster, (4) investigating related genes and perhaps concern diseases by referring to OMIM information. Our results illustrate the influence of the secreted protein from Mycobacterium tuberculsis on the gene expression of human lung fibroblast, at least its involvement in cell proliferation, cell motility, and cell survival. Furthermore, we show that Focal adhesion pathway contains the most biological hotspots than others at 40h. According to OMIM confirmed, all meaningful biological information will contribute to the research on how the secreted protein influences the gene expression of human lung fibroblast. And according to our biological experimental result, it has verified that the secreted protein will cause human lung fibroblast to induce cells died in a large amount at 48h. These are evidences that prove the practicability of our approach. With this information we hope it will make a contribution to the further study on the prevention and cure of this disease.
P26: Connecting extracellular metabolomic profiles to intracellular metabolic states in yeast
Monica L. Mo, Markus Herrgard, Gregory Hannum, Bernhard Palsson
Dept. of Bioengineering, University of California, San Diego
mlmo@ucsd.edu
Metabolomics has emerged as a potentially powerful tool in the quantitative identification of disease- and pharmacologically-induced biological states. Extracellular metabolome data, in particular, can provide an insightful view of intracellular physiological states in a noninvasive manner. We used a genome-scale metabolic network model of S. cerevisiae, iMM910 to investigate how changes in the extracellular metabolome can be used to study systemic changes in intracellular metabolic states. The iMM910 metabolic network was reconstructed based on an existing genome-scale network, iND750, and includes 162 new genes and 301 new reactions. The network model was first validated by comparing 2,922 in silico single-gene knockout strain growth phenotype predictions to published experimental data. Extracellular metabolome data measured in response to environmental and genetic perturbations was integrated to the iMM910 model in the form of overflow secretion constraints. A random flux sampling approach was then used to characterize feasible genome-scale flux distributions allowed by these constraints. The predicted intracellular flux changes in response to perturbations were qualitatively consistent with experimental data on pathway activity changes. These results indicate that integrating extracellular metabolomic data into the constraint-based framework allows inferring changes in intracellular metabolic states. The methods developed in this work can be applied towards the human metabolic network recently constructed in our laboratory to analyze biofluid metabolome variations related to disease and pharmacological effects.
P27: Comparative Study of Molecular Interaction Databases
Iliana Avila-Campillo, Aaron Chang, Carol Rohl
Informatics, Merck
iliana_avila-campillo@merck.com
There are approximately 222 public and proprietary databases of molecular interactions (Bader et al., 2006). These databases differ in the methods they use to obtain, curate, organize, and/or distribute their content. Systems Biology researchers typically use a "blender" strategy to build biological networks from a small subset of these databases, often selecting data sources on the basis of popularity, ease of use, and availability. While many efforts are under way to centralize and standardize the information in these databases, descriptive statistics offering insight into their similarities and differences can guide researchers in selecting data sources to build networks. Towards this end, we have compiled a preliminary set of individual and comparative statistics for mammalian interactions in a variety of public and proprietary databases including BIND, DIP, HPRD, IntAct, MINT, and the BioGrid, among others. The comparative statistics we collected include database content overlap based on different variables like experimental method, PubMed ID, and interaction type. Individual database statistics include percentage of interactions supported by a given number of PubMed IDs or experimental methods. We also considered the application of these statistics to the estimation of confidence values for individual interactions, and their effect in the layout and visualization of networks from different data sources.
P28:Protein prioritization in condition-specific biological networks using global graph architecture
Zoltan Dzeso, Tatiana Nikolskaya, Andrej Bugrim
GeneGo, Inc
andrej@genego.com
We present a novel algorithm designed to evaluate importance of individual nodes for providing connectivity in condition-specific biological networks. The algorithm starts with a condition-specific set of genes or proteins (e.g. differentially expressed genes). First, we construct shortest path network connecting these genes using global database of interactions available in MetaCoreT. Second, we evaluate number of all paths traversing each node in the shortest path network in relation to the total number of paths going via this node in the global network. Using these numbers as well as the relative size of the initial gene set we calculate p-values for each node in the shortest path network, showing whether or not it is statistically significant in providing connectivity. We test algorithm's ability to assign high significance to biologically validated targets by using the set of gene expression data from psoriatic patients. We show that it is able to uncover many genes that do not show up on gene expression screen but are nevertheless highly related to disease pathways and/or validated by other molecular methods. Thus approach can be applied for uncovering new, higher-quality drug targets, validation of existing targets and cross-validation of genomic and proteomics or other types of data.
P29: Study of the role of Snf1 kinase in yeast Saccharomyces cerevisiae - a system biology approach
Renata Usaite
Center for Microbial Biotechnology Technical University of Denmark
John Yates III
The Scripps Research Institute
Jens Nielsen and Lisbeth Olsson
Technical University of Denmark
ru@biocentrum.dtu.dk
In yeast, Saccharomyces cerevisiae, Snf1 kinase is a well described key component of glucose repression regulatory cascade. It is highly conserved among eukaryotes and its mammalian homolog AMPK is responsible for energy homeostasis in cells, organs and whole bodies. Failure in the AMPK regulatory cascade leads to metabolic disorders, such as obesity or type II diabetes. Thus, undiscovered Snf1 kinase functions remain to be of high interest. Snf1 is involved in transcription and translation regulation. Through phosphorylation of transcription factors, Snf1 initiates expression of metabolic genes involved in catabolism of alternative carbon sources. It has been shown that it directly phosphorylates and inactivates metabolic targets (e. g. ACC1), phosphorylates histones and components of other signal transduction pathways (e. g. HOG). A Systems Biology approach was used to uncover new Snf1 kinase functions: its new targets and mechanisms through which Snf1 performs its regulation. To do so, a reference strain and delta snf1, delta snf4 and delta snf1delta snf4 mutants were grown in steady state chemostat cultivations and samples for global transcriptome, proteome and metabolome analysis were collected. Data generated from Affymetrix system (6000 gene expression profiles), 2D-uLC-MS/MS (1500 relative protein expression differences) and GC-MS (50 quantified metabolites) is being integrated in order to derive a global model for the role of Snf1. The initial transcriptome-proteome data analysis indicated the presence of both transcription and post-transcription regulation in carbon metabolism. Tor and Gcn4 target genes were identified to be affected in Snf1 disruption strains. Additional thorough studies on data analysis, integration of metabolome data, and additional evaluation of delta snf1, delta snf4 and delta snf1, snf4 strains' phenotype are needed to be done in order to confirm newly proposed regulatory systems in yeast Saccharomyces cerevisiae. Identified novel functions of Snf1 kinase are expected to contribute to a better understanding of AMPK regulatory cascade in mammalian cells.
P30: Comparing Protein Interaction Networks via a Graph Match-and-Split Algorithm
Manikandan Narayanan and Richard M. Karp
University of California at Berkeley
nmani@cs.berkeley.edu
We present an algorithm that recursively matches and splits the nodes of two graphs to list similar sub-graphs between them. We use this graph-matching algorithm to develop a network comparison method, which detects functionally similar (conserved) protein modules between the protein interaction networks of two species. Our algorithm has provable guarantees on correctness and efficiency unlike previous network comparison methods. Our algorithm's match and split framework is also quite general in allowing diverse local matching and connectivity criteria that define when two sub-graphs are similar. Using a lenient criteria based on connectedness and matching edges, coupled with a betweenness clustering heuristic, we apply our method to pairwise comparisons of the protein networks of human and model organisms yeast, nematode worm and fruit fly. We evaluate the detected conserved modules using sensitivity and specificity measures against reference yeast protein complexes. In these evaluations, our method performs competitively with (and sometimes better than) two previous network comparison methods. Further with proper homolog and species selection, our method performs better than a popular single-species clustering method. Hence pairwise methods can under some conditions exploit cross-species conservation of protein sequences and interactions to detect functional modules better than single-species methods. Beyond these evaluations, we discuss the biology of a couple of conserved modules detected by our method. We demonstrate the utility of network comparison for transferring annotations from yeast proteins to human ones, and validate the predicted annotations.
P31: Inferring molecular interaction pathways from eQTL
Imran Rashid, Ka Yee Yeung, Roger Bumgarner, Walter L. Ruzzo, Ram Samudrala
University of Washington
irashid@cs.washington.edu
Recent studies have revealed that differential gene expression is sometimes tightly linked to variation in specific chromosomal locations. When gene expression is also associated with phenotypes such as disease, there is a great interest in discovering the pathway connecting the genetic variation and the differential expression. However, this remains a difficult task. The chromosomal locations are generally include many candidate causative genes. Furthermore, even when the causative gene is known, it is very difficult to predict all the molecules involved in the regulatory pathway. This is because genes, proteins and other biological molecules form highly context dependent interaction networks that need to be elucidated.
One recent approach to these problems has been through a random walk across a weighted graph of known molecular interactions (Tu, 2006). This approach is very promising, because it makes use of known molecular interactions, and it produces encouraging results. Here, we extend this random walk approach in three ways. First, we implement efficient computations of the probability distribution of a random walk. Second, we rigorously characterize the results from the random walk approach. Whereas previous approaches have used simple voting, we analyze whether a confidence can be assigned to the predicted pathways. In addition, we compare the random walk approach with other graph-exploration techniques, including uniform-cost search. Finally, we extend the graph of molecular interactions to include computationally predicted interactions from the Bioverse (http://bioverse.compbio.washington.edu). The Bioverse primarily uses the interolog method for predicting protein-protein and protein-DNA interactions for more than 50 organisms, which results in significantly greater coverage than experimentally determined interactions. Using a combination of predicted interaction data and our graph searching algorithms, we are able to infer pathways that demonstrate causal relationships between genes in chromosomal loci and their corresponding expression phenotypes.
P32: Reconstructing Regulatory Relations between Functional Units using Logic Analysis
Einat Sprinzak
UCLA-DOE Institute for Genomics and Proteomics
Shawn J. Cokus and Matteo Pellegrini
Department of MCD Biology, UCLA
Peter M. Bowers and David Eisenberg
UCLA-DOE Institute for Genomics and Proteomics and Howard Hughes Medical Institute
Todd O. Yeates
Department of Chemistry and Biochemistry, UCLA
einat@mbi.ucla.edu
Developing improved methods to infer relationships between pathways and complexes is essential for understanding higher order organization in the cell. Logic analysis identifies triplets of genes, which obey certain logic relations. These ternary relations represent more complex relations intrinsic to biological systems. Here we applied logic analysis on microarray data to identify triplet relations between Saccharomyces cerevisia genes. Our approach identifies triplet logic relations which cannot be detected by measuring pair-wise gene expression similarity. We mapped these gene triplets into relations between three types of higher order functional units: cellular complexes, metabolic pathways and regulatory modules. These mapping schemes allow the discovery of higher order relationships between such functional units. In the first scheme we mapped the gene triplets into relations between cellular complexes. For example, we found that three complexes involved in the translation process are related by the logic operator OR: eIF4A is underexpressed if EF-1 is underexpressed or the 40S ribosome is underexpressed. Moreover, we show that the cellular complexes exhibiting such relations are more often co-regulated. In the second scheme we mapped the gene triplets into relations between metabolic pathways. For example we found the logic operator OR relate the expression of certain Arginine/Proline pathway genes, or Glutamine metabolism pathway genes with extracellular transport genes. In the third scheme, we mapped the gene triplets into relations between regulatory modules. We pre-defined the regulatory modules as groups of genes having similar expression profiles that are co-regulated by the same transcription factors. This scheme provides a way to classify many non-annotated genes into our pre-defined regulatory modules, thus providing additional insight into the function of these genes and their relations to genes in other modules. Combined, these approaches provide a proof of principle of the ability to obtain higher order relations between genes and cellular modules using logic analysis.
P33: Qualitative Networks: A Symbolic Approach to Analyze Biological Signaling Networks
Marc A. Schaub
Ecole Polytechnique Fédérale de Lausanne (Now at Computer Science Department, Stanford University)
Thomas A. Henzinger and Jasmin Fisher Ecole
Polytechnique Fédérale de Lausanne
(email: Marc Schaub "firstname.lastname@epfl.ch")
A central goal of Systems Biology is to model and analyze biological signaling pathways that interact with one another to form complex networks. We propose an approach that uses formal verification to produce a model that is consistent with laboratory experimental observations. An initial model is constructed according to the mechanistic understanding of the studied biological process. Using simulation and model checking, we verify that all possible executions of the model adhere to a set of specifications derived from the experimental data. If not, this means that the model cannot reproduce all experimental results, suggesting that the model needs revisions. Then one can try in-silico other hypothetical models. Once a hypothetical model reproduces the data, experimental validation of this model is required. In order to build discrete models at a similar level of abstraction to the one observed in experimental studies, we introduce Qualitative Networks, an extension of Boolean Networks. In this framework, variables representing the activity of biochemical components such as proteins and genes range over a small finite domain, allowing more flexibility than Boolean values. Interactions between components are represented by associating one target function to every component, thus allowing modeling a rich set of behaviors. We propose a symbolic algorithm for analyzing the steady-state of these networks. This algorithm reasons over sets of states and uses partition reduction to scale to multi-cellular models of complex pathways for which an exhaustive exploration of the state space is intractable. We illustrate the usefulness of this approach through a model of the interaction between the Notch and the Wnt signaling pathways in mammalian skin, and its extensive analysis. The hypotheses formulated during the model improvement process suggest new avenues to explore experimentally. Hence, this approach has the potential to efficiently complement experimental studies.
P34: Network based analysis of eQTL data
Silpa Suthram, Andreas Beyer, Trey Ideker
Department of Bioengineering, UCSD
ssuthram@ucsd.edu
High-throughput measurement of gene expression using microarrays has enabled the use of traditional linkage analysis techniques to study the affect of genetic variation on mRNA transcript levels. Genome-wide eQTL (expression Quantitative Trait Loci) data is being generated for a variety of species and many approaches have been developed to understand and analyze the eQTL data. However, these genetic association studies still face a number of issues. Firstly, due to the spacing of genetic markers, several genes reside in a given locus and it is often difficult to finely map the responsible gene for a certain trait. Moreover, a genetic link doesn't explain the molecular / biochemical cause for association. Finally, all the large-scale measurements suffer from a high level of noise and statistical issues such as multiple testing. Recent years have also seen an increase in the accumulation of other sources of genome-wide biological data such as protein-protein and protein-DNA interactions. In this study, we suggest a new method for linking and analyzing genetics-based data such as eQTLs to a regulatory network of protein interactions. The main hypothesis is that genetic data sets encode genetic interactions between genes that are mediated in vivo by an array of protein-protein interactions. Hence, known protein interaction networks can be used to gain mechanistic insights for understanding the genetic interactions observed in the eQTL data.
P35: Transcription Factor Response to Radiation Damage in Saccharomyces cerevisiae
Matteo Pellegrini and Hung Pham
University of California, Los Angeles
matteop@mcdb.ucla.edu
The transcriptional response of Saccharomyces cerevisiae to 170 Grays of ionizing radiation has been previously profiled in both wild type and mec1 mutant strains. We have conducted an analysis of the transcription factor activities by analyzing this data in terms of transcription factor binding data. We use multiple linear and non-linear regression methods to estimate transcription factor activities. We find that Mig3 is the primary Mec1 dependent transcription factor, and may thus play an important role in regulating the cellular response to radiation damage in yeast. Mig3 is a transcriptional repressor involved in response to toxic agents such as hydroxyurea that inhibit ribonucleotide reductase; phosphorylation by Snf1p or the Mec1p pathway inactivates Mig3p, allowing induction of damage response genes. Mig3 binds genes involved in various cell-cycle regulatory mechanisms as well transposons. Thus we hypothesize that in response to radiation transposons are activated in yeast. The Mammalian homologs of Mec1 and Mig3 are ATR and EGR1, respectively. EGR1 activates genes that are required for differentiation and mitogenesis. It is also known that EGR1 regulates radiation-induced apoptosis. We therefore speculate that EGR1 may be an important ATR dependent stress response pathway.
P36: Predicting protein-protein interactions using amino-acid composition
Sushmita Roy, Terran Lane, Margaret Werner-Washburne
University of New Mexico
sroy@cs.unm.edu
Accurate prediction of protein-protein interactions is crucial for understanding the physical interaction layer in cells. Most computational approaches incorporate protein domain information to predict protein-protein interactions. In this work, we propose amino-acid composition as an important feature for predicting protein-protein interactions. We trained separate maximum-entropy classifiers on domains and on amino-acid composition of yeast proteins and compared their performance using the area under the ROC curve (AUC). We found that the classifier using amino-acid composition (AUC 0.71 ± 0.02) performed at par and sometimes better than the classifier using domains (AUC 0.66 ± 0.04). We also found that a classifier that combined domains and amino-acid composition had an improved performance (AUC 0.74 ±0.03), indicating that amino-acid composition can boost performance of classifiers trained on other features. We found similar results using a support vector machine classifier, illustrating that our results were not an artifact of a single classifier. It is surprising that a simple feature like amino-acid composition yields as good or better performance, as compared to well-known but complex features such as protein domains. A possible explanation for this is that domains are local features that capture information confined to a portion of the protein sequence, whereas amino-acid composition is a global feature that captures information of the entire protein sequence. We examined protein pairs for which the amino-acid composition-based classifier correctly predicted interactions, but for which the domain-based classifier failed. We found that the domain-based classifier performed much worse as the number of domains present in the protein pairs increased. Because the protein domain is a local feature, a classifier that does not model dependencies between co-occurring domains is likely to degrade in performance with increasing domain count. However, amino-acid composition, being a global feature, enables a classifier to be more resilient to performance degradation with increasing domains.
P37: Automated Identification of Disulfide Bonds in Proteins and Peptides from LC-MS/MS by use of the MassMatrix Database Search Software
Hua Xu and Michael A. Freitas
The Ohio State University
freitas.5@osu.edu
The characterization of protein disulfide linkages is essential to understanding their role in protein function. Tandem mass spectrometry of protein digests under non-reducing condition has been frequently used to characterize disulfide bonds. The data produced from MS/MS experiments of disulfide linked peptides may be extremely complicated and the resulting analysis is laborious and time consuming. Many novel linkages may go undetected especially for proteins with multiple disulfide bonds. In order to improve the ability to detect novel linkages, it is necessary to develop and/or refine bioinformatics software capable of detecting these linkages. This abstract describes an algorithm to identify disulfide bonds in peptides by use of tandem MS. The algorithm is included in the newly developed tandem MS database search program, MassMatrix. The algorithm uses the probabilistic scoring model in MassMatrix to achieve high performance identification of disulfide linkages in proteins and peptides. Proteins and peptides with disulfide bonds can be identified with high confidence without chemical reduction or other derivatization. The approach was tested on several peptide and protein standards with known disulfide bonds. All disulfide bonds in the standard set were identified by MassMatrix. The algorithm was further tested on bovine pancreatic ribonuclease A. The four native disulfide bonds in this protein were detected by MassMatrix along with additional novel disulfide bonds. The MassMatrix algorithm offers an additional informatics based approach to facilitate the discovery of disulfide bond from tandem mass spectrometry data.
P38: Enzyme-pathway relations in metabolic networks
Gabriele Scheler
Stanford University
scheler@stanford.edu
We wanted to know how functional alterations of enzymes spread throughout a metabolic system by analyzing which pathways are linked by at least one common enzyme. This is a type of meta-analysis different from other direct reaction-substrate graphs. We used the BioCYC collection of databases, focussing on E.coli (EcoCYC). Out of 316 pathways, only 182 shared any enzyme with another one. Similarly, out of 1055 enzymes, only 289 had multiple functions(and only 9 enzymes took part in more than one reaction within the same pathway). This shows a high proportion of unique functions of enzymes, and a high rate of enzymatically encapsulated pathways. Creating a graph by linking pathways if they share an enzyme revealed a core component of 92 pathways (50\%) which were all densely linked, while the other pathways were not connected to the core, and split into small clusters. Looking at the core component, we found that it had a strong hierarchical structure with highly linked pathways as roots, and two groups of branches: about half of the branches were shared among different roots, showing a densely connected structure, while the other half were specific to each root, showing a tree-like structure. We applied the same methods to 6 other microbes from the BioCYC database, with qualitatively similar results. The properties of the resulting graphs were quantified by probabilistic graph measures. The two metabolic systems of higher-order organisms (drosophila and human) in the BioCYC databases showed a very different pattern. Instead of a densely connected core, only highly segmented tree-like structures appeared, suggesting high modularity of enzyme-pathway relations. Implications for system robustness and evolutionary pressures can be drawn on the basis of this analysis, addressing the issues of a larger set of metabolic systems and making sure that pathway analysis is of comparable completeness.
P39: Targeted Combinatorial Modulation of Cytotoxic Cancer Therapies
Adrian Heilbut, Joseph Lehár, Grant Zimmermann, Curtis Keith
CombinatoRx Inc.
aheilbut@combinatorx.com
Simultaneous dysregulation of multiple cellular processes and redundant mechanisms of drug resistance are hallmarks of cancer. This observation is consistent with the networked nature of biology, and points to a need for drugs that act through multiple targets. CombinatoRx has developed a platform for high-throughput screening of combinations of small molecules in cell-based assays, which we have deployed to screen combinations of ~2000 approved pharmaceutical ingredients for discovery of therapeutically relevant synergies. Existing pharmaceuticals constitute a highly privileged set of compounds for therapeutic discovery, but many molecularly targeted research probes and development compounds are also available. Preclinical identification of optimal uses and combinations of targeted agents is a challenge for which our platform is well suited. Many existing cancer therapies work through DNA damage, which ultimately leads to apoptosis, senescence, or mitotic catastrophe. One strategy to improve efficacy and selectivity of such therapy is to modulate the signal transduction and regulatory mechanisms that mediate these outcomes. We conducted a preliminary screen to test over 400 targeted small-molecule probes for the capacity to modulate antiproliferative effects of the DNA damage agents topotecan, bleomycin, and oxaliplatin. Several synergies were identified, involving both probes expected to affect DNA damage responses and probes for which relevance to DNA damage could not have been anticipated. To validate the observed combination effects and to search for higher-order synergies conditional on DNA damage, we are screening all combinations involving 36 selected probes across several cell lines and in both the presence and absence of DNA damage. Profiles of combination effects observed can be used to classify small molecule probes or cell lines and correlated to probe annotations. Genotype and context-specific synergies and antagonisms identified may reflect novel biological interactions and suggest strategies for novel and improved combination therapies.
P40: Identification of Disease related Single Nucleotide Polymorphisms (SNPs) within Human Protein Domains
Areum Han
University of California, Los Angeles
Sungsam Gong
University of Cambridge
Jong Bhak
Korean Bioinformation Center
arhan@ucla.edu
Human proteins consist of domains which are the core functional sites of a protein; protein domains are structurally and sequentially conserved in proteins. In this study, we identified disease related SNPs which are located in protein domains and investigated their patterns in view of protein domains. To identify those SNPs, we annotated protein domains to human proteins in the Ensembl database and mapped whole SNPs from dbSNP to those protein domains. Since the Ensembl database provides Pfam domain annotation information, we performed a structure-based domain assignment using the PDB-ISL method using SCOP version 1.69 and Ensembl human proteins. Domains were classified by keeping BLAST-matched regions having an e-value 1e-4 or lower. In total, 17,639 SNPs within SCOP and 28,238 SNPs within Pfam domains were identified. When we link those SNPs with diseases from the Online Mendelian Inheritance in Man database, we identified 128 novel Single Nucleotide Polymorphisms which are related with diseases and protein domains. Those pairs include 127 distinct SNPs and distinct 78 diseases. Our disease-SNP-domain catalog is available via a web and provides insights about the selection and interpretation of genetic loci which cause human diseases.
P41: SPIKE - A Database, Visualization and Analysis Tool for Signaling Pathways
Ran Elkon1, Rita Vesterman1, Nira Amit1, Igor Ulitsky2, Mali Weisz1, Nir Orlev1, Giora Sternberg1, Ran Blekhman1, Jackie Assa2, Yosef Shiloh1, Ron Shamir2
1The David and Inez Myers Laboratory for Genetic Research, Department of Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel. 2School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.
ulitskyi@post.tau.ac.il
Our realization of the complexity of signaling networks that regulate cellular physiology is growing commensurate with the rapid expansion in biological knowledge. It is now clear that biological pathways that govern cellular development and responses to environmental challenges are not linear, parallel, and independent, but instead form a web of interlocking processes. Given the complexity of these networks, the assimilation and interpretation of the wealth of data collected on signaling pathways become an acute bioinformatics challenge. To cope with this challenge, we are developing SPIKE (Signaling Pathway Integrated Knowledge Engine). SPIKE is a knowledge base of signaling pathways, which can be utilized to analyze any signaling network. At present, we focus our efforts on populating SPIKE with data on networks induced by DNA damage in human cells. SPIKE contains three main software components: 1) a database of cellular signaling pathways; 2) a visualization package that allows interactive graphic representations of regulatory interactions; 3) an algorithmic inference engine that analyzes the networks, aiming to discover novel functional interplays between network components. SPIKE's database contains extensive and highly curated data on pathways induced by DNA damage, such as cell-cycle checkpoints, apoptosis and other stress responses. Our vision is that the database will be populated by a distributed and collaborative effort undertaken by multiple groups, with quality supervised by SPIKE's curators.
P42: Automation system of Protein Databank (PDB) with new classification using clustering
Muhammad Mahbubur Rahman, Tamnun E Mursalin, Anwarul Kabir
Computer Science, American International University-Bangladesh
mus_mahbub@hotmail.com
Research in bioinformatics is a complex phenomenon as it overlaps two knowledge domains, biological and computer sciences. This paper has tried to introduce an efficient data mining approach for classifying proteins into some useful groups by representing them in hierarchy tree structure. There are several techniques used to classify proteins but most of them had few drawbacks on their grouping. Among them the most efficient grouping technique is used by PSIMAP. Even though PSIMAP (Protein Structural Interactome Map) technique was successful to incorporate most of the protein but it fails to classify few proteins which are known as scale free property proteins. Our technique overcomes this drawback and successfully maps all the protein in different groups, including the scale free property proteins failed to be grouped by PSIMAP. Our approach selects the six major attributes of protein: a) Structure comparison b) Sequence Comparison c) Interactivity d) Connectivity e) Cluster Index f) Taxonomic to group the protein from most familiar databank PDB and groups them base on their similar properties on a hierarchal tree structure.
P43: Extending TNFa-NF?B signaling pathway model based
Mahesh Visvanathan
UMIT
mahesh.visvanathan@umit.at
The objective of this work is to investigate different mathematical models concerning TNFa - NF-kB signaling pathway and to build a new pathway model using an integrative framework. The integrative framework consists of a database, designing and simulation environments. A new TNFa pathway model was developed based on literatures studies within this framework. Later on, the model was compared with the protein interaction connectivity map. During this process several common proteins between the model and the connectivity map were identified. A comprehensive validation of a new pathway model based on different experimental conditions is the focus of our on going research. Our preliminary results show that this approach further increases the understanding of signaling pathways.
P44: Extension of Naive Bayes and its Application to Systems Biology
Raja Loganantharaj
University of Louissiana
logan@cacs.louisiana.edu
Naďve Bayes has been successfully used in many applications including those in Bioinformatics and data mining. Naďve Bayes assumes positional independence, which makes the computation of the joint probability vale easier in the expense of the accuracy or the underlying reality. Besides, the positive and negative prior probabilities are computed from the training instances, which often do not accurately reflect the real prior probabilities. In this paper we address those two issues. We have developed algorithms that automatically perturbs around the computed prior probabilities and search around the neighborhood to maximize a given expected utility. To improve the accuracy we introduce limited dependency around the neighborhood. We have demonstrated the extension in applying to solve the problem of discriminating a TATA box from putative TATA boxes found promoter regions of plant genome. The best prediction accuracy of a naďve Bayes with 10 fold cross validation was 69% while the extension gave the prediction accuracy of 79%. An artificial neural network solves the same problem with the best prediction accuracy of 78%.
P45: Inferring the Skeleton Cell Cycle Regulatory Network
Isabel Tienda Luna2 , Yufang Yin1 , Diego. P. R. Padillo2, Yufei Huang1 , Hong Cai1 , Maribel Sanchez 2 , Yufeng Wang 1
1Department of Biology, University of Texas at San Antonio
2University of Granada, Spain
yufeng.wang@utsa.edu
The development of new antimalarial drugs is urgently needed due to elevated drug resistance in the causative parasite Plasmodium falciparum. An intervention strategy based on the interruption of parasite cell cycle represents a systems-biology aided drug discovery approach. However, little is known about the components or the mechanism of parasite cell cycle control. In this proof of concept study, we attempted to infer the skeleton components using comparative genomic analysis and to uncover this genetic regulatory network (GRN) using a Variational Bayesian expectation maximization approach.
P46: Modeling protein-DNA binding time in Stochastic Discrete Event Simulation of Biological Processes
Preetam Ghosh, Samik Ghosh, Kalyan Basu, Sajal K Das
University of Texas at Arlington
ghosh@cse.uta.edu
To understand the stochastic behavior of biological systems, we propose an ``in silico" stochastic event based simulation that determines the temporal dynamics of different molecules. This paper presents a parametric model to determine the execution time of one biological function (i.e. simulation event): protein-DNA binding by abstracting the function as a stochastic process of microlevel biological events using probability measure. This probability is coarse grained to estimate the stochastic behavior of the biological function. Our model considers the structural configurations of the DNA, proteins and the actual binding mechanism. We use a collision theory based approach to transform the thermal and concentration gradients of this biological process into the probability measure of DNA-protein binding event. This information theoretic approach significantly removes the complexity of the classical protein sliding along the DNA model, improves the speed of computation and can bypass the speed-stability paradox. This model can produce acceptable estimates of DNA-protein binding time to be used by our event-based stochastic system simulator where the higher order uncertainties can be ignored. The results show good correspondence with available experimental estimates.
P47: A Peptide Screening Model for Database Searches of Mass Spectrometry
Bobbie-Jo M Webb-Robertson, William R Cannon, Christopher S Oehmen, Danny J Taasevigen, Mary S Lipton, Katrina M Waters
Pacific Northwest National Laboratory
bj@pnl.gov
The high-throughput capability of mass spectrometry for global proteomics is generating data at a scale that is quickly outpacing the ability to accurately analyze it. We offer a new approach to this analysis bottleneck that reduces the number of peptides searched in the computational peptide identification step. We describe a statistical model that discriminates true peptide identification candidates from those that cannot be biochemically identified with mass spectrometry. We demonstrate a machine learning method that describes peptides as sets of variables associated with amino acid content and various physiochemical properties which can predict peptide candidates with high accuracy; a ROC score of ~0.87. This study provides an approach to assign a priori knowledge to the peptide identification problem, as well as screen peptides prior to analysis to alleviate the computational load on database search routines.
P48: Investigating Metabolism Organization using Petri Net based Compound Subsets Expansion
Stefano Lanzeni, Enza Messina, Francesco Archetti
Department of Computer Science and communication, University of Milan-Bicocca
stefano.lanzeni@unimib.it
Natural evolution acts on metabolism both at level of network functionalities and at level of single biochemical enzyme, breeding specificities in the metabolic structure. Under an evolutionary point of view a reaction is integrated in a metabolism if and only if its substrates are made available by metabolic reactions in previous generations. Here we develop a novel approach, based on Petri Net, for simulating and analyzing the process of progressive network expansion, and we apply it on a set of well annotated microbial metabolisms. This analysis detects meaningful and conserved sub-networks of interconnected compounds by seeding initially an arbitrary set of metabolites and incorporating new synthesized chemicals. Results confirm the crucial evolutionary relevance of metabolic cofactors and permit to characterized both the usages of compounds among organisms and the metabolic network robustness in response to gene deletions. Given the ability in characterizing qualitative and qualitative differences between metabolisms, our method could be thought of as a first step towards the development of a network based phylogeny.
P49: Kinetic Modeling of Regulation of Gene Expression of ace Operon in E. coli.
Kirill Peskov, Goryanin Igor, Demin Oleg
Institute of theoretical and experimental biophysics
krillpeskov@gmail.com
Kinetic model, taking into account all known experimental information and data about regulation of ace operon expression, was developed. It allowed us to predict ace operon stationary expression levels depending on genetic regulators and coeffectors concentration. Analysis of these profiles showed, that essential expression of ace operon genes, that can be seen in cells growing on acetate and other gluconeogenetic substrates, are possible only in a fixed range of coeffectors stationary concentrations. We examined three types of ace operon expression regulation, induced by presence in the system different forms of IslR (full-length, truncated and PEP-insensitive).
P50: Non-linear Network Regulatory Patterns for Maintaining the Robustness of Biological Systems
Chin-Rang Yang
University of Texas Southwestern Medical Center at Dallas
chinrang.yang@utsouthwestern.edu
To simulate the biological complexity, an Enzyme-Centric modeling approach has been developed. The idea is to learn the expert knowledge from the literature and build the fundamental enzyme catalytic and regulatory models that compose the biological network. With human familiar interface, the tool is modulated for easily revising and regenerating the model in sync with the frequently updated biological knowledge. This report further demonstrates the integration of upstream and downstream pathways and the non-linear network level regulatory patterns which are essential for maintaining the robustness of the system. Examples shown are the amino acid biosynthetic network, the protein factory of RNA splicing and the signaling transduction for calcium oscillation. These simulations reveal the values of Enzyme-Centric modeling in understanding how a living system maintains homeostasis (robustness) and continues to function while facing environmental stresses or genetic mutations.
P51: ConsensusPathDB - database for matching pathway annotation
Atanas Kamburov , Christoph Wierling, Hans Lehrach, Ralf Herwig
Max-Planck-Institute for Molecular Genetics
kamburov@molgen.mpg.de
The understanding of complex biological systems requires integration of different kinds of functional annotation. Several databases exist that comprise information on cellular processes, pathways and reactions. However, these databases are typically heterogeneous in the information that they contain and the problem domain they address so that the overlap between them is rather small. We have developed ConsensusPathDB, a database that helps the user to summarize and verify pathway information and to enrich a priori information in the process of model annotation. The database model allows the integration of information on metabolic, signal transduction and gene regulatory networks. As we use data from different sources with partially redundant information we have developed specific algorithms to reduce redundancy by identifying and merging identical reactions or interactions. Cellular reaction networks are stored in a PostgreSQL database and can be accessed under http://pybios.molgen.mpg.de/CPDB. ConsensusPathDB also assists the development, expansion and refinement of computational models of biological systems and the context-specific visualization of models provided in SBML.
P52: Comparative Analysis of Gene-Coexpression Networks Across Species
Shiquan Wu and Jing Li
Case Western Reserve University,Electrical Engineering and Computer Science
shiquan.wu@case.edu
This paper presents a large-scale comparative analysis on gene-coexpression networks across four plant pecies:Arabidopsis,Barley,Soybean,and Wheat,over 1471 DNA microarrays.5164 metagenes are identified across the four species.Four gene-coexpression networks are respectively constructed by linking reliable coexpressed metagene pairs based on their expression profiles within each species.Similarly,an overall gene-coexpression network is constructed based on metagene expression profiles across the four species.Each network contains 50K-70K links among metagenes.Several recent studies have discussed gene-coexpression networks across various species,which reveal conserved genetic modules and functional related genes.But no studies have been reported on gene-coexpression networks across crop species.This study is devoted to the comparative analysis of gene-coexpression networks on crop species.It is shown:(1)the five gene-coexpression networks are scale-free and their degree distributions follow power-law;(2)they have the small-world property;(3)they share very similar network parameters:degree distributions,network diameters,cluster coefficients,and frequency distributions of correlation patterns;(4)they are non-random and stable.Further analysis can be carried out to investigate conserved functional modules and regulatory pathways across the four species by these networks.A web-based computing tool is designed to visualize the expressions of metagenes,available at http://cbc.case.edu/coexp.html.
P53: A systems approach of transcriptional modules identification and analysis on their dynamics
Zhidong Tu, Xiaotu Ma, Ting Chen, Fengzhu Sun
University of Southern California, Biological Sciences
fsun@usc.edu
Extensive studies have shown that transcriptome has modular structures. We developed a new approach of deriving transcriptional modules using gene expression profiles and linkage analysis. We show that most derived modules are coherent in function, highly correlated at expression levels, and enriched of specific transcription factors. However, rather loose within module connections are observed when these modules are projected onto protein physical interaction network. The feature of loose connections within the interaction network is clearly different from what has been observed in the heat shock response module. As the original experiments were performed on two normal yeast strains, our modules are mainly composed of naturally diverged genes. Therefore, we hypothesize that transcriptome can undertake rather dynamic changes if important interactions are kept intact. We conclude based on several other evidences that transcription modules are highly plastic and robust.
P54: Overlapping Protein Functional Module Discovery from Protein Interaction Networks
Wenyuan Li and Ying Liu
University of Texas at Dallas
ying.liu@utdallas.edu
Recent advances in high throughput experiments have provided a wealth of interaction maps of several biomolecular networks, including metabolic, protein-protein networks. The architecture of these molecular networks reveals important principles of cellular organization and molecular functions. Many efforts have been made to analyze and decompose such networks, i.e., discovering the dense regions in the networks, to identify protein complexes and functional modules. Most of the protein complexes and functional modules discovered are non-overlapped. However, many proteins are involved in more than one functional module. Furthermore, most of the current network analysis algorithms can only applied to unweighted networks. Therefore, more accurate models are to decompose more general weighted networks into overlapping sub-networks, which correspond to the overlapping functional modules. We extended our previous Find Heavy Subgraphs (FindHSs) algorithm and proposed a novel algorithm Find Overlapping Heavy Subgraphs (FindOHSs) to discover overlapping functional modules in both weighted and unweighted networks. Empirical study on two weighted networks and one unweighted network shows that FindOHSs can effectively and accurately discover the overlapping functional modules from protein-protein interaction networks.
P55: Peptide Identification by Spectral Matching of Tandem Mass Spectra using Hidden Markov Models
Xue Wu, Nathan Edwards,Chau-Wen Tseng
University of Maryland
wu@cs.umd.edu
Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. The peptide fragmentation spectra generated by these workflows exhibit characteristic fragmentation patterns that can be used to identify the peptide. In other fields, where the compounds of interest do not have the convenient linear structure of peptides, fragmentation spectra are identified by comparing new spectra with libraries of identified spectra, an approach called spectral matching. In contrast to sequence based tandem mass spectrometry search engines used for peptides, spectral matching can make use of the intensities of fragment peaks in library spectra to assess the quality of a match. We evaluate a hidden Markov model approach to spectral matching, in which many examples of a peptide's fragmentation spectrum are summarized in a generative probabilistic model that captures not only the expected ion intensities, but also the variation in the intensities of the peak. Results show HMMs can identify many additional mass spectra not identified by traditional tandem mass spectrometry database search engines such as X!Tandem.
P56: Computational prediction of Ciona intestinalis microRNA targets
Terry Gaasterland , Trina Norden-Krichmar
Scripps Institution of Oceanography, UCSD
tgaasterland@ucsd.edu
This study uses computational techniques to identify and characterize microRNA targets in the sea squirt, Ciona intestinalis. MicroRNAs are ~22 nucleotide non-coding RNAs which have been shown to regulate target gene expression. Because of their small size, temporal expression and regulatory mechanism involving imperfect binding to target genes, the presence of microRNAs and their targets are difficult to detect both computationally and experimentally. In a previous study, computational methods were used to predict 43 microRNAs in Ciona intestinalis. In this study, we implemented a target prediction algorithm which utilizes configurable microRNA target binding specificity parameters, grouping by binding location, functional gene assignments, and phylogenetic conservation to the closely related species, Ciona savignyi. The target prediction procedure generated a manageable list of potential targets to validate experimentally for their gene regulation by microRNA. The novel computational techniques implemented in this study can be applied to other organisms and serve to increase the understanding of the origins of non-coding RNA, embryological developmental pathways and microRNA-controlled gene regulatory networks.
P57: Computational Prediction of Tomato mosaic Virus encoded miRNAs and their possible targets
C.V.S Siva Prasad, Anamika Singh, Sarvendra Vikram Singh
Indian Institute of Information Technology
shiva@iiita.ac.in
In the current scientific scenario, the small strands of RNA produced by non-coding DNA regions namely interference RNA (RNAi), are the important topic in life-sciences research. One of the interference RNA is microRNA and it is known to regulate gene expression by controlling the protein translation mechanism during variety of cell phenomena viz.- proliferation of cancer, differentiation of stem cells and neurons etc. miRNA related sequences are present in most of the genomes, but this information is hitherto unexplored and there is a need to study them in much greater detail. miRNA based research is currently oscillating between computational biology and experimental biology predicted. Precursor miRNAs were predicted on the basis of optimum free energies (dG), Structural continuity etc,. eight miRNA molecules were predicted in Tomato mosaic virus through viro-miRNA algorithm based program. Computationally identified mature miRNA sequences of TMV and their complementary sequences. We have also predicted possible target genes for predicted miRNAs.
P58: Asymmetric Complexity and Specificity in a fast evolving metabolic network
Almudena Trigo, Florencio Pazos, Alfonso Valencia, Ildefonso Cases
Spanish National Cancer Research Centre, Structural Biology and Biocomputing Programme
valencia@cnio.es
We have investigated the effect of the structure of a metabolic network on the properties of its components, the enzymes. To do so, we chose as a model the metabolic network composed by the microbial enzymes involved in the biodegradation of chemical pollutants. Two properties make this network ideal adequate for our objective: first, as consequence of the strong selective pressure and fierce competition inside complex microbial populations, the biodegradation metabolism has evolved very rapidly, thus, making any organizational patterns more evident. Second, it has a clear convergent structure in which pathways direct compounds to the Central Metabolism, so we can relate enzymes properties with the distance to that central hub. Our results revealed an asymmetry in the properties of enzymes along the metabolic network, with an increasing functional complexity in the periphery of the network, and decreased enzymatic specificity close to the central metabolism. The relevance of these patterns for other biological networks is discussed.
P59: Incorporating prior information reduces the complexity of regulatory network models in cancer
Olivier Gevaert Katholieke, Bart De Moor Katholieke
Katholieke Universiteit Leuven/Dept. Electrical Engineering
olivier.gevaert@esat.kuleuven.be
Reverse engineering regulatory networks has been an intensively studied topic in bioinformatics. Integration of different sources of information could facilitate this task. In this paper we investigated the use of known protein-DNA interactions as a structure prior to facilitate the discovery of regulatory mechanisms that are (in)active in cancer in combination with Bayesian network learning. We used three publicly available data sets on ovarian, breast and lung cancer and we were able to show that for all three data sets both the number of parameters and the variance on the posterior distribution was smaller. For two of the three data sets we also observed a reduction in the number of edges of the inferred networks. These findings provide evidence that integrating prior information leads to simpler more robust models while still keeping the same predictive performance on unseen data. Moreover due to our Bayesian approach we can easily visualize and compare the prior, likelihood and posterior distribution over the network structures. This allows a detailed analysis of how the prior influences model choice.
P60: MAPK Module: Biological Basis, Structure, Mathematical Model, and Dynamical Analyse
Natasa A. Kablar
Lola Institute, Belgrade, Serbia
nkablar.ae01@gtalumni.org
In this paper we present mitogen-activated protein kinase (MAPK) module: its biological definition, structure, and model. In modeling stage, we build on result of Kholodenko, and we include newly experimentally observed processes to capture more on real dynamic of cell: cross-linking among the different modules of MAPK and/or cross-linking with other pathways; influence of Phosphatase's, and influence of phosphorylated kinase kinase (KKP) found to have profound effect on module dynamics. For the chosen set of experimentally verifiable parameters we perform dynamic analyze. In investigation of bifurcation, we find Hopf Bifurcation as the only type of bifurcation observed.
P61: Protein Folding Information in Nucleic Acids which is Not Present in the Genetic Code
Jan C Biro
Homulus Informatics
jan.biro@sbcglobal.net
Nucleic acid subsequences comprising the 1st and/or 3rd codon residues in mRNAs express significantly higher free folding energy (FFE) than the subsequence containing only the 2nd residues (p<0.0001, n=81). This periodic FFE difference is not present in introns and therefore it is a specific physico-chemical characteristic of coding sequences and it might contribute to unambiguous definition of codon boundaries during translation. The FFE in the 1st and 3rd residues is additive, which suggests that these residues contain a significant number of complementary bases and contribute to selection for local RNA secondary structures in coding regions. This periodic, codon-related structure-forming of mRNAs indicates a connection between the structure of exons and translated proteins. The folding energy dot plots of RNAs and the residue contact maps of the coded proteins are indeed similar. Residue contact statistics using 81 different protein structures confirmed that amino acids that are coded by partially reverse and complementary codons (Watson-Crick (WC) base pairs at the 1st and 3rd codon positions and translated in reverse orientation) are preferentially co-located in protein structures.
P62: Mass Spectrometric Analysis of Protein Oxidation in the 20S Proteasome: Implications of Thiol Oxidations on Phosphoproteomics
Justin W Torpey, Jessica Q Ho, Miklos Guttman, Gourisankar Ghosh
University of California San Diego
jtorpey@ucsd.edu
The 20Sproteasome is a multicatalytic protein complex that plays an important role in intracellular protein degradation. Several post-translational modifications have previously been described, including phosphorylation , glycosylation, and cysteic acid. Here we investigate further the post-translational modifications present in a highly-purified 20S proteasome complex. The complex was separated by 1-D SDS-PAGE and the gel bands analyzed by mass spectrometry. A database search that incorporated the phosphorylation (+80 Da) variable modification revealed a group of low-scoring phosphopeptides which comprises both true and false results. The single true phosphopeptide was a highly acidic peptide that scored low due to the ubiquitous presence of fragment ion losses of water and/or phosphoric acid. For the others, the false indication of phosphorylation was given by peptides with five additional oxygen atoms (+80 Da) - three from a cysteine sulfonic acid and one from each of two methionine sulfoxides. Our results indicate characteristics of tandem mass spectra that are useful in rapidly distinguishing between phosphorylated peptides and those with a high degree of thiol oxidation.
P63: A predictive approach to learning regulatory motifs and control in Caenorhabditis elegans
Xuejing Li, Anshul Kundaje, Byung Suk Lee, Chris Wiggins, Christina Leslie
Computer Science, Columbia University
abk2001@columbia.edu
The problem of inferring regulatory networks from highthroughput genomic data has become a central focus of systems biology. However, much of the computational work on this problem has been restricted to the single-celled model eukaryote, S. cereviseae. It is clearly important to investigate how well these computational approaches work in multi-cellular eukaryotes, especially since highthroughput data at the resolution of single cells are currently available in such organisms. In this work, we apply MEDUSA, an algorithm for learning transcriptional regulatory programs from gene expression and promoter sequence data, to study early embryonic development in the worm C. elegans. We find that MEDUSA is able to learn true regulatory elements, identify key regulators, and build a regulatory program that make accurate predictions of transcriptional response in held-out data. MEDUSA represents the regulatory control logic of the organism as a single alternating decision tree that makes genome-wide, context-specific predictions of differential expression. We illustrate how this regulatory program can be used to reveal lineage-specific regulation and we use the model to extract the dynamics of the transcriptional regulation across developmental stages.
|