Rapid, accurate and cost-effective comparative analysis of biological data requires well-structured, integrated software and high-quality reference data. Structure, integration, and prediction, along with modelling and simulation, are the foundation of integrated bioinformatics. This foundation is necessary to support powerful methods of prediction, verification, navigation, querying and simulation, and ultimately to move toward a systems level understanding of cellular processes.
This foundation is necessary to support powerful methods of prediction, verification, navigation, querying and simulation, and ultimately to move toward a systems level understanding of cellular processes.
Faced with an avalanche of biological data coming from next generation sequencers, microarrays and mass spectrometry, the challenge for research and development is to find the key functional data and transform it into knowledge which can be used to improve the discovery process. It goes without saying that excellent bioinformatics methods and data are required to do this, but it is also clear that this is not enough. The methods and data need to be connected and streamlined so that the scientist does not waste time in the error-prone process of transferring and converting data and results from one format to another. Furthermore, reference data must be properly interconnected so that important associations and relationships are not missed during the analytical process.
Structured, integrated bioinformatics provides the most effective and reliable framework for prediction and understanding of cellular processes in many fields, including research which could lead to new vaccines, diagnostic tests and antibiotics, optimized bioprocesses and enhanced probiotics (1).
Data alone is worth little. A sequence, an expression value, or a genetic interaction, is only significant when it is completed with additional information
Properly exploited, data allows us to understand and predict how a system works. Large quantities of information must be collected, interpreted and structured in an organized way. Fortunately, the importance of relational databases is well accepted in biology. Relational databases offer clear advantages to biologists: data can be accompanied by their description, and can be completed with annotations, comments, and predictions. Data integrity and coherence can be verified systematically and assured. A database management system (DBMS) provides the underlying support for queries and gives users the flexibility to write their own queries according to their need.
The formal structure of a relational database allows the integration of data of diverse nature and origin and ensures that all complementary data is grouped and connected. This makes it possible to perform queries that take advantage of relevant, related data that otherwise would be disconnected. This is one of the ways an integrated system contributes to improving discovery opportunities that may remain overlooked when using non-integrated data and software.
Genostar's MicroB database is designed on this principle. MicroB gathers, collects and connects genomic, proteic, biochemical, metabolic, and taxonomy data on over 700 microbial organisms. The data come from GenomeReviews or RefSeq, UniprotKB and KEGG and are completed by functional classification data from ENZYME and GO, as well as by taxonomy data from NCBI. The database design enables customization of MicroB to permit integration of additional or proprietary data.
Constructing an appropriate query is key to finding relevant data. Anyone who has queried a relational database using SQL, a language not well-known for user-friendliness, is familiar with the associated challenges. Genostar's Metabolic Pathway Builder is a powerful and practical software suite. It enables researchers to efficiently perform extensive genomic, proteic and metabolic analyses of their experimental data or newly sequenced genomes, while drawing on the full content of MicroB for comparative, interactive work.
Metabolic Pathway Builder integrates a wide range of functions for navigation, querying, and interactively visualizing genomic, proteic and metabolic data. After importing proprietary data on one or several organisms, the user can extract reference data from MicroB and answer questions such as:
For example, the genomic map below shows a pathogenicity island in Listeria monocytogenes. It consists of a set of coding regions (enlarged, and in red) for which there are no homologous regions, in terms of sequence similarity, in the genome of Listeria innocua.
The metabolic reactions catalysed by these genes can be quickly identified, along with the metabolic pathways specific to the given organism or strain. The metabolic map below, integrated from KEGG and made interactive in Metabolic Pathway Builder, shows that the non-mevalonate pathway of the steroid biosynthesis map is complete only in L. monocytogenes (reactions colored in red). As shown (reactions colored in blue) two of the reactions in this pathway are not present in L. innocua. The same approach can be applied to sets of genes known to be co-expressed: using the software, it is easy to rapidly identify and display the metabolic reactions catalyzed by these genes, and more generally, to map expression profiles and expression kinetics on the metabolic pathways.
Comparisons can be made between species or strains and the questions can be posed in the inverse order. For example, from a set of metabolic reactions that composes a pathway, it is possible to retrieve the genes whose products catalyze the reactions, and study their location and organization in one or more operons on the chromosome. It is then possible to look for conserved regulatory regions upstream.
The results obtained in Metabolic Pathway Builder, through queries or by application of prediction methods, are immediately viewable in a form consistent with current standards. They can be assembled and organized in tabular format. These tables, with the images of genomic and metabolic maps, can be exported for use in reports.
The availability of high quality data is essential. Tools that can be used to verify the quality of experimental data and complete them are equally indispensable. The Metabolic Pathway Builder software suite provides biologists with a wide range of efficient and powerful predictive methods developed over many years of applied research in bioinformatics.
For instance, one of the methods in Metabolic Pathway Builder can be used to predict the enzymatic activity of a set of coding regions. Profiles based on protein-EC numbers have been pre-calculated and are updated regularly. The user can then apply the prediction method, along with these profiles to a set of coding regions. The efficiency of the method is such that it can be applied to the whole genome. Experience has confirmed that this method regularly yields improvements in identification of enzymatic activity even on well annotated genomes.
Clearly, genomic analysis is not limited to the organisms in MicroB. Using Metabolic Pathway Builder, users can import one or more genomes in any standard format, either in the form of raw sequences or previously annotated. They can then apply well-founded, Markov-based methods to rapidly predict the full set of coding regions of a genome. Additionally, they can compare sequences or sets of contigs, identify specific regions of a strain, assemble contigs and identify and characterize biologically significant regions in functional terms. The software is designed to evolve, and can easily incorporate additional methods or tasks.
Once again, the key word, even for prediction, is integration. Genostar's software is based on technology that is particularly innovative in the way data are represented and how the results can be manipulated. The coherent and uniform model, based on entities and relationships, is what makes the flexible exchange among methods possible. For instance, the results of a method to predict coding regions can be immediately viewed on a genomic map, and subsequently used as input for the method that predicts enzymatic activity. These predictions are automatically taken into account by the queries that search for reactions and metabolic pathways. Researchers do not have to spend time converting data from one format to another. They can rely on efficient, built-in mechanisms that verify the compliance between data and methods. The result is that the discovery process is both accelerated and improved.
A systems biology perspective
Is it possible now to pursue the analytical process several steps further, to improve our knowledge and develop a more complete understanding of cellular functions and processes? In other words, is it possible to extend the static vision of entities and their interactions that software such as Metabolic Pathway Builder can provide toward a larger view that provides understanding of the dynamics of these interactions?
Genostar and the French National Institute for Research in Computer Science and Control (INRIA) have taken up this challenge with the joint development of GNA (Genetic Network Analyzer). GNA is designed to model and simulate molecular interaction networks through an innovative approach using piecewise-linear differential equations.
The principle is to associate a state variable to each gene with its level of expression, typically by means of the corresponding mRNA concentration. In an ideal world, the variation rate of each of the variables would be linked to the values of the other variables through kinetic models in the form of differential equations. Currently, however, the available knowledge about genetic regulatory networks is not sufficient to reliably determine the values of many of the parameters of these equations.
GNA circumvents this problem through the reduction of the kinetic models to piecewise-linear differential equations.(2) The resulting models have the property of being linear at each point in the state space, which permits analytical calculation of the steady states of models. With qualitative knowledge of the inequalities between the parameters, GNA is able to simulate behaviour of genetic regulatory networks. The concentration of gene products such as mRNA are not quantitatively predicted over time, but the qualitative expression level and trend can be computed, displayed and compared with experimental results(3). Thus, while the model remains under-informed, its validity can be verified and it can evolve iteratively through comparison with experimental results.
Genostar continues its close collaboration with its academic partners to develop tools for analyzing the complex state graphs that are the output of the qualitative simulations.
GNA paves the way for applied systems biology in keeping with the available data and knowledge on gene networks, and with the experimental devices available to measure gene expression. Several models demonstrate the adequacy and relevance of GNA in the study of networks with several dozens of genes, including the carbon starvation response in E. coli (4) quorum sensing in P. aeruginosa (5), and the initiation of sporulation in B. subtilis (6).
Bioinformatics can significantly accelerate and improve the R&D process. Integrated solutions provide a structured, easily accessible foundation of validated methods and data. Metabolic Pathway Builder, MicroB and GNA enable life scientists to more productively apply their scientific expertise and time to successfully answering their research questions - turning data into knowledge.
Genostar designed the MicroB database in collaboration with the French National Institute for Research in Computer Science and Control (INRIA), the Swiss Institute of Bioinformatics and the Laboratory of Alpine Ecology (LECA).
Genostar collaborates with the Ibis bioinformatics team at INRIA's Rhône-Alpes research center, and with the Adaptation and Pathogenicity of Micro-organisms Laboratory at the University of Grenoble, Joseph Fourier.
(1) Durand, P., Médigue, C., Morgat, A., Vandenbrouck, Y., Viari, A. Rechenmann, F., "Integration of data and methods for genome analysis", Current Opinion in Drug Discovery and Development, 6(3): 346-352, 2003.
(2) de Jong, H. and Ropers, D, "Strategies for dealing with incomplete information in the modeling of molecular interaction networks", Briefings in Bioinformatics, 7(4):354-363, 2006
(3) ---, "Qualitative approaches towards the analysis of genetic regulatory networks," System Modeling in Cellular Biology: From Concepts to Nuts and Bolts, ed. Z. Szallasi, V. Periwal, J. Stelling. MIT Press, Cambridge MA, 125-148, 2006.
(4) Batt, G., Ropers, D., de Jong, H., Geiselmann,J., Mateescu, R., Page, M., and Schneider, D. "Validation of qualitative models of genetic regulatory networks by model checking: Analysis of the nutritional stress response in Escherichia coli", Bioinformatics, 21(Suppl 1):i19-i28, 2005.
(5) Usseglio Viretta, A and M. Fussenegger, M, "Modeling the quorum sensing regulatory network of human-pathogenic Pseudomonas aeruginosa", Biotechnology Progress, 20(3): 670-678, 2004.
(6) de Jong, H., Geiselmann, J., Batt, G., Hernandez, C., Page. M. "Qualitative simulation of the initiation of sporulation in Bacillus subtilis", Bulletin of Mathematical Biology, 66(2):261-99, 2004.