Where our team of guest writers discuss what they think about the current NGP US Issues.

Johnson & Johnson’s Qingqin Li tells NGP how bioinformatics helps us to understand biological processes.
“We could use the information derived from this type of study to better understand the relevant pathway and the relevant target”
-Qingqin Li
One of the primary goals of bioinformatics is to increase our understanding of biological processes by developing and applying computationally intensive techniques. Qingqin Li, informaticist and a Senior Project Lead in the Department of Pharmacogenomics at Johnson & Johnson, uses an example in next generation sequencing to illustrate how this works: “This is perhaps more focused on the computational loads than the computation techniques, even though the techniques in the sequence area have been in use for more than ten years. Those techniques are primarily focused on long reach and next generation sequencing is more short-reach technology.
“This technology allows us to identify normal polymorphisms and mutations within the whole genome or a predefined specific genetic region. Imagine if we have two related biological conditions, and we could use this type of approach to identify the underlying molecular difference that might explain the relevant biological process that might drive the phenotype difference.
“With the next generation sequencing technology, each technology could generate hundreds of millions of sequence reads in a single sequencing reaction. This poses a big computational challenge for us. In order for us to address this type of computational challenge, we set up a cluster computing environment so that we could divide the computational jobs into chunks. For example, for the human genome, we could divide them by the number of chromosomes so that we could send out the computational jobs to different computing nodes simultaneously. And then once the jobs are done, we could assemble the results back into a coherent result.
“Anybody who is working with next generation sequencing data would need to have some sort of infrastructure like this. This was not as much of issue before we got into the next generation sequencing technology, since the traditional server was sufficient to deal with the computational load. But as soon as we got into the next generation sequencing technology, we realized that we would not be efficient if we did not have the cluster computing type of environment.”
Genetic varations
At Johnson & Johnson, the type of bioinformatics activities Li is involved in are around pharmacogenomics activities. “All of these activities are related to the fact that they are genetic variations within the human genome,” she explains. “While a few are critical to the biological function, the majority of them may not be.
“Our job is to try to find the few ones that have phenotypic consequence, being either as disease risk, differential therapeutic efficacy or adverse events upon therapeutic intervention. People sometimes describe this as trying to find the needle in the haystack.
“The first activity is focused on drug targets. You can imagine that if there are sequence variations in the drug targets some of them might interfere with the binding of the therapeutic agent; if there’s inadequate binding, you might expect insufficient efficacy.
“What we do is to systematically provide the target variability information so that this type of information can be taken into consideration during screening assay design and during high throughput screening. After all, that's the first step in identifying the lead compound for therapeutic intervention.”
Li and her colleagues also work on discovering the biomarker impacting efficacy and adverse events using clinical samples. The team collects DNA samples from clinical trials, and through collaboration with the clinical team and statistics, they also have access to the clinical data. They then perform experiments to genotype individuals by looking a set of predefined genetic variants using either a candidate gene approach or a genome-wide association study approach.
“Both approaches allow us to look at the genetic sequence variation in a predefined set of loci,” Li points out. “Typically they are the ones that are common in the study population, typically people describe common as having greater than five percent of frequency in the population.
“By combining the genotype data derived from these loci and comparing that to the clinical information using statistical models, we are able to identify markers that impact efficacy or side effects. Because of the nature of this type of markers, we typically call this a common disease/common variant approach in identifying the biomarker.
“Another area is next generation sequencing. The next gen sequencing approach allows us to look at the rare variants within the human genome so that we can sequence a pre-selected panel of individuals and identify the rare variants, which include those not captured in the public variation databases. With the rare variant discoved. We are then in a position to do the association studies again but now looking at the rare variants.”
Development
How then is the information that comes from these activities used in the drug development process? “Imagine we have a biomarker that could be validated in different clinical populations and have a good enough clinical effect size,” Li says. “Meaning that the subjects carrying the biomarker, for example, have a much better response than the overall population or have a less adverse event. So if we have a marker like that, we could potentially use it to stratify the clinical population and use it in the subset of the population that has a better benefit/risk ratio.
“In addition, we could use the information derived from this type of study to better understand the relevant pathway and the relevant target for a given therapeutic area and feedback into discovery as a next generation of drug targets.”
The field of bioinformatics is in a constant state of development, and Li believes it will evolve in line with improvements in technology, and will address the additional computational challenges posed by this new technology. “This has been my experience in sequencing field, in the gene expression and now in the genetic area. For example, in the genetic field in the past ten years, probably people are still using the microsatellite markers. But with the International HapMap Project taking place and also the advancement of chip technology, we see new chip platforms coming onboard, allowing scientists to look at 10,000 markers at a time and then hundreds of thousands of markers and now millions of markers at a time.
“The new methodology needed to address this type of data would direct the bioinformatics field to advance and address the question posed by the new technology.
“Bioinformatics is an interdisciplinary field. It will continue to involve people from different fields such as molecular biology, statistics, computer science and mathematics, working together to come up with new analytic method and information platforms to address the biological question.”
Qingqin Li is an informaticist and Bioinformatics group leader in the CNS/Internal Medicine franchise Department of Pharmacogenomics at Johnson & Johnson Pharmaceutical Research and Development, LLC. She has been working at the company since 2002, and is responsible for successful execution of data management, analysis and reporting of large-scale exploratory genetic association research projects, as well as enablement of next generation sequencing technology.
