"Concise industry news from the US pharmaceutical industry..."
New Account

The Magazine

Issue 7

This is a short description of the magazine.

E-magazine
  • Previous Issues

Blog

Where our team of guest writers discuss what they think about the current NGP US Issues.

Peter Duncan
Director of Business Development

Can digital pathology save drug development?

Peter Duncan of Definiens discusses the potential of digital pathology.
07 Jul 2010

“Bio and life sciences researchers are not computer scientists”

Penguin Computing | www.penguincomputing.com

No Comments

NGP: A need for high performance computing has made grid technology especially appealing for pharma companies, but many still struggle to get it right. What are the key things to bear in mind when embarking on a grid computing project?

DJ: Two things: keep it simple and do not take ownership away. First, keeping it simple applies to both end-user and admin transparency. The correct solution will not impose re-factoring of the workload nor will it require researchers to learn new approaches to workload management. As far as possible, the ideal grid should allow continued use of familiar job submission, monitoring, and management tools. Grids can be set up in this manner today with the right tool selection.

Second: ownership. Quite often, grid projects are approached strictly from the point of view of theoretical performance gains but fail to take into account the importance of key human factors at both the end-user and department manager level. While increasing utilization is important, it is critical to allow all participants to maintain some level of control or ownership over their ability to deliver. This ownership allows them to customize their environment, optimize their policies, and dedicate or re-assign resources as needed to meet local deadlines and needs. Taking that ownership away leads to a sense of futility and frustration.

The right solution will maintain the low-level sovereignty needed by researchers and managers while adding new access upon this foundation. This model greatly increases ‘buy-in’ from all parties and makes the grid an asset.

PN: One key thing to bear in mind is that grid technology, while useful for many applications, is often awkward for data-intensive life science and bioinformatics applications. The movement of large genomic or cheminformatic databases between data centers raises issues of data synchronization and bandwidth cost. In fact, for some organizations, local, deployable and easy to manage clusters coupled with secure portals may provide a more cost-effective solution.

NGP: What trends have you seen in terms of take-up over the last few years, and how is this likely to change in the short to medium-term? What are the obstacles to more widespread adoption?

PN: An undeniable recent trend is that data sets have been getting larger and larger. This requires massive computing power to conduct meaningful analysis. Many organizations choose clusters to meet that need. The obstacle involved is that they often go the ad hoc route and build their own cluster.

Building your own cluster can be daunting and error prone because of the complexity involved in procuring, installing, and configuring compute clusters. Building a cluster also typically slows researchers down, due to the amount of time required putting the cluster together and getting applications to run on the compute nodes. This effort significantly reduces time to discovery, the exact opposite of why you want the technology in the first place.

On the other hand, a properly designed cluster can provide ease of deployment and expansion, single-point administration, a guarantee of data consistency and have a positive impact on time to discovery for even the largest data sets. It is just a question of how you get there.

DJ: The adoption of grid is definitely accelerating. As for obstacles, we see two issues frequently raised by our customers. First, while the ability to enable sharing exists through a number of grid products, what people are starting to realize is that the ability to truly control, manage, and even optimize sharing is key. A second major obstacle is the ‘rip-and-replace’ mentality of some solutions, requiring existing infrastructure and technologies to be thrown away. In many environments, this infrastructure represents years of skilled effort encapsulating countless nuances of the way an organization does business. Getting buy-in from a director to put his productivity at risk for six to 12 months while a new foreign technology is introduced may be little more than career suicide.

We have found that our customers really resonate with an alternate approach to grid, which leverages existing infrastructure and allows a grid project to be distributed across a number of small, transparent, and safe steps, where each step preserves existing capabilities while adding new grid benefits. No step is ever taken which cannot be backed out of in a very rapid and transparent matter. And that transparency is key. When all is said and done, users should be able to interact with the grid just as they did before with the cluster.

NGP: Are you witnessing any particularly innovative uses of high performance computing technology right now? And what potential does HPC have for the future of pharma?

PN: Innovative uses of HPC we are seeing today range from target identification using genomic and proteomic applications to high throughput lead selection using docking programs. This variety is because HPC widens the possibilities for discovery.

In addition, the use of computational fluid dynamics and finite element analysis is growing in pharma arena. For example, we now see chemical and bio reactors and the newest and widest use of finite element analysis is in packaging design.

The future potential for HPC in pharma is huge because the application of HPC technology in pharma enables organizations to transform the way discovery research is conducted.

DJ: We are excited about new technology enabling the creation of internal corporate on-demand computing centers. These on-demand centers allow department clusters the ability to dynamically request and provision private, dedicated resources from the shared pool. The on-demand management software guarantees that high-level corporate objectives are met while allowing individual departments the ability to adapt to changing circumstances. Basically, they allow a department cluster to manually request new resources now or in the future, or better yet, to automatically and dynamically grow with workload or replace failed resources on the fly. New resources are customized to match the department cluster and are integrated as a local resource. It is completely transparent, fully automatic, and eliminates the complexities and confusion often found in grids. For all intents and purposes, departments have complete control over a local cluster, which can now integrate remote resources and grow, shrink, or change as needs arise.

NGP: Pharma is increasingly becoming a ‘team sport’, with different organizations often working together to solve complex problems. Does this raise any issues with regards to the integration, analysis and sharing of data? And how can grid computing address some of these issues?

DJ: While grid is a great enabler for collaboration – allowing data and compute resource sharing – more must be done. Much as dynamic chatrooms can be created to bring like-minded people together into a private, personalized space, support for virtual organizations must continue to improve. This support must allow shared resource and data access, environment customization and a more cohesive dynamic environment that can be created and modified to address the changing needs of the group.

PN: The single biggest issue we face today is the same one we have faced historically – the dearth of reliable distributed data management systems or distributed file systems. Provisioning CPU cycles is not the problem today. Provisioning data is.

It is not uncommon to have to manage 500 Gb data sets and move them around within a discovery team. That is a challenge because the state of file systems today is highly fragmented, none are ideal for all situations and many lack reliability. Solving this problem is one of biggest challenges pharma faces.

NGP: Is the security of sensitive data still seen as a hindrance to the proliferation of grid computing? How is this area being addressed?

DJ: There is no question that security is a major factor blocking many otherwise valuable applications of grid. While network encryption and credential management are important, they are only a first step. Many organizations are hesitant to reveal any aspect of their computing effort to outside sources, and this includes file names, job names, or even the fact that they are utilizing resources.

One approach we have used with success allows the creation of a full per-customer ‘Virtual Private Cluster’ (VPC) that embeds all aspects of usage inside a private and customized environment. With this model the outside world sees resources delivered to a closed black box. Inside that box a pristine, dedicated, and private environment is created. These VPCs are also well suited for security in that they can utilize dynamic VLANs, dynamic host security, virtualization and other security features.

PN: Security is a primary concern in the pharmaceutical community, understandable given the characteristics of the industry and the value of early discovery information. A properly designed grid architecture ensures the security of the computing resources and data.

NGP: As organizations progress beyond their initial grid deployments and extend to other parts of the enterprise (or even beyond the enterprise itself), is there a need for more standards and greater interoperability?

PN: Yes, there is a significant need for more standards and greater interoperability. It is especially important with respect to reliable, scalable distributed file systems.

DJ: As with other technologies, the success of grid will mandate continued standards efforts. These will come in protocols for job and data transfer, credential management, and the publishing of resource and service offerings. Right now, the technology may be moving faster than the standards and this may lead to some bumpy interfaces in the future.

PN: On a macro level, I think we need to remember that most bio and life sciences researchers are not computer scientists and to treat them as if they are has an impact on their productivity. It is critical to the industry that computer systems aimed at these scientists should be easy as to use as workstations.

There has been a great deal of technology in recent years targeting the pharma space including elaborate grid computing schemes. But these are incredibly complex and involved, even for sophisticated IT professionals. It is difficult to translate these technologies, however exciting and innovative they are in their own way, into real world pharma information technology. It must be intuitive and easy to use for researchers, simple and reliable to implement for IT. The bottom line is that if its not easy to use, technology is essentially worthless to this target market.


David Jackson is founder and CTO of Cluster Resources, Inc., a leading provider of cluster, grid, and utility computing management software. He has worked in high performance computing (HPC) for over 15 years and has designed and developed popular open source and professional software, including the commercial Moab line of products – the most used grid and workload management solution of Top 500 clusters. He is a founding member of the Global Grid Forum and a key member of the DOE’s Scalable System Software Initiative.

Pauline Nist joined Penguin Computing as SVP of Product Development and Management in January 2006. Before joining Penguin, Pauline served as VP of Quality for HP’s Enterprise Storage and Servers Division and immediately prior to that, as VP and General Manager for HP’s NonStop Enterprise Division, where she was responsible for the development, delivery, and marketing of the NonStop family of servers, database, and middleware software. Pauline is a graduate of the Yale University Executive Management Program and Lessons in Leadership at Harvard University. She is a member of the Society of Women Engineers.


More like this...

Disclaimer: All comments posted in a personal capacity
POST A COMMENT
In order to post a comment you need to be regsitered and signed in.
Register | Sign in
No Comments Have Been Submitted
Disclaimer: All comments posted in a personal capacity