
When you consider that most PCs stand idle for over 90 percent of the average day, it seems logical to find ways to take advantage of the wasted resources.
Grid technology has offered a solution by unifying pools of storage systems and networks into one ‘supercomputer’, regardless of their physical location or access point.
Pharma was one of the first industries to pioneer this new technology, recognizing its cost and time-effective benefits when processing, cleaning, cross-tabulating and comparing massive amounts of data vital to drug development. But whilst giants such as Johnson & Johnson are now reaping the benefits of running applications on the grid, the jury is still out on whether there should be more widespread adoption in the industry – not least because there is some confusion over what grid technology is and what adoption actually entails. NGP asked Simon McIntosh-Smith, Joerg Schwarz and John Powers to deliver their verdict.
Joerg Schwarz is Director of Healthcare and Life Sciences at Sun Microsystems and is as such responsible for global business and sales development. The purpose of his industry group is identifying industry trends, analyzing their impact on business processes and underlying IT priorities, and developing and delivering adequate end-to-end solutions together with partners and all relevant Sun business units, client solution practices and regional sales organizations.
John Powers is Founder and President of Digipede Technologies. An economist with more than two decades of entrepreneurial experience, he has guided the development, marketing, and sale of software systems for more than 10 years. Prior to Digipede, Powers led two firms, most recently founding Energy Interactive (EI), a pioneering energy information software and services company that developed some of the first web-based information services for the electric utility industry.
Simon McIntosh-Smith is Director of Architecture & Applications, at ClearSpeed Technology plc, and has been designing microprocessors and parallel applications for 15 years. He started his career at Inmos and ST Microelectronics, before moving to PixelFusion in 1999, and becoming a founding member of ClearSpeed in 2001. McIntosh-Smith now leads the development of ClearSpeed's application accelerators for pervasive high-performance, high-density computing.
NGP. A need for high performance computing has made grid technology especially appealing for pharma companies. What are the obstacles to more widespread adoption?
JS. People! The basic technologies for doing clusters and grids of various types are available and they work. The technologies will continue to evolve, of course, adding more security, better resource allocation, variable pricing, mixed mode computing, and so forth. The hardest part in setting up a grid is to get the participants to agree to offer their equipment to the pool, to set policies for “fair” access and accounting, to agree on protocols and procedures, and so forth. When multiple institutions are involved, the difficulties are compounded by different methods of operation and accounting.
Nonetheless, there are many examples of successful grids where these problems have been overcome. For example, Sun is involved in the White Rose Grid in the UK and the ACENET grid in Canada, among many others.
JP. In my view, the adoption of grid technology has been slow primarily because of the unnecessary complexity built into most grid solutions. There’s a perception in the market that ‘grid’ means a big consulting project followed by a bigger application integration project, followed by an even bigger IT project. It’s up to the vendors in this space to put a much higher priority on ease of deployment, ease of integration, and ease of use.
A second problem we’ve seen is application-specific distributed computing. Because of the complexity of early general-purpose grids, independent software vendors (ISVs) understandably went off on their own and built a certain amount of distributed computing capability into their own applications. But if your primary area of expertise is in biology or chemistry, you usually end up creating pretty limited distributed computing functionality, without the robustness most customers require. Further, customers are not interested in deploying a new grid every time they roll out an application – they want to use the same infrastructure to support all distributed applications.
As a result, customers often spend enormous sums on building their own distributed computing environments at high cost and with limited functionality. Widespread adoption will only come when grid becomes radically easier to deploy and use.
SM. Grid technology delivers significant advantages but there are many different kinds of grids, so the first obstacle to adoption is simply one of understanding the different kinds of grid architectures and how they relate to the needs of your organization.
Grid technology is used to coordinate distributed compute and data resources and to make their capabilities available to groups of users that may themselves be distributed. While this provides the potential advantages of making efficient use of resources and connecting users to the tools they need, it also introduces complexities that do not need to be addressed in closely coupled, localized installations. This is the next obstacle. Each organization needs to weigh the benefits of implementing grids in their specific environment against the effort required to provide an effective, transparent service to their user communities.
NGP. How has take-up grown in the last few years and how is this likely to change over the next few years?
JP. Frankly, I’m not sure take-up has grown all that much recently. As we just discussed, widespread adoption will only take place when (a) customers begin to reap real benefits of grid, and (b) vendors make simplicity a priority.
The uptake in financial services has been very quick; grid is now a “must-have” in that market. In pharma, adoption has been much slower. There are some grid projects that have gone forward, but most have not progressed beyond a single application. In my opinion, a lot of grid vendors have ignored some basic market facts – most customers can’t spend months or years on grid IT projects, most applications can’t be grid-enabled from the command line, most computers run Windows, ease of use is a higher priority than multi-OS operation, and so on. Until grid vendors do a better job delivering value quickly, customers will continue to either wait or ‘roll their own’.
SM. A fundamental change has occurred in the commercial HPC space over the last few years. Industry standard systems have come to dominate the market by combining excellent performance with the cost benefits of volume markets. Grid technology has added to this trend by making more resources accessible to more users.
While scaling capability by adding more systems and CPUs to clustered architectures has been the best solution for many users in recent years, the costs in terms of power consumption, cooling, space and overall management complexity are now driving the need for more sophisticated approaches. Today’s visionaries are pushing the next wave of HPC system architectures by deploying hybrid clusters. These combine a range of complementary technologies to deliver the optimum match of overall system characteristics to required workloads while paying careful attention to the environmental and economic impact of those systems.
A typical hybrid architecture includes clusters of low cost servers, systems that combine multiple, multi-core CPUs with a large distributed or global shared memory capability and focused complementary solutions that accelerate functions that would otherwise be inefficient bottlenecks in a system designed for general purpose applications.
JS. There has been a significant installation of grids and the rate of uptake will accelerate as grids become popular in the general business (in addition to the research) community. Also, Sun has launched the first, easy to access and use, retail grid (network.com). You just pay US$1/cpu/hour by credit card or Paypal account, without complicated contracts. There is a lot of interest in this form of the grid, as well as other variants that Sun is developing (developer grids, hosted grids, and so forth). The Sun Retail Grid is one of the first really practical manifestations of ‘computing as a utility’, which is an extension of one of the original founding principals of Sun: “the network is the computer.”
NGP. As technology constantly evolves, what advances have been made to make it easier to integrate and utilise data from disparate offices?
SM. Grid technology has evolved to the point where secure and dependable access to distributed systems and data can be used for certain classes of applications as a matter of routine. Initiatives such as the human genome project and folding@home [a distributed computing project designed to perform computationally intensive simulations of protein folding] demonstrate that progress. The next steps lie in the effective use of accelerator technologies to reduce file access latencies and the time and operational cost of lengthy simulations in disciplines like protein folding. Finally, to fully benefit from the heterogeneous resources that are inherent to grid architectures, applications need to be built on standard API’s and libraries that can provide a common method of accessing disparate technologies.
JP. Is it easier? Many businesses haven’t noticed. But sharing data hasn’t been the biggest problem; making that data available for the right applications in a reasonable workflow without unreasonable expertise requirements – that’s been quite a challenge. Our customers tell us they’re sick of spending such a large fraction of their time hacking in Perl instead of doing real science. How is it possible that in the 21st century, the people who are supposed to be discovering cures for disease spend half their time debugging scripts? So, associating data with applications in workflow without a whole lot of specialized skills has to be a priority.
JS. SMAs for getting access to the data, WebServices and Service Oriented Architectures (SOA) are increasingly useful and provide a number of advantages for getting data out of disparate applications. Sun bought SeeBeyond and their products are now part of the Java Enterprise Suite, called JCAPS (Java Composite Application Platform Suite).
NGP. Which areas within the pharma industry are particularly benefiting from grid technology right now and what potential does it have for the future?
SM. It is the nature of high performance computing that users have an insatiable demand for resources. Resource limitations stimulate the generation of new approaches. New approaches and access to additional resources create advances that generate new problems, which in turn challenge the limits of available resources. The ability to sequence and assemble the human genome is a testament to the success of grid technology today. Yet the simulation of protein folding remains beyond the reach of commercial viability for everyday business. Projects like the folding@home initiative demonstrate the value of grid technology in developing future mainstream techniques.
JP. Certainly any compute-intensive R&D activity can benefit today. Compound screening for drug discovery is the area that seems farthest along. But we’re amazed at the new applications customers present to us all the time. Image processing is a huge area – in fact, processing data from any form of sensor is a growing field.
At Digipede, web-based applications are the fastest-growing area we see today. More and more organizations are putting interesting applications online, both for customer service and for research collaboration. We’ve been working with clients who want to share very sophisticated analysis applications with colleagues throughout the world – and as you expose these applications to external use, you need great scalability tools to deliver high quality of service under high and variable loads. Grid computing provides the ideal architecture to handle such scale-out scenarios.
Mainstream enterprise applications – from business intelligence to reporting and printing – are also increasingly being targeted, and that’s an area that will see increasing growth over the next few years. Mainframe migration is driving some of this, but user demand for wider access to the enterprise’s best information and tools is the long-term driver. As you expose internal applications as services in a Service Oriented Architecture (SOA), you face many of the same scalability issues I highlighted above.
JS. Computational grids for discovery research are popular because they help solve some peak load problems. Customers are able to get jobs completed without buying and supporting ever-larger data centers. There are various approaches from ‘cycle scraping’ of idle cycles on various machines to optimum load balancing on clusters and buying cycles on services such as Sun Retail Grid (network.com; and other variants such as hosted services).
NGP. Is security still a major concern? How is this being addressed?
JS. Security, broadly defined, is THE major concern in the life science industry! For example, biotech and pharma companies must protect terabytes of research data for patent and NDA filings, must protect clinical research and trials data following all privacy regulations (such as 21 CFR part 11 and HIPAA), and must deal with all of the normal business data typical of any manufacturing company.
Security actually takes many forms from physical security of assets, to fault tolerance and error correction (a type of operational security), to identity management, to data integrity and preservation, to regulatory compliance.
JP. Grid has the potential to expose existing flaws in security, and it’s critical to integrate grid security with best practices already in place in the enterprise. Despite the knock they take in public, Microsoft has done a great job with security over the past two years, and their current round of product rollouts is making grid security far easier to administer.
The security concerns get even more complex if more than one enterprise is involved. We don’t see many pharma customers exploring (external) utility computing seriously, and security is one of the primary reasons given. To date, customers have shown more willingness to use their own computing resources more effectively than to expose themselves to an additional security risk.
SM. It is difficult to conceive of a time when security will not be a major concern of any individual or organization. However, security is the domain of technology suppliers and IT professionals that provide and manage access to resources rather than component suppliers like ClearSpeed Technology that deliver the resources themselves.
NGP. In the past, costs and licensing were deterring some companies from using grid technology. Is this still the case? What improvements have been made?
SM. Grids are inherently more complex than standalone systems and complexity always has a cost, but the costs of grid software and the resources that are constituents of grids have reduced considerably. As organizations have learned to implement grids effectively the complexity costs have been outweighed by the benefits of improved resource utilization and access. While one of the benefits of grid infrastructures is to optimize the use of expensive software licences, licensing models themselves have been changing to accommodate grid adoption and technology shifts like multi-core CPU architectures.
Today’s challenges have shifted from system acquisition and licensing costs to operational constraints largely related to the space and power provisioning limitations of facilities and associated energy consumption expenses. The capabilities of grid architectures have now become an important tool in managing those costs by connecting users to resources such as systems utilizing ClearSpeed’s energy efficient Advance accelerator boards that provide dramatic performance enhancements without any material effect to facilities costs or power consumption.
JS. Much of Sun's software is free to try and open sourced, with quite reasonably priced support. All of our software is, of course, “grid enabled”. Our multi-threaded CPU products (such as CoolThreads) have special prices for Oracle licensing, for example, and are optimised for grids. Other software vendors are slowly but steadily adopting a similar pricing approach for multi-threaded /multi-core products and grid enabled applications.
JP. Cost is only justified by value. In a world where grid delivers marginal value only after arduous consulting projects, adoption will remain slow and customers will be highly sensitive to price. In a world where grid delivers great value out of the box, cost and licensing issues will become secondary.
The licensing models of many grid vendors remain perplexing, and the licensing models of other ISVs even more so. Customers tell us they want transparency in pricing, not complex negotiated agreements. One of the best decisions we made, in retrospect, was publishing our prices on our website, in a simple storefront where even large enterprise customers can get their pricing questions addressed.
As for ISVs with specialized applications, we continue to work with them on more rational licensing practices. Vendors who put in the effort to grid-enable their software also need to be open to aggressive pricing for processing nodes if they want wider adoption. In mainstream enterprise applications, there’s still a long way to go.