"Concise industry news from the US pharmaceutical industry..."
New Account

The Magazine

Issue 8

This is a short description of the magazine.

E-magazine
  • Previous Issues

Blog

Spencer Green
Chairman, GDS International

Sales and the 'Talent Magnet'

A lot is written about being a ‘Talent Magnet’, either as a company, or as President. It’s all good practice – listen, mentor, reward, provide clear goals and career maps. Good practice for the employer, but what about the employee?
25 May 2011

Integrating Clinical Data

Hurley Consulting | www.hurleyconsulting.com

No Comments

Integrating clinical data is often an arduous task. A complete and thorough assessment of any investigational product involves the integration and subsequent evaluation (summary and/or analysis) of all data that have been collected on the product. The time and effort required to accomplish what seems like a simple straightforward task is often substantially underestimated. The primary explanation for the underestimate is that, while the summary and analysis component of the assessment is straightforward, the data preparation activities are often complex and labor intensive due to lack of standardization.

There are many reasons for the intricacies of the data preparation. The main difficulties are usually attributable to the need to combine data from studies with different objectives and, as a result, databases with completely different structures. Such “data diversity” is expected between early Phase 1 and Phase 2 studies, adequate and well-controlled Phase 3 confirmatory studies, and product differentiation studies. However, the issue is often present when combining data from similar Phase 3 studies.

Ideally, data for all studies being combined have been collected on common case report forms and have been stored either in a common database or in identical individual study databases. Admittedly, that ideal may be unrealistic in today’s world. A close approximation is if all of the study databases use the same variable names and have a similar data structure. When that happens, data integration is straightforward and can be accomplished in minimal time.

Such conformity is hardly ever the case. A major factor that inhibits conformity is the licensing of investigational drug products and marketed products, and the ongoing consolidation and joint-development alliances within the pharmaceutical industry. As a result, the studies that are being combined may have been conducted by more than one company, and the data are likely to be stored in completely different databases using completely different conventions. The task of combining the data into a common data structure can be formidable. That feat must be accomplished in order to integrate the data and provide an overall assessment of the drug.

A Case Study in Data Integration

We recently encountered just such a situation. There was a need to search safety data from 14 studies in order to evaluate the incidence of a specific event. The studies had been sponsored independently by two companies – one company sponsored eight of the studies, and the other company sponsored six of them.

The initial plan was to focus on adverse events and identify occurrences of the event of interest. After all of them had been identified, it could be determined whether they occurred while the subject was on-drug or off-drug. The effort was complicated as the adverse events in some of the studies were coded in COSTART and in other studies, in MedDRA. The use of two coding dictionaries added an additional, but easily manageable, challenge. (If it had been necessary, all of the verbatim terms would have been recoded to MedDRA.)

The event of interest, if it occurred, should have been classified as a serious adverse event (SAE). Thus, the first step in the project was to identify and then search the SAEs. Among the 14 studies, there were three distinct data structures for SAEs. One group of studies had a specific flag to identify SAEs; another group had a flag for each of the conditions that defined an SAE; and the third group did not have any identification of SAEs. In order to identify subjects with the event of interest, it was necessary to search 346 variables across the 14 studies.

The next step was to search all adverse events and identify recorded events that may have been closely related to the event of interest. Identified records would need to be scrutinized to determine if they were associated with the event. The structure of the adverse event data had already been determined when the SAE search was being set up. The same 346 variables were searched, and events that indicated a potential occurrence of the event of interest were identified and further evaluated.
The complexity of the task increased again when it became obvious that the event of interest was not consistently recorded as an adverse event. In order to identify all occurrences, it would be necessary to search other components of the study databases. A search of the adverse event data sets would not identify all of the occurrences of the event of interest with under-reporting as a consequence.

It was necessary to search virtually all text fields in all of the data sets for phrases or words that could be indicative of the event. A major effort was to define the search terms for phrases or character strings. Twenty-six search strings were defined and used in the subsequent search.

The data from the 14 studies were stored in a total of 548 data sets. Those data sets contained 14,025 variables of which 8,866 were character variables. It was immediately obvious that it would not be necessary to search all of the character variables. Specifically, it was decided not to search variables of one or two characters in length. After removing those variables, 7,195 variables remained to be searched. Further assessment of the remaining variables revealed that some of them would not need to be searched (e.g., the verbatim adverse event terms that had already been “searched” during the targeted assessment of adverse events in the initial steps). Careful consideration led to a decision to search a total of 7,034 variables.

A database was created to store information on all potential occurrences of the event of interest that was found through either the adverse event search or the string searches.

The next step was to look at each of the records in that data set and determine if the event of interest occurred while the subject was receiving study drug. The designs of the 14 studies varied, and it was necessary to review each of them individually to determine both the design and the data structure. Some of the studies included dose escalation and/or taper periods at the beginning or end of the “main treatment” period. Since the primary interest was in the dose used during treatment, it was necessary to decide if events in dose escalation or taper periods were on-drug or off-drug.

While the studies were all double blind, some of them had an open-label segment. Since it is well known that adverse events are reported more frequently in open-label periods, it was necessary to decide if events in these periods would be included as on-drug events.
After all decisions had been made, the dosing dates, i.e., treatment dates, were established for all subjects. Due to the multiple ways that dosing data were stored in the individual study data sets, this was the most time-consuming task of the project. It was then possible to determine the drug status (on-drug or off drug) for each of the events.

The final step was to merge individual subject information (e.g., age, race, sex, study treatment). As expected, the data structures varied among the 14 studies, and this too became an involved task. However, it was the most straightforward of all the data preparation activities. The subject information data set was created, and the subject data were merged with the event data. The identification of the events, along with information on the subjects with the events, was complete.

Review of the Data Integration Process

When looking back at the project, it was immediately obvious that most of the time and effort expended was not directed at running the search and identifying the specific events. Instead, it was expended on data preparation activities. The major effort involved deciphering the multiple data structures with minimal documentation for some data sets prior to combining the information from those multiple structures into a single database. The major effort was data integration.

The data integration effort started with a review of the study protocol and case report forms. It was necessary to understand both the study objectives and the specific variables being collected. It is often the case that the same information is collected in a slightly different manner in different studies. The variables collected in one study do not necessarily correspond to similar variables in another study – even though they have been given the same variable name. In order to integrate data properly from such studies, the collected variables must be converted to common variables that represent the same concept. This conversion involves more than just changing variable names.

Conversion to common variables and variable names is not enough. The integrated data set must also contain information that defines the studies that are being integrated. Variables, such as type of study (parallel or crossover) and type of blind (double blind, single blind, open label), must be created to record information. Those “study variables” are usually not required for, and hence usually not included in, individual study databases. Hence, they must be defined as part of the integration process.

CDISC Standardization: A Solution

The current CDISC (Clinical Data Interchange Standards Consortium) initiative will make data integration much easier. This initiative is developing naming conventions for study data sets and standards for the structure and definition of variables included in them. Those standards include specific variable names and types.

Although the CDISC initiative is designed to facilitate the interchange of data between groups, there is no reason to delay its implementation until there is a need to exchange data. The use of a standardized naming convention and database structure, such as that of the CDISC, will also benefit programs in house by reducing the time and effort needed in data preparation activities when combining and analyzing data across studies. In order to support Data Safety Monitoring Board activities for multiple clinical trials, standardization across trials is crucial for timely production of data displays for safety assessments.

The reduction of data preparation time will facilitate the monitoring of data, both safety and efficacy, from on-going clinical studies. As such, it will be possible to more easily pick up early signals of potential efficacy and also of potential safety concerns. This on-going data monitoring process, in turn, could lead to earlier submissions of drug applications and lower development costs.

A standardized naming convention and database structure should be implemented as quickly as possible at every company and organization that collects, records, and analyzes clinical data. CDISC offers the preferred standards to accomplish this goal.


More like this...

Disclaimer: All comments posted in a personal capacity
POST A COMMENT
In order to post a comment you need to be regsitered and signed in.
Register | Sign in
No Comments Have Been Submitted
Disclaimer: All comments posted in a personal capacity