A Research Data Warehouse (RDW) is a database designed to accelerate scientific progress. In the medical field, a RDW typically contains current and past clinical, administrative and demographic information on a large number of patients as well as data collected for specific research projects. Although a RDW is primarily a platform for researchers, it may also be used to create administrative and public health reports. The need for a research infrastructure, separate from real-time patient care systems is clear. Researchers are often frustrated by the difficulty of finding, extracting, cleaning, interpreting and transforming the data from real-time systems into useful analytical datasets. They often spend a disproportionate amount of time in these preparatory activities, rather than in the analytic phase of research projects. A RDW aggregates clinical data from current electronic health record (EHR) systems, legacy systems, various public use datasets, and research datasets into a single database. Three important characteristics of a RDW are: 1) it will provide a uniform schema with consistent data definitions and coding, using standard terminology, that will aggregate and harmonize current and legacy data from all sources, 2) its will have interfaces that will extract, transform and load (ETL) data from source systems into the RDW, 3) implement a security model that is compliant with HIPAA and State regulations and institutional policies.
The RDW will be a unique and valuable institutional resource for clinical, epidemiological and health services research. It can be used in support of the health care goals of various government agencies (FDA, AHRQ, NIH), including comparative effectiveness research, biosurveillance, patient safety monitoring, post-marketing drug surveillance, geographical mapping of morbidity and mortality rates (by zip code or census tract), genetic and genomic studies, and many other types of research. Because longitudinal data is stored in a RDW,it will provide an opportunity for researchers to study secular trends of disease, as well as following the health and disease status of large patient cohorts over time.
In our Hybrid Data Warehouse model, an existing Data Warehouse can be coupled with an instance of Hadoop to levarage the benefits of Big Data without having to build a Big Data system from scratch. Using our Hybrid architeture, a data warehouse can offload some of its work-load to it's Big Data component to reduce costs and enhance response time. This is a new view on hybrid data architectures, in which data lakes and warehouses coexist. As the Hadoop/Spark data lake gains more definition and deployments, it's beginning to look like something that will coexist with existing data warehouse technology. It's not an 'all or nothing' thing, it's a 'both' thing. The enterprise data warehouse will not go away and an architecture to couple Data warehouses and Data lakes is the future.
An esay aggregation of clinical data from current EHR systems, legacy systems, various public use datasets and research datasets into a single database
The Research Data Warehouse has proven to be an unique and valuable institutional resource for clinical, epidemiological and health services research.
Implement a security model that is compliant with HIPAA and State regulations and institutional policies.
An uniform schema with consistent data definitions and coding