Mica-Opal IT infrastructure evaluation tutorial
Let's consider the scenario depicted in Figure 1 where two studies, Study 1 and Study 2, aim to harmonize their data and analyse harmonized datasets via a federated database infrastructure web portal, thereby allowing them to preserve individual-level data on their own servers. Maelstrom Research has developed such an infrastructure to support multi-center research projects. This infrastructure is founded on two web-based open source software applications, Opal, and Mica, both freely available at www.obiba.org and is used both to generate common format (i.e. harmonized) datasets across participating studies and to allow federated analyses of these datasets without the need to physically pool data in a central location.
In the context of the two studies participating in the data harmonization project illustrated in Figure 1, Opal is used to define and document DataSchemas (i.e. sets of variables targeted for harmonization) and to develop study-specific processing algorithms which transform study data stored in Opal into common (i.e. harmonized) DataSchema format datasets to be analyzed across Study 1 and Study 2. Mica, a powerful web-based software application used to web portals for epidemiological study consortia, is the second main component of the Maelstrom Research data harmonization IT infrastructure. In the project illustrated in Figure 1, Mica is used to federate the two independent Opal servers hosting harmonized datasets and to query and analyze the content of these datasets without actually pooling them together in a central location. Using the Mica-Opal federated infrastructure, both studies retain all control over individual-level data since local Opal instances compute aggregate data before sending results to the central Mica web portal.
Figure 1: Typical data harmonization and database federation project infrastructure set up
This guide will walk users through the basics of Opal and Mica to build such an infrastructure and to conduct a data harmonization and database federation project.
By the end of this tutorial, users will learn:
- how to build a study catalog for a network of studies using Mica
- how to develop a DataSchema, i.e. an annotated list of variables targeted for harmonization
- how to import and manage study datasets in Opal
- how to develop and execute processing algorithms in Opal to generate a harmonized dataset
- how to connect Opal servers and Mica to create a federated database network
- how to query distributed databases and produce summary statistics and contingency tables