Project

Funded by the programme R&D infrastructures from the Andalusian Plan for Research, Development and Innovation (PAIDI 2020, IE19_259 FPS)

This computational infrastructure constitutes a crucial element to support the ongoing transformation of the Andalusian Health System from a passive data warehouse to an active entity able of managing, interpreting and generating Real World Evidence (RWE), on new biomedical knowledge from its Real World Data (RWD). This infrastructure is located within the security ring of the Andalusian Health System and is suitable for the analysis of clinical data, subjected to strict security restrictions. In particular it is an appropriate environment to work in collaboration with the Health Population Database la base poblacional de salud (BPS), an unparalleled resource that contains comprehensive clinical data (diagnosis, treatments, drug prescriptions, analytics, usage of the health system, etc.) for more than 13 million of patients, collected since 2001. The analysis of BPS RWD with artificial intelligence (AI) methodologies has an immense potential of generation of biomedical knowledge of rapid translation. Due to this, the infrastructure comprises not only conventional CPUs but also GPUs for massive parallel computation required by AI and machine learning methodologies.

Data privacy and GDPR-compliant management of clinical data

General Data Protection Rules (and specifically the Spanish regulation (Ley Orgánica de Protección de Datos Personales y Garantías de los Derechos Digitales) explicitly prohibits the management of clinical data with some exceptions. One of these is the secondary use for research under certain conditions. As explained in the document BPS and research (Base Poblacional de Salud e Investigación), data request requires of:

  • A report of the study aimed
  • A complete assessment of impact on data protection according to the “practical instructions for evaluation of impact in data protection for data subject to GDPR” (see also Garcia-Leon et al., 2020)
  • A report of the Andalusian Ethics Committee.

Because GDPR, the extraction of sensitive data from the secure environment of the BPS to external computing facilities for its study is problematic given that, even anonymized, the risk of re-identification is not negligible and poses serious ethics challenges.

However, there is a solution that does not compromise data privacy: analyzing the data within the corporative network of the Andalusian Health System. Although, health systems typically do not possess computation facilities (not to mention highly parallel computation facilities to run AI algorithms), this infrastructure has been set up to solve this deficiency.

This infrastructure brings to the Andalusian Health System the necessary computation capacity, including specific equipment to run AI algorithms, for carrying out RWD analysis internally. Thus, the infrastructure endows the Andalusian Health System with a transformative drive change from a mere passive data warehouse to a generator of biomedical knowledge of rapid translation.

The schema depicts this original infrastructure, where clinical projects are outlined and once considered, they are transformed into reports of clinical studies that are submitted to the Ethics Committee, and once authorized are submitted to BPS, along with the corresponding assessment of impact on data privacy. Data are transferred to the infrastructure, always within the secure environment of the corporative network of the Andalusian Health System. There, the study is carried out by FPS personnel, from the Health system, and the results, that do not contain any private data, can be extracted from the secure environment.

figure1

This computational capacity along with the immense amount of high-quality clinical data stored in BPS, makes of the Andalusian Health System an international reference for this innovative concept of biomedical research by endowing it with the possibility of analyzing their own data, opening the possibility to carry out innumerable RWD analyses and facilitating the secondary use of clinical data.