How Aster DM Healthcare used federated learning to better secure AI analysis of sensitive data

How Aster DM Healthcare used federated learning to better secure AI analysis of sensitive data

For the healthcare sector, siloed data comes across as a major bottleneck in the way of innovative use cases such as drug discovery, clinical trials, and predictive healthcare. An Aster DM Healthcare, an Indian healthcare institution, has now found a solution to this problem that could lead to several cutting-edge solutions.

A single patient generates nearly 80MB of data annually through imaging and electronic medical records. RBC Capital Market projects that the annual growth rate of data for healthcare will reach 36% by 2025. “Genomic data alone is predicted to be 2 to 40 exabytes by 2025, eclipsing the amount of data acquired by all other technological platforms,” it says.

Although AI-enabled solutions in areas such as medical imaging are helping to address pressing challenges such as staffing shortages and aging populations, accessing silos of relevant data spread across various hospitals, geographies, and other health systems, while complying with regulatory policies, is a massive challenge.

Dr Harsh Rajaram, COO at Aster Telehealth, India & GCC

istock

“In a distributed learning setup, data from different hospitals must be brought together to create a centralised data repository for model training, raising lot of concerns on data privacy. Hospitals are sceptical in participating in such initiatives, fearing losing control on the patient data, though they see immense value in it,” says Dr Harsha Rajaram, COO at Aster Telehealth, India & GCC. Its parent firm Aster DM Healthcare is a conglomerate with hospitals, clinics, pharmacies, and healthcare consultancy service under its portfolio.

To overcome these challenges, Aster Innovation and Research Centre, the innovation hub of Aster DM Healthcare, has deployed its Secure Federated Learning Platform (SFLP) that securely and rapidly enables access to anonymised and structured health data for research and collaboration.

Federated learning is a method of training AI algorithms with data stored at multiple decentralised sources without moving that data. The SFLP allows access to diverse data source without compromising the data privacy, because data remains at the source, while the model training happens from multiple data sources.

“The platform marks a paradigm shift by getting the compute to the data rather than getting the data to the compute,” says Dr Lalit Gupta, consultant AI scientist-innovation at Aster Digital Health.

“Federated technology provided us a platform through which we can unlock the immense potential data provides to draw better insights into clinical, operational, and business challenges and tap on newer opportunities without the fear of losing control of our data. It will allow data scientists from multiple organisations to perform AI training without sharing raw data. By gaining access to larger data sets, they can develop more accurate AI models. It will also ensure data compliance and governance,” COO Rajaram says.

The building blocks of SFLP

Before deploying the platform, Aster conducted a capability demonstration, or proof of concept, of the platform using hospital data from the Bengaluru and Vijayawada clusters of Aster Hospital.

“The platform comprised a two-node collaboration with machines physically located in Bangalore and Vijayawada. The director/aggregator was in Bangalore and the two envoy/collaborator were distributed between Bengaluru and Vijayawada, respectively. The software setup included Ubuntu 20.04.02 with kernel version 5.4.0-65-generic, OpenFL Python library for collaboration, PyTorch Python library[GG1]  for developing deep learning models, and Nvidia Quadro RTX 6000 GPU,” says Gupta.

Dr Lalit Gupta, consultant AI scientist-innovation at Aster Digital Health

istock

“The Aster IT team helped to install and set up the three servers, enabled ports, installed the operating system and necessary drivers, and maintained the servers. The IT team also helped to fetch the data from PACS and HIS, which was required for federated learning experiments,” he says. PACS refers to picture archiving and communication system, a medical imaging technology used to store and transmit electronic images and reports. An HIS or health information system is designed to manage healthcare data.

As part of the capability demonstration, more than 125,000 chest X-ray images, including 18,573 images from more than 30,000 unique patient data from Bengaluru, were used to train a CheXNet AI model, developed in Python, to detect abnormalities in the X-ray report. The additional 18,537 images provided a 3% accuracy boost due to real-world data that was otherwise not available for training the AI model.

The platform can accommodate any analytical tool and does not have any restrictions on the size of data. “We shall decide on size of data based on use case. In case of our capability demonstration experiments, we used a chest X-ray image database of around 30GB,” says COO Rajaram.

It took Aster about eight months, including four months of the capability demonstration, to deploy the system. The platform went live in June 2022. We are in our early days with hardware and software deployed at only two hospitals currently. We intend to increase these deployments to multiple hospitals and look forward to other providers joining hands to leverage the ecosystem,” says Rajaram.

Addressing new data security challenges

While federated learning as a methodology is a well-acknowledged approach to address the data privacy challenges, it also brings in additional security risks as the data/AI model assets are more exposed to possible hacking. Hence, it is essential to provide security capabilities to go with the privacy.

A set of security related instruction codes are built into the central processing units of the servers, which provide the required hardware-based memory encryption that isolates specific application code and data in memory for data security. “The platform combines federated learning with security guarantees enabled by its hardware. This helps to protect data and AI model in storage, when transmitted over network, and during execution of federated learning training jobs. The security features in the platform provide confidentiality, integrity, and attestation capabilities that prevent stealing or reverse-engineering of the data distribution,” says Rajaram.

“Annotation was already in our PACS system. We used its API for data extraction. Though anonymisation was not required since it was within our network, for the pilot we did anonymise the data from the back end,” he says.

Electronic Health Records, Healthcare Industry

Read More