Research project outline: A step towards secure, transformational collaboration in machine learning.

The original idea and proposal for my summer 1 research project!!!!
Research project outline: A step towards secure, transformational collaboration in machine learning.
Like

Utilising Homomorphic Encryption and Smart Contracts to develop a Quantum-safe Privacy Preserving Federated Learning Model for healthcare applications

Introduction: 

The Covid-19 Pandemic was the perfect stage for Artificial Intelligence (AI) and Machine Learning (ML), to save the world most literally. The world saw the need for a paradigm to handle, process and employ, a corpus of data, spilling out of every country in the world, into agile, empirical decisions and judgements. For healthcare systems around the world, AI was an exemplary solution, or at least, could have been. However, when it mattered, AI mostly failed. The complex and uneven global context of the pandemic and the multitude of patient symptoms underscored the challenge of compiling comprehensive and diverse datasets without incentivised and trustworthy frameworks for data sharing and collaboration. This originates in the need for privacy and confidentiality of healthcare data; and in the absence of trust between independent and uncoordinated healthcare systems and research institutions. 

This calls for a convention of clustering data and critical information, while guaranteeing, privacy, confidentiality, and transparency. To this end, in this research project, I look to investigate and propose a Blockchain-based Federated Learning model that utilises Homomorphic Encryption for Healthcare Applications.  

Federated Learning (FL) is a ML schema that allows multiple actors to train a common machine learning model without sharing any data. Traditional FL involves actors conducting individual training sessions in their respective local environments. Subsequently the model updates are transmitted to a fixed central server for aggregation. The core privacy preserving characteristic of FL is the fact that, each actor does not share their individual data, only model updates, i.e. the results of the training (decision weights) are shared. This allows the model to be trained on an extensive repository of heterogeneous data while maintaining security and privacy. Hence, FL offers a promising approach to capitalising on sensitive data (such as in the case of healthcare) by allowing collaboration without sharing raw data. But it still faces two major vulnerabilities:  

  1. Since it uses a single fixed server as an aggregator, this sever can become a single point of failure, that could corrupt the entire system, through security attack or physical damage to the device.  
  1. Inference attacks, where the central server or adversaries could “infer” information about underlying raw data of each agent, through the model updates that are transmitted to the central server. 

Federated Learning on a blockchain is introduced as a safeguard against the first vulnerability. A blockchain is a public, shared and distributed ledger. It can be visualised as network made up of various participants (called nodes). Performing federated learning on the blockchain would mean that instead of transmitting model updates to a central server, they are sent to this dispersed network. It is accommodating for data-sharing between untrusting participants through its use of smart contracts. A smart contract is a program used to automate execution of agreements, providing immediate certainty of outcomes to the nodes.  

As the blockchain is a public record and anyone can access it, FL on the blockchain is still vulnerable to inference attacks from adversaries. To combat this issue, Homomorphic Encryption (HE), offering the crucial future proof, quantum-safe security will be applied to the model updates. Unlike other forms of encryption, the ciphertext generated from HE can be computed on as if it were still in its original form without the need for decryption.  

I chose this topic to show the viability of overcoming the longstanding challenges in data sharing and privacy. The potential impacts of my proposed model could extend beyond the domain of healthcare, offering a blueprint for material and consequential collaboration across diverse domains. As we navigate the complex landscape of the post-pandemic world, characterised by what some might describe as a state of “Permacrisis”, the need for intelligent computer systems and networks has never been more apparent. 

 

Methodology and Timeline: 

The first step of the project would be to design and train the benchmark single dataset ML model. We will make use of a Convolutional Neural Network (CNN), a class of “Deep Neural Networks” along with a dataset of X-Ray images. A deep neural network is inspired by the structure and function of a human brain, they contain multiple layers connected “neurons” that learn by extracting multidimensional information from data (such as images and videos) being fed into them. 

Next, we will implement federated learning. Imagine multiple medical institutions, each possessing their own data sets of X-ray images and patient background, collaborating without directly sharing sensitive details. This phase involves simulating such a network and implementing a federated learning model within it. Each participant trains the model on their local data, and the resulting updates are aggregated on a central server. For the scope of this project we will not actually have real participants but simulated participants (by creating a large dataset) to demonstrate the improvements models experience when compared to smaller datasets. 

In the next pivotal phase, we elevate the security and transparency of our FL model by migrating it onto a blockchain platform, the Hyperledger Fabric. This distributed ledger technology ensures a tamper-proof record of all model updates. To further fortify privacy, HE is introduced, this guarantees that sensitive patient data remain confidential even on the blockchain. 

Finally, we will conduct comprehensive testing of the encrypted federated learning model on the blockchain. We will benchmark its accuracy, security, and efficiency against both the baseline single-dataset model and the unencrypted federated learning model. Subsequently, an in-depth analysis of the results will be conducted to evaluate the overall effectiveness of the proposed approach. 

 

 

 

Intended Outcomes: 

  • This project will help me learn the intricacies of developing and deploying secure ML models at a scale, working with blockchain technology and cryptography. Experience leading a project in an academic/research setting. 
  • Develop an extensible and scalable framework to motivate institutions to collaborate on shared Artificial Intelligence models that hold immense transformative potential 
  • Safeguard honest collaborators from malicious internal data breaches, and from external hacking attempts 
  • Collecting a large enough dataset to learn from complex and global contexts, and demonstrate the improvements in model robustness from larger datasets 

 

Please sign in

If you are a registered user on Laidlaw Scholars Network, please sign in