Summer 1 Blog: Machine Learning, Cryptography and some Self Discovery?

A recap of my summer 1 research experience and a gentle introduction to Zero Knowledge Proofs! For anyone wondering: the poster image is the Utah/Newell Teapot: One of the world's first ever 3D Models. This is considered the 3D Modelling equivalent of a "Hello World!" Program.
Summer 1 Blog: Machine Learning, Cryptography and some Self Discovery?
Like

Over the summer (my first summer of the Laidlaw programme), I have been researching and experimenting on a framework that I hope would allow multiple institutions and organisations (healthcare in my experiments), to collaborate on machine learning models without having to worry about leaking private, confidential and sensitive data. My research project title is: Utilising Homomorphic Encryption, Smart Contracts and ZK-SNARKs to develop a Quantum-safe Federated Learning Model for Healthcare Applications. (I promise it’s not all buzzwords).

 In this blog post I delve into my experience of doing an independent research project. I will talk more about the research experiments and findings in a research report that is due to be published soon.

I would like to preface all of it by saying that without the Laidlaw programme and support from my supervisor (and a tiny bit of delusion?), I would not have dared to take myself into this venture – I had never created elaborate cryptosystems, worked with cybersecurity; and only had cursory understanding of how machine learning models worked under hood.

I eventually got started on my research; the first part entailed gauging the “state of the art”, which meant navigating through a labyrinth of academic literature. The first paper I opened had what, at the time, looked like a tangled plate of spaghetti, brimming with elaborate polynomials and boilerplate jargon. However, over the next couple of weeks these seemingly “magical” papers started making sense to me, it was really fulfilling doing a deep dive and really understanding the theoretical basis for a branch of computer science (my major at College) I hadn’t explored earlier. But, as I tackled a mountain of papers, I realised – my initial research proposal had a myriad of literature behind it already. This led me to redefining the scope of my project in hope of making material and somewhat original contributions to research. I did not include ZK-SNARKs in my initial proposal but found that this was relatively less researched in the context of federated learning and its applications in healthcare. This was a very exciting point during the summer, also one that probably had the greatest impact on how my project would materialise over the coming weeks.

 ZK-SNARKs: Zero Knowledge Succinct Non-Interactive Arguments of Knowledge, are a form of Zero Knowledge Proofs (ZKPs): these allow a party or agent (the prover(s)) to prove they possess or have “knowledge” of certain information without revealing anything else, including what this information is. ZK-SNARKs differ in some technicalities from other forms of ZKPs but the share the same essence. This can be very mystifying, especially when understanding how to construct these because they have several moving parts behind the scenes. But getting in the depths of it, I began to see, even with the “mystics”, ZK-SNARKs are absolutely elegant, they are the result of years of intricate mathematics and cryptography research.

 For anyone interested, a simple example of a Zero Knowledge Proof is presented as an extra at the close. This requires only a very elementary understanding of algebra. Heading back to my experience:

 After studying up on literature, I started designing and implementing the system I proposed. This was probably the most daunting step. As an undergraduate, it felt intimidating to write research reports, engage with PhDs, and conduct simulations on a supercomputer. Despite the challenges, I found the experience to be immensely rewarding at every step. I started off by implementing a machine learning and then a federated learning model, to put it lightly the first few iterations of the model were “astronomically underwhelming”. I spent days on end, trying to fine tune these, there were times where I felt like an imposter, lost for direction. But there came a moment where I had to step back, evaluate where I am with the project and somewhat reset the scope of the project, this meant extra things I did not originally plan to do, cut back on things I did propose to do and take a more theoretical stand on my proposed system. This was slightly frustrating and felt somewhat fraudulent; but I realised I don’t have to solve every issue within secure federated learning, but I could still contribute positively by laying down an elaborate framework, paving the way for future work while also concluding to and showcasing preliminary yet realistic and insightful simulation results.

Throughout the weeks of the project I have learnt so much academically, and even more so, I have realised my strengths and where I want to improve on. I learnt that roadblocks are bound to occur, results of experiments aren’t always what you expect them to be, and it’s okay for it to be this way, but it’s important at stages like these to be able to be resilient, confident and to slow down, self-criticise (to an extent) and contextualise where we stand, if the way we are moving is the best way, and if it isn’t: strategise to adapt to changing directions

Thank you for reading! If you haven’t dozed off yet and are interested: Here is  the promised example of a Zero Knowledge Proof:

 Let’s say we have a prover Peggy, who wants to prove that they know** a polynomial (of arbitrary degree) that the verifier, Victor also knows, without revealing said polynomial.

** Knowing a polynomial means knowing its coefficients and its degree. 

More concretely: 

  • There exists a polynomial p(x), that Victor knows. 
  • Victor has not shared with Peggy what p(x).
  • Peggy wants to prove to Victor that they know this p(x), without revealing it

Peggy can prove their knowledge of the polynomial with the following steps:

  1. Victor chooses a random value for x and evaluates this polynomial with this x.
  2. Victor gives x to Peggy and asks to evaluate the polynomial in question 
  3. Peggy evaluates their polynomial at x and gives the result to Victor.
  4. Victor checks if the local result is equal to the prover’s result, and if so then the statement is proven with a high confidence.

This works because of how polynomials behave, particularly the fact that, number of intersections of two non-equal polynomials of degree at most d cannot exceed d. Additionally it is impossible to find two non-equal polynomials which share a consecutive  chunk of curve (excluding the case of single point chunk).

The degree of a polynomial is the greatest exponent of x:

p(x) = x3 – x2 + x + 11 has degree 3.

 This property of polynomials comes from the Fundamental theorem of Algebra. Read more here.

Going back to the proof, please take note of the phrase “high confidence”, this is because even though the probability of an equal evaluation is very small, it is possible: consider x is in the range 1 to 1050, the minimum number unequal evaluations of two non-equal  polynomials is 1050  - d, hence the probability that the evaluations are equal by accident is  d /  1050  - d, which can be considered negligible, and Peggy and Victor can do multiple rounds to gain even greater confidence.

Please sign in

If you are a registered user on Laidlaw Scholars Network, please sign in