Grzegorz Jacenków-Privacy distillation:reducing reidentification risk of multimodal diffusion models

Sdílet
Vložit
  • čas přidán 13. 09. 2024
  • Knowledge distillation in neural networks refers to compressing a large model or dataset into a smaller version of itself. We introduce Privacy Distillation, a framework that allows a text-to-image generative model to teach another model without exposing it to identifiable data. Here, we are interested in the privacy issue faced by a data provider who wishes to share their data via a multimodal generative model. A question that immediately arises is “How can a data provider ensure that the generative model is not leaking identifiable information about a patient?”. Our solution consists of (1) training a first diffusion model on real data (2) generating a synthetic dataset using this model and filtering it to exclude images with a re-identifiability risk (3) training a second diffusion model on the filtered synthetic data only. We showcase that datasets sampled from models trained with privacy distillation can effectively reduce re-identification risk whilst maintaining downstream performance.
    Currently a Data Scientist at Amazon, Grzegorz Jacenków specialises in multimodal learning research and large language models (LLMs). Prior to joining Amazon, he was a PhD student in Healthcare AI at The University of Edinburgh, where he also earned an MSc in Artificial Intelligence. His academic foundation was laid with a BSc in Computer Science with Business and Management from The University of Manchester. Notably, Grzegorz contributed to CERN as a technical student, addressing author disambiguation at Inspire-HEP. His research interests encompass multimodal alignment, low-resource learning, and leveraging knowledge graphs.
    The talk was delivered during ML in PL Conference 2023 as a part of Contributed Talks. The conference was organized by a non-profit NGO called ML in PL Association.
    ML in PL Association website: mlinpl.org/
    ML In PL Conference 2023 website: conference2023...
    ML In PL Conference 2024 website: conference.mli...
    ---
    ML in PL Association was founded based on the experiences in organizing of the ML in PL Conference (formerly PL in ML), the ML in PL Association is a non-profit organization devoted to fostering the machine learning community in Poland and Europe and promoting a deep understanding of ML methods. Even though ML in PL is based in Poland, it seeks to provide opportunities for international cooperation.

Komentáře •