
Machine Learning (ML) requires vast amounts of data, but often the data sets that enrich the models are owned by different parties and protected by privacy, security, trade secrets, or regulatory requirements. Likewise, the applied ML models (e.g., classifiers) are often owned by different parties and may be proprietary, requiring stringent protection to reduce the threat of exposure for the input data and modeling results. Due to these limitations, organizations in the government and private sector are unable to cooperate fully in model training and development to gain the best performance of ML systems.
Patriot Labs is interested in Cooperative Secure Learning (CSL) that supports new methods of protecting data, models, and model outputs among multiple entities and/or stakeholder groups. The primary aim is to enable cooperation and the secure sharing of information for the purpose of improving each other’s ML models, while assuring the individual privacy of pre-existing datasets and models.
For purposes of this CFI, solutions should focus on delivering working prototypes of computational techniques for improving ML models; and providing insights and methods that support privacy preservation and data security. Underlying algorithms will be evaluated based on their accuracy and privacy as well as their computational feasibility. Demonstrations should be based on a realistic use case to which CSL can be applied; and be reflective of a realistic data or model sharing problem. The use case may span any discipline, including, defense, medical, industrial, cyber, or other national security domains. It should include relevant group relationships, including privacy/security relationships among the group members, clearly identifying what information in the models and data need to be protected.
Proposed solutions should describe the privacy technology approach for sharing information among parties, and how the secure representations of the data and models can be used for training models and sharing results appropriately. The development workplan should include: (i) a high-level description of the methodologies to be applied in the CSL solution drawing from research in homomorphic encryption, multiparty security, differential privacy, and other methods; (ii) an explanation of how the algorithm scales with the number of parties, data size, and ML model type and size; and (iii) an analysis on the tradeoffs between model performance vs. data privacy (accounting for security relationships) and model performance vs. model privacy.
Approaches could include a two-step development approach, where Step 1 represents a relatively simple use case and Step 2 is a more realistic use case reflective of real-world scenarios. Special consideration given to solutions that draw upon cryptographic methods (e.g., secure multiparty computation, homomorphic encryption, etc.,) differential privacy, and other methodologies. While a complete software package of the CSL prototype may not be a practical outcome of this CFI effort, it should result in new methods that enable better informed and more robust ML models without compromising privacy.
