Publications

Differentially Private Learning Needs Better Model Initialization and Self-Distillation

Ngong, I. C., Near, J. P., & Mireshghallah, N.

NAACL 2025

TLDR

“Differentially private language models often produce poor quality text, but what exactly makes the text 'bad'? We first systematically analyze the types of errors in private models, finding two main categories: language errors (grammar, spelling, incomplete sentences) and inconsistencies (hallucinations, wrong attributions). Based on this analysis, we propose DPRefine: (1) strong initialization using filtered synthetic data before private training, and (2) self-distillation refinement after. This approach significantly reduces both types of errors while maintaining privacy guarantees, showing that understanding failure modes is key to improving private learning.”
Paper

Citation

@inproceedings{ngong-etal-2025-differentially,
    title = "Differentially Private Learning Needs Better Model Initialization and Self-Distillation",
    author = "Ngong, Ivoline C.  and
      Near, Joseph  and
      Mireshghallah, Niloofar",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.naacl-long.455/",
    pages = "9009--9027",
    ISBN = "979-8-89176-189-6",
    
}

Evaluating the Usability of Differential Privacy Tools with Data Practitioners

Ngong, I. C., Stenger, B., Near, J. P., & Feng, Y.

Symposium of Usable Privacy and Security (SOUPS 2024) — co-located with USENIX

TLDR

“Differential privacy (DP) is essential for privacy-preserving data analytics, but its practical implementation is difficult. A study involving 24 US data practitioners evaluated the usability of four Python-based DP tools: DiffPrivLib, Tumult Analytics, PipelineDP, and OpenDP. Findings show these tools aid beginners in understanding DP, highlight the importance of user-friendly API design and documentation for learning and error reduction, and reveal a strong link between user satisfaction and tool effectiveness. The study emphasizes the need for a balance between usability and the learning required for proper DP application and offers suggestions to enhance the usability of DP tools for wider adoption”
Paper
Citation

Ngong, I. C., Stenger, B., Near, J. P., & Feng, Y. (2023). Evaluating the Usability of Differential Privacy Tools with Data Practitioners. arXiv preprint arXiv:2309.13506.

OLYMPIA: A Simulation Framework for Evaluating the Concrete Scalability of Secure Aggregation Protocols

Ngong, I. C., Gibson, N., & Near, J. P.

2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)

TLDR

“OLYMPIA is a framework that allows for empirical evaluation of secure protocols via simulation. It provides an embedded language for defining protocols and a simulation framework for performance evaluation. OLYMPIA has been used to implement several recent secure aggregation protocols, and the first empirical comparison of their end-to-end running times has been performed. OLYMPIA is available as an open-source resource.”
Paper
Code
Citation

Ngong, I. C., Gibson, N., & Near, J. P. (2024, April). OLYMPIA: A Simulation Framework for Evaluating the Concrete Scalability of Secure Aggregation Protocols. In 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) (pp. 534-551). IEEE.

Different Deep Learning Based Classification Models for COVID-19 CT-Scans and Lesion Segmentation Through the cGAN-UNet Hybrid Method

Ngong, I. C., & Baykan, N. A.

Traitement du Signal, 2023

TLDR

“A two-stage system is proposed to assist radiologists in detecting COVID-19 from CT scans. The first stage uses a hybrid method to segment the infected regions from the images, with the cGAN-UNet hybrid system proving most successful with a dice score of 92.32% and IoU score of 86.41%. The second stage classifies the segmented images as either COVID-19 or not, using a Convolutional Neural Network (CNN), a PatchCNN and a Capsule Neural Network (CapsNet), with the CNN achieving the highest success rate of 99.20%.”
Paper
Citation

Ngong, I. C., & Baykan, N. A. (2023). Different Deep Learning Based Classification Models for COVID-19 CT-Scans and Lesion Segmentation Through the cGAN-UNet Hybrid Method. Traitement du Signal, 40(1), 1.

Prediction Sensitivity: Continual Audit of Counterfactual Fairness in Deployed Classifiers

K. Maughan, I. C. Ngong, J. P. Near

arXiv, 2022

TLDR

“The paper presents prediction sensitivity, an approach for ongoing evaluation of counterfactual fairness in deployed AI classifiers. Prediction sensitivity is effective in identifying discrimination against individuals and does not require sensitive information at prediction time. It can detect violations of counterfactual fairness by answering the question of whether a prediction would have been different for an individual in a different demographic group. The empirical results demonstrate the effectiveness of prediction sensitivity in detecting violations of counterfactual fairness.”
Blog
Paper
Citation

Maughan, K., Ngong, I. C., & Near, J. P. (2022). Prediction Sensitivity: Continual Audit of Counterfactual Fairness in Deployed Classifiers. arXiv preprint arXiv:2202.04504.

Feature Extraction Methods for Predicting the Prevalence of Heart Disease

I. C. Ngong, N. A. Baykan

Springer, Cham, 2021

TLDR

“This paper presents an automatic classification technique using Support Vector Machine (SVM) kernel classifiers and feature extraction methods to accurately detect cardiac arrhythmias from ECG signals. The study shows that the CNN-SVM classifier with a polynomial kernel achieved the highest accuracy of 99.2% and compared favorably with other approaches in literature. The technique has the potential to significantly reduce mortality rates by providing accurate and early detection of beat abnormalities..”
Paper
Slides

Citation

@inproceedings{ngong2021feature,
  title={Feature Extraction Methods for Predicting the Prevalence of Heart Disease},
  author={Ngong, Ivoline C and Baykan, Nurdan Akhan},
  booktitle={The Proceedings of the International Conference on Smart City Applications},
  pages={481--494},
  year={2021},
  organization={Springer}
}