* Co-first authorship, # Co-corresponding authorship
Preprint
, PLayer-FL: A Principled Approach to Personalized Layer-wise Cross-Silo Federated Learning (arXiv, 2025)
Published
2025
, A Universal Metric of Dataset Similarity for Cross-silo Federated Learning (Proceedings of IEEE International Conference on Data Mining, 2025) , KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code Embeddings (Proceeding of Machine Learning Research , 2025) accepted at Conference in Health, Inference, and Learning (CHIL'25). , Multi-domain rule-based phenotyping algorithms enable improved GWAS signal (npj Digital Medicine, 2025) accepted. , pC-SAC: A method for high-resolution 3D genome reconstruction from low-resolution Hi-C data (Nucleic Acids Research, 2025) 53(7). , Secure and Federated Quantitative Trait Loci Mapping with privateQTL (Cell Genomics, 2025) 5 (2). , ScatTR: Estimating the Size of Long Tandem Repeat Expansions from Short-Reads (25th Conference of Research in Computational Molecular Biology (RECOMB), 2025) accepted. , Secure and scalable gene expression quantification with pQuant (Nature Communication, 2025) 16 (1), 2380.
2024
, Toward Identifying New Risk Aversions and Subsequent Limitations and Biases When Making De-identified Structured Data Sets Openly Available in a Post-LLM world (AMIA Annual Symposium Proceedings, 2024) 262. , Ultra-secure storage and analysis of genetic data for the advancement of precision medicine (Genome Biology, 2024) 25 (1), 1-27. , On the overflow and p-adic theory applied to homomorphic encryption (The International Symposium on Cyber Security, Cryptology and Machine Learning [CSCML'24], 2024) accepted. , Private information leakage from single-cell count matrices (Cell, 2024) 187 (23), 6537-6549. e10. , A framework for sharing of clinical and genetic data for precision medicine applications (Nature Medicine, 2024) 30, 3578–3589. , Privacy-preserving model evaluation for logistic and linear regression using homomorphically encrypted genotype data (Journal of Biomedical Informatics, 2024) 156:104678. , Examining the Generalizability of Pretrained De-identification Transformer Models on Narrative Nursing Notes (Applied Clinical Informatics, 2024) 1:1-198. , Predicting A/B compartments from histone
modifications using deep learning (iScience, 2024) accepted.
2023
, A generalizable physiological model for detection of delayed cerebral ischemia using federated learning (IEEE Bioinformatics and Biomedicine, 2023) , Assessing and mitigating privacy risks of sparse, noisy genotypes by local alignment to haplotype databases (Genome Research, 2023) 33:2156-2173. , Privacy-preserving patient clustering for personalized federated learning (Proceedings of Machine Learning Research, 2023) Machine Learning for Healthcare Conference. , The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models (Cell, 2023) 186(7):1493-1511. , LDmat: Efficiently Queryable Compression of Linkage Disequilibrium Matrices (Bioinformatics, 2023) 39(2): btad092. , Privacy-preserving cancer type prediction with homomorphic encryption (Scientific Reports, 2023) 13:1661.
2022
, A genome-wide atlas of recurrent repeat expansions in human cancer (Nature, 2022) , Privacy-preserving model training for disease prediction using federated learning with differential privacy (44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2022) pp. 1358-1361. , Storing and analyzing a genome on a blockchain (Genome Biology, 2022) accepted. , Genome privacy and trust (Annual Reviews of Biomedical Data Science, 2022) Vol 5. , Functional genomics data: privacy risk assessment and technological mitigation. (Nature Reviews Genetics, 2022) 23, 245–258 (2022). , Privacy-preserving genotype imputation with fully homomorphic encryption (Cell Systems, 2022) 13(2): 173-182.
2021
, Recovering genotypes and phenotypes using allele-specific genes (Genome Biology, 2021) 22:263. , Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption (IEEE Access, 2021) vol. 9, pp. 93097-93110.
2020
, Data Sanitization to Reduce Private Information Leakage from Functional Genomics (Cell, 2020) 183(4): 903-917. , FANCY: fast estimation of privacy risk in functional genomics data (Bioinformatics, 2020) 36(21): 5145-5150. , Using blockchain to log genome dataset access: efficient storage and query (BMC Medical Genomics, 2020) 13(7): 1-9. , DiNeR: a Differential graphical model for analysis of co-regulation Network Rewiring (BMC Bioinformatics, 2020) 21:281. , Using Ethereum blockchain to store and query pharmacogenomics data via smart contracts (BMC Medical Genomics, 2020) 13:74. , An integrative ENCODE resource for cancer genomics (Nature communications, 2020) 11(1): 1-11.
2017
, Three-dimensional chromosome structures from energy landscape (Proceedings of the National Academy of Sciences, 2017) 113(43): 11991-11993. , Spatial organization of the budding yeast genome in the cell nucleus and identification of specific chromatin interactions from multi-chromosome constrained chromatin model (Plos computational Biology, 2017) 13(7): e1005658. , Computational construction of 3D chromatin ensembles and prediction of functional interactions of alpha-globin locus from 5C data (Nucleic Acids Research, 2017) 45(20): 11547-11558.
2016
, Mechanisms of stochastic focusing and defocusing in biological reaction networks: Insight from accurate Chemical Master Equation (ACME) solutions (Conf Proc IEEE Eng Med Biol Soc., 2016) 1480-1483.
2014
, Spatial confinement is a major determinant of the folding landscape of human chromosomes (Nucleic Acids Research, 2014) 42(13): 8223-8230.