Zachary Coalson

me.jpg
coalsonz[at]oregonstate[dot]edu

I am a first-year PhD student in Computer Science at Oregon State University. I am studying trustworthy and socially responsible machine learning under the supervision of Professor Sanghyun Hong, where I am a member of the Trustworthy and Responsible AI Lab (TRUE). The goal of my research is to audit and improve the robustness of machine learning systems against adversarial threats, misuse, and other undesirable behaviors. Currently, I am particularly interested in improving the trustworthiness of large language models (for example, reducing toxicity and jailbreaking techniques).

I am also a 2025 NSF GRFP and GEM fellow.

Please do not hesitate to contact me to discuss my work or potential collaborations!

News

Dec 09, 2025 Our paper Hard Work Does Not Always Pay Off: On the Robustness of NAS to Data Poisoning is accepted to TMLR 2025!
Sep 18, 2025 Our paper IF-Guide: Influence Function-Guided Detoxification of LLMs is accepted to NeurIPS 2025!
Sep 10, 2025 I am starting my PhD in Computer Science at Oregon State University!
Aug 22, 2025 Our paper Demystifying the Resilience of Large Language Model Inference: An End-to-End Perspective is accepted to SC 2025!
Jun 30, 2025 I have been awarded the 2025 GEM Fellowship to support my PhD studies. Thank you GEM, PNNL and OSU!
Jun 25, 2025 Our paper Harnessing Input-Adaptive Inference for Efficient VLN is accepted to ICCV 2025!
Jun 16, 2025 I will be interning at Pacific Northwest National Laboratory (PNNL) in the High-Performance Computing Group!
Jun 13, 2025 I am a 2025 NSF GRFP recipient! Thank you NSF!
May 26, 2025 I successfully defended my undergraduate honors thesis: On the Robustness of Neural Architecture Search to Data Poisoning Attacks. That about wraps up my undergraduate studies!
Feb 26, 2025 I have been awarded the ARCS Foundation Oregon Scholar Award to support my PhD studies. Thank you OSU!
Dec 10, 2024 Our paper PrisonBreak: Jailbreaking Large Language Models With at Most Twenty-Five Targeted Bit-flips is on arXiv!
Sep 21, 2023 Our paper (my first publication!) BERT Lost Patience Won’t Be Robust to Adversarial Slowdown is accepted to NeurIPS 2023!

Selected Publications

  1. PrisonBreak: Jailbreaking Large Language Models with at Most Twenty-Five Targeted Bit-flips
    Zachary Coalson, Jeonghyun Woo, Chris S. Lin, Joyce Qu, Yu Sun, Shiyang Chen, Lishan Yang, Gururaj Saileshwar, Prashant Nair, Bo Fang, and Sanghyun Hong
    2025
  2. IF-Guide: Influence Function-Guided Detoxification of LLMs
    Zachary Coalson, Juhan Bae, Nicholas Carlini, and Sanghyun Hong
    In The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025
  3. BERT Lost Patience Won’t Be Robust to Adversarial Slowdown
    Zachary Coalson, Gabriel Ritter, Rakesh B Bobba, and Sanghyun Hong
    In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023