June 2, 2025
Our paper on using influence functions to reduce language model toxicity is on arXiv.
I am an incoming first-year PhD student in Computer Science at Oregon State University. I am studying trustworthy and socially responsible machine learning under the supervision of Professor Sanghyun Hong, where I am a member of the Secure AI systems Lab (SAIL). The goal of my research is to audit and improve the robustness of machine learning systems agaisnt adversarial threats, misuse, and generally undesirable behaviors. Currently, I am particularly interested in improving the trustworthiness of large language models (for example, reducing toxicity and auditing jailbreaking). Please do not hesitate to contact me to discuss my work or potential collaborations!
June 2, 2025
Our paper on using influence functions to reduce language model toxicity is on arXiv.
May 26, 2025
I successfully defended my undergraduate honors thesis: On the Robustness of Neural Architecture Search to Data Poisoning Attacks. That about wraps up my undergraduate studies!
December 10, 2024
Our paper on jailbreaking large language models with adversarial bit-flips is on arXiv.
May 1, 2024
Our paper on poisoning attacks agaisnt neural architecture search is on arXiv.
September 21, 2023
Our paper (my first publication!) on auditing multi-exit language models to adversarial slowdown is accepted to NeurIPS 2023.