Chad Voegele
Work Experience
Currently, I'm researching techniques to run small language models (SLMs) on edge devices, including end-to-end quantization-aware pre-training and LoRA fine-tuning of <1B Mu language models, post-quantizing models using state-of-the-art methods such as SpinQuant, and working with the QNN SDK to utilize Qualcomm NPUs.
I built the data plane component for the Amazon Bedrock foundation model service. I focused on providing an optimized inference server for Amazon's Titan models, including Large Language Models (LLMs).
I was a founding member of the Lookout for Equipment engineering team, focusing on the development of the Python analytics library jointly with the data science team. Additionally, I re-architected Monitron's back-end analytics flows to Java/Python for rapid prototyping and deployment.
Previously, I worked on big data analysis in Elastic Block Storage.
As part of the survey data reporting team, I co-wrote an asynchronous robust exports micro-service used by several products. As an intern, I built and released a dynamic, high-dimensional pivot table for the real-time customer experience dashboard product.
My group designed, built, and supported over-the-counter (OTC) interest rate swap (IRS) clearing solutions. This included price, margin, and default fund model development.
Most of my work was focused on the portfolio margining offering that allows clients to achieve savings by offsetting OTC IRS risk with interest rate futures.
Education
Research Experience
I worked with Dr. Sreepathi Pai in Dr. Keshav Pai's research group to study high-performance subgraph isomorphism and k-truss identification algorithms. We implemented parallel GPU and CPU k-truss algorithms using the IrGL framework and published the research for the 2017 IEEE HPEC Graph Challenge.