December 1, 2024
ZeRO to Hero: How to Build Your Own Adaptive Learning-Rate Optimizers
Learn how to implement popular adaptive optimizers such as Adam, AdamW, Adagrad, and RMSProp.
I'm an ML Engineer based in Boston, Massachusetts. I specialize in training large language models.
I work on AI systems in the NLP space. Some of my past projects have included large-scale LLM training, RAG-based tools, backport systems, and more.
I develop full-stack React/Next.js appllications that transform complex solutions into accerssible, sharable tools.
Take a look below at some of my featured work for clients from the past few years.
Here are some of my recent blog posts:
December 1, 2024
Learn how to implement popular adaptive optimizers such as Adam, AdamW, Adagrad, and RMSProp.
November 24, 2024
In this blog, we begin embarking on our journey of rebuilding FSDP from scratch by re-implementing popular variants of the Stochastic Gradient Descent algorithm and building intuition for how they work, and what limitations they bring.