Monday, 16 February 2026

Show HN: Andrej Karpathy's microgpt.py to C99 microgpt.c – 4,600x faster https://bit.ly/3MzBpn5

Show HN: Andrej Karpathy's microgpt.py to C99 microgpt.c – 4,600x faster Andrej Karpathy showed us the GPT algorithm. I wanted to see the hardware limit. The Punchline: I made it go 4,600x faster in pure C code, no dependencies and using a compiler with SIMD auto-vectorisation!!! Andrej recently released microgpt.py - a brilliant, atomic look at the core of a GPT. As a low-latency developer, I couldn't resist seeing how fast it could go when you get closer to the metal. So just for funzies, I spent a few hours building microgpt-c, a zero-dependency and pure C99 implementation featuring: - 4,600x Faster training vs the Python reference (Tested on MacBook Pro M2 Max). On Windows, it is 2,300x faster. - SIMD Auto-vectorisation for high-speed matrix operations. - INT8 Quantisation (reducing weight storage by ~8x). Training is slightly slower, but the storage reduction is significant. - Zero Dependencies - just pure logic. The amalgamation image below is just for fun (and to show off the density!), but the GitHub repo contains the fully commented, structured code for anyone who wants to play with on-device AI. I have started to build something useful, like a simple C code static analyser - I will do a follow-up post. Everything else is just efficiency... but efficiency is where the magic happens https://bit.ly/4rV63GC February 17, 2026 at 01:06AM

No comments:

Post a Comment