Hardware for AI & Machine Learning - Everything You Wanted to Know
Last updated: December 2022
The growth of modern machine learning is heavily dependent on GPU compute growing faster than Moore’s law, where GPU speeds doubled every 4-6 months (compared to CPUs under Moore’s law doubling every 12-18 months).
Specifically, per OpenAI: “since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.4-month doubling time (by comparison, Moore’s Law had a 2-year doubling period). Since 2012, this metric has grown by more than 300,000x (a 2-year doubling period would yield only a 7x increase). Improvements in compute have been a key component of AI progress, so as long as this trend continues, it’s worth preparing for the implications of systems far outside today’s capabilities.”
For newcomers, here is a quick overview of the Semiconductor industry overall.
For those who want to understand why ML hardware is so important, the single best starting point is this excellent article AI Accelerator series (if you need support, you can refer to the standard textbook of Computer Org Design Book (Patterson Hennesey). The series goes into detail on “how computer architecture principles drive artificial intelligence, how processors became a crucial part of today’s technology, and what ideas are being implemented by some of the world’s leading AI companies. It does not require an in-depth background in computer architecture, and it should be understandable to people that have a good grasp and intuition of software engineering, high-level programming principles, and how a computer system is built. People with a deeper hardware background can also benefit from reading this as a ‘back to basics’ refresher that demonstrates how fundamental ideas have culminated to drive multi-billion dollar industries.”
Beyond the series, this UC Berkeley ML hardware course is a good starting point, if you have the time. If you want to see how important hardware and GPUs are, check out this AI and Compute article from OpenAI, or this older overview from Cornell on AI hardware. The articles from Arxiv below are quite good too and complement each other (esp. the Chen and Reuter surveys, and LeCun’s informative piece.). Finally, using RL to design better chips, which can then power better ML, is a scarily good idea that I hope takes off, as this Google/Nature GPU chip design paper shows.
Why does this matter? The bigger picture is that as ML hardware and GPU costs per flop come down, we get closer to building computers that compete at the level of human intelligence. This last week, Deepmind released an important generalist learning agent called GATO, which mostly relies on the smart use of ML hardware and scaled compute.
Resources:
A Survey of Accelerator Architectures for Deep Neural Networks (Chen et al, 2020)
Survey of Machine Learning Accelerators (Reuter et al, MIT 2020)
Hardware for Machine Learning: Challenges and Opportunities (Sze et all, MIT 2017)
Google’s Tensor Processing Unit (TPU) papers
TPU v1: In-Datacenter Performance Analysis of a Tensor Processing Unit
TPU v2: A Domain-Specific Architecture for Deep Neural Networks
TPU v3: Ditto as above - also Scale MLPerf-0.6 models on Google TPU-v3 Pods, plus Exploring the limits of Concurrency in ML Training on Google TPUs
TPU v4: Ten Lessons From Three Generations Shaped Google’s TPUv4i