Machine Learning (including Deep Learning and Reinforcement Learning) for Engineers — A Technical Primer (Part 2)
Last updated: December 2022
How should a software developer, hacker, or coding cross-functional partner (Engineering manager, product manager, project manager, etc) who is fascinated by AI but with no ML-specific background build a base? Below are the resources that I used and would recommend.
I offer six stages to jump in:
Learn or review the required math (mostly calculus, statistics, and linear algebra);
Take some introductory machine learning (ML) courses;
Work on some of your own ML projects — learn by doing;
Take advanced reinforcement learning (RL) and deep learning (DL) courses;
Learn about AI ethics and governance so you’re a force for good, not evil and negligence, and;
Start following leaders in the field and reading their papers.
I also have another post (Part 1) on AI/ML for less technical people (without math and CS backgrounds) here (it also has my general reader book and paper recommendations). Another perspective on “How I learn machine learning” from Vicki Boykis.
1) Math for Machine Learning
Math for ML textbook: Solid intro textbook from Deisenroth et al. I would do a quick review here, and if you need more background, do the ICL course below.
Math for ML Coursera course: The fantastic best place to start from ICL. I also enjoyed Khan Academy’s visualizations of many linear algebra concepts (having last studied it in high school!).
Essential Math for Machine Learning: Python Edition: An EdX course from Microsoft.
2) Intro ML Courses and Books
Machine Learning Courses: Old school ML has a mix of supervised and unsupervised learning methods, and is a pre-requisite before doing DL (if you want to deeply understand what you’re doing, instead of just apeing some Keras code).
Andrew Ng’s Coursera Machine Learning class is the best — it was one of the most enjoyable academic experiences of my life, across many STEM and humanities fields (here is the YouTube version of Andrew Ng’s ML Class, updated as of 2018).
Anima Anandkumar at Caltech also has a solid ML course that is both more technical and goes into data and fairness issues. I’ve heard good things about Sebastian Thrun’s Udacity ML course, and the Columbia ML course by John Paisley.
Hands-on Machine Learning: Geron has the best intro-level textbook for ML and DL. It’s simply a delight. I would also get the 2nd edition of Chollet’s great Deep Learning with Python book (he was the creator of the powerful and very useful Keras framework to make doing DL much easier). A newer machine learning book is “Patterns, Predictions, and Actions” by Hardt and Recht. Kubat also has a nice, simple Machine Learning textbook. Finally, I would highly recommend you try to build understandable systems, not black boxes, and so recommend this Interpretable ML textbook, though this paper from Microsoft Research suggests model interoperability is tricky and unreliable.
Machine Learning Glossary: Google has the most ML systems in production of any company in the world. They put together this useful glossary and the best set of tips for: Machine Learning in Production and Hidden Technical Debt in Machine Learning Systems.
Deep Learning Courses: Deep learning comes from “deep neural networks” — networks with more than one layer that have been shown to be great universal function estimators. So after getting some ML basics, above, there are two fantastic courses from Andrew Ng and Geoff Hinton. I think the Andrew Ng is the best Deep Learning teacher out there, and Hinton gives a more advanced, research-level perspective (Hinton’s is not a good first DL class, but a third or fourth one). This DL class by some Amazon research scientists is first-rate too. DeepMind has a nice channel on DL and RL lectures. If you want to geek out DL theory, there is a new textbook on The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks (the older classic is the Goodfellow, Bengio Deep Learning book).
AWS Sagemaker and Google Co-lab: These are the two entry-level tools that will allow you to do your ML and DL research quickly and efficiently on the cloud. I would recommend you pick one and invest some time in understanding how to run notebooks here. I wish these existed when I started my ML and DL studies. Some interesting suggestions for ML in production here.
Production-Grade, Scalable ML: This intro paper with some case study on challenges in deploying ML into production lays out the big picture problem. There is a new book on the emerging field of MLOps and this book by a Google Cloud team on ML Design Patterns. The annual ScaledML conference has some nice talks too. In 2021, Stanford started offering this course on Production-ready Machine Learning system design (even just the notes and slides on data engineering are worth it), along with this interesting Seminar on ML systems. One unusual example of production ML is this Google paper “On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models.”
Early Deep Learning Textbook: This was the textbook published by Goodfellow and Bengio that had the high-level theory, but was a bit dense. It’s more of a decent reference that is already getting dated. For more color, see this hands-on DL textbook. Charu Aggarwal has a Deep Learning textbook and a Recommender Systems textbook, and Skansi has a simpler, easier Deep Learning text.
Russell/Norvig AI Textbook: The new, updated 4th edition is fantastic and gives a comprehensive overview of the field. Skip the 2nd and 3rd editions of this book unless you just want more details on history and context — it’s theoretical and gives a good foundation, but is not practical. The book is by two UC Berkeley profs; Norvig later ran the Google search technical team (the world’s largest and most profitable user of cutting-edge ML). Note that much of this textbook is devoted to dead branches of GOFAI (symbolic logic, genetic algorithms, etc), with only about a quarter covering the really useful machine learning topics. Note that better AI will likely come from these dead older branches (neural networks were considered dead till ~2009), but we need more fundamental research to revive them. I also like this short, and surface-level ML intro.
Machine Learning Surveys. A good set of survey papers on ML. For more recent surveys by the best scholars, check out Michael Jordan’s Foundations and Trends® in Machine Learning. A nice git repo of past ML survey papers is here.
3) Project Ideas and Fun Videos
Most modern machine learning falls into large areas like computer vision (CV), natural language processing (NLP), anomaly detection, recommendation systems, conversational agents, cybersecurity, and so on. You can search online for neat projects, and below are a few I would recommend.
Over 200 Python tutorials: This is a great set of links to a bunch of hands-on tutorials to learn more ML topics. A tutorial gives introductory content to teach a concept succinctly (this excludes books and research papers). Tutorials are helpful when you’re trying to learn a specific niche topic or want to get different perspectives.
Dataset for Machine learning: You can start with just playing with some fun datasets curated by Prof. Sebastian Raschka.
Deep learning gender from names -LSTM Recurrent Neural Networks: Take the input of people’s names and predict their likely gender given their name. Collect errors to update your model and datasets.
Full Stack GPT-3 tutorials: Make a cool chatbot or some other interesting Q&A or text service quickly.
Fun ML art experiments: Over 200 experiments from the Google Culture team.
Object Detection in Photos Using YOLO: This is an advanced CV project to detect what is in photos.
Text Generation Using Transformers and GPT-2: This uses a pre-trained model to generate text — you enter one sentence and can generate paragraphs more. It’s easy to start but takes more work to finetune the model.
Text Classification with Transformer Models: This looks at a few different transformer models and allows you to try simple classification.
Text Classification with HuggingFace Transformer Models: The startup has released some good models and easy tutorials.
Create a Voice Assistant for Games (tutorial for FIFA): Play video games with voice commands using a Deep Learning powered wake-word detection engine.
Creating a Chatbot from Scratch using Keras and TensorFlow: Make your own chatbot in under two hours — the hard way.
From Deep Learning Foundations to Stable Diffusion: Learn to build diffusion models
Word Vectors
Bag of Words Meets Bags of Popcorn (kaggle.com)
On word embeddings Part I, Part II, Part III (sebastianruder.com)
The amazing power of word vectors (acolyer.org)
word2vec Parameter Learning Explained (arxiv.org)
Word2Vec Tutorial — The Skip-Gram Model, Negative Sampling (mccormickml.com)
Encoder-Decoder: One of the most common architectures, which show up in many more complex systems (like diffusion models):
Attention and Memory in Deep Learning and NLP (wildml.com)
Sequence to Sequence Models (tensorflow.org)
Sequence to Sequence Learning with Neural Networks (NIPS 2014)
Machine Learning is Fun Part 5: Language Translation with Deep Learning and the Magic of Sequences (medium.com/@ageitgey)
How to use an Encoder-Decoder LSTM to Echo Sequences of Random Integers (machinelearningmastery.com)
tf-seq2seq (google.github.io)
Tesla’s Head of AI Karpathy on Software 2.0: A fun short video on why ML is really “Software 2.0”. Here is a 2020 video from Karpathy on AI for Self-Driving cars.
Jeff Dean on Systems and ML at Scaled ML: The Head of Google Brain has some nice points here — he’s a person you will want to track.
Scaled ML Conference Talks from 2017-present: These are a bunch of ML experts who run massive production systems, serving billions of people, talking about what interests them in the field.
4) Advanced RL and DL Classes
OK, it may have taken you 2–3 years to get through those basics above (if you work on it part-time and have a job). After you feel comfortable with deep learning, I would recommend diving into advanced DL, reinforcement learning, and meta-learning. Below are the cutting-edge classes as of early 2021.
Foundation Models and Large Language Models (LLMs): Once you have basic ML down, this Stanford foundation model paper will get you up to the cutting edge on large-scale models that are transforming research and industry, starting with large language models (BERT, GPT-3) but which are evolving into multimodal large models (CLIP, XLM-R, etc). The entire Stanford AI department coauthored this 160-page paper, and it’s worth a close read. GPT-3 is the classic LLM, though recent work suggest Pathways/PaLM is the top LLM, with interesting work done on the emergent abilities of LLMs (magical!), and Scaling Instruction-Finetuned Language Models. If you want to see a LLM take on scientific papers, check out the Galactica experiment from Meta.
Transformers and BERT: The transformer model changed what attention could be and seems to have displaced LTSM and RNNs - it’s the dominant architecture as of 2022 for billion-scale ML. The Illustrated Transformer, Transformers from Scratch, and Kaiser’s Google/transformer talk are good overviews, followed by the same for BERT and GPT-2. Then there’s RoBERTa with TPUs. You can learn more about GPT-2 here and then train your own model here and in this co-lab GPT-2 notebook. Finally, if you just want to mess around without coding and play with the output of transformer models, you can Write With a Transformer here or even play the old-school AI Dungeon game here. Here is a cool HuggingTweet notebook to play with. Finally, if you’re operating at billion-scale, efficient transformers like the Reformer and Linformer will be important, though this recent 2022 Google “Efficient Transformer” paper suggests old-fashioned transformers work best.
Multimodal Deep Learning: Getting data from multiple source types to learn (video, audio, text, sensors, etc) to make a coherent learning model is the aim of this field. Here is an early paper by Andrew Ng’s team and a more recent survey of multimodal deep learning in 2017. If you want to focus on the hard problem of video modeling, an Amazon team put this tutorial together. More recently, check out CLIP and DALL-E from Open AI.
Variational Auto-encoders: Still an emerging field with lots of great ideas — start with this survey.
Hardware for Machine Learning: This is generally a neglected area, but this UC Berkeley ML hardware course is the best starting point. If you want to see how important hardware and GPUs are, check out this AI and Compute article from OpenAI. A shorter, older overview from Cornell on AI hardware. Finally, using RL to design better chips, which can then power better ML, is a scarily good idea that I hope takes off, as this Google/Nature GPU chip design paper shows. More references on hardware for machine learning are here.
Reinforcement Learning: Emma Brunskill has a great intro RL course at Stanford, and I would pair it with the Coursera Univ. of Alberta RL course taught by Martha and Adam White. If you want to see want an old-school RL course is like, check out the RL lectures by David Silver at DeepMind (this is one harder). Sutton and Barto have the classic textbook on RL.
Graph Networks and Causality: I would check out Judea Pearl’s books and then Scholkopf’s Elements of Causal Inference and this Gentle Intro to Graph Embeddings. I think of the classic course as Daphne Koller’s at Coursera on Probabilistic Graphical Models.
Privacy: As datasets become more pervasive and go deeper into our lives, we will need better privacy frameworks. Check out Roth and Kearns’ book The Ethical Algorithm, along with this course from Prof. Kamath on Differential Privacy.
Advanced Deep Learning, Reinforcement Learning, and Deep RL: I would recommend the more hands-on Deep RL Book by Maxim Lapan, the Foundations of Deep RL book by some Googlers, and this Deep Reinforcement Learning survey paper. Hands down the best lecture course on Deep RL is the one by UC Berkeley professor Sergey Levine, who is a star researcher and a great teacher. There’s also this fun book on Generative Deep Learning. Finally, OpenAI recently released their own, interesting deep RL course, Spinning Up RL, and they have a page with top DeepRL papers to survey. DeepMind has its in-house course from 2018 on Advanced DL and RL. And now even Deep Unsupervised Learning is getting closer to its promise. Khan has an interesting tutorial for Deep learning with Bayes, and here is a practical paper on Bayesian deep learning.
Meta-learning: What AlphaGo and AlphaZero have shown, is that the cutting edge is in Deep Multi-Task and Meta Learning (taught by Chelsea Finn at Stanford). See the slides from Finn’s CVPR 2019 tutorial on meta-learning here (and another tutorial from 2021 here). Also, check out this paper on variable-shot meta-learning from Levine’s group. This is the material I plan to work through in 2022 — it’s the edge of my circle of competence, so I will stop here.
Diffusion models: These are the latest type of models that allow people to create striking visuals from text descriptions using a text encoder, image information creator, and image decoder. A great hands-on, first intro is the FAST.AI diffusion model course, which pairs well with Jalamar’s Illustrated Stable Diffusion and Scale’s Practical Guide to Diffusion Models. Cohere has a good guide to prompt engineering. For more technical detail on how diffusion models differ from GANs, VAEs, and flow-based models, see Lillian Wang’s Diffusion Model Primer, or this HuggingFace annotated paper on Diffusion models.
5) Ethics and AI/ML
Courses on AI/ML Ethics: Don’t build evil, predatory, or negligent AI. It takes some time to educate yourself. Some courses I’m aware of are: FastAI’s Practical Data Ethics course; Stanford’s CS182: Ethics, Public Policy, and Technological Change course taught by Rob Reich et al.; UC Berkeley’s Human Contexts and Ethics of Data; MIT Media Lab’s Ethics and Governance of AI(lectures are here); Fairness in ML course; Harvard Law School’s Ethics and Governance of AI course; UMass course on Ethical Issues Surrounding Artificial Intelligence Systems and Big Data, and this CMU course Truth, Justice, and Algorithms that covers selected theoretical topics at the interface of computer science and economics. These seminars by Timnit Gebru and Emily Denton on Data Ethics and algorithmic bias are decent (though I don’t agree with everything they argue for). Stanford recently launched CS 384: Ethical and Social Issues in Natural Language Processing — which covers some real-life topics I’ve encountered in NLP AI systems I’ve built. Finally, here is an Ethics and Governance of AI Reading List from the Berkman Klein Center at Harvard. For the really nerdy ones, here is a list of tech ethics courses with syllabi. A specific, important sub-field is data ethics, though going from problems and ethics to Structuring Techlaw is really tough. A final bagatelle: This paper on AI for Social Good.
Books on Algorithms and AI Ethics: Start with this HBR article “A Practical Guide to Building Ethical AI”, and then: Kearns and Roth’s The Ethical Algorithm; this MIT short book on AI Ethics (which is just ok); Red Abebe’s thesis on Designing Algorithms for Social Good; and Lucy Suchman’s Human-Machine Configurations. End with some more details on Fairness and Machine Learning. Here is a longer reading list on the ethics of AI/ML. There are some “bad takes” on AI ethics too (heavy on politics, light on ML systems analysis or ethics) - for that I’d suggest Kate Crawford’s Atlas of AI or the work or Bender’s “On the Dangers of Stochastic Parrots.”
Principles from US Institutes, Companies, and Governments: Google came out with some early Responsible AI practices, and this was matched with the Asilomar AI Principles that a smart, multi-disciplinary group of researchers came up with. The US Department of Defense’s DIA came out with these AI Principles (for warfighting!). Finally, here is an Ethics & Algorithms Toolkit, a risk management framework for governments put together by some data science researchers.
European AI Principles: Europe tends to take a heavier hand with AI ethics codes. The EU has its official stand in the EU Ethics Guidelines for Trustworthy AI. They also have the less onerous and somewhat vague OECD AI Principles and the quite good and detailed WEF report on Empowering AI Leadership An Oversight Toolkit for Boards of Directors. The EU has recently doubled down in a heavy EU AI Act, which may stifle AI development in the EU more than gently regulate it. More EU AI Act detail here.
Chinese AI Ethics: Meanwhile, China is the only country on par with the US in AI (arguably better for AI in production tools, worse for AI research). First there are the Beijing AI principles, developed by the Beijing Academy of Artificial Intelligence (BAAI), an organization backed by the Chinese Ministry of Science and Technology and the Beijing municipal government. The code was developed in collaboration with the most prominent and important technical organizations and tech companies working on AI in China, including Peking University, Tsinghua University, the Institute of Automation and Institute of Computing Technology within the Chinese Academy of Sciences, and the country’s big three tech firms: Baidu, Alibaba, and Tencent. Second, there’s the Ministry of Science and Technology of the People’s Republic of China (MOST) which established a National Governance Committee for the New Generation Artificial Intelligence and released the Governance principles for the new generation of artificial intelligence — Developing responsible artificial intelligence. More here on Ethical Principles and Governance Technology Development of AI in China. Note that China released new AI regulations in 2022, and it’s hard to know if they are enforced seriously or are just a CCP power play, but the Chinese AI laws should not be ignored.
6) Papers to Read and People to Follow
How to Read a CS Paper: Great basic intro, in case it’s been a while. Here is star ML researcher and teacher Andrew Ng on how to read an ML/DL paper, and
Getting Started with Deep Learning Papers: Some good advice on getting started. I would take it and then tackle this roadmap: Deep Learning Paper Reading Roadmap and then check out these DL resources. While you’re at it, I would read these classics, the early Deep Learning Nature paper (not easy), and the DL overview by Schmiduber. Here is a fun application NEJM paper by Jeff Dean and others on Machine Learning in Medicine.
Older detailed book/paper recs from UC Berkeley: if you want to dig deeper into any sub-topic of AI (agents, search, planning, NLP, etc)- my warning is that many of these are dead branches of “good old fashioned AI” (GOFAI).
Year-end ML research reviews: The top labs in the world put out an annual, year-end research review that is worth reading, as are the top pick papers in conferences like ICML, NeurIPS, IJCAI, CVPR, ECCV, ACL, EMNLP, RecSys, ScaledML, etc. Check out 2019-2021 reviews at Google Research, Microsoft Research, FAIR, IDSIA, DAIR.AI, etc.
ML and Data Science Newsletters and Podcasts:
Jack Clark Import AI (from OpenAI)
Deeplearning.ai’s The Batch (from Andrew Ng’s team)
Lex Fridman Podcast (it’s fabulous!)
TWIML AI Podcast (more technical, more on MLops)
“Academic Twitter” is a great way to follow ML experts and research organizations:
Fei-Fei Li
Michael Nielsen
Chris Olah
Ian Goodfellow
Anima Anandkumar — see her great talks here.
KD Nuggets