Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

What is this book, and who’s it for?

“You don’t really understand something until you can explain it.”

This book is my attempt to synthesize my understanding of how LLMs work. It’s based on my reading of Build a Large Language Model (From Scratch) by Sebastian Raschka, as well as a lot of back and forth with AI chatbots to help me through the things I didn’t understand.

I wrote this book for myself, because there’s no better way to make sure you’ve learned something than to try to explain it. But it’s my hope that others may find it useful as well.

Feedback encouraged!

The bottom of the main nav (either the left pane, or the icon at the top bar, depending on your screen) includes a feedback form. This is an anonymous Google Sheets form. I don’t track your email when you submit, and you don’t need to be logged in.

I welcome any corrections, comments or questions. Please make sure to include the page and chapter you were on, as the form won’t include them.

The term “LLM”

LLMs — large language models — encompass a range of technologies. These include models that generate text, but also translation tools, classification tools, and others.

There are various architectures under the LLM umbrella, such as BERT (I’ll cover some of these in Other LLM models). But when most people talk about “LLMs”, they really mean the ones that can generate text and images — and specifically, an LLM architecture called Generative Pre-trained Transformer, or GPT.

Following that colloquial usage, this book will use “LLM” and “GPT” interchangeably.

Organization

Parts in the journey

I find it useful to think about LLMs in three hierarchical perspectives:

  1. The fundamental concepts

  2. Algebraic reformulations of those concepts

  3. The actual implementation

This book will primarily focus on the first two perspectives. It leaves the third essentially untouched, though I wrote an implementation of a GPT-2 LLM based on this book. (Let me know if you’d like me to tie this implementation more closely to the book!)

For more implementation details, you should refer to resources like Sebastian Raschka’s Build a Large Language Model (From Scratch) or Hugging Face’s course (which I haven’t read, but I hear good things about).

The book is organized into four parts:

  1. Introduction (you are here), which includes a very high level overview of LLMs and a quick refresher on vectors and matrices

  2. The LLM, which will walk you through the architecture of an LLM from 0 to 60

  3. Training, which will discuss how an LLM learns the values that drive that architecture

  4. Further reading, which will talk about modern improvements to the LLM, as well as other, related ML technologies.

This book is meant to be read front-to-back

The driving principle behind this book’s organization is that you should be able to read it front-to-back. This means:

That said, I’ll sometimes need to tease ahead to topics that I’ll discuss in detail later. When I do, I’ll try to give just enough context to make the current thing I’m explaining make sense. I’ll provide cross-reference links where relevant, but you don’t need to click through to them.

(Of course, human learning being the way it is, you may still need to refer back to a section you’ve already read to remind yourself of it. Basically: yes to having to flipping back, no to having to flipping ahead.)

Callouts

Throughout the book, I’ll use callouts like this:

Some of these will be collapsed and are expandable; others are just visual blocks.

What I assume about you

This book assumes high school math. Maybe a bit more, but not much.

The most advanced math topic is vectors and matrices, and even for those, the book includes an overview of what you need to know. There is also a glancing blow of tensors, but again, I’ll explain just what you need from those.

It’s also helpful to have familiarity with derivatives, but you won’t have to know the nitty-gritty.

That said, this book will be getting into the specific math behind LLMs, so the more comfortable you are with math, the easier it’ll likely be to follow along.

Contributions

The source for this book is on my GitHub. Please feel free to suggest corrections there, especially if I got something factually wrong.