Skip to article frontmatter Skip to article content

Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Yuval's Intro to LLMs

Introduction

The LLM

Training

Further Reading

TODO

Feedback

Other LLM models

Page Status: Stub ☆☆☆

This page is just a placeholder for future content.

WIP

TODO

“Why we focused on this”: Modern LLMs (GPT, Claude, Llama) all use decoder-only “If you want to learn more about X”: pointers to other architectures and use cases

This would also be a good place to mention:

Why encoder-decoder exists (translation, summarization with separate input/output) Why BERT exists (bidirectional context for classification/understanding tasks) How your decoder-only model is optimized specifically for generation

It’s like the bookend to your introduction — sets scope at the start, provides context at the end.

Further Reading

Beyond the toy LLM