Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Before we get into the meat of LLMs, let’s do a quick refresher on vectors, matrices, and tensors.

This subject can take up whole chapters of a math text book, but we only need to know a few things:

If you know all that, feel free to skip this chapter.

Scalars, vectors, matrices, and tensors

For our purposes:

We refer to items within a vector by its index, starting with 1: v1=1\mathbf{v}_1 = 1 in the above.

We refer to items within a matrix by its row and column, in that order: M1,2=5M_{1,2} = 5.

If we think of vectors as “an object with a single index” and matrices as “an object with two indexes”, a tensor just abstracts that into N indices. That N is called the tensor’s rank: a vector is a rank 1 tensor, and a matrix is a rank 2 tensor.

We call the number of elements in a vector its size, or its dimensionality. The terms are essentially interchangeable, though I find I tend to use “size” more when talking about the mechanics of math operations, and “dimensionality” more when talking about how much information the vector carries.

Lastly, we can treat a vector of size dd as a matrix of size 1×d1 \times d or d×1d \times 1.

123size 3 vector[123]1×3 matrix[123]3×1 matrix\underbrace{ \begin{vmatrix} 1 & 2 & 3 \end{vmatrix} }_{\text{size 3 vector}} \longleftrightarrow \underbrace{ \begin{bmatrix} 1 & 2 & 3 \end{bmatrix} }_{1 \times 3 \text{ matrix}} \longleftrightarrow \underbrace{ \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} }_{3 \times 1 \text{ matrix}}

Math operations

Overview

There are just a few operations we’ll need to understand:

Note that all of these work on vectors or matrices, not higher-rank tensors. When we work with higher-rank tensors, we’ll use some indices to slice those tensors into vectors or matrices, and then apply the above operations to those slices. For example, given a rank-3 tensor Xk,i,jX_{k,i,j}, we can think of each Xk, X_{k,\text{ }\dots} as a matrix, and then apply some matrix operation to each one.

dot products

Combines two vectors into a single scalar. Both vectors must be the same length.

vw=scalar number\mathbf{v} \cdot \mathbf{w} = \text{scalar number}
matrix multiplication:

Combines two matrices into another matrix. The first matrix’s column length has to be the second matrix’s row length. The result has the same number of rows as the first matrix, and the same number of columns as the second.

Aa×bBb×c=Ca×cA_{ \underline{a} \times b } \cdot B_{ b \times \underline{c} } = C_{ \underline{a} \times \underline{c} }

The expression ABA \cdot B can also be written as just ABAB.

transposition

Swaps a matrix’s rows and columns, which you can visualize as flipping along its ╲ diagonal. This is denoted as ATA^T.

[123456]T=[142536]\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}
adding matrices

We can add two or more matrices as long as they’re the same size. This just means adding their corresponding elements:

[123456]+[102030405060]=[112233445566]\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} + \begin{bmatrix} 10 & 20 & 30 \\ 40 & 50 & 60 \end{bmatrix} = \begin{bmatrix} 11 & 22 & 33 \\ 44 & 55 & 66 \end{bmatrix}

Adding vectors works the same way: we just treat an nn-vector as a 1×n1 \times n matrix:

[123]+[102030]=[112233]\begin{bmatrix} 1 & 2 & 3 \end{bmatrix} + \begin{bmatrix} 10 & 20 & 30 \end{bmatrix} = \begin{bmatrix} 11 & 22 & 33 \end{bmatrix}
multiplying by a scalar

We can multiply a matrix (or vector) by a scalar, which just means applying the multiplication to each element:

10[123456]=[102030405060]10 \cdot \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} = \begin{bmatrix} 10 & 20 & 30 \\ 40 & 50 & 60 \end{bmatrix}

Dot products

A dot product combines two vectors of the same size (dimensionality) into a single number.

The two vectors are often represented as a horizontal vector on the left and a vertical vector on the right, but that’s just a convention. It only matters that they have the same number of elements.

The dot product is simply the sum of terms, where each term the product of the two vectors’ corresponding elements:

[abc][αβγ]=aα+bβ+cγ\begin{bmatrix} \textcolor{red}{a} & \textcolor{limegreen}{b} & \textcolor{goldenrod}{c} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{red}{\alpha} \\ \textcolor{limegreen}{\beta} \\ \textcolor{goldenrod}{\gamma} \end{bmatrix} = \textcolor{red}{a \cdot \alpha} + \textcolor{limegreen}{b \cdot \beta} + \textcolor{goldenrod}{c \cdot \gamma}

If the two vectors are normalized to have the same magnitude, the dot product specifies how aligned they are: higher values means more aligned.

Matrix multiplication

In the matrix multiplication of two matrices AA and BB, each cell (i,j)(i, j) is the dot product of the corresponding row ii from AA and column jj from BB. This produces a thorough mixing of the two inputs: every row from AA gets combined with every column from BB.

C=[A1,1A1,2A2,1A2,2][B1,1B1,2B2,1B2,2]=[[A1,1A1,2][B1,1B2,1][A1,1A1,2][B1,2B2,2][A2,1A2,2][B1,1B2,1][A2,1A2,2][B1,2B2,2]]=[A1,1B1,1+A1,2B2,1A1,1B1,2+A1,2B2,2A2,1B1,1+A2,2B2,1A2,1B1,2+A2,2B2,2]\begin{aligned} C &= \begin{bmatrix} \textcolor{steelblue}{A_{1,1}} & \textcolor{steelblue}{A_{1,2}} \\ \textcolor{limegreen}{A_{2,1}} & \textcolor{limegreen}{A_{2,2}} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{red}{B_{1,1}} & \textcolor{goldenrod}{B_{1,2}} \\ \textcolor{red}{B_{2,1}} & \textcolor{goldenrod}{B_{2,2}} \end{bmatrix} \\[2.5em] &= \begin{bmatrix} \begin{bmatrix} \textcolor{steelblue}{A_{1,1}} & \textcolor{steelblue}{A_{1,2}} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{red}{B_{1,1}} \\ \textcolor{red}{B_{2,1}} \end{bmatrix} \quad & \begin{bmatrix} \textcolor{steelblue}{A_{1,1}} & \textcolor{steelblue}{A_{1,2}} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{goldenrod}{B_{1,2}} \\ \textcolor{goldenrod}{B_{2,2}} \end{bmatrix} \\[2.25em] \begin{bmatrix} \textcolor{limegreen}{A_{2,1}} & \textcolor{limegreen}{A_{2,2}} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{red}{B_{1,1}} \\ \textcolor{red}{B_{2,1}} \end{bmatrix} \quad & \begin{bmatrix} \textcolor{limegreen}{A_{2,1}} & \textcolor{limegreen}{A_{2,2}} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{goldenrod}{B_{1,2}} \\ \textcolor{goldenrod}{B_{2,2}} \end{bmatrix} \end{bmatrix} \\[4em] &= \begin{bmatrix} \textcolor{steelblue}{A_{1,1}} \cdot \textcolor{red}{B_{1,1}} + \textcolor{steelblue}{A_{1,2}} \cdot \textcolor{red}{B_{2,1}} & \quad \textcolor{steelblue}{A_{1,1}} \cdot \textcolor{goldenrod}{B_{1,2}} + \textcolor{steelblue}{A_{1,2}} \cdot \textcolor{goldenrod}{B_{2,2}} \\[1em] \textcolor{limegreen}{A_{2,1}} \cdot \textcolor{red}{B_{1,1}} + \textcolor{limegreen}{A_{2,2}} \cdot \textcolor{red}{B_{2,1}} & \quad \textcolor{limegreen}{A_{2,1}} \cdot \textcolor{goldenrod}{B_{1,2}} + \textcolor{limegreen}{A_{2,2}} \cdot \textcolor{goldenrod}{B_{2,2}} \end{bmatrix} \end{aligned}

When you multiply matrices:

For example:

[1234][5678]=[[12][57][12][68][34][57][34][68]]=[15+2716+2835+4736+48]=[19224350]\begin{aligned} &\begin{bmatrix} \textcolor{steelblue}{1} & \textcolor{steelblue}{2} \\ \textcolor{limegreen}{3} & \textcolor{limegreen}{4} \end{bmatrix} \begin{bmatrix} \textcolor{red}{5} & \textcolor{goldenrod}{6} \\ \textcolor{red}{7} & \textcolor{goldenrod}{8} \end{bmatrix} \\[1.5em] =&\begin{bmatrix} \begin{bmatrix} \textcolor{steelblue}{1} & \textcolor{steelblue}{2} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{red}{5} \\ \textcolor{red}{7} \end{bmatrix} & \quad \begin{bmatrix} \textcolor{steelblue}{1} & \textcolor{steelblue}{2} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{goldenrod}{6} \\ \textcolor{goldenrod}{8} \end{bmatrix} \\[1.25em] \begin{bmatrix} \textcolor{limegreen}{3} & \textcolor{limegreen}{4} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{red}{5} \\ \textcolor{red}{7} \end{bmatrix} & \quad \begin{bmatrix} \textcolor{limegreen}{3} & \textcolor{limegreen}{4} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{goldenrod}{6} \\ \textcolor{goldenrod}{8} \end{bmatrix} \end{bmatrix} \\[1.5em] =&\begin{bmatrix} \textcolor{steelblue}{1} \cdot \textcolor{red}{5} + \textcolor{steelblue}{2} \cdot \textcolor{red}{7} & \textcolor{steelblue}{1} \cdot \textcolor{goldenrod}{6} + \textcolor{steelblue}{2} \cdot \textcolor{goldenrod}{8} \\ \textcolor{limegreen}{3} \cdot \textcolor{red}{5} + \textcolor{limegreen}{4} \cdot \textcolor{red}{7} & \textcolor{limegreen}{3} \cdot \textcolor{goldenrod}{6} + \textcolor{limegreen}{4} \cdot \textcolor{goldenrod}{8} \end{bmatrix} \\[1.5em] =&\begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix} \end{aligned}