02 - Tensors¶
What this session is¶
About 45 minutes. Tensors are PyTorch's central data type - multi-dimensional arrays, with support for GPU acceleration and automatic differentiation. Almost every line of PyTorch code touches tensors.
What a tensor is¶
A tensor is a generalized array:
- A 0-dimensional tensor is a single number (scalar):
7. - A 1-D tensor is a vector:
[1, 2, 3]. - A 2-D tensor is a matrix:
[[1, 2], [3, 4]]. - A 3-D tensor is a cube of numbers. (Often used for color images:
[height, width, channels].) - Higher: a batch of images, a batch of token sequences, etc.
Every tensor has a shape (its size per dimension) and a dtype (the type of each element).
Creating tensors¶
import torch
# From a Python list
a = torch.tensor([1, 2, 3])
print(a.shape, a.dtype) # torch.Size([3]) torch.int64
# As floats
b = torch.tensor([1.0, 2.0, 3.0])
print(b.shape, b.dtype) # torch.Size([3]) torch.float32
# Zeros, ones, random
z = torch.zeros(2, 3) # 2x3 of zeros
o = torch.ones(2, 3)
r = torch.randn(2, 3) # random normal (mean=0, std=1)
u = torch.rand(2, 3) # random uniform [0, 1)
i = torch.arange(0, 10) # 0, 1, 2, ..., 9
# An identity matrix
I = torch.eye(4)
Default float dtype is float32. Default int dtype is int64. You can specify:
Shape and reshape¶
a = torch.arange(12)
print(a.shape) # torch.Size([12])
b = a.reshape(3, 4) # 3x4 matrix
print(b.shape) # torch.Size([3, 4])
c = a.reshape(2, 2, 3) # 2x2x3 tensor
print(c.shape) # torch.Size([2, 2, 3])
reshape(...) doesn't copy data when it can avoid it - it just changes the "view" on the underlying buffer.
The -1 placeholder means "infer this dimension":
a = torch.arange(12)
b = a.reshape(-1, 4) # 3 rows of 4 (12/4)
c = a.reshape(2, -1) # 2 rows of 6 (12/2)
Useful in functions where you know all but one dimension.
Indexing¶
Like NumPy, like Python lists, but extended:
m = torch.arange(12).reshape(3, 4)
# m is:
# tensor([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])
m[0] # first row: tensor([0, 1, 2, 3])
m[0, 0] # first element: tensor(0)
m[:, 0] # first column: tensor([0, 4, 8])
m[1:, 2:] # rows 1+, cols 2+: tensor([[6, 7], [10, 11]])
m[0:2, 0:2] # 2x2 top-left
Slicing returns a view (shares storage). Modifying the slice modifies the original. Use .clone() if you need an independent copy.
Arithmetic¶
Element-wise:
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])
print(a + b) # [5, 7, 9]
print(a * b) # [4, 10, 18] element-wise multiply
print(a ** 2) # [1, 4, 9]
print(torch.exp(a)) # [e^1, e^2, e^3]
print(torch.sin(a))
Reductions (collapse a dimension):
m = torch.randn(3, 4)
m.sum() # scalar
m.sum(dim=0) # column sums (4 values)
m.sum(dim=1) # row sums (3 values)
m.mean()
m.max()
m.argmax() # index of maximum
dim= is the dimension to reduce over. dim=0 collapses the rows; dim=1 collapses the columns. Confusing the first time; you'll internalize it.
Matrix multiplication¶
The most-used operation in ML. It is not the same as element-wise *.
A = torch.randn(2, 3) # 2x3
B = torch.randn(3, 4) # 3x4
C = A @ B # 2x4 - matrix multiply
# or: torch.matmul(A, B)
The @ operator is matrix multiply. Two requirements:
- Inner dimensions match: (2, 3) @ (3, 4) works because both have 3 in the middle.
- Result is the outer dimensions: (2, 3) @ (3, 4) → (2, 4).
If they don't match, you get an error. Get the dimensions right first; everything else follows.
Broadcasting¶
When you operate on tensors of different shapes, PyTorch tries to make them match by broadcasting the smaller one along the matching dimensions:
a = torch.tensor([[1, 2, 3], [4, 5, 6]]) # shape (2, 3)
b = torch.tensor([10, 20, 30]) # shape (3,)
print(a + b)
# tensor([[11, 22, 33],
# [14, 25, 36]])
b was broadcast across the rows of a. Equivalent to adding [10, 20, 30] to each row.
The rules are precise but the intuition is "align from the right; missing dimensions are filled in by repeating":
When in doubt, print shapes. Most "shape mismatch" errors come from this; once you see the shapes, the fix is usually obvious.
Move tensors to GPU¶
device = "cuda" if torch.cuda.is_available() else ("mps" if torch.backends.mps.is_available() else "cpu")
print("using:", device)
a = torch.randn(1000, 1000).to(device)
b = torch.randn(1000, 1000).to(device)
c = a @ b # runs on GPU if device is cuda/mps
Tensors and operations have to be on the same device. Mixing CPU and GPU tensors raises an error.
Common idiom: define device once at the top of the script; .to(device) every tensor you create.
NumPy interop¶
PyTorch tensors and NumPy arrays interoperate:
import numpy as np
n = np.array([1, 2, 3])
t = torch.from_numpy(n) # tensor sharing memory with the array
back = t.numpy() # numpy array sharing memory with the tensor
If the tensor is on CPU, this is free (no copy). On GPU, you have to .cpu() first.
NumPy is the older sibling - PyTorch borrows most of its API conventions from NumPy. If you've used NumPy, PyTorch tensors will feel familiar.
Exercise¶
In a new script tensor_practice.py:
-
Create a
5×3tensor of random normal values. Print its shape and mean. -
Create the same tensor and add
1.0to every element. (Hint: justtensor + 1.) -
Create a
3×3identity matrix; create another3×3matrix withtorch.arange(9).reshape(3, 3).float(). Multiply them with@. Result? -
Create
a = torch.arange(20).reshape(4, 5). Get the third row. Get the second column. Get the bottom-right2×2submatrix. -
Broadcasting: create
a = torch.zeros(3, 4)andb = torch.tensor([1, 2, 3, 4]). Computea + b. What shape? What values? -
GPU (if available): create two
1000×1000random matrices. Time how longa @ btakes on CPU vs your device. Usetime.time()around the multiplications.
What you might wonder¶
"Why are tensors not just NumPy arrays?" PyTorch tensors add: GPU support, automatic differentiation (page 04), automatic device placement, and a richer API for ML-specific operations. They're NumPy++.
"What's float32 vs float16 vs bfloat16?"
Number formats with different precision/memory trade-offs. float32 (FP32) is the default - 4 bytes per number, lots of precision. float16 and bfloat16 are half-precision (2 bytes); used heavily in training large models for memory savings. Modern GPUs (Volta+) have tensor cores that specifically accelerate these.
"Why both reshape and view?"
view requires the data to be contiguous in memory. reshape may copy if needed. Prefer reshape; reach for view only when you've measured it matters.
"My tensors are on different devices and I'm confused."
Set a device = ... constant at the top of your script. Always .to(device) after creation. This rule alone eliminates 80% of device-mismatch bugs.
Done¶
- Create tensors with various constructors.
- Reshape, index, slice.
- Use element-wise arithmetic, matrix multiplication.
- Use broadcasting confidently.
- Move tensors between CPU and GPU.
Next: Linear algebra you actually need →