Skip to content

02 - Tensors

What this session is

About 45 minutes. Tensors are PyTorch's central data type - multi-dimensional arrays, with support for GPU acceleration and automatic differentiation. Almost every line of PyTorch code touches tensors.

What a tensor is

A tensor is a generalized array:

  • A 0-dimensional tensor is a single number (scalar): 7.
  • A 1-D tensor is a vector: [1, 2, 3].
  • A 2-D tensor is a matrix: [[1, 2], [3, 4]].
  • A 3-D tensor is a cube of numbers. (Often used for color images: [height, width, channels].)
  • Higher: a batch of images, a batch of token sequences, etc.

Every tensor has a shape (its size per dimension) and a dtype (the type of each element).

Creating tensors

import torch

# From a Python list
a = torch.tensor([1, 2, 3])
print(a.shape, a.dtype)            # torch.Size([3]) torch.int64

# As floats
b = torch.tensor([1.0, 2.0, 3.0])
print(b.shape, b.dtype)            # torch.Size([3]) torch.float32

# Zeros, ones, random
z = torch.zeros(2, 3)              # 2x3 of zeros
o = torch.ones(2, 3)
r = torch.randn(2, 3)              # random normal (mean=0, std=1)
u = torch.rand(2, 3)               # random uniform [0, 1)
i = torch.arange(0, 10)            # 0, 1, 2, ..., 9

# An identity matrix
I = torch.eye(4)

Default float dtype is float32. Default int dtype is int64. You can specify:

x = torch.zeros(2, 3, dtype=torch.float16)
y = torch.tensor([1, 2, 3], dtype=torch.int32)

Shape and reshape

a = torch.arange(12)
print(a.shape)                     # torch.Size([12])

b = a.reshape(3, 4)                # 3x4 matrix
print(b.shape)                     # torch.Size([3, 4])

c = a.reshape(2, 2, 3)             # 2x2x3 tensor
print(c.shape)                     # torch.Size([2, 2, 3])

reshape(...) doesn't copy data when it can avoid it - it just changes the "view" on the underlying buffer.

The -1 placeholder means "infer this dimension":

a = torch.arange(12)
b = a.reshape(-1, 4)               # 3 rows of 4 (12/4)
c = a.reshape(2, -1)               # 2 rows of 6 (12/2)

Useful in functions where you know all but one dimension.

Indexing

Like NumPy, like Python lists, but extended:

m = torch.arange(12).reshape(3, 4)
# m is:
# tensor([[ 0,  1,  2,  3],
#         [ 4,  5,  6,  7],
#         [ 8,  9, 10, 11]])

m[0]              # first row: tensor([0, 1, 2, 3])
m[0, 0]           # first element: tensor(0)
m[:, 0]           # first column: tensor([0, 4, 8])
m[1:, 2:]         # rows 1+, cols 2+: tensor([[6, 7], [10, 11]])
m[0:2, 0:2]       # 2x2 top-left

Slicing returns a view (shares storage). Modifying the slice modifies the original. Use .clone() if you need an independent copy.

Arithmetic

Element-wise:

a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])
print(a + b)               # [5, 7, 9]
print(a * b)               # [4, 10, 18]    element-wise multiply
print(a ** 2)              # [1, 4, 9]
print(torch.exp(a))        # [e^1, e^2, e^3]
print(torch.sin(a))

Reductions (collapse a dimension):

m = torch.randn(3, 4)
m.sum()                    # scalar
m.sum(dim=0)               # column sums (4 values)
m.sum(dim=1)               # row sums (3 values)
m.mean()
m.max()
m.argmax()                 # index of maximum

dim= is the dimension to reduce over. dim=0 collapses the rows; dim=1 collapses the columns. Confusing the first time; you'll internalize it.

Matrix multiplication

The most-used operation in ML. It is not the same as element-wise *.

A = torch.randn(2, 3)      # 2x3
B = torch.randn(3, 4)      # 3x4
C = A @ B                  # 2x4 - matrix multiply
# or: torch.matmul(A, B)

The @ operator is matrix multiply. Two requirements: - Inner dimensions match: (2, 3) @ (3, 4) works because both have 3 in the middle. - Result is the outer dimensions: (2, 3) @ (3, 4) → (2, 4).

If they don't match, you get an error. Get the dimensions right first; everything else follows.

Broadcasting

When you operate on tensors of different shapes, PyTorch tries to make them match by broadcasting the smaller one along the matching dimensions:

a = torch.tensor([[1, 2, 3], [4, 5, 6]])     # shape (2, 3)
b = torch.tensor([10, 20, 30])                # shape (3,)

print(a + b)
# tensor([[11, 22, 33],
#         [14, 25, 36]])

b was broadcast across the rows of a. Equivalent to adding [10, 20, 30] to each row.

The rules are precise but the intuition is "align from the right; missing dimensions are filled in by repeating":

a.shape = (2, 3)
b.shape = (3,)        becomes (1, 3) then broadcast  (2, 3)

When in doubt, print shapes. Most "shape mismatch" errors come from this; once you see the shapes, the fix is usually obvious.

Move tensors to GPU

device = "cuda" if torch.cuda.is_available() else ("mps" if torch.backends.mps.is_available() else "cpu")
print("using:", device)

a = torch.randn(1000, 1000).to(device)
b = torch.randn(1000, 1000).to(device)
c = a @ b           # runs on GPU if device is cuda/mps

Tensors and operations have to be on the same device. Mixing CPU and GPU tensors raises an error.

Common idiom: define device once at the top of the script; .to(device) every tensor you create.

NumPy interop

PyTorch tensors and NumPy arrays interoperate:

import numpy as np

n = np.array([1, 2, 3])
t = torch.from_numpy(n)            # tensor sharing memory with the array
back = t.numpy()                   # numpy array sharing memory with the tensor

If the tensor is on CPU, this is free (no copy). On GPU, you have to .cpu() first.

NumPy is the older sibling - PyTorch borrows most of its API conventions from NumPy. If you've used NumPy, PyTorch tensors will feel familiar.

Going deeper

You can make and manipulate tensors now. This is the depth that turns the cryptic errors every PyTorch beginner hits into instant diagnoses - because you'll hit all of these in your first week.

The error you'll see most: shape mismatch

90% of early PyTorch errors are shapes that don't line up. The message looks scary but is precise:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (32x10 and 64x128)

Decode it: a matrix multiply needs the inner dimensions to match - (32x10) @ (64x128) fails because 10 != 64. The fix is making the inner dimensions agree (here, the first matrix's columns must equal the second's rows). The habit that prevents 90% of these: print .shape constantly.

print(x.shape)        # torch.Size([32, 10])   -- batch of 32, 10 features
print(w.shape)        # torch.Size([64, 128])  -- mismatch! 10 != 64

When any tensor op errors, your first move is always printing the shapes of the operands. The numbers in the error map directly to the numbers you print - the mismatch jumps out. Shape debugging is the single most-used PyTorch skill.

The device error: CPU vs GPU tensors

The second-most-common error - mixing tensors on different devices:

RuntimeError: Expected all tensors to be on the same device, but found at least
two devices, cuda:0 and cpu!

A tensor lives on the CPU or a specific GPU, and you can't do math across devices. This happens when your model is on the GPU but a batch of data is still on the CPU (or vice versa). The fix is moving everything to the same device:

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
x = x.to(device)              # data must go to the SAME device as the model
print(x.device, next(model.parameters()).device)   # confirm they match

The discipline: pick one device at the top, .to(device) your model once, and .to(device) every batch. "Expected all tensors on the same device" always means a batch (or a new tensor) didn't get moved - check .device on both operands.

The dtype trap

Less common but baffling when it hits - mixing data types:

RuntimeError: expected scalar type Float but found Long

A tensor of integers (Long) where floats (Float) were expected. Image pixels loaded as ints, labels vs inputs, etc. Fix with .float() / .long():

x = x.float()                 # cast to float32 for the model
y = y.long()                  # labels for CrossEntropyLoss must be Long
print(x.dtype, y.dtype)       # torch.float32 torch.int64

CrossEntropyLoss specifically wants float inputs (logits) and long labels - a frequent source of this error.

The silent one: broadcasting did something you didn't expect

This doesn't error - it gives wrong results, which is worse. Broadcasting auto-expands mismatched shapes, sometimes not how you meant:

a = torch.tensor([[1.0], [2.0], [3.0]])   # shape (3, 1)
b = torch.tensor([10.0, 20.0])            # shape (2,)
(a + b).shape                              # (3, 2)! - broadcast to a 3x2 grid, probably not intended

When a result has a surprising shape, broadcasting silently expanded something. Print shapes before the op; if they're not what you expect, reshape explicitly (.unsqueeze(), .view(), .reshape()) to control it. Broadcasting is powerful but its silent expansion is a real correctness footgun.

Try it (with what you'll see)

  1. Deliberately multiply mismatched matrices (torch.randn(32,10) @ torch.randn(64,128)). Read the error, print both shapes, fix the inner dimension, watch it work.

  2. If you have a GPU: put a model on cuda, feed it a CPU tensor, hit the device error. Add .to(device) and fix it.

  3. Make a Long tensor, pass it where float is expected, see the dtype error, .float() it.

  4. Add (3,1) and (2,) tensors and look at the surprising (3,2) result - feel broadcasting expand silently.

Exercise

In a new script tensor_practice.py:

  1. Create a 5×3 tensor of random normal values. Print its shape and mean.

  2. Create the same tensor and add 1.0 to every element. (Hint: just tensor + 1.)

  3. Create a 3×3 identity matrix; create another 3×3 matrix with torch.arange(9).reshape(3, 3).float(). Multiply them with @. Result?

  4. Create a = torch.arange(20).reshape(4, 5). Get the third row. Get the second column. Get the bottom-right 2×2 submatrix.

  5. Broadcasting: create a = torch.zeros(3, 4) and b = torch.tensor([1, 2, 3, 4]). Compute a + b. What shape? What values?

  6. GPU (if available): create two 1000×1000 random matrices. Time how long a @ b takes on CPU vs your device. Use time.time() around the multiplications.

What you might wonder

"Why are tensors not just NumPy arrays?" PyTorch tensors add: GPU support, automatic differentiation (page 04), automatic device placement, and a richer API for ML-specific operations. They're NumPy++.

"What's float32 vs float16 vs bfloat16?" Number formats with different precision/memory trade-offs. float32 (FP32) is the default - 4 bytes per number, lots of precision. float16 and bfloat16 are half-precision (2 bytes); used heavily in training large models for memory savings. Modern GPUs (Volta+) have tensor cores that specifically accelerate these.

"Why both reshape and view?" view requires the data to be contiguous in memory. reshape may copy if needed. Prefer reshape; reach for view only when you've measured it matters.

"My tensors are on different devices and I'm confused." Set a device = ... constant at the top of your script. Always .to(device) after creation. This rule alone eliminates 80% of device-mismatch bugs.

Done

  • Create tensors with various constructors.
  • Reshape, index, slice.
  • Use element-wise arithmetic, matrix multiplication.
  • Use broadcasting confidently.
  • Move tensors between CPU and GPU.

Next: Linear algebra you actually need →

Comments