Skip to content

02 - Tensors

What this session is

About 45 minutes. Tensors are PyTorch's central data type - multi-dimensional arrays, with support for GPU acceleration and automatic differentiation. Almost every line of PyTorch code touches tensors.

What a tensor is

A tensor is a generalized array:

  • A 0-dimensional tensor is a single number (scalar): 7.
  • A 1-D tensor is a vector: [1, 2, 3].
  • A 2-D tensor is a matrix: [[1, 2], [3, 4]].
  • A 3-D tensor is a cube of numbers. (Often used for color images: [height, width, channels].)
  • Higher: a batch of images, a batch of token sequences, etc.

Every tensor has a shape (its size per dimension) and a dtype (the type of each element).

Creating tensors

import torch

# From a Python list
a = torch.tensor([1, 2, 3])
print(a.shape, a.dtype)            # torch.Size([3]) torch.int64

# As floats
b = torch.tensor([1.0, 2.0, 3.0])
print(b.shape, b.dtype)            # torch.Size([3]) torch.float32

# Zeros, ones, random
z = torch.zeros(2, 3)              # 2x3 of zeros
o = torch.ones(2, 3)
r = torch.randn(2, 3)              # random normal (mean=0, std=1)
u = torch.rand(2, 3)               # random uniform [0, 1)
i = torch.arange(0, 10)            # 0, 1, 2, ..., 9

# An identity matrix
I = torch.eye(4)

Default float dtype is float32. Default int dtype is int64. You can specify:

x = torch.zeros(2, 3, dtype=torch.float16)
y = torch.tensor([1, 2, 3], dtype=torch.int32)

Shape and reshape

a = torch.arange(12)
print(a.shape)                     # torch.Size([12])

b = a.reshape(3, 4)                # 3x4 matrix
print(b.shape)                     # torch.Size([3, 4])

c = a.reshape(2, 2, 3)             # 2x2x3 tensor
print(c.shape)                     # torch.Size([2, 2, 3])

reshape(...) doesn't copy data when it can avoid it - it just changes the "view" on the underlying buffer.

The -1 placeholder means "infer this dimension":

a = torch.arange(12)
b = a.reshape(-1, 4)               # 3 rows of 4 (12/4)
c = a.reshape(2, -1)               # 2 rows of 6 (12/2)

Useful in functions where you know all but one dimension.

Indexing

Like NumPy, like Python lists, but extended:

m = torch.arange(12).reshape(3, 4)
# m is:
# tensor([[ 0,  1,  2,  3],
#         [ 4,  5,  6,  7],
#         [ 8,  9, 10, 11]])

m[0]              # first row: tensor([0, 1, 2, 3])
m[0, 0]           # first element: tensor(0)
m[:, 0]           # first column: tensor([0, 4, 8])
m[1:, 2:]         # rows 1+, cols 2+: tensor([[6, 7], [10, 11]])
m[0:2, 0:2]       # 2x2 top-left

Slicing returns a view (shares storage). Modifying the slice modifies the original. Use .clone() if you need an independent copy.

Arithmetic

Element-wise:

a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])
print(a + b)               # [5, 7, 9]
print(a * b)               # [4, 10, 18]    element-wise multiply
print(a ** 2)              # [1, 4, 9]
print(torch.exp(a))        # [e^1, e^2, e^3]
print(torch.sin(a))

Reductions (collapse a dimension):

m = torch.randn(3, 4)
m.sum()                    # scalar
m.sum(dim=0)               # column sums (4 values)
m.sum(dim=1)               # row sums (3 values)
m.mean()
m.max()
m.argmax()                 # index of maximum

dim= is the dimension to reduce over. dim=0 collapses the rows; dim=1 collapses the columns. Confusing the first time; you'll internalize it.

Matrix multiplication

The most-used operation in ML. It is not the same as element-wise *.

A = torch.randn(2, 3)      # 2x3
B = torch.randn(3, 4)      # 3x4
C = A @ B                  # 2x4 - matrix multiply
# or: torch.matmul(A, B)

The @ operator is matrix multiply. Two requirements: - Inner dimensions match: (2, 3) @ (3, 4) works because both have 3 in the middle. - Result is the outer dimensions: (2, 3) @ (3, 4) → (2, 4).

If they don't match, you get an error. Get the dimensions right first; everything else follows.

Broadcasting

When you operate on tensors of different shapes, PyTorch tries to make them match by broadcasting the smaller one along the matching dimensions:

a = torch.tensor([[1, 2, 3], [4, 5, 6]])     # shape (2, 3)
b = torch.tensor([10, 20, 30])                # shape (3,)

print(a + b)
# tensor([[11, 22, 33],
#         [14, 25, 36]])

b was broadcast across the rows of a. Equivalent to adding [10, 20, 30] to each row.

The rules are precise but the intuition is "align from the right; missing dimensions are filled in by repeating":

a.shape = (2, 3)
b.shape = (3,)        becomes (1, 3) then broadcast  (2, 3)

When in doubt, print shapes. Most "shape mismatch" errors come from this; once you see the shapes, the fix is usually obvious.

Move tensors to GPU

device = "cuda" if torch.cuda.is_available() else ("mps" if torch.backends.mps.is_available() else "cpu")
print("using:", device)

a = torch.randn(1000, 1000).to(device)
b = torch.randn(1000, 1000).to(device)
c = a @ b           # runs on GPU if device is cuda/mps

Tensors and operations have to be on the same device. Mixing CPU and GPU tensors raises an error.

Common idiom: define device once at the top of the script; .to(device) every tensor you create.

NumPy interop

PyTorch tensors and NumPy arrays interoperate:

import numpy as np

n = np.array([1, 2, 3])
t = torch.from_numpy(n)            # tensor sharing memory with the array
back = t.numpy()                   # numpy array sharing memory with the tensor

If the tensor is on CPU, this is free (no copy). On GPU, you have to .cpu() first.

NumPy is the older sibling - PyTorch borrows most of its API conventions from NumPy. If you've used NumPy, PyTorch tensors will feel familiar.

Exercise

In a new script tensor_practice.py:

  1. Create a 5×3 tensor of random normal values. Print its shape and mean.

  2. Create the same tensor and add 1.0 to every element. (Hint: just tensor + 1.)

  3. Create a 3×3 identity matrix; create another 3×3 matrix with torch.arange(9).reshape(3, 3).float(). Multiply them with @. Result?

  4. Create a = torch.arange(20).reshape(4, 5). Get the third row. Get the second column. Get the bottom-right 2×2 submatrix.

  5. Broadcasting: create a = torch.zeros(3, 4) and b = torch.tensor([1, 2, 3, 4]). Compute a + b. What shape? What values?

  6. GPU (if available): create two 1000×1000 random matrices. Time how long a @ b takes on CPU vs your device. Use time.time() around the multiplications.

What you might wonder

"Why are tensors not just NumPy arrays?" PyTorch tensors add: GPU support, automatic differentiation (page 04), automatic device placement, and a richer API for ML-specific operations. They're NumPy++.

"What's float32 vs float16 vs bfloat16?" Number formats with different precision/memory trade-offs. float32 (FP32) is the default - 4 bytes per number, lots of precision. float16 and bfloat16 are half-precision (2 bytes); used heavily in training large models for memory savings. Modern GPUs (Volta+) have tensor cores that specifically accelerate these.

"Why both reshape and view?" view requires the data to be contiguous in memory. reshape may copy if needed. Prefer reshape; reach for view only when you've measured it matters.

"My tensors are on different devices and I'm confused." Set a device = ... constant at the top of your script. Always .to(device) after creation. This rule alone eliminates 80% of device-mismatch bugs.

Done

  • Create tensors with various constructors.
  • Reshape, index, slice.
  • Use element-wise arithmetic, matrix multiplication.
  • Use broadcasting confidently.
  • Move tensors between CPU and GPU.

Next: Linear algebra you actually need →

Comments