Saltar a contenido

Week 15 - Threads, Processes, Subinterpreters, concurrent.futures

15.1 Conceptual Core

  • concurrent.futures is the unified high-level API. ThreadPoolExecutor for I/O or GIL-releasing C; ProcessPoolExecutor for pure-Python CPU.
  • multiprocessing start methods: fork (fast, dangerous with threads/locks/CUDA), spawn (safe default on macOS/Windows, slower), forkserver (good middle ground on Linux).
  • Pickle is the IPC currency for multiprocessing. Things that can't be pickled (lambdas, locally defined classes, open file handles) cannot cross the boundary. cloudpickle is the third-party escape hatch.
  • Subinterpreters (PEP 684/734, 3.13+): each interpreter has its own GIL, its own modules, its own sys. Communication via interpreters.Queue or shared memory. Lighter than processes, heavier than threads.

15.2 Mechanical Detail

  • multiprocessing.shared_memory.SharedMemory (3.8+): zero-copy buffers across processes. Pair with numpy.ndarray(buffer=shm.buf) for big-array IPC.
  • multiprocessing.Manager: proxy objects for list, dict, etc. Convenient but slow - every op is an IPC.
  • os.fork() directly is rarely correct in modern Python; use multiprocessing or subprocess.
  • The free-threaded build (PEP 703): with python3.13t, ThreadPoolExecutor becomes a true parallel CPU executor for pure-Python code. The future-state replacement for many ProcessPoolExecutor use cases.

15.3 Lab - "Pick Your Parallelism"

For each workload, pick a model and justify: 1. Compress 10k JPEGs in parallel. 2. Run 10k HTTP requests against an external API (rate-limited). 3. Compute SHA-256 of 10k 1MB blobs. 4. Train 10 small models concurrently sharing a GPU.

Implement at least two of them three ways: threads, processes, asyncio. Bench. Write up the right answer.

15.4 Idiomatic & Linter Drill

  • Add ruff S (security, bandit-style). Catch subprocess.run(..., shell=True) and the unpickling of untrusted input.

15.5 Production Hardening Slice

  • Add a deadlock-detection probe: a watchdog thread that dumps py-spy if the main loop hasn't ticked in 30s. Ship it as part of the hardening template.

Comments