Endianness: why the same bytes mean different numbers¶
A 32-bit integer is four bytes in memory. There are only 24 ways to arrange four bytes, but two of them won the historical lottery: little-endian (low byte first) and big-endian (high byte first). Every CPU, file format, and network protocol picks one. When two systems with different choices exchange bytes, the bits look identical and the numbers come out different.
This is the page you need before you debug your first binary file, your first network capture, or your first cross-platform serialization bug.
1. The example, three ways¶
Consider the 32-bit integer 0x12345678. Its four bytes are 12, 34, 56, 78 - "12" is the most-significant byte (it represents 0x12000000) and "78" is the least-significant.
Now write it into memory at address 100. Where does each byte go?
Big-endian ("most-significant byte at the lowest address"):
The byte order in memory matches the order you would write the number on paper. This is also called "network byte order" because TCP/IP picked it in 1981 and never looked back.
Little-endian ("least-significant byte at the lowest address"):
The byte order is reversed compared to how you write the number. This is what every modern x86 and x86-64 CPU uses, and what ARM defaults to on Linux, macOS, and Windows.
Same 32 bits, two orderings. If you hand a little-endian machine the bytes 12 34 56 78 and ask it to read a 32-bit integer, it answers 0x78563412 - which is 2,018,915,346, not 305,419,896. The hardware did not lie; the convention mismatched.
2. Why "endian"? Why two answers?¶
The names come from Jonathan Swift's Gulliver's Travels, where the Lilliputians fight a war over which end of a boiled egg to crack first. The 1980 paper that introduced the terminology used the joke deliberately: the choice is arbitrary, both sides have good arguments, and the fight is unwinnable.
Substantive arguments do exist:
- Big-endian is "natural" for humans reading hex dumps: a 32-bit value at offset 100 reads left-to-right exactly as you would write it. Easier for protocol designers to specify ("the version byte is the first byte"), easier to debug with a hex viewer.
- Little-endian is "natural" for arithmetic: when you do multi-word addition, you start with the low bytes and carry upward, which is exactly the order they sit in memory. Casting a pointer from
uint32*touint16*"just works" on little-endian - the low 16 bits live at the original address. On big-endian the same cast quietly returns the high 16 bits, which is a portability bug waiting to happen.
Neither argument is decisive. The split happened because different CPU vendors made different choices in the 1970s, and the formats and protocols that grew up around each side calcified the decision. Today essentially every general-purpose computer is little-endian (x86, x86-64, ARM-on-Linux), but the network and most binary file formats are big-endian.
3. Where each side wins¶
Little-endian today: all x86 and x86-64 CPUs (Intel, AMD), ARM on Linux/macOS/Windows (in its default mode), RISC-V (in its standard configuration), the Itanium and Alpha CPUs of yesteryear, every laptop and phone you own.
Big-endian today: network byte order (TCP/IP/UDP headers, DNS, TLS records, HTTP/2 binary framing), most multimedia file formats (PNG, JPEG markers, MIDI, AIFF), Java's JVM (its bytecode and serialization), Motorola 68k (vintage), older SPARC, IBM mainframes (z/Architecture), some embedded devices.
Bi-endian (configurable at boot or per-page): ARM, MIPS, PowerPC, SPARC v9. In practice, nearly every deployment of these locks to little-endian; the bi-endian capability is mostly historical.
4. The runtime check¶
How do you tell at runtime which side you are on? Write a 32-bit value, then look at the first byte:
package main
import (
"encoding/binary"
"fmt"
"unsafe"
)
func main() {
var x uint32 = 0x01020304
bytes := (*[4]byte)(unsafe.Pointer(&x))
fmt.Printf("first byte = 0x%02x\n", bytes[0])
if bytes[0] == 0x04 {
fmt.Println("little-endian")
} else {
fmt.Println("big-endian")
}
fmt.Printf("encoding/binary.NativeEndian: %v\n", binary.NativeEndian)
}
On any laptop you own this prints 0x04 and "little-endian." Go's encoding/binary package even exposes binary.NativeEndian (Go 1.21+) as a saved reference to whichever side you are on, so you do not have to write the runtime check yourself.
The C version is the famous one-liner:
If the first byte of 0x00000001 is 01, the low byte sits at the lowest address - little-endian.
5. Network byte order and the conversion functions¶
Because TCP/IP picked big-endian in 1981 and most CPUs are little-endian, every networking program has to byte-swap on the way in and on the way out. The BSD socket API named the functions:
htonl- host-to-network long (32-bit)htons- host-to-network short (16-bit)ntohl/ntohs- the reverse
These are named for an era when "long" meant 32 bits and "short" meant 16. They are no-ops on a big-endian host and byte-swap on little-endian. Every Linux server you have ever touched calls them millions of times per second.
In Go:
import "encoding/binary"
var port uint16 = 8080
buf := make([]byte, 2)
binary.BigEndian.PutUint16(buf, port) // write 0x1F 0x90 (port 8080 in network order)
fmt.Printf("%x %x\n", buf[0], buf[1]) // 1f 90
readBack := binary.BigEndian.Uint16(buf) // 8080 again
encoding/binary gives you BigEndian and LittleEndian as explicit objects with Uint16, Uint32, Uint64, PutUint16, PutUint32, PutUint64 methods. Always use these when writing or parsing wire formats; never assume native endianness when the data is going to disk or the wire.
6. Where it bites you in production¶
6.1 Binary file formats¶
PNG, GIF, JPEG metadata, TIFF, MIDI, FLAC, MP4 (mostly), Java .class files, ELF object files (mostly little-endian on x86, configurable per-target) - reading any of them requires knowing the spec's endianness for each field. The PNG spec, for example, is explicit: "All integers that require more than one byte shall be in network byte order." A naive int(buf[0]) | int(buf[1])<<8 parser will work on *.bmp (which is little-endian) and silently corrupt every PNG it touches.
6.2 Network protocols¶
Every header field in IPv4, IPv6, TCP, UDP, ICMP, TLS, HTTP/2, DNS, NTP is big-endian. If you ever write a packet sniffer or a custom protocol handler and skip the byte-swap, the integers you decode will look reasonable on big-endian hosts and garbage on little-endian. Worse, they will look like "valid but wrong" values (e.g., a port number of 36895 instead of 8080), which is harder to spot than an obvious crash.
6.3 Cross-architecture serialization¶
If you memcpy a struct from a little-endian sender into a big-endian receiver, every multi-byte field comes out reversed. Languages with cross-platform binary serializers (Java's ObjectOutputStream, Protocol Buffers, Cap'n Proto, FlatBuffers, MessagePack) all pick one endianness in the spec and convert at runtime; the failure mode you avoid is exactly this.
Protocol Buffers fixed-width fields are little-endian. Java serialization is big-endian. Cap'n Proto is little-endian. The choice is arbitrary; what matters is the spec is explicit and consistent, so every implementation agrees.
6.4 Database storage formats¶
PostgreSQL stores integers in native endianness in its on-disk format. This means a PostgreSQL data directory created on x86 cannot be copied to a (theoretical) big-endian server and reopened - the integers in every page header would be reversed. The pg_dump utility re-serializes at the SQL level for portability. SQLite, by contrast, picks big-endian for its file format precisely so that databases are portable across CPUs. The trade-off is one byte-swap per integer on every read on x86 - tiny, but explicit.
6.5 The bswap instruction¶
x86 has a dedicated bswap instruction that reverses the bytes of a 32-bit or 64-bit register in one cycle. ARM has REV. Compilers recognize byte-swap idioms ((x << 24) | ((x << 8) & 0xFF0000) | ...) and lower them to a single bswap. Go's encoding/binary calls into these via math/bits.ReverseBytes32 / ReverseBytes64. If you ever benchmark a serialization path and see a single instruction where you wrote eight lines of shifting and masking, that is bswap doing its job.
7. Advanced: subtleties¶
7.1 Endianness only applies to multi-byte values¶
A single byte (uint8, int8, bool, char, an ASCII letter) has no endianness; a byte is a byte. Endianness is purely about how the bytes of a multi-byte value are ordered relative to each other. UTF-8 has no endianness because it is a sequence of single bytes. UTF-16 does have endianness, which is why the spec defines a Byte Order Mark (U+FEFF) at the start of a UTF-16 file: FE FF means big-endian, FF FE means little-endian. (See the character encodings page for the full UTF-16 story.)
7.2 Bit order is separate from byte order¶
Inside a single byte, the bit numbering ("which bit is bit 0?") is a separate convention. Most software treats the least-significant bit as bit 0 (so 0b00000001 has bit 0 set). Many network specifications, including RFC 791 (IPv4), label bits left-to-right with bit 0 at the high end. When you read a packet diagram in an RFC and see "bits 0-3: version," check the RFC's bit-numbering convention - "bit 0" in IPv4 prose is the bit you would call "bit 7" or "bit 31" in code. The Linux kernel's __builtin_bitreverse exists for exactly this reason.
7.3 Mixed-endian (PDP-endian)¶
Some older 32-bit machines used "middle-endian" or "PDP-endian" ordering: bytes within a 16-bit half-word were little-endian, but the two half-words within a 32-bit word were big-endian. So 0x12345678 would be stored as 34 12 78 56. PDP-11s and certain ARM floating-point representations used this. It is essentially extinct in modern hardware; the only place you might see it is reverse-engineering a binary format from a 1970s VAX or DEC system. Mention it so you recognize the term if you ever read about it; do not implement it.
7.4 Endianness of floating-point¶
IEEE 754 floats are stored using the same byte order as integers of the same size on the same machine. So on little-endian x86, the bits of a float32 1.0 (which is 0x3F800000) live in memory as 00 00 80 3F. The byte-swap routines (bswap, binary.BigEndian.PutUint32) work on the bit pattern interpreted as an integer - you do not write a "byte-swap a float" function, you cast to uint32, swap, cast back. Go's math.Float32bits and math.Float32frombits exist for exactly this.
7.5 Endianness in SIMD lanes¶
A SIMD register (SSE, AVX, NEON) holds multiple values laid out in memory. The endianness of each lane is the machine's native endianness, but the lane order within the register is a separate convention defined by each instruction. AVX-512 famously labels "lane 0" as the low 128 bits, growing upward; PowerPC AltiVec did the opposite. This rarely matters for production code, but it absolutely matters if you write intrinsics or read assembly listings.
8. The mental model to keep¶
- Endianness is byte order within a multi-byte value, not bit order and not anything to do with single bytes.
- Two answers won: little-endian (your CPU) and big-endian (the network, most file formats).
- Use
encoding/binary.BigEndian/LittleEndian(or your language's equivalent) for any value going to disk or the wire. Never assume native order. - The conversion is symmetric and cheap (a single
bswapinstruction). The bug is forgetting to do it at all. - A single byte has no endianness. UTF-8 has no endianness. UTF-16 does (and has a BOM to declare it).
The day endianness stops surprising you is the day you read a hex dump and instinctively reverse the bytes in your head when you see a 32-bit field.
9. Further reading¶
- IETF RFC 1700, "Assigned Numbers" - the canonical reference for network byte order.
- Danny Cohen, "On Holy Wars and a Plea for Peace" (1980) - the paper that named the camps.
- The PNG specification, sections on integer encoding - a clean, well-written example of an explicit big-endian binary format.
- Go's
encoding/binarypackage documentation - the cleanest implementation of the read/write helpers in any modern standard library. - Bit operations and Two's complement - both useful prerequisites.