Riiven Threads
JPEG
You've never seen a JPEG. You've seen what five sciences agreed to let your eyes notice.
Every photo on this screen has lost about 90% of its original data. You will never notice, and that fact is deliberate. The decision about what to delete and what to keep was made by five sciences, each minding its own problem: a 1931 color chart, a 1974 math trick, a 1968 vision experiment, a 1960 engineering paper, and a 1952 packing algorithm. JPEG is not a compression format. It is a working model of your eye, written into a file by people working decades and disciplines apart.
- 64coefficients
- Only 4 to 8 of them matter after the DCT sort; the rest collapse.
- 50%
- About half of all DCT data is detail your eyes cannot resolve anyway.
- 2:1
- Maximum compression without quantization; the real gains live entirely there.
- 30%
- Extra file weight every JPEG would carry if Huffman packing were skipped.
When the fields matured
Each field had to produce a specific result before JPEG could exist as you know it. The timeline below shows when each one arrived.
Pull any thread, and the same story unravels.
Sorted by maturation year, from the oldest foundation to the newest refinement.
Keystone
The math that sorts every patch of sky
Most of a photo is boring. A 1974 math trick can prove it, block by block.
Sort an 8×8 patch of pixels into 'smooth parts' (sky, skin) and 'busy parts' (eyelashes, leaves). Most photos are 90% smooth. The DCT does this sort in microseconds, and once a patch is sorted, the boring parts collapse to almost nothing. JPEG's 10-to-1 size cut starts here.
Without this field
Without the DCT, JPEG has no way to separate perceptually important information (low-frequency structure) from unimportant detail (high-frequency noise). Lossy compression of raw pixel blocks at 10:1 produces visible noise from the first bit discarded.
After DCT, typical photo blocks have only 4 to 8 significant coefficients out of 64. That is a 10x data reduction before any quantization.
How we know
The 8×8 DCT transforms a block of spatial pixel values into 64 frequency coefficients. Natural images concentrate energy in low frequencies, so after DCT most coefficients become small and cheap to encode. JPEG's 10:1 compression depends entirely on this energy compaction.
Source: Discrete Cosine Transform (1974) · tier1
Sorting the data is only useful if you know which sorted parts the eye actually needs.
Striped patterns that mapped where vision goes blind
Two scientists in 1968 measured exactly what your eyes are blind to. JPEG throws away that, and only that.
Campbell and Robson asked people to look at striped patterns until the stripes blurred together, then mapped, in numbers, the resolution your eyes can and cannot see. JPEG keeps the data your eyes can resolve and quietly deletes the data they cannot. That deletion is the loss in 'lossy' compression. The file does not shrink arbitrarily. It shrinks exactly where vision is blind.
Without this field
Without the contrast sensitivity function, JPEG has no principled way to decide which DCT coefficients to keep. Quantization without HVS data discards luminance information indiscriminately, producing visible blur rather than imperceptible loss at the same compression ratio.
About half the data inside a JPEG is the part your eyes can't resolve. Quantization deletes exactly that.
How we know
Campbell and Robson (1968) measured the contrast sensitivity function: human eyes respond sharply at 2 to 4 cycles per degree and fall off rapidly above that. JPEG's quantization matrix discards high-frequency DCT coefficients precisely because the eye cannot resolve them.
Source: Application of Fourier analysis to the visibility of gratings (1968) · tier1
Knowing what the eye misses tells you where to delete aggressively and where to go gently.
Rounding numbers where your eyes will never notice
This is the step where JPEG actually deletes the parts of your photo it decided you wouldn't miss.
Quantization is the only place a JPEG truly loses information. Every other step rearranges bits; this step rounds them: aggressively where the eye is blind, gently where the eye is sharp. Without it, files shrink at most 2×. With it, they shrink 10× to 50× with no visible loss.
Without this field
Without quantization, JPEG's compression ratio is fundamentally limited to ~2:1 (the DCT's energy compaction without bit reduction). Everything beyond that (the 10:1 to 50:1 ratios consumers actually use) comes from quantization discarding coefficient precision.
Lossless JPEG: ~2:1 compression. With quantization: 10:1 to 50:1.
How we know
Quantization is JPEG's only lossy step: divide each DCT coefficient by a quantizer, then round to integer. Max (1960) proved how to minimize expected distortion for a given number of levels. JPEG's standard quantization matrices are hand-tuned versions of Max's result, using HVS data: aggressive for high frequencies, gentle for low.
Source: Quantizing for Minimum Distortion (1960) · tier1
Once detail is selectively discarded, brightness and color still need to be split before packing.
A coloring-book layer your eyes trust completely
A photo's brightness matters more to your eyes than its color. JPEG cuts the file in half before doing anything else, just by knowing this.
Imagine a black-and-white photo with a thin coloring-book layer on top. JPEG splits every photo into exactly that: a sharp brightness layer (Y) and a softer color layer (Cb, Cr). It keeps the brightness at full resolution and halves the color resolution. Your eyes do not notice. The file is 50% smaller before any 'compression' has happened.
Without this field
Without perceptual color spaces, JPEG would compress in RGB, treating all three channels as equally important. Compression artifacts would manifest as colorband shifts rather than luminance noise, destroying image structure at modest compression ratios.
Splitting brightness from color and halving the color half: 50% smaller file, eye notices nothing.
How we know
CIE 1931 quantified how wavelengths of light map to perceived color, defining the XYZ color space and the trichromatic matching functions. JPEG uses a derived space, YCbCr, which separates luminance (Y) from chrominance (Cb, Cr). This separation enables chroma subsampling: keep Y at full resolution, halve the resolution of color. File shrinks 50% before anything else happens.
Source: CIE 1931 2° Standard Observer (1931) · tier1
With brightness separated and color halved, whatever remains still needs to be packed as tightly as possible.
Short codes for common patterns, long ones for rare
After all the deletion, what's left needs to be packed. A 1952 algorithm packs it almost perfectly.
Common patterns get short codes; rare patterns get long codes. Huffman's algorithm does this packing optimally, within a fraction of a bit of the smallest size mathematically possible for the data that's left. Skip this step and JPEGs would be roughly 30% larger for no visible benefit.
Without this field
Without variable-length entropy codes, JPEG would fall back to fixed-length encoding of the quantized coefficient stream, wasting 25 to 30% of file size. Huffman is what makes the final compression step nearly optimal.
Without smart packing of what's left, every JPEG would be ~30% larger for no visible reason.
How we know
Huffman's 1952 algorithm produces optimal prefix codes for any probability distribution: shorter bit sequences for common symbols, longer for rare ones. JPEG uses Huffman tables tuned to typical quantized-DCT-coefficient statistics, compressing the final stage to within a fraction of a bit of Shannon's theoretical minimum.
Source: A Method for the Construction of Minimum-Redundancy Codes (1952) · tier1
Watch
A visual companion to the fields above.
JPEG DCT Explained
ComputerphileEvery image on every site you have ever loaded was filtered through a five-stage model of your own perception, built by people working decades and disciplines apart. A color committee from before television. A vision experiment from before the moon landing. A math paper from the year of the Watergate break-in. None of them woke up wanting to compress your beach photo. The most powerful engineering you encounter every day is the engineering you cannot see, by design. The lesson isn't that JPEG is clever. It's that the things you 'just see' on the internet are decisions made about your eyes, on your behalf, by people who are mostly dead now.
References
- Discrete Cosine Transform (1974) tier1
Ahmed, Natarajan & Rao, IEEE Transactions on Computers vol. C-23 (1974). The paper that introduced the DCT. Every JPEG encoder still uses this specific transform.
- Application of Fourier analysis to the visibility of gratings (1968) tier1
Campbell & Robson, Journal of Physiology vol. 197 (1968). Established that human visual sensitivity drops sharply above 2 to 4 cycles per degree, which is exactly what JPEG exploits.
- Quantizing for Minimum Distortion (1960) tier1
Joel Max, IRE Transactions on Information Theory vol. IT-6 (1960). Established the optimal-quantizer design that JPEG's quantization matrices approximate.
- CIE 1931 2° Standard Observer (1931) tier1
Proceedings of the Commission Internationale de l'Éclairage, 1931. The color-matching functions that quantify human trichromatic perception. Every color space since rests on this foundation.
- A Method for the Construction of Minimum-Redundancy Codes (1952) tier1
David A. Huffman, Proceedings of the IRE vol. 40 (1952). The algorithm every JPEG encoder still uses for its final entropy-coding stage.