Skip to content
files.co

PDF GLOSSARY

PDF glossary: terms and formats

What every PDF term and format actually means, in plain language. The jargon you run into, explained.

Formats

Concepts

OCR

OCR (Optical Character Recognition) turns a picture of text into real, selectable text. A scanned page or a photo of a document is, to a computer, just a grid of pixels: you cannot search it, copy from it, or have a screen reader read it. OCR analyzes those pixels, recognizes the letters and words, and produces an actual text layer.

AcroForm

AcroForm is the native, built-in form system in PDF. When a PDF has fillable fields, text boxes, checkboxes, radio buttons, dropdowns, you fill in without any special software, those are almost always AcroForm fields. It is the original PDF form technology and the one supported pretty much everywhere.

XFA

XFA (XML Forms Architecture) is an alternative form technology Adobe layered on top of PDF, where the form is described in XML rather than as native PDF objects. It was built for complex, dynamic forms, the kind that grow extra rows, recalculate totals, or change layout based on what you enter.

Metadata

Metadata is the information about a PDF that is not part of the visible page: title, author, subject, keywords, the software that created it, and the dates it was made or last changed. It travels inside the file alongside the content, even though you never see it on the page.

Compression

Compression makes a PDF smaller without changing what it looks like, or at least without changing it noticeably. PDFs balloon in size mainly because of images: a report full of high-resolution scans or photos can be tens of megabytes, too big to email and slow to load.

Embedded fonts

Embedded fonts are typefaces packaged inside the PDF itself, rather than relied upon from whatever computer happens to open the file. This is the mechanism behind one of PDF's core promises: the document looks the same everywhere.

Text layer

The text layer is the part of a PDF that holds real, selectable characters, as opposed to a picture of text. When you can click and drag to highlight words, search for a phrase, or copy a sentence out of a PDF, you are interacting with its text layer.

Watermark

A watermark is text or an image laid over the pages of a PDF, usually faint and repeated, to mark the document's status or ownership. Think of stamps like DRAFT, CONFIDENTIAL, a company logo, or a copyright notice sitting behind or above the content.

Linearization (Fast Web View)

Linearization, also called Fast Web View, is a way of reorganizing a PDF so it can start displaying before the whole file has downloaded. A normal PDF often needs to load completely before the first page appears; a linearized one is structured so page one shows up almost immediately.

Security

Images

Vector graphic

A vector graphic describes an image as shapes, lines, curves, and fills defined by math, rather than as a grid of pixels. Because it is math, a vector can scale to any size and stay perfectly sharp: a logo drawn as vectors looks crisp on a business card and on a billboard.

Raster image

A raster image is made of pixels, a fixed grid of tiny colored dots. Photographs, scans, and screenshots are all raster: zoom in far enough and you see the individual squares. This is the opposite approach to vector graphics, which are defined by math and scale infinitely.

JPG / JPEG

JPG (also written JPEG, for Joint Photographic Experts Group) is the most common format for photographs and one of the most common image types embedded in PDFs. It uses lossy compression, throwing away detail the eye is least likely to notice, to make photo files dramatically smaller.

PNG

PNG (Portable Network Graphics) is a lossless image format, meaning it compresses without throwing any detail away. What goes in comes back out pixel-for-pixel, which makes it ideal for screenshots, logos, icons, charts, and anything with sharp edges or flat areas of color.

WebP

WebP is a modern image format from Google that aims to replace both JPG and PNG by doing what each does well. It supports lossy compression for photos and lossless compression with transparency for graphics, and at comparable quality it usually produces noticeably smaller files than the older formats.

TIFF

TIFF (Tagged Image File Format) is a high-quality raster format long favored for scanning, archiving, and professional imaging. It can store images losslessly at full fidelity and supports features that matter in document workflows, like multiple pages in a single file and color profiles for accurate print.

SVG

SVG (Scalable Vector Graphics) is an open, web-native vector format that describes images as shapes and paths in XML text. Like any vector graphic, it scales to any size without losing sharpness, so an SVG logo or icon stays crisp at every resolution, from a tiny favicon to a full-page header.

DPI / PPI

DPI (dots per inch), sometimes called PPI for pixels per inch, measures how densely packed an image's detail is, how many dots fit into each inch when it is printed or scanned. It is the number that decides whether an image looks sharp or soft at a given size.