How libavif Extracts EXIF Metadata during Decoding
This article provides a technical overview of how the
libavif library extracts EXIF (Exchangeable Image File
Format) metadata when decoding an AVIF image. It covers the container
parsing process, the location of metadata payloads within the ISO Base
Media File Format (ISOBMFF), the handling of the EXIF header offset, and
how the final data is exposed to the application via the
libavif API.
1. Parsing the ISOBMFF Container
AVIF files are built on the ISO Base Media File Format (ISOBMFF)
standard. During the decoding process, libavif first parses
the file’s container structure to identify the image tracks, item
properties, and metadata.
The library looks specifically for the top-level meta
(Metadata) box. Inside this box, libavif processes the
relationships between the primary image item and its associated metadata
items.
2. Locating the EXIF Item
Within the meta box, libavif reads the
iinf (Item Information) box, which lists all items present
in the file. It searches for an item entry where the
item_type is set to 'Exif'.
To confirm that this EXIF item belongs to the image being decoded,
the library checks the iref (Item Reference) box. It looks
for a reference of type 'cdsc' (content description) that
links the primary image item ID to the EXIF metadata item ID.
3. Retrieving the Raw Payload Offset
Once the EXIF item ID is identified, libavif queries the
iloc (Item Location) box. The iloc box
contains the structural blueprint of the file’s payload, detailing the
exact byte offsets and lengths of construction construction-blocks
(extents) for each item. Using this information, the parser locates and
extracts the raw bytes of the EXIF payload from the media file.
4. Processing the EXIF Header Offset
According to the HEIF/AVIF specifications, the EXIF payload stored in
the container does not start immediately with the TIFF header. Instead,
it begins with a preamble: * The 4-Byte Offset: The
first 4 bytes of the payload represent a 32-bit big-endian unsigned
integer. This integer indicates the offset (in bytes) from the end of
this 4-byte marker to the start of the actual TIFF header. *
Skipping the Offset: libavif reads this
4-byte integer, validates that the payload is large enough to
accommodate the offset, and skips the designated number of bytes (often
0 or a short series of padding bytes) to arrive at the beginning of the
TIFF data (marked by standard byte order markers like II*
or MM*).
5. Exposing the Metadata to the API
After stripping the container-specific offset, the clean, raw EXIF
data block is allocated and copied into the destination structure. In
libavif, this data is stored in the avifImage
struct under the exif member, which contains: *
exif.data: A pointer to the raw TIFF EXIF byte buffer. *
exif.size: The size of the EXIF buffer in bytes.
The calling application can then access this buffer directly to parse
the metadata using external libraries like libexif or to
write it into a newly encoded image.