Handling Multiple Image Items in libavif

This article explains how the libavif library processes AVIF files containing multiple independent image items. It details how the decoder identifies the primary image, how it distinguishes between image sequences and static collections, and what happens to secondary or alternate images during the decoding process.

In the AVIF specification—which is built upon the High Efficiency Image File Format (HEIF) and the ISO Base Media File Format (ISOBMFF)—a single file can package multiple independent image items. These items can represent alternate resolutions, different crops, or entirely distinct images grouped together.

Primary Item Identification

When libavif parses an AVIF file containing multiple independent items, its default behavior is guided by the pitm (Primary Item) box in the file’s metadata.

The Default Action: When calling avifDecoderParse(), libavif scans the container and locates the designated primary item ID.
Decoding: The standard decoding pipeline targets this primary image. Any subsequent call to avifDecoderNextImage() decodes this primary item, along with any of its associated auxiliary items, such as an alpha channel (transparency) or a depth map.

Image Sequences vs. Image Collections

libavif distinguishes between multiple images based on how they are structured within the container:

Image Sequences (Animations): If the multiple images are stored as a sequence (using a track or trak box), libavif treats them as frames of an animation. The developer can iterate through each frame sequentially using avifDecoderNextImage(), retrieving the duration and image data for each frame.
Image Collections (Static Alternates): If the images are stored as independent, static items (an image collection) rather than a timed sequence, libavif does not automatically loop or cycle through them. By default, it only exposes and decodes the primary item.

Accessing Non-Primary Independent Items

If an AVIF file contains multiple independent static images and you need to access an item other than the primary one, the high-level libavif decoder API has specific design limitations:

Targeted Use Case: libavif is primarily optimized for the “primary image + auxiliary channels” or “animated image sequence” use cases.
API Exposure: The high-level API does not provide a direct, simple array interface to iterate and decode unrelated, independent static images.
Alternative Approaches: To extract independent, non-primary static images (such as a secondary image or an unlinked thumbnail), developers must either use a broader container-parsing library (such as libheif) or use lower-level container-parsing functions to identify the item IDs and manually direct the decoder to those specific item IDs.