How libavif Handles HEIF Spatial Grid Images

This article explores how the open-source library libavif processes spatial grid images as defined by the High Efficiency Image File Format (HEIF) specification. It covers the mechanisms libavif uses to decode grid-based image items into a single seamless image, how it encodes large images into tiled grids to optimize performance, and how it ensures compliance with the underlying ISO/IEC 23008-12 standard.

Understanding Spatial Grids in HEIF

The HEIF specification allows an image to be represented as a spatial grid of smaller, independently encoded image items (tiles). Rather than encoding a massive image as a single, resource-heavy bitstream, the image is divided into a 2D grid of rows and columns.

In the HEIF container structure, this is represented by a specific item type called gird (Grid Image Item). The grid item itself does not contain compressed pixel data; instead, it contains metadata defining the output width and height, the number of rows and columns, and references to the individual input image items that make up the tiles.

How libavif Decodes Grid Images

When libavif parses an AVIF file (which uses the HEIF container structure), it handles spatial grids through a structured, multi-step pipeline:

Parser Identification: The libavif demuxer parses the ISOBMFF (ISO Base Media File Format) box structure. If the primary item is of type gird, the library identifies it as a spatial grid image rather than a single coded image.
Metadata Validation: The library extracts the grid configuration from the gird item payload. It validates critical parameters, including the number of rows and columns, the target output dimensions, and the references to the individual tile items. libavif checks that the tiles are laid out correctly and that their combined dimensions match or exceed the declared canvas size (handling any necessary cropping for edge tiles).
Tile Decoding: libavif utilizes its configured AV1 decoder backend (such as dav1d or aom) to decode the individual tiles. Depending on the library configuration and the decoder’s capabilities, these tiles can be decoded sequentially or in parallel. Parallel decoding of grid tiles significantly improves performance on multi-core processors.
Reassembly: Once decoded, the individual pixel buffers of the tiles are stitched together in memory into a single, continuous frame buffer representing the final image. Any padding on the rightmost or bottommost tiles is cropped out according to the dimensions specified in the grid metadata.

How libavif Encodes Grid Images

libavif also supports creating grid images. This is particularly useful for very large images (such as 8K resolutions or higher) where decoding a single massive AV1 frame would exceed the hardware limitations or memory profiles of mobile and low-power devices.

During encoding, the user or application can specify grid parameters (specifically, the number of columns and rows) via the API or command-line tools like avifenc.

Image Splitting: libavif takes the input image and divides it into the requested number of horizontal and vertical tiles.
Parallel Encoding: Each tile is encoded independently as an AV1 image item. This process can be heavily threaded, allowing different CPU cores to compress different parts of the image simultaneously, resulting in faster encoding speeds.
Container Assembly: The encoder writes the individual AV1 bitstreams into the AVIF container as standard image items. It then constructs the gird item, linking the tiles together and writing the necessary spatial relationship metadata so that any compliant HEIF/AVIF reader can reconstruct the image.