How libavif Handles HEIF Spatial Grid Images
This article explores how the open-source library
libavif processes spatial grid images as defined by the
High Efficiency Image File Format (HEIF) specification. It covers the
mechanisms libavif uses to decode grid-based image items
into a single seamless image, how it encodes large images into tiled
grids to optimize performance, and how it ensures compliance with the
underlying ISO/IEC 23008-12 standard.
Understanding Spatial Grids in HEIF
The HEIF specification allows an image to be represented as a spatial grid of smaller, independently encoded image items (tiles). Rather than encoding a massive image as a single, resource-heavy bitstream, the image is divided into a 2D grid of rows and columns.
In the HEIF container structure, this is represented by a specific
item type called gird (Grid Image Item). The grid item
itself does not contain compressed pixel data; instead, it contains
metadata defining the output width and height, the number of rows and
columns, and references to the individual input image items that make up
the tiles.
How libavif Decodes Grid Images
When libavif parses an AVIF file (which uses the HEIF
container structure), it handles spatial grids through a structured,
multi-step pipeline:
- Parser Identification: The
libavifdemuxer parses the ISOBMFF (ISO Base Media File Format) box structure. If the primary item is of typegird, the library identifies it as a spatial grid image rather than a single coded image. - Metadata Validation: The library extracts the grid
configuration from the
girditem payload. It validates critical parameters, including the number of rows and columns, the target output dimensions, and the references to the individual tile items.libavifchecks that the tiles are laid out correctly and that their combined dimensions match or exceed the declared canvas size (handling any necessary cropping for edge tiles). - Tile Decoding:
libavifutilizes its configured AV1 decoder backend (such asdav1doraom) to decode the individual tiles. Depending on the library configuration and the decoder’s capabilities, these tiles can be decoded sequentially or in parallel. Parallel decoding of grid tiles significantly improves performance on multi-core processors. - Reassembly: Once decoded, the individual pixel buffers of the tiles are stitched together in memory into a single, continuous frame buffer representing the final image. Any padding on the rightmost or bottommost tiles is cropped out according to the dimensions specified in the grid metadata.
How libavif Encodes Grid Images
libavif also supports creating grid images. This is
particularly useful for very large images (such as 8K resolutions or
higher) where decoding a single massive AV1 frame would exceed the
hardware limitations or memory profiles of mobile and low-power
devices.
During encoding, the user or application can specify grid parameters
(specifically, the number of columns and rows) via the API or
command-line tools like avifenc.
- Image Splitting:
libaviftakes the input image and divides it into the requested number of horizontal and vertical tiles. - Parallel Encoding: Each tile is encoded independently as an AV1 image item. This process can be heavily threaded, allowing different CPU cores to compress different parts of the image simultaneously, resulting in faster encoding speeds.
- Container Assembly: The encoder writes the
individual AV1 bitstreams into the AVIF container as standard image
items. It then constructs the
girditem, linking the tiles together and writing the necessary spatial relationship metadata so that any compliant HEIF/AVIF reader can reconstruct the image.