pynaviz.audiovideo.audio_handling#

Classes

AudioHandler(audio_path[, stream_index, time])

Handler for reading and decoding audio frames from a file.

class pynaviz.audiovideo.audio_handling.AudioHandler(audio_path, stream_index=0, time=None)[source]#

Bases: BaseAudioVideo

Handler for reading and decoding audio frames from a file.

This class uses PyAV to access audio frames from a given file. It allows querying time-aligned audio samples between two timepoints, and provides audio shape and total length information.

Parameters:
  • audio_path (str | pathlib.Path) – Path to the audio file.

  • stream_index (int) – Index of the audio stream to decode (default is 0).

  • time (Optional[NDArray]) – Optional 1D time axis to associate with the samples. Must match the number of sample points in the audio file.

Raises:

ValueError – If the provided time axis is not 1D or does not match the number of sample points in the audio file.

Examples

>>> from pynaviz.audiovideo import AudioHandler
>>> ah = AudioHandler("example.mp3")
>>> # Get audio samples between 1.5 and 2.5 seconds.
>>> audio_trace = ah.get(1.5, 2.5)
>>> # Shape: (n_samples, n_channels)
>>> audio_trace.shape
(44100, 2)
get(start, end)[source]#

Extract decoded frames from a video between two timestamps.

This method decodes and returns the raw video frames corresponding to the time interval [start, end] in seconds. Decoding begins from the nearest keyframe at or before start, and proceeds sequentially until the end timestamp is reached or exceeded. If the last decoded frame extends beyond end, trailing samples are truncated so that the returned array aligns with the requested time range.

Parameters:
  • start (float) – Start time of the segment to extract, in seconds.

  • end (float) – End time of the segment to extract, in seconds. Must be greater than start.

Returns:

A 2D NumPy array containing the decoded frames for the requested interval. The exact shape depends on the video format and frame size, with the first dimension corresponding to time (frame index or samples) and the remaining dimensions containing audio channels.

Return type:

numpy.typing.NDArray

Notes

  • The returned frames are decoded in sequence and concatenated before being transposed so that time is the first dimension.

  • If end falls between two frames, the last frame is partially trimmed to match the requested duration.

See also

av.AudioFrame <https://pyav.org/docs/stable/api/audio.html#module-av.audio.frame>

The PyAV frame object.

property index: numpy.typing.NDArray#

Time axis corresponding to the audio samples.

Returns:

Array of timestamps with shape (num_samples,).

property shape: Tuple[int, int]#

Shape of the audio data.

Returns:

Tuple (num_samples, num_channels) describing the audio shape.

property t: numpy.typing.NDArray#

Time axis corresponding to the audio samples.

Returns:

Array of timestamps with shape (num_samples,).

property tot_length: float#

Total duration of the audio in seconds.

Returns:

Total duration of the audio stream.