The v1 format

Overview

The v1 format is the simplest way to represent a timeline. It supports cutting and changing speed from 0.0 to 99999.0 inclusive. v1 is a stable format. Developers are welcome to use it to make cuts for auto-editor and to use it for their own programs.

You can generate a v1 timeline file with auto-editor example.mp4 --export timeline:api=1 and it would look something like this:

{
  "version": "1",
  "source": "example.mp4",
  "chunks": [
    [0, 26, 1.0],
    [26, 34, 99999.0],
    [34, 396, 1.0],
    [396, 410, 99999.0],
    ...
  ],
}

v1 is a subset of JSON. ... is used to show that a variable amount of elements are allowed.

Auto-Editor can use the v1 format as input:

auto-editor input.json -o output.mkv

Limitations

Only a single file (source) is allowed. Additionally, v1 only supports "linear" timelines. That means sections further in the media cannot be put ahead in the timeline than sections before.

The Spec

There are only three keys that are required: "version", "source", and "chunks". If there are more keys present in the JSON, the parser should ignore them.

shown using TypeScript notation, the keys can be set to the following values.

interface v1 {
  version: "1";    // Must always be set as "1".

  source: string;  // Path to a media file. The path can be relative or absolute,
                   // but must be valid for the given platform.

  chunks: Chunk[]; // We'll cover this in the next section.
}

Chunks

Each Chunk element has 3 parts:

start (inclusive) and end (exclusive) represent a time range: selecting a segment from the original source.

start and end may be represented by floating point numbers but using BigIntegers is preferred. There is no hard limit how big start and end can be.

The speed 1.0 means to play the media at its normal rate. The speeds 99999.0 and 0.0 always mean cut a section off, don't include it.

It is valid for chunks to be an empty array. The first chunk must start with 0. All other chunks must have their start set be the preceding end's value (there can be no gaps).

The Implicit Timebase

start and end are in the timebase unit. Timebase determines how much actual time a length occurs. To determine the timebase, divide 1 by the average framerate of the source.

For example, if suppose input.mp4 has a framerate of 30/1, then 1/30 is the timebase. A chunk of [0, 1, 1.0] would then have a length of 1/30 of a second.

Examples in Implementation, Misc.

Here is a way to implement the v1 format as an interface.

in TypeScript:

interface v1 {
  version: "1";
  source: string;
  chunks: Array<[number, number, number]>;
}

in Python:

from dataclasses import dataclass
from typing import Literal

@dataclass
class v1:
    version: Literal["1"]
    source: str
    chunks: list[tuple[int, int, float]]

Here's a way to write a chunk predicate function in Palet:

; in Palet, `list?` is analogous to Python's tuple.
(define/c (chunk? [obj list?] -> bool?)
  (if (= obj.len 3)
    (let ([start (obj 0)] [end (obj 1)] [speed (obj 2)])
      ; Returns true only if all our conditions are true.
      (and
        start.nat? end.nat? (> end start)
        speed.float? (>= speed 0.0) (<= speed 99999.0)
      )
    )
    #f  ; Return false
  )
)