The v2 format

Overview

The v2 format is a linear timeline format, like v1, but instead of a single speed value per section it lets each section reference a group of actions (speed, volume, zoom, etc.). It also records an explicit timebase, so it does not need to probe the source for its framerate.

Like v1, v2 only supports a single source and a "linear" timeline: sections must be laid down in order, and sections later in the source cannot be placed ahead of earlier ones.

You can generate a v2 timeline file with auto-editor example.mp4 --export v2 and it would look something like this:

{
  "version": "2",
  "source": "example.mp4",
  "tb": "30/1",
  "effects": [
    [],
    ["speed:2.0"]
  ],
  "clips": [
    [0, 26, 0],
    [34, 396, 1]
  ]
}

v2 is a subset of JSON. Here, the range 2634 is cut simply because no clip covers it; the second clip plays 34396 at 2x speed.

Auto-Editor can use the v2 format as input:

auto-editor input.v2 -o output.mkv

The Spec

There are four required keys: "version", "source", "tb", and "effects", plus "clips". If there are more keys present in the JSON, the parser should ignore them.

shown using TypeScript notation, the keys can be set to the following values.

interface v2 {
  version: "2";      // Must always be set as "2".

  source: string;    // Path to a media file. The path can be relative or absolute,
                     // but must be valid for the given platform.

  tb: string;        // The timebase as a rational number, e.g. "30/1" or "30000/1001".

  effects: Effect[]; // The table of effects that clips reference by index.

  clips: Clip[];     // The sections of the source to lay down, in order.
}

Effects

effects is an array of effect groups. A Clip refers to one of these by its index in this array.

type Effect = string[]; // A list of action strings.

Each effect group is a list of action strings:

Clips

Each Clip is a 3-element array:

type Clip = [start: number, end: number, effect: number];

Clips are processed in order. Each non-cut clip is appended to the end of the timeline, so the output is the concatenation of all the selected (non-cut) ranges. Cutting a range out is normally done by simply not covering it with a clip (leaving a gap), but you can also point a clip at a ["cut"] effect to the same end.

The Timebase

start and end are in the timebase unit set by tb. For example, with "tb": "30/1" a clip of [0, 1, 0] has a length of 1/30 of a second.

See Also

The v1 format is simpler if you only need cutting and a single speed per section. The v3 format is nonlinear and supports multiple sources and overlapping layers.