The v2 format

Overview

The v2 format is a linear timeline format, like v1, but instead of a single speed value per section it lets each section reference a group of actions (speed, volume, zoom, etc.). It also records an explicit timebase, so it does not need to probe the source for its framerate.

Like v1, v2 only supports a single source and a "linear" timeline: sections must be laid down in order, and sections later in the source cannot be placed ahead of earlier ones. Like v1, it cannot represent transitions; exporting drops them and keeps the cuts.

You can generate a v2 timeline file with auto-editor example.mp4 --export v2 and it would look something like this:

{
  "version": "2",
  "source": "example.mp4",
  "tb": "30/1",
  "effects": [
    [],
    ["speed:2.0"]
  ],
  "clips": [
    [0, 26, 0],
    [34, 396, 1]
  ]
}

v2 is a subset of JSON. Here, the range 26–34 is cut simply because no clip covers it; the second clip plays 34–396 at 2x speed.

Auto-Editor can use the v2 format as input:

auto-editor input.v2 -o output.mkv

The Spec

There are four required keys: "version", "source", "tb", and "effects", plus "clips". If there are more keys present in the JSON, the parser should ignore them.

shown using TypeScript notation, the keys can be set to the following values.

interface v2 {
  version: "2";      // Must always be set as "2".

  source: string;    // Path to a media file. The path can be relative or absolute,
                     // but must be valid for the given platform.

  tb: string;        // The timebase as a rational number, e.g. "30/1" or "30000/1001".

  effects: Effect[]; // The table of effects that clips reference by index.

  clips: Clip[];     // The sections of the source to lay down, in order.
}

Effects

effects is an array of effect groups. A Clip refers to one of these by its index in this array.

type Effect = string[]; // A list of action strings.

Each effect group is a list of action strings:

[] — an empty list means "do nothing"; play the section unaltered at normal speed.
["cut"] — cut the section out (don't include it).
["speed:2.0"], ["volume:0.5"], ... — apply one or more actions. Each string is an action name with its arguments, the same syntax used by --set-action.

Clips

Each Clip is a 3-element array:

type Clip = [start: number, end: number, effect: number];

start (inclusive) and end (exclusive) select a time range from the source, in timebase units.
effect is an index into the effects array.

Clips are processed in order. Each non-cut clip is appended to the end of the timeline, so the output is the concatenation of all the selected (non-cut) ranges. Cutting a range out is normally done by simply not covering it with a clip (leaving a gap), but you can also point a clip at a ["cut"] effect to the same end.

The Timebase