Internet-Draft moq-mi November 2024
Cenzano-Ferret & Frindell Expires 16 May 2025 [Page]
Workgroup:
Media Over QUIC
Internet-Draft:
draft-cenzano-moq-media-interop-latest
Published:
Intended Status:
Informational
Expires:
Authors:
J. Cenzano-Ferret
Meta
A. Frindell
Meta

MoQ Media Interop

Abstract

This protocol can be used to send and receive video and audio over Media over QUIC Transport [MOQT].

About This Document

This note is to be removed before publishing as an RFC.

The latest revision of this draft can be found at https://afrind.github.io/draft-cenzano-media-interop/draft-cenzano-moq-media-interop.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-cenzano-moq-media-interop/.

Discussion of this document takes place on the Media Over QUIC Working Group mailing list (mailto:moq@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/moq/. Subscribe at https://www.ietf.org/mailman/listinfo/moq/.

Source for this draft and an issue tracker can be found at https://github.com/afrind/draft-cenzano-media-interop.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 16 May 2025.

Table of Contents

1. Introduction

This protocol specifies a simple mechanism for sending media (video and audio) over MOQT for both live-streaming and VC style use cases. The protocol is flexible in order to support this range of use cases.

The following parameters can be updated in the middle of a the track (ex: frame rate, resolution, codec, etc)

The protocol defines a low overhead packager (not LoC [loc], and is extensible to other formats such as FMP4.

2. Protocol Operation

2.1. Track Names

The publisher selects a namespace of their choosing, and sends an ANNOUNCE message for this namespace.

Within the publisher namespace the publisher will offer media tracks named as videoX and audioX where X will be an integer starting at 0.

So in case the publisher issues 2 audio tracks and 1 video track, the track names available will be video0, audio0, and audio1.

The subscriber will consider all of those tracks belonging to the same namespace as part of the same synchronization group (timestamps aligned to the same timeline).

2.2. Mapping Tracks to MoQT Object Model

For the video track, the publisher begins a new group at the start of each IDR (so object 0 will be always an IDR Keyframe), and each group contains a single subgroup. Each object has the format described in Section 2.4.

For the audio track, the publisher begins a new group with each audio object, and each group contains a single subgroup. Each object has the format described in Section 2.4.

TODO: Datagram forwarding preference could be used, but has problems if audio frame does not fit in a single UDP payload.

2.3. Timestamps

To avoid using fractional numbers and having to deal with rounding errors, timestamps will be expressed with two integers: - timestamp numerator (ex: PTS, DTS, duration) - timebase

To convert a timestamp into seconds you just need to: timestamp(s) = timestamp numerator / timebase

Example:

PTS = 11, timebase = 30

PTS(s) = 11/30 = 0.366666

2.4. Object Format

{
  Media Type (i)
  Media payload (..)
}
Figure 1: MOQT Media object

2.4.1. Media Type

This value indicates what kind of media payload will follow

Table 1
Code Value
0x0 Video H264 in AVCC with LOC packager
0x1 Audio Opus bitsream
0x2 UTF-8 text
0x3 Audio AAC-LC in MPEG4

2.4.2. Media payload

Is where media related information is carried, and it is specifed by Media type

2.4.2.1. Video H264 in AVCC with LOC packager format
{
  Seq ID (i)
  PTS Timestamp (i)
  DTS Timestamp (i)
  Timebase (i)
  Duration (i)
  Wallclock (i)
  Metadata Size (i)
  Metadata (..)
  Payload (..)
}
Figure 2: MOQT Media video h264 loc
2.4.2.1.1. Seq ID

Monotonically increasing counter for this media track

2.4.2.1.2. PTS Timestamp

Indicates PTS in timebase

TODO: Varint does NOT accept easily negative, so it could be challenging to encode at start (priming)

2.4.2.1.3. DTS Timestamp

Not needed if B frames are NOT used, in that case should be same value as PTS.

TODO: Varint does NOT accept easily negative, so it could be challenging to encode at start (priming)

2.4.2.1.4. Timebase

Units used in PTS, DTS, and duration.

2.4.2.1.5. Duration

Duration in timebase. It will be 0 if not set

2.4.2.1.6. Wall Clock

EPOCH time in ms when this frame started being captured. It will be 0 if not set

2.4.2.1.7. Metadata Size

Size in bytes of the metadata section It can be 0 if no metadata is sent

2.4.2.1.8. Metadata

Extradata needed to decode this stream This will be AVCDecoderConfigurationRecord as described in [ISO14496-15:2019] section 5.3.3.1, with field lengthSizeMinusOne = 3 (So length = 4). If any other size length is indicated (in AVCDecoderConfigurationRecord) we should error with “Protocol violation”

Any change in encoding parameters MUST send a new AVCDecoderConfigurationRecord

2.4.2.1.9. Payload

H264 with bitstream AVC1 format as described in [ISO14496-15:2019] section 5.3. Using 4bytes size field length.

2.4.2.2. Audio Opus bitsream
{
  Seq ID (i)
  PTS Timestamp (i)
  Timebase (i)
  Sample Freq (i)
  Num Channels (i)
  Duration (i)
  Wall Clock (i)
  Payload (..)
}
Figure 3: MOQT Media audio Opus LOC
2.4.2.2.1. Seq Id

Monotonically increasing counter for this media track

2.4.2.2.2. PTS Timestamp

Indicates PTS in timebase

TODO: Varint does NOT accept easily negative, so it could be challenging to encode at start (priming)

2.4.2.2.3. Timebase

Units used in PTS, DTS, and duration

2.4.2.2.4. Sample Freq

Sample frequency used in the original signal (before encoding)

2.4.2.2.5. Num Channels

Number of channels in the original signal (before encoding)

2.4.2.2.6. Duration

Duration in timebase. It will be 0 if not set

2.4.2.2.7. Wallclock

EPOCH time in ms when this frame started being captured. It will be 0 if not set

2.4.2.2.8. Payload

Opus packets, as described in [RFC6716] - section 3

2.4.2.3. UTF-8 Text
{
  Seq ID (i)
  Payload (..)
}
Figure 4: MOQT UTF-8 Text
2.4.2.3.1. Seq Id

Monotonically increasing counter for this media track

2.4.2.3.2. Payload

Text packets in UTF-8, as described in [RFC3629]

2.4.2.4. Audio AAC-LC in MPEG4 bitstream
{
  Seq ID (i)
  PTS Timestamp (i)
  Timebase (i)
  Sample Freq (i)
  Num Channels (i)
  Duration (i)
  Wall Clock (i)
  Payload (..)
}
Figure 5: MOQT Media audio AAC-LC MPEG4 LOC
2.4.2.4.1. Seq Id

Monotonically increasing counter for this media track

2.4.2.4.2. PTS Timestamp

Indicates PTS in timebase

TODO: Varint does NOT accept easily negative, so it could be challenging to encode at start (priming)

2.4.2.4.3. Timebase

Units used in PTS, DTS, and duration

2.4.2.4.4. Sample Freq

Sample frequency used in the original signal (before encoding)

2.4.2.4.5. Num Channels

Number of channels in the original signal (before encoding)

2.4.2.4.6. Duration

Duration in timebase. It will be 0 if not set

2.4.2.4.7. Wallclock

EPOCH time in ms when this frame started being captured. It will be 0 if not set

2.4.2.4.8. Payload

AAC frame (syntax element raw_data_block()), as described in section 4.4.2.1 of [ISO14496-3:2009].

3. References

[ISO14496-15:2019] "Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format", ISO ISO14496-15:2019, International Organization for Standardization, October, 2022.

[ISO14496-3:2009] "Information technology — Coding of audio-visual objects", ISO ISO14496-3:2009, International Organization for Standardization, September, 2009.

4. Conventions and Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

5. Security Considerations

TODO Security

6. IANA Considerations

This document has no IANA actions.

7. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC3629]
Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, , <https://www.rfc-editor.org/rfc/rfc3629>.
[RFC6716]
Valin, JM., Vos, K., and T. Terriberry, "Definition of the Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, , <https://www.rfc-editor.org/rfc/rfc6716>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.

Acknowledgments

TODO acknowledge.

Authors' Addresses

Jordi Cenzano-Ferret
Meta
Alan Frindell
Meta