This document describes use cases and requirements for WebRTC used in Live Streaming Multimedia Ecosystem .

This is still a work in progress. The proposal is being incubated in the Chinese Web Interest Group.

Introduction

The current livestreaming multimedia ecosystem is showing diverse and robust growth trends. It covers livestreaming content from both institutions and individuals in various fields to meet the diverse needs of users. In this ecosystem, major tech companies are strategically involved in different areas to provide richer, practical, and entertaining livestreaming experiences.

China Mobile MIGU, an integrated media platform, focuses on providing diverse livestreaming content. Particularly in large-scale event livestreaming, MIGU offers real-time broadcasts of significant events like sports competitions and music concerts, meeting users' demand for immediate experiences.

ByteDance plays a crucial role in interactive entertainment, online education, and conferencing. Its TikTok platform has become a globally popular short video platform, providing users with abundant entertainment content, including livestreaming. ByteDance has also ventured into livestreamed educational content to accommodate learners seeking flexible learning approaches. In the business sector, its livestreamed conferencing solutions offer efficient remote communication and collaboration platforms for enterprises.

Alibaba Group demonstrates strong capabilities in the livestreaming sector. Its livestreaming e-commerce combines shopping and entertainment, offering consumers immersive shopping experiences. Through methods like hosts showcasing products and interactive Q&A sessions, consumers are engaged in making purchases. Additionally, Alibaba also offers fundamental livestreaming cloud services, spanning education and gaming sectors, providing technical support for creating livestreaming platforms for businesses and individuals.

Terminology

This document uses the following terms with the specific meanings defined here. Where possible these meanings are consistent with common usage. However, note that common usage of some of these terms have multiple, ambiguous, or inconsistent meanings. The definition here will take precedence. When terms are used with the specific meanings defined here they will be capitalized. When possible reference to existing standards defining these terms is given.

WebRTC
Web Real-Time Communication

Use Cases

Cloud Box (UC-CB)

A "Cloud Box" can be understood as a cloud-based private room, similar to the private room we have in our real lives. Friends can come together in these rooms to chat, watch movies, and watching sports events. In sports broadcasts, fans can participate in predictions and discussions about ongoing matches through the "Cloud Box" feature, enabling them to share their insights and forecasts in real-time. The Cloud Box feature bridges the gap between individuals, enhancing their immersive experience and interaction.

In this scenario, participants can engage in real-time interactions through WebRTC.

In order to provide a more immersive user experience, spatial audio codecs similar to Dolby Atmos is utilized in WebRTC when watching Ultra HD sports events.

WebMedia
MIGU Cloud Box

Real-time Live Commerce (UC-RLC)

Real-time Live Commerce, also known as real-time livestream shopping, is an innovative retail trend that merges real-time streaming with e-commerce. In Live Commerce sessions, businesses and brands showcase their products or services on live online platforms, such as social media and e-commerce websites, while simultaneously engaging with the audience in direct interactions. During these live sessions, hosts or salespeople introduce product features, demonstrate usage, address viewer inquiries, and offer exclusive deals or limited-time promotions.

Viewers can actively participate by using real-time commenting or chat features to interact with the hosts. They can ask questions, express their interest in purchasing, and even share their own product experiences with fellow viewers. When a viewer shows interest in a particular product, they can instantly make a purchase using dedicated links or designated purchasing methods.

Occasionally, invited guests can participate in live interactions with the hosts, aiding in sales through WebRTC-powered video/audio chat.

WebMedia
Real-time Live Commerce

Metaverse Convetion Center (UC-MCC)

The Metaverse Convention Center is a type of cloud game that incorporates advanced technologies such as WebRTC for interactive communication, 3D rendering, and AI-generated avatars. It transforms traditional audio-video conferencing into virtual, gamified, and interactive experiences for remote meetings, office use, and events.

Users have the ability to create their own virtual characters, which can range from cartoon-style to realistic representations, and select from a variety of meeting rooms to align with their preferences.

In this scenario, all the resources are rendered on the cloud and then streamed to the browser using WebRTC. Any control commands in the browser are executed based on WebRTC.

WebMedia
MIGU Metaverse Convetion Center

Requirements

Generic Fragmentation and Reassembly Mechanism for Audio Transmission

This requirement is for UC-CB.

During audio transmission via WebRTC, RTP packets encapsulate each audio frame. However, if the frame size exceeds the network's MTU limit (typically 1500 bytes), packet transmission can fail. To overcome this, packet fragmentation and reassembly are needed. Fragmentation occurs at the sender, and reassembly happens at the receiver.

WebRTC handles fragmentation and reassembly for video transmission, but not for audio. It is the responsibility of the application or developer to handle fragmentation and reassembly of audio packets when using WebRTC for audio transmission.

A proposed fragmentation and reassembly workflow can be outlined as follows:

WebMedia
a proposed Fragmentation and Reassembly Workflow

Statistics for Audio and Video Freeze

This requirement is for UC-CB, UC-MCC, and UC-RLC.

In the W3C Identifiers for WebRTC's Statistics API spec, the logic for marking a video freeze, is defined as follows(see freezeCount):

  1. Calculate the linear average of the past 30 rendering frame durations and label it as avg_frame_duration_ms.
  2. Calculate the interval between the current two frames. If this value is greater than Max(3 * avg_frame_duration_ms, avg_frame_duration_ms + 150), it is considered as a freeze.
In situations where the network is good, this logic is quite reasonable. However, in the case of a weak network, the value of avg_frame_duration_ms may be larger, for example, 150ms. In this scenario, the freeze trigger value would be 450ms, which may not be reasonable. As generally, 200ms might be considered a freeze. So The spec can define an API for setting the trigger duration for video freeze (i.e., when the interval between two frames exceeds this duration, it is recorded as a freeze event).

Meanwhile, The W3C spec does not define metrics for audio-related freeze , although the underlying WebRTC code has already implemented them with the fields totalInterruptionDuration and interruptionCount. The metrics can be exposed to developers.

Receiving sounds from other devices when in background

This requirement is for UC-CB.

An iOS device that has the microphone enabled and then switches to the background is unable to hear sounds from other devices when some device unmutes.

This issue can be reproduced as follows:

Fixed GOP encode interval

This requirement is for UC-RLC.

In the live-streaming scene, the encoder needs to output video frames at a fixed GOP interval (such as GOP=2 seconds). We hope that an API can be provided to set GOP interval.

Enable RTCP-XR (RRTR/DLRR) at recv-only mode

This requirement is for UC-RLC.

In the live-streaming scene, the player endpoint is generally in recv-only mode. If the player endpoint wants to calculate RTT (round-trip time), one possible approach is to use RTCP-XR [RFC3611]. Currently, WebRTC already supports RTCP-XR, but the offer generated by browser doesn't contain it. We hope the browser adds an API to enable the RTCP-XR.

Related issue