Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache H264/H265 GOPs in order to allow readers to decode frames immediately #4189

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

jean343
Copy link
Contributor

@jean343 jean343 commented Jan 23, 2025

GOP Cache

This PR introduces Group of Pictures (GOP) caching to MediaMTX, enhancing its performance and reducing latency in streaming scenarios. By caching the last GOP for each stream, new subscribers can immediately receive the latest video data without waiting for the next keyframe, improving the user experience, especially for streams with long keyframe intervals.

This works for both H264 and H265, as well as for RTSP and WebRTC.

Configurable Cache Settings:

Introduced a new configuration parameter gopCache in mediamtx.yml for enabling/disabling GOP caching.

Fix: #1209

@aler9
Copy link
Member

aler9 commented Jan 24, 2025

This is a great work. The important thing is that you can confirm that the feature works and in which scenarios (protocols, codecs, with/without B-frames).
We can then adjust details, the ones that come into my mind are:

  • check on the GOP size to prevent RAM exhaustion
  • additional codecs, at least AV1

@jean343
Copy link
Contributor Author

jean343 commented Jan 25, 2025

Thanks @aler9 for your kind comment!

The feature does work in the following scenarios:

Protocol:

  • WebRTC
  • RTSP

Codecs:

  • H.264
  • H.265

I have only tried videos without b-frames as WebRTC does not support b-frames. There might be adjustments to make when dealing with b-frames over RTSP.

In order to reduce RAM exhaustion, we do not cache anything until we get a key frame, this will prevent unsupported codecs from storing anything, and will save a little bit for supported codecs.
In case the GOP is really long, we trim at 512 packets, conserving memory. In this case, clients will need to wait until the next key frame before video playback.

As per additional codecs, I could not find a reliable way to detect their keyframes.

In the WebRTC playback scenario, the PTS and Timestamp needs to be modified to prevent gaps. WebRTC will pause and stop playback if set incorrectly.

For example, incorrect Timestamp will look like:

Untitled.mov

Correct timestamp will look like:

Screen.Recording.2025-01-23.at.1.40.16.PM.mov

@yairzahavi
Copy link

You have missing fixes needed in order to align with base branch.
I did the in my own fork since i needed the Gop Cache and base branch alignment..

you can reference them here:
#4282

@angry-beaver
Copy link

@aler9 @jean343 This is a cool feature. Do you have any thoughts on when it will be completed?

@aler9
Copy link
Member

aler9 commented Mar 10, 2025

@angry-beaver i need to finish a couple of other things then i'll focus on this. In the meanwhile, @jean343 and @yairzahavi can try to go on by themselves.

@yairzahavi
Copy link

@angry-beaver i need to finish a couple of other things then i'll focus on this. In the meanwhile, @jean343 and @yairzahavi can try to go on by themselves.

@jean343 I'll try and do it this soon including the av1 and ram exhaustion prevention and i'll ping you for a review. 🙌

@jean343
Copy link
Contributor Author

jean343 commented Mar 14, 2025

Thanks everyone for the help. I merged from master and fixed the build!

@yairzahavi, the AV1 work is awesome. I merged the AV1 work into this branch and fixed the merge conflicts. I did not change the CacheLength logic, it it's not clear that the additional logic helps performance.

We should aim at making a final PR as coauthors!

Comment on lines +148 to +156
if s.CachedUnits != nil {
s.CachedUnits = append(s.CachedUnits, u)
}
l := len(s.CachedUnits)
if l > maxCachedGOPSize {
s.CachedUnits = s.CachedUnits[l-maxCachedGOPSize:]
sf.decodeErrLogger.Log(logger.Warn, "GOP cache is full, dropping packets")
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jean343

Thanks everyone for the help. I merged from master and fixed the build!

@yairzahavi, the AV1 work is awesome. I merged the AV1 work into this branch and fixed the merge conflicts. I did not change the CacheLength logic, it it's not clear that the additional logic helps performance.

You are right that the change is not clear and additionally it doesn't really work and i have yet to figure out why.

But the reason I tried and change it is that i inspected this code block and it seems there is another memory allocation when you surpass the maxCachedGOPSize
And if you truncate it afterwards the memory allocation already happened.
Additionally I tried to reduce memory allocations and usage by allocating a fixed size.

I hope you'd have a solution\idea for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could drop the entire cache once it reaches maxCachedGOPSize. It would solve memory allocations, and once the cache size reaches maxCachedGOPSize the GOP gets affected and the video player will need to wait for the next key frame regardless.

@yairzahavi
Copy link

Another point i noticed is that the GOP cache causes higher webrtc jitter
So i'm contemplating how and if to solve this as well.

@aler9
Copy link
Member

aler9 commented Mar 16, 2025

Hello, i've tested the patch, while the working principle is present, there some aspects that can be improved:

  1. the startup phase is not pleasant as all past frames of the GOP are shown and played very fast (see attached video). I understand that this behavior is needed as workaround to prevent freezing, but there must be some alternative. For instance, we had a similar problem when implementing the Playback server. In that case, duration of past frames is set to zero, because the transport mean (which is MP4) allows that (more or less!). However i know that it's not possible to do that with most streaming protocols. An alternative might be grouping multiple H264 access units into a big H264 access unit that contains all frames.
  2. audio sync: linked to point 1. If you add or remove some delta T from video, then audio will get desynced.
  3. RTSP: this patch does not cover RTSP. In RTSP, packets are not written to individual readers, but they are sent by calling ServerStream.WritePacketRTP once, in order to support the multicast transport, in which packets are sent once to the network.
    There should be a mechanism in which when a RTSP reader is created and the transport protocol is not Multicast, then GOP frames are sent through ServerSession.WritePacketRTP.

The GOP caching feature has always been difficult to implement because it has to take into consideration how players react when receiving a bunch of access units at the same time. It involves testing all possible ways to send the GOP, digging into source code of all players and codec specifications.

The feature can be merged into the main branch only when a high level of compatibility with all major protocols and players is reached.

out.mp4

@jean343
Copy link
Contributor Author

jean343 commented Mar 16, 2025

Thank you @aler9 for testing and for your feedback.

I did not expect to uncover this many corner cases when I started implementation :)

  1. I agree that playing back all frames very quickly is not optimal, in our testing, where our GOP setting is ~12s, it ends up being better than waiting up to 12s for video playback.
    As you know, we can't set the duration to 0s, and because the transport is UDP, we can not send an unlimited amount of packets instantly.
    We have tested sending a key frame, and dropping all p-frames until the next key frame. In this case, playback starts instantly, and obviously freezes until next key frame. This indicates that we could possibly combine the p-frames into one...
    It would be amazing if you could help merging the frames together into one!
  2. I do not believe audio desync should be a problem, because we do not play audio back during catch up.
  3. Unicast RTSP is supported, and in this case, playing back all frames at the same time does work, which is great.
    Multicast RTSP should not have GOP cache enabled, is there a check in the code I could add to disable that portion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache H264 GOPs in order to allow readers to decode frames immediately
4 participants