The original HD mp4 file 'movie.mp4' plays very smoothly. I picked out the h264 stream saved as 'movie.264' from it, then using mp4v2 converted 'movie.264' to a new mp4 file 'output.mp4'. However, experience of watching 'output.mp4' was very bad. Why?
I debugged the 'ClipTrack' function in 'mp4v2/test/OLD/mp4clip.cpp', found a key argument 'renderingOffset' of function 'MP4WriteSample'. All examples searched from the web about writing h264 to mp4 through mp4v2 set 'renderingOffset' to 0, while in 'ClipTrack', 'renderingOffset' of the function 'MP4WriteSample' is directly set to the value from 'renderingOffset' of the function 'MP4ReadSample'.
Here we make a concept clear, the 'sample' entity in mp4 file format specification is equal to a frame which can completely rendering as a picture. A sample consists of several nalus, a nalu whose start code is 0x00, 0x00, 0x00, 0x01 indicates a new sample, while a nalu which starts with 0x00, 0x00, 0x01 is among a sample, and a sample needs not to be a IDR key frame.
Through carrying 'renderingOffset' value at beginning of each sample in 'movie.264', I did generate an 'output.mp4' which displays as smoothly as the original one.
Did 'renderingOffset' mean time stamp of a sample? No, at least not exactly. By watching log, I found its value was not increasing as time went by.
I continued debugging mp4v2, then saw a key word 'ctts' atom, and searched a link from the web,
I now understand, the decoding order of h264 samples is slightly different from the rendering order of them.
For example, in stts atom,
1 2 3 4 5 | 'decoding order' 'sample count' 'sample duration' 1,2,3,4 4 20 5 1 40 6,7,8 3 20 9 1 60 |
in ctts atom,
1 2 3 4 5 6 7 8 | 'decoding order' 'sample count' 'sample offset' 1,2,3 3 20 4 1 40 5 1 0 6 1 40 7 1 0 8 1 40 9 1 0 |
final rendering timing is
1 2 3 | 'decoding order' 1 2 3 4 5 6 7 8 9 'duration' 20 20 20 20 40 20 20 20 60 'rending order' 1 2 3 5 5,4 7 6 9 9,8 9 |
1 2 3 4 5 6 7 8 9 10 11 12 | mvhd: movie header tkhd: track header mdhd: media header stsd: sample descriptions stts: time to sample ctts: composition offset stss: sync sample offset stsc: sample to chunk stsz: sample size stco/stco64: chunk offset dts: decoding timestamp pts: presentation timestamp |