MP4 文件格式解析
2021年5月25日 · 3065 字 · 7 分钟
参考文档:
MP4 文件格式文档里已经非常详细了。
本文通过解析一个实际的 mp4 文件,简单看看 MP4 文件中的各个 Atom
/box
。
- MP4 源文件如下,
- 下载自这里 640x360。
ftyp 0x00000000 -> 0x0000001F
00 00 00 20 | Size | 32
66 74 79 70 | Type | 'ftyp'
6D 70 34 32 | Major_Brand | 'mp42'
00 00 00 00 | Mainor_Version |
6D 70 34 32 | Compatible_Brands | 'mp42'
6D 70 34 31 | Compatible_Brands | 'mp41'
69 73 6F 6D | Compatible_Brands | 'isom'
61 76 63 31 | Compatible_Brands | 'avc1'
## moov
[0x00000020 -> 0x00004A3D]
00 00 4A 1E | Size | 18974
6D 6F 6F 76 | Type | 'moov'
...........
...........
## moov->mvhd
[0x00000028 -> 0x00000093]
00 00 00 6C 4| Size | 108
6D 76 68 64 4| Type | 'mvhd'
00 1| Version |
00 00 00 3| Flags |
D1 EA 27 21 4| Creation time |
D1 EA 27 21 4| Modification time |
00 00 02 58 4| Time scale |
00 00 47 8C 4| Duration |
00 01 00 00 4| Preferred rate |
01 00 2| Prefereed volume |
00 00 10| Reserved
00 00 00 00 | Reserved
00 00 00 00 | Reserved
00 01 00 00 36| Matrix structure 1
00 00 00 00 | Matrix structure 2
00 00 00 00 | Matrix structure 3
00 00 00 00 | Matrix structure 4
00 01 00 00 | Matrix structure 5
00 00 00 00 | Matrix structure 6
00 00 00 00 | Matrix structure 7
00 00 00 00 | Matrix structure 8
40 00 00 00 | Matrix structure 9
00 00 00 00 4| Preview time
00 00 00 00 4| Preview duration
00 00 00 00 4| Poster time
00 00 00 00 4| Selection time
00 00 00 00 4| Selection duration
00 00 00 00 4| Current time
00 00 00 03 4| Next track ID
## moov->iods
[0x00000094 -> 0x000000BD]
00 00 00 2A | Size | 0x2A
69 6F 64 73 | Type | 'moov'
00 00 00 00
10 80 80 80
19 00 4F FF
FF 29 7F FF
0E 80 80 80
04 00 00 00
01 0E 80 80
80 04 00 00
00 02
## moov->trak [container atom]
[0x000000BE -> 0x00003170]
00 00 30 B3 | Size | 12467
74 72 61 6B | Type | 'trak'
...........
...........
## moov->trak->tkhd [container atom]
00 00 00 5C 4| Size | 92
74 6B 68 64 4| Type | 'tkhd'
00 1| Version
00 00 07 3| Flags
D1 EA 27 21 4| Creation time
D1 EA 27 21 4| Modification time
00 00 00 01 4| Track ID
00 00 00 00 4| Reserved
00 00 46 64 4| Duration
00 00 00 00 8| Reserved 1
00 00 00 00 | Reserved 2
00 00 2| Layer
00 00 2| Alternate group
00 00 2| Volume
00 00 2| Reserved
00 01 00 00 36| Matrix structure
00 00 00 00 | Matrix structure
00 00 00 00 | Matrix structure
00 00 00 00 | Matrix structure
00 01 00 00 | Matrix structure
00 00 00 00 | Matrix structure
00 00 00 00 | Matrix structure
00 00 00 00 | Matrix structure
40 00 00 00 | Matrix structure
02 80 00 00 | Track width | 0x02800000 >> 16 = 0x0280 = 640
01 68 00 00 | Track height | 0x01680000 >> 16 = 0x0168 = 360
## moov->trak->edts
## moov->trak->edts->elst
00 00 00 24 4| Size | 36
65 64 74 73 4| Type | 'edts'
00 00 00 1C 4| Size | 28
65 6C 73 74 4| Type | 'elst'
00 1| Version
00 00 00 3| Flags
00 00 00 01 4| Number of entries
00 00 46 64 4| Track duration |
00 00 00 02 4| Media time |
00 01 00 00 4| Media rate
## moov->trak->mdia [container atom]
00 00 30 2B 4| Size | 12331
6D 64 69 61 4| Type | 'mdia'
## moov->trak->mdia->mdhd
00 00 00 20 4| Size | 32
6D 64 68 64 4| Type | 'mdhd'
00 1| Versioin
00 00 00 3| Flags
D1 EA 27 21 4| Creation time
D1 EA 27 21 4| Modification time
00 00 00 1E 4| Time scale |
00 00 03 85 4| Duration |
55 C4 2| Language | (https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap4/qtff4.html#//apple_ref/doc/uid/TP40000939-CH206-27005)
00 00 2| Quality
## moov->trak->mdia->hdlr
00 00 00 36 4| Size | 54
68 64 6C 72 4| Type | 'hdlr'
00 1| Version
00 00 00 3| Flags
00 00 00 00 4| Component type |
76 69 64 65 4| Component subtype
00 00 00 00 4| Component manufacturer
00 00 00 00 4| Component flags
00 00 00 00 4| Component flags masg
4C 2D 53 4D ~| 'L-SM'
41 53 48 20 | 'ASH '
56 69 64 65 | 'Vide'
6F 20 48 61 | 'o Ha'
6E 64 6C 65 | 'ndle'
72 00 | 'r\0'
## moov->trak->mdia->minf [container atom]
00 00 2F CD 4| Size | 12337
6D 69 6E 66 4| Type | 'minf'
...........
## moov->trak->mdia->minf->vmhd
00 00 00 14 4| Size | 20
76 6D 68 64 4| Type | 'vmhd'
00 1| Version
00 00 01 3| Flags
00 00 2| Graphics mode
00 00 6| Opcolor
00 00 00 00
## moov->trak->mdia->minf->dinf [container atom]
## moov->trak->mdia->minf->dinf->dref
00 00 00 24 4| Size | 36
64 69 6E 66 4| Type | 'dinf'
00 00 00 1C 4| Size | 28
64 72 65 66 4| Type | 'dref'
00 1| Version
00 00 00 3| Flags
00 00 00 01 4| Number of entries
00 00 00 0C ~|
75 72 6C 20 |
00 00 00 01 |
## moov->trak->mdia->minf->stbl [container atom]
00 00 2F 8D 4| Size | 12173
73 74 62 6C 4| Type | 'stbl'
Sample Description Atoms
## moov->trak->mdia->minf->stbl->stsd
00 00 00 AC 4| Size | 172
73 74 73 64 4| Type | 'stsd'
00 1| Vesion
00 00 00 3| Flags
00 00 00 01 4| Number of entries
00 00 00 9C ~| Size | 156
61 76 63 31 | Data format | 'avc1' -> H.264 video (https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap3/qtff3.html#//apple_ref/doc/uid/TP40000939-CH205-SW1)
00 00 00 00 6| Reserved
00 00
00 01 2| Data reference index
00 00 2| Vesion
00 00 2| Revison level
00 00 00 00 4| Vendor
00 00 00 00 4| Temporal quality
00 00 00 00 4| Spatial quality
02 80 2| Width | 640
01 68 2| Height | 360
00 48 00 00 4| Horizontal resolution ?
00 48 00 00 4| Vertical resolution ?
00 00 00 00 4| Data size
00 01 2| Frame count per samples
0A 41 32| Compressor name |
56 43 20 43 |
6F 64 69 6E |
67 00 00 00 |
00 00 00 00 |
00 00 00 00 |
00 00 00 00 |
00 00 00 00 |
00 00
00 00 2| Depth
FF FF 2| Color table ID
00 00 4| Size | 54
00 36 | Size
61 76 4| Type
63 43 | Type | 'avcC'
01 1| Version
64 1| AVCProfileIndication
00 1| profile compatibility
1F 1| AVClevelIndication
FF | H264 中 NALU 长度 4
E1 | sps 个数,
00 1A | sps 长度 26
67 64 | sps 内容
00 1F AC D9 | .......
40 A0 2F F9 | .......
70 11 00 00 | .......
03 00 01 00 | .......
00 03 00 3C | .......
0F 18 31 96 | sps 内容
01 | pps 个数
00 05 | pps 长度
68 | pps 内容
EB EC B2 2C | pps 内容
FD F8 F8 00
00 00 00 10 4| Size | 16
70 61 73 70 4| Type | 'pasp'
00 00 00 01
00 00 00 01
Time-to-Sample Atoms
Time-to-sample atoms store duration information for a media’s samples, providing a mapping from a time in a media to the corresponding data sample. The time-to-sample atom has an atom type of ‘stts’.
## moov->trak->mdia->minf->stbl->stts
00 00 00 18 4| Size | 24
73 74 74 73 4| Type | 'stts'
00 1| Version |
00 00 00 3| Flags |
00 00 00 01 4| Number of entries | 1
00 00 03 85 4| Sample count | 901 帧
00 00 00 01 4| Sample duration | ## todo 这里的 1, 表示以 time scale 为单位的 1. 1/30 (s) ~= 0.033333 秒
持续时间相同的连续sample可以放到一个entry里达到节省空间的目的。
如果前 400 各帧是 1, 后 501 各帧是 2,则表如下:
Sample Count | Sample duration |
---|---|
400 | 1 |
501 | 2 |
Composition Offset Atom
Video sample in encoded formats have a decode order and a presentation order (also called compostion order
or display order
). The composition offset atom is used when there are out-of-order video samples.
## moov->trak->mdia->minf->stbl->ctts
00 00 1B E8 4| Size | 7144
63 74 74 73 4| Type | 'ctts'
00 1| Version
00 00 00 3| Flags
00 00 03 7B 4| Entry count | 891
00 00 00 01 4| Sample count |
00 00 00 02 4| Composition offset
00 00 00 01 4| Sample count
00 00 00 05 4| Composition offset
...........
...........
Sync Sample Atoms
The sync sample atom identifies the key frames in the media. In a media that contains compressed data, key frames define starting points for portions of a temporally compressed sequence. The key frame is self-contained—that is, it is independent of preceding frames. Subsequent frames may depend on the key frame.
The sync sample atom provides a compact marking of the random access points within a stream. The table is arranged in strictly increasing order of sample number. If this table is not present, every sample is implicitly a random access point.
## moov->trak->mdia->minf->stbl->stss
00 00 00 3C 4| Size | 60
73 74 73 73 4| Type | 'stss'
00 1| Version
00 00 00 3| Flags
00 00 00 0B 4| Number of entries | 该视频共有 11 个关键帧
00 00 00 01 4| Number of sample 1 | 1
00 00 00 5B 4| Number of sample 2 | 91
00 00 00 B5 4| Number of sample 3
00 00 01 0F 4| Number of sample 4
00 00 01 69 4| Number of sample 5
00 00 01 C3 4| Number of sample 6
00 00 02 1D 4| Number of sample 7
00 00 02 77 4| Number of sample 8
00 00 02 D1 4| Number of sample 9
00 00 03 2B 4| Number of sample 10
00 00 03 85 4| Number of sample 11
Sample Dependency Flags Atom
## moov->trak->mdia->minf->stbl->sdtp
00 00 03 91 4| Size | 913
73 64 74 70 4| Type | 'sdtp'
00 1| Version
00 00 00 3| Flags
A6 ~| Sample dependency flags
96 |
96 |
9A |
9A |
96 |
96 |
9A |
...........
...........
...........
Quick-Time format (for each video sample):
- bit[7] - reserved to 1
- bit[6] - if set to 1 then POC of the current frame might be greater than the POC of the next frame (the frame reordering takes place).
- bit[5] - if I-picture set 1, otherwise 0
- bit[4] - if not I-picture set 1, otherwise 0
- bit[3] - if ref_idc of slice NALU is zero then set bit[3]=1, otherwise 0
- bit[2] - if ref_idc of slice NALU is non-zero then set bit[2]=1, otherwise 0
- bit[1] - 0 - picture is redundant, otherwise 1 (redundant pictures are highly unlikely in mp4-files, therefore this bit rarely is found 0)
MP4 Format (for each video sample):
- bit[1:0] - set 10b , this implies that no redundant pictures present
- bit[3:2] - set 10b if ref_idc of the current frame is 0, otherwise set 01b
- bit[5:4] - set 10b if current frame is I-picture, otherwise set 01b
该媒体文件为 MP4 Format,所以:
[A6] 10 10 01 10 (I 帧)
[10] --> this implies that no redundant pictures present
[01]------> ref_idc != 0
[10]---------> I-picture
[96] 10 01 01 10 (P 帧)
[10] --> this implies that no redundant pictures present
[01]------> ref_idc != 0
[01]---------> not I-picture
[9A] 10 01 10 10 (B 帧)
[10] --> this implies that no redundant pictures present
[10]------> ref_idc == 0
[01]---------> not I-picture
Sample-to-Chunk Atoms
As sample are added to a media, they are collected into chunks that allow optimized data access. A chunk contains one or more sample. Chunks in a media may have different sizes, and the samples with a chunk may have different size. The sample-to-chunk atom stores chunk information for the samples in a media.
## moov->trak->mdia->minf->stbl->stsc
00 00 00 28 4| Size | 40
73 74 73 63 4| Type | 'stsc'
00 1| Version
00 00 00 3| Flags
00 00 00 02 4| Number of entries
00 00 00 01 4| First chunk
00 00 00 1F 4| Samples per chunk
00 00 00 01 4| Sample description ID
00 00 00 1E 4| First chunk
00 00 00 02 4| Samples per chunk
00 00 00 01 4| Sample description ID
First chunk | Sample per chunk | Sample description ID |
---|---|---|
chunk 1 | 31 | 1 |
chunk 30 | 2 | 1 |
表明 :
chunk 1
->chunk 29
, 每个chunk
包含 31 个sample
chunk 30
->chunk end
, 每个chunk
包含 2 个sample
然后我们前面了解到的,视频总共有 901
个 sample, 可以推算出
901 = 29 * 31 + 2 * X --> X = 1
视频流总共有 30 个 chunk
。通过后文的 stco
atom 也可以验证, 确实是 30 个 chunk
。
Sample Size Atoms
You use sample size atoms to specify the size of each sample in the media. Sample size atoms have an atom type of stsz
.
## moov->trak->mdia->minf->stbl->stsz
00 00 0E 28 4| Size |
73 74 73 7A 4| Type | 'stsz'
00 1| Version
00 00 00 3| Flags
00 00 00 00 4| Sample Size
00 00 03 85 4| Number of entries
00 00 88 08 4| Size of sample 1
00 00 17 4C 4| Size of sample 2
00 00 02 1C 4| Size of sample 3
...........
Chunk Offset Atoms
Chunk offset atoms identify the location of each chunk of data in the media’s data stream. Chunk offset atoms have an atom type of ‘stco’.
## moov->trak->mdia->minf->stbl->stco
00 00 00 88 4| Size | 136
73 74 63 6F 4| Type | 'stco'
00 1| Version
00 00 00 3| Flags
00 00 00 1E 4| Number of entries 30
00 00 4A 4E 4| Chunk 1 |
00 02 4C 51
00 03 73 C1
00 05 7E 35
00 06 D8 CB
00 08 89 43
00 0A B2 24
00 0C 2D 62
00 0D E3 A2
00 0F CB 14
00 11 5A E7
00 13 22 3D
00 14 FD 56
00 16 93 0F
00 18 2B F1
00 19 EA B2
00 1B 59 0A
00 1C BC 7C
00 1E 66 AC
00 1F A1 97
00 21 01 F8
00 22 87 FF
00 23 DE 28
00 25 44 47
00 26 D1 50
00 28 62 BD
00 2A 0D 22
00 2B D3 15
00 2D 77 4A
00 2F 13 F0 4| Chunk 30 3085296
sgpd // todo ????
## moov->trak->mdia->minf->stbl->sgpd
00 00 00 18
73 67 70 64
01
00 00 00
72 6F 6C 6C
00 00 00 02
00 00 00 00
sbgp // todo ?????
## moov->trak->mdia->minf->stbl->sbgp
00 00 00 1C
73 62 67 70
00
00 00 00
72 6F 6C 6C
00 00 00 01
00 00 03 85
00 00 00 00
以上就是视频流相关的 atmos 了。
moov->trak->mdia->mdhd
- TimeScale 视频 time scale/ time base 30
- Duration 视频时长 901
moov->trak->mdia->minf->stbl->stsd
- Data format –> ‘avc1’ –> H.264
- Width –> 640
- Height –> 360
moov->trak->mdia->minf->stbl->stts
- Number of entries –> 总帧数 901 帧
- Sample count
- Sample duartion –> 各帧视频帧的时长 –> 可以计算出各帧的 PTS
Sample index | Sample duration | Sample pts |
---|---|---|
0 | 1 | 0 |
1 | 1 | 1 |
2 | 1 | 2 |
3 | 1 | 3 |
… | … | … |
900 | 1 | 900 |
最后一个帧 pts=900, time scale(/base) = 30, 可以计算出, duration = 30(s)
moov->trak->mdia->minf->stbl->ctts
- all –> 各帧视频的 DTS
PTS = DTS + ∆
Sample index | Sample duration | Sample pts |
---|---|---|
0 | 1 | 0 |
1 | 1 | 1 |
2 | 1 | 2 |
3 | 1 | 3 |
… | … | … |
900 | 1 | 900 |
moov->trak->mida->minf->stbl->stss
- Number of entries –> 该视频共有几个关键帧
- ~ –> 该视频所有关键帧 id
moov->trak->mida->minf->stbl->stsz
- Number of entries –> 总帧数
- ~ –> 每一帧视频的大小
moov->trak->mdia->minf->stbl->stsc moov->trak->mida->minf->stbl->stco moov->trak->mida->minf->stbl->stsz
- ~ –> 计算出每个 sample 的位置