MP4 文件格式解析

2021年5月25日 · 3065 字 · 7 分钟

参考文档:

MP4 文件格式文档里已经非常详细了。

本文通过解析一个实际的 mp4 文件,简单看看 MP4 文件中的各个 Atom/box

ftyp 0x00000000 -> 0x0000001F

00 00 00 20    | Size              | 32
66 74 79 70    | Type              | 'ftyp'
6D 70 34 32    | Major_Brand       | 'mp42'
00 00 00 00    | Mainor_Version    |  
6D 70 34 32    | Compatible_Brands | 'mp42'
6D 70 34 31    | Compatible_Brands | 'mp41'
69 73 6F 6D    | Compatible_Brands | 'isom'
61 76 63 31    | Compatible_Brands | 'avc1'
## moov 
[0x00000020 -> 0x00004A3D]
00 00 4A 1E    | Size              | 18974
6D 6F 6F 76    | Type              | 'moov'
...........
...........
## moov->mvhd 
[0x00000028 -> 0x00000093]
00 00 00 6C    4| Size              | 108
6D 76 68 64    4| Type              | 'mvhd'
00             1| Version           |
   00 00 00    3| Flags             |
D1 EA 27 21    4| Creation time     | 
D1 EA 27 21    4| Modification time |
00 00 02 58    4| Time scale        |
00 00 47 8C    4| Duration          |
00 01 00 00    4| Preferred rate    |
01 00          2| Prefereed volume  |
      00 00   10| Reserved 
00 00 00 00     | Reserved 
00 00 00 00     | Reserved 
00 01 00 00   36| Matrix structure 1
00 00 00 00     | Matrix structure 2
00 00 00 00     | Matrix structure 3
00 00 00 00     | Matrix structure 4
00 01 00 00     | Matrix structure 5
00 00 00 00     | Matrix structure 6
00 00 00 00     | Matrix structure 7
00 00 00 00     | Matrix structure 8
40 00 00 00     | Matrix structure 9
00 00 00 00    4| Preview time
00 00 00 00    4| Preview duration 
00 00 00 00    4| Poster time
00 00 00 00    4| Selection time
00 00 00 00    4| Selection duration
00 00 00 00    4| Current time
00 00 00 03    4| Next track ID
## moov->iods
[0x00000094 -> 0x000000BD]
00 00 00 2A    | Size              | 0x2A
69 6F 64 73    | Type              | 'moov'
00 00 00 00
10 80 80 80
19 00 4F FF
FF 29 7F FF
0E 80 80 80
04 00 00 00
01 0E 80 80
80 04 00 00
00 02      
## moov->trak  [container atom]
[0x000000BE -> 0x00003170]
00 00 30 B3    | Size              | 12467
74 72 61 6B    | Type              | 'trak'    
...........
...........
## moov->trak->tkhd  [container atom]
00 00 00 5C    4| Size              | 92
74 6B 68 64    4| Type              | 'tkhd'    
00             1| Version
   00 00 07    3| Flags 
D1 EA 27 21    4| Creation time
D1 EA 27 21    4| Modification time
00 00 00 01    4| Track ID
00 00 00 00    4| Reserved
00 00 46 64    4| Duration
00 00 00 00    8| Reserved 1
00 00 00 00     | Reserved 2
00 00          2| Layer         
      00 00    2| Alternate group
00 00          2| Volume
      00 00    2| Reserved
00 01 00 00   36| Matrix structure
00 00 00 00     | Matrix structure
00 00 00 00     | Matrix structure
00 00 00 00     | Matrix structure
00 01 00 00     | Matrix structure
00 00 00 00     | Matrix structure
00 00 00 00     | Matrix structure
00 00 00 00     | Matrix structure
40 00 00 00     | Matrix structure
02 80 00 00     | Track width       |  0x02800000 >> 16 = 0x0280 = 640
01 68 00 00     | Track height      |  0x01680000 >> 16 = 0x0168 = 360
## moov->trak->edts
## moov->trak->edts->elst
00 00 00 24    4| Size              | 36
65 64 74 73    4| Type              | 'edts'
00 00 00 1C    4| Size              | 28
65 6C 73 74    4| Type              | 'elst'
00             1| Version
   00 00 00    3| Flags
00 00 00 01    4| Number of entries
00 00 46 64    4| Track duration    |
00 00 00 02    4| Media time        |
00 01 00 00    4| Media rate
## moov->trak->mdia [container atom]
00 00 30 2B    4| Size              | 12331
6D 64 69 61    4| Type              | 'mdia'
## moov->trak->mdia->mdhd
00 00 00 20    4| Size              | 32
6D 64 68 64    4| Type              | 'mdhd'
00             1| Versioin
   00 00 00    3| Flags
D1 EA 27 21    4| Creation time
D1 EA 27 21    4| Modification time
00 00 00 1E    4| Time scale        | 
00 00 03 85    4| Duration          | 
55 C4          2| Language          | (https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap4/qtff4.html#//apple_ref/doc/uid/TP40000939-CH206-27005)
      00 00    2| Quality
## moov->trak->mdia->hdlr
00 00 00 36    4| Size              | 54
68 64 6C 72    4| Type              | 'hdlr'
00             1| Version
   00 00 00    3| Flags
00 00 00 00    4| Component type    |
76 69 64 65    4| Component subtype
00 00 00 00    4| Component manufacturer
00 00 00 00    4| Component flags
00 00 00 00    4| Component flags masg
4C 2D 53 4D    ~| 'L-SM' 
41 53 48 20     | 'ASH '
56 69 64 65     | 'Vide'
6F 20 48 61     | 'o Ha'
6E 64 6C 65     | 'ndle'
72 00           | 'r\0'
## moov->trak->mdia->minf [container atom]
00 00 2F CD   4| Size              | 12337
6D 69 6E 66   4| Type              | 'minf'
...........
## moov->trak->mdia->minf->vmhd
00 00 00 14   4| Size              | 20
76 6D 68 64   4| Type              | 'vmhd'
00            1| Version
   00 00 01   3| Flags
00 00         2| Graphics mode
      00 00   6| Opcolor
00 00 00 00
## moov->trak->mdia->minf->dinf [container atom]
## moov->trak->mdia->minf->dinf->dref
00 00 00 24   4| Size              | 36
64 69 6E 66   4| Type              | 'dinf'
00 00 00 1C   4| Size              | 28
64 72 65 66   4| Type              | 'dref'
00            1| Version
   00 00 00   3| Flags
00 00 00 01   4| Number of entries
00 00 00 0C   ~|
75 72 6C 20    |
00 00 00 01    |
## moov->trak->mdia->minf->stbl [container atom]
00 00 2F 8D   4| Size             | 12173
73 74 62 6C   4| Type             | 'stbl'

Sample Description Atoms

## moov->trak->mdia->minf->stbl->stsd
00 00 00 AC   4| Size             | 172
73 74 73 64   4| Type             | 'stsd'
00            1| Vesion
   00 00 00   3| Flags
00 00 00 01   4| Number of entries
00 00 00 9C   ~| Size             | 156  
61 76 63 31    | Data format      | 'avc1' -> H.264 video (https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFChap3/qtff3.html#//apple_ref/doc/uid/TP40000939-CH205-SW1)
00 00 00 00   6| Reserved
00 00 
      00 01   2| Data reference index 
00 00         2| Vesion
      00 00   2| Revison level
00 00 00 00   4| Vendor
00 00 00 00   4| Temporal quality
00 00 00 00   4| Spatial quality
02 80         2| Width             | 640
      01 68   2| Height            | 360
00 48 00 00   4| Horizontal resolution ?
00 48 00 00   4| Vertical resolution ?
00 00 00 00   4| Data size
00 01         2| Frame count per samples
      0A 41  32| Compressor name  |
56 43 20 43    |
6F 64 69 6E    |
67 00 00 00    | 
00 00 00 00    | 
00 00 00 00    | 
00 00 00 00    | 
00 00 00 00    |
00 00 
      00 00   2| Depth              
FF FF         2| Color table ID
      00 00   4| Size            | 54
00 36          | Size
      61 76   4| Type 
63 43          | Type            | 'avcC'        
      01      1| Version
         64   1| AVCProfileIndication
00            1| profile compatibility
   1F         1| AVClevelIndication
      FF       | H264 中 NALU 长度 4
         E1    | sps 个数,
00 1A          | sps 长度  26
      67 64    | sps 内容
00 1F AC D9    | .......
40 A0 2F F9    | .......
70 11 00 00    | .......
03 00 01 00    | .......
00 03 00 3C    | .......
0F 18 31 96    | sps 内容    
01             | pps 个数
   00 05       | pps 长度
         68    | pps 内容 
EB EC B2 2C    | pps 内容  
FD F8 F8 00 
00 00 00 10   4| Size            | 16
70 61 73 70   4| Type            | 'pasp'
00 00 00 01 
00 00 00 01

Media Data Atom Types

Time-to-Sample Atoms

Time-to-sample atoms store duration information for a media’s samples, providing a mapping from a time in a media to the corresponding data sample. The time-to-sample atom has an atom type of ‘stts’.

## moov->trak->mdia->minf->stbl->stts
00 00 00 18    4| Size              | 24
73 74 74 73    4| Type              | 'stts'
00             1| Version           |
   00 00 00    3| Flags             |
00 00 00 01    4| Number of entries | 1
00 00 03 85    4| Sample count      | 90100 00 00 01    4| Sample duration   | ## todo 这里的 1, 表示以 time scale 为单位的 1.      1/30 (s) ~= 0.033333 秒

持续时间相同的连续sample可以放到一个entry里达到节省空间的目的。

如果前 400 各帧是 1, 后 501 各帧是 2,则表如下:

Sample Count Sample duration
400 1
501 2

Composition Offset Atom

Video sample in encoded formats have a decode order and a presentation order (also called compostion order or display order). The composition offset atom is used when there are out-of-order video samples.

## moov->trak->mdia->minf->stbl->ctts
00 00 1B E8    4| Size            | 7144
63 74 74 73    4| Type            | 'ctts'
00             1| Version
   00 00 00    3| Flags
00 00 03 7B    4| Entry count     | 891
00 00 00 01    4| Sample count    |
00 00 00 02    4| Composition offset
00 00 00 01    4| Sample count
00 00 00 05    4| Composition offset
...........
...........

Sync Sample Atoms

The sync sample atom identifies the key frames in the media. In a media that contains compressed data, key frames define starting points for portions of a temporally compressed sequence. The key frame is self-contained—that is, it is independent of preceding frames. Subsequent frames may depend on the key frame.

The sync sample atom provides a compact marking of the random access points within a stream. The table is arranged in strictly increasing order of sample number. If this table is not present, every sample is implicitly a random access point.

## moov->trak->mdia->minf->stbl->stss
00 00 00 3C    4| Size            | 60
73 74 73 73    4| Type            | 'stss'
00             1| Version
   00 00 00    3| Flags
00 00 00 0B    4| Number of entries   | 该视频共有 11 个关键帧
00 00 00 01    4| Number of sample 1  |   1
00 00 00 5B    4| Number of sample 2  |   91
00 00 00 B5    4| Number of sample 3  
00 00 01 0F    4| Number of sample 4 
00 00 01 69    4| Number of sample 5 
00 00 01 C3    4| Number of sample 6 
00 00 02 1D    4| Number of sample 7 
00 00 02 77    4| Number of sample 8 
00 00 02 D1    4| Number of sample 9 
00 00 03 2B    4| Number of sample 10 
00 00 03 85    4| Number of sample 11

Sample Dependency Flags Atom

## moov->trak->mdia->minf->stbl->sdtp
00 00 03 91    4| Size             | 913
73 64 74 70    4| Type             | 'sdtp'
00             1| Version
   00 00 00    3| Flags
A6             ~| Sample dependency flags           
   96           | 
      96        | 
         9A     |
9A              |
   96           | 
      96        | 
         9A     |
...........
...........
...........

Quick-Time format (for each video sample):

  • bit[7] - reserved to 1
  • bit[6] - if set to 1 then POC of the current frame might be greater than the POC of the next frame (the frame reordering takes place).
  • bit[5] - if I-picture set 1, otherwise 0
  • bit[4] - if not I-picture set 1, otherwise 0
  • bit[3] - if ref_idc of slice NALU is zero then set bit[3]=1, otherwise 0
  • bit[2] - if ref_idc of slice NALU is non-zero then set bit[2]=1, otherwise 0
  • bit[1] - 0 - picture is redundant, otherwise 1 (redundant pictures are highly unlikely in mp4-files, therefore this bit rarely is found 0)

MP4 Format (for each video sample):

  • bit[1:0] - set 10b , this implies that no redundant pictures present
  • bit[3:2] - set 10b if ref_idc of the current frame is 0, otherwise set 01b
  • bit[5:4] - set 10b if current frame is I-picture, otherwise set 01b

该媒体文件为 MP4 Format,所以:

[A6]  10 10 01 10       (I 帧)
              [10] -->  this implies that no redundant pictures present
           [01]------>  ref_idc != 0
        [10]--------->  I-picture     

[96]  10 01 01 10       (P 帧)
              [10] -->  this implies that no redundant pictures present
           [01]------>  ref_idc != 0
        [01]--------->  not I-picture        

[9A]  10 01 10 10       (B 帧)  
              [10] -->  this implies that no redundant pictures present
           [10]------>  ref_idc == 0
        [01]--------->  not I-picture        

Sample-to-Chunk Atoms

As sample are added to a media, they are collected into chunks that allow optimized data access. A chunk contains one or more sample. Chunks in a media may have different sizes, and the samples with a chunk may have different size. The sample-to-chunk atom stores chunk information for the samples in a media.

## moov->trak->mdia->minf->stbl->stsc
00 00 00 28     4| Size           | 40
73 74 73 63     4| Type           | 'stsc'
00              1| Version
   00 00 00     3| Flags
00 00 00 02     4| Number of entries
00 00 00 01     4| First chunk
00 00 00 1F     4| Samples per chunk
00 00 00 01     4| Sample description ID
00 00 00 1E     4| First chunk
00 00 00 02     4| Samples per chunk
00 00 00 01     4| Sample description ID
First chunk Sample per chunk Sample description ID
chunk 1 31 1
chunk 30 2 1

表明 :

  • chunk 1 -> chunk 29 , 每个 chunk 包含 31 个 sample
  • chunk 30-> chunk end, 每个 chunk 包含 2 个 sample

然后我们前面了解到的,视频总共有 901 个 sample, 可以推算出

901 = 29 * 31 + 2 * X  --> X = 1

视频流总共有 30 个 chunk。通过后文的 stco atom 也可以验证, 确实是 30 个 chunk

Sample Size Atoms

You use sample size atoms to specify the size of each sample in the media. Sample size atoms have an atom type of stsz.

## moov->trak->mdia->minf->stbl->stsz
00 00 0E 28     4| Size          |
73 74 73 7A     4| Type          | 'stsz'
00              1| Version
   00 00 00     3| Flags
00 00 00 00     4| Sample Size   
00 00 03 85     4| Number of entries
00 00 88 08     4| Size of sample 1 
00 00 17 4C     4| Size of sample 2
00 00 02 1C     4| Size of sample 3
........... 

Chunk Offset Atoms

Chunk offset atoms identify the location of each chunk of data in the media’s data stream. Chunk offset atoms have an atom type of ‘stco’.

## moov->trak->mdia->minf->stbl->stco
00 00 00 88     4| Size         | 136
73 74 63 6F     4| Type         | 'stco'
00              1| Version
   00 00 00     3| Flags
00 00 00 1E     4| Number of entries  30
00 00 4A 4E     4| Chunk 1      | 
00 02 4C 51 
00 03 73 C1 
00 05 7E 35 
00 06 D8 CB 
00 08 89 43 
00 0A B2 24 
00 0C 2D 62 
00 0D E3 A2 
00 0F CB 14 
00 11 5A E7 
00 13 22 3D 
00 14 FD 56 
00 16 93 0F 
00 18 2B F1 
00 19 EA B2 
00 1B 59 0A 
00 1C BC 7C 
00 1E 66 AC 
00 1F A1 97 
00 21 01 F8 
00 22 87 FF 
00 23 DE 28 
00 25 44 47 
00 26 D1 50 
00 28 62 BD 
00 2A 0D 22 
00 2B D3 15 
00 2D 77 4A 
00 2F 13 F0    4| Chunk 30     3085296

sgpd // todo ????

## moov->trak->mdia->minf->stbl->sgpd
00 00 00 18
73 67 70 64 
01 
   00 00 00 
72 6F 6C 6C 
00 00 00 02 
00 00 00 00

sbgp // todo ?????

## moov->trak->mdia->minf->stbl->sbgp
00 00 00 1C 
73 62 67 70 
00 
   00 00 00 
72 6F 6C 6C 
00 00 00 01 
00 00 03 85 
00 00 00 00

以上就是视频流相关的 atmos 了。

moov->trak->mdia->mdhd

  • TimeScale 视频 time scale/ time base 30
  • Duration 视频时长 901

moov->trak->mdia->minf->stbl->stsd

  • Data format –> ‘avc1’ –> H.264
  • Width –> 640
  • Height –> 360

moov->trak->mdia->minf->stbl->stts

  • Number of entries –> 总帧数 901 帧
  • Sample count
  • Sample duartion –> 各帧视频帧的时长 –> 可以计算出各帧的 PTS
Sample index Sample duration Sample pts
0 1 0
1 1 1
2 1 2
3 1 3
900 1 900

最后一个帧 pts=900, time scale(/base) = 30, 可以计算出, duration = 30(s)

moov->trak->mdia->minf->stbl->ctts

  • all –> 各帧视频的 DTS

PTS = DTS + ∆

Sample index Sample duration Sample pts
0 1 0
1 1 1
2 1 2
3 1 3
900 1 900

moov->trak->mida->minf->stbl->stss

  • Number of entries –> 该视频共有几个关键帧
  • ~ –> 该视频所有关键帧 id

moov->trak->mida->minf->stbl->stsz

  • Number of entries –> 总帧数
  • ~ –> 每一帧视频的大小

moov->trak->mdia->minf->stbl->stsc moov->trak->mida->minf->stbl->stco moov->trak->mida->minf->stbl->stsz

  • ~ –> 计算出每个 sample 的位置