OpenCV使用VideoCapture处理视频时丢失最后一些帧

晚星

1. 问题描述

之前处理视频数据集都是抽帧后再读取图片的方法，最近想直接从视频中读，并且发现有时这样速度更快（pytorch的dataloader相同参数，UCF101加载一个epoch从7.4min降到了2.8min），但在使用opencv-python库处理视频的时候，用cv2.VideoCapture.get()得到的总帧数与实际按帧能读取得到的帧数不相同，这就导致最后一部分帧并不能读取成功。以UCF101中某个视频测试，示例如下

capture = cv2.VideoCapture('/home/data/ucf101/video/GolfSwing/v_GolfSwing_g22_c02.avi')
len_video = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
print('num of video frames :', len_video)
n = 0
lose = []
for i in range(len_video):
    ret, frame = capture.read()
    if ret:
        n += 1
    else:
        lose.append(i)
print('index :{}, read: {}'.format(i, n))
print(lose)
------------------------- print ---------------------------
num of video frames : 300
index :299, read: 244
[244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 
263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 
282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299]

可以看到，这个视频一共有300帧，但是cv2只能正确读取到前244帧，后面的几十帧都丢失了，实际上在UCF101中有很多视频最后几帧或十几帧也是如此。

同时，使用VideoCapture.set()也会得到类似的错误，接上一段。

flag = capture.set(cv2.CAP_PROP_POS_FRAMES, 294)
ret, frame = capture.read()
print(flag, ret, frame)
------------------------- print ---------------------------
True False None

在Github上opencv项目的issue中找到了类似的问题，而这个issue已经提出了两年多仍没有被解决...

这其中有个老哥提出了他的看法

It seems to me that the problem is that VideoCapture ignores all duplicate frames, while FFmpeg will return duplicates without issues. In consequence, when calling VideoCapture::set with CAP_PROP_POS_FRAMES you will get a frame offset by something approximate (or likely equal) to the number of duplicate frames in the video file, or the number of duplicate frames in video file up to the frame you've tried to set to..

他认为VideoCapture会忽略掉重复的帧，而FFmpeg则会毫无问题的返回全部帧。因此在调用VideoCapture时实际得到的帧数会近似等于总帧数减去重复帧。的确，使用ffmpeg对这个视频抽帧可以得到300帧图片，而其中有大量连续重复的帧。这在整个数据集中也是很常见的。

2. 解决方法

在StackOverflow里面有许多类似的问题

其中大多数做法是用时间而不是帧数来定位，即在set()中使用CV_CAP_PROP_POS_MSEC来设置，这也类似于torchvision中videoio的方法。

而我自己则是决定改用skvideo来读取视频，skvideo.io.vread(path)将整个视频按numpy读取，不会出现上述问题，这个函数适用于小的视频片段。如果是大视频逐帧读取会更快，可以使用skvideo.io.vreader(path)

v = skvideo.io.vread('/home/wx/data/ucf101/video/GolfSwing/v_GolfSwing_g22_c02.avi')
print(v.shape)
image = Image.fromarray(v[294])
print(image)
------------------------- print ---------------------------
(300, 240, 320, 3)
<PIL.Image.Image image mode=RGB size=320x240 at 0x7F50345543C8>

教程在下方，使用pip install sk-video下载

更新：skvideo读取视频出现错误

RuntimeError: Traceback (most recent call last):
  File "/home/praateek/.local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 40, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "<ipython-input-10-a53d73bbb9a6>", line 32, in __getitem__
    vid = skvideo.io.vread(self.__xs[index])
  File "/usr/local/lib/python2.7/dist-packages/skvideo/io/io.py", line 148, in vread
    for idx, frame in enumerate(reader.nextFrame()):
  File "/usr/local/lib/python2.7/dist-packages/skvideo/io/ffmpeg.py", line 293, in nextFrame