OpenCV & Python Multithreading - Seeking within a VideoCapture Object

link之家
链接快照平台
输入网页链接，自动生成快照
标签化管理网页链接
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I've been working on a python application which uses OpenCV to read frames from a video and create a composite of the "activity", i.e. the things that have changed from one frame to the next. To do that, I only really want to check one frame per second or so.
For a long time I've been using the following code (simplified, with some error checking, classes, etc removed for brevity) to get the video object and the first frame:
video_capture = cv2.VideoCapture(video_fullpath)
this_frame = get_frame(0)
def get_frame(time):
    video_capture.set(cv2.CAP_PROP_POS_MSEC, time)
    capture_success, this_frame = video_capture.read()
    return this_frame
The process of getting subsequent frames, using the latter two lines of code above, is really slow.  On a 2015 MacBook Pro it takes 0.3-0.4s to get each frame (at 1sec intervals in the video, which is a ~100MB .mp4 video file).  By comparison, the rest of my operations, which are comparing each frame to its predecessor, are very quick - typically less than 0.01s.
I've therefore been looking at multi-threading, but I'm struggling.
I can get multi-threading working on a "lookahead" basis, i.e. whilst I'm processing one frame I can be getting the next one.  And once I'm done processing the previous frame, I'll wait for the "lookahead" operation to finish before continuing.  I do that with the following code:
while True:
    this_frame, next_frame_thread = get_frame_async(prev_frame.time + time_increment)
    << do processing of this_frame ... >>
    next_frame_thread.join()
def get_frame_async(time):
    if time not in frames:
        frames[time] = get_frame(time)
    next_frame_thread = Thread(target=get_frame, args=(time,))
    next_frame_thread.start()
    return frames[time], next_frame_thread
The above seems to be working, but because the seeking operation is so slow compared to everything else it doesn't actually save much time - in fact it's difficult to see any benefit at all.
I then wondered whether I could be getting multiple frames in parallel.  However, whenever I try I get a range of errors, mostly related to async_lock (e.g. Assertion fctx->async_lock failed at libavcodec/pthread_frame.c:155).  I wonder whether this is simply that an OpenCV VideoCapture object can't seek to multiple places at once... which would seem reasonable.  But if that's true, is there any way to speed this operation up significantly?
I've been using a few different sources, including this one https://nrsyed.com/2018/07/05/multithreading-with-opencv-python-to-improve-video-processing-performance/ which shows huge speed-ups, but I'm struggling with why I'm getting these errors around async_lock.  Is it just the seek operation?  I can't find any examples of multithreading whilst seeking around the video - just example of people reading all frames sequentially.
Any tips or guidance on where / which parts are most likely to benefit from multithreading (or another approach) would be most welcome.  This is my first attempt at multithreading, so completely accept I might have missed something obvious!  Based on this page (https://www.toptal.com/python/beginners-guide-to-concurrency-and-parallelism-in-python), I was a bit overwhelmed by the range of different options available.
Thanks!
                How about dividing the process into 2 parts: (1) converting video frames into image files, (2) process the images? I think it's easier to parallelize these processes
– Yohanes Gultom
                Oct 4, 2018 at 22:13
                Try reading consecutive frames from the video as fast as you can without any positioning at all and you will see that is very fast indeed. I am suggesting you read all the frames but discard say 29 out of 30 if your video is 30 fps and you only want 1 fps.
– Mark Setchell
                Oct 4, 2018 at 22:25
                Do you know if it's any faster to shoot to a specific frame instead of a specific time? I.e. POS_FRAMES instead of POS_MSEC? Also, to piggyback on @MarkSetchell's idea, you could only read() when you actually want the frame, and otherwise you could just grab(), which won't spend the time to decode the frame (since you're just going to toss it anyways). Or in the same vein just grab() all frames and then only retrieve() the grabbed frame every however many frames.
– alkasm
                Oct 5, 2018 at 2:46
                Thanks for the comments.  I'll try a few variations out this afternoon and see what the timing looks like for each.  I'll report back my results.
– DaveWalker
                Oct 6, 2018 at 10:21
Based on the comments on the original question I've done some testing and thought it worth sharing the (interesting) results.  Big savings potential for anyone using OpenCV's VideoCapture.set(CAP_PROP_POS_MSEC) or VideoCapture.set(CAP_PROP_POS_FRAMES).
I've done some profiling comparing three options:
1. GET FRAMES BY SEEKING TO TIME:
frames = {}
def get_all_frames_by_ms(time):
    while True:
        video_capture.set(cv2.CAP_PROP_POS_MSEC, time)
        capture_success, frames[time] = video_capture.read()
        if not capture_success:
            break
        time += 1000
2. GET FRAMES BY SEEKING TO FRAME NUMBER:
frames = {}
def get_all_frames_by_frame(time):
    while True:
        # Note my test video is 12.333 FPS, and time is in milliseconds
        video_capture.set(cv2.CAP_PROP_POS_FRAMES, int(time/1000*12.333))
        capture_success, frames[time] = video_capture.read()
        if not capture_success:
            break
        time += 1000
3. GET FRAMES BY GRABBING ALL, BUT RETRIEVING ONLY ONES I WANT:
def get_all_frames_in_order():
    prev_time = -1
    while True:
        grabbed = video_capture.grab()
        if grabbed:
            time_s = video_capture.get(cv2.CAP_PROP_POS_MSEC) / 1000
            if int(time_s) > int(prev_time):
                # Only retrieve and save the first frame in each new second
                self.frames[int(time_s)] = video_capture.retrieve()
            prev_time = time_s
        else:
            break
Running through those three approaches, the timings (from three runs of each) are as follows:
33.78s    29.65s    29.24s
31.95s    29.16s    28.35s
11.81s    10.76s    11.73s
In each case it's saving 100 frames at 1sec intervals into a dictionary, where each frame is a 3072x1728 image, from a .mp4 video file.  All on a 2015 MacBookPro with 2.9 GHz Intel Core i5 and 8GB RAM.
Conclusions so far... if you're interested in retrieving only some frames from a video, then very worth looking at running through all frames in order and grabbing them all, but only retrieving those you're interested in - as an alternative to reading (which grabs and retrieves in one go).  Gave me an almost 3x speedup.
I've also re-looked at multi-threading on this basis.  I've got two test processes - one that gets the frames, and another that processes them once they're available:
frames = {}
def get_all_frames_in_order():
    prev_time = -1
    while True:
        grabbed = video_capture.grab()
        if grabbed:
            time_s = video_capture.get(cv2.CAP_PROP_POS_MSEC) / 1000
            if int(time_s) > int(prev_time):
                # Only retrieve and save the first frame in each new second
                frames[int(time_s)] = video_capture.retrieve()
            prev_time = time_s
        else:
            break
def process_all_frames_as_available(processing_time):
    prev_time = 0
    while True:
        this_time = prev_time + 1000
        if this_time in frames and prev_time in frames:
            # Dummy processing loop - just sleeps for specified time
            sleep(processing_time)
            prev_time += self.time_increment
            if prev_time + self.time_increment > video_duration:
                break
        else:
            # If the frames aren't ready yet, wait a short time before trying again
            sleep(0.02)
For this testing, I then called them either one after the other (sequentially, single threaded), or with the following muti-threaded code:
get_frames_thread = Thread(target=get_all_frames_in_order)
get_frames_thread.start()
process_frames_thread = Thread(target=process_all_frames_as_available, args=(0.02,))
process_frames_thread.start()
get_frames_thread.join()
process_frames_thread.join()
Based on that, I'm now happy that multi-threading is working effectively and saving a significant amount of time.  I generated timings for the two functions above separately, and then together in both single-threaded and multi-threaded modes.  The results are below (number in bracket is the time in seconds that the 'processing' for each frame takes, which in this case is just a dummy / delay):
get_all_frames_in_order - 2.99s
Process time = 0.02s per frame:
process_all_frames_as_available - 0.97s
single-threaded - 3.99s
multi-threaded - 3.28s
Process time = 0.1s per frame:
process_all_frames_as_available - 4.31s
single-threaded - 7.35s
multi-threaded - 4.46s
Process time = 0.2s per frame:
process_all_frames_as_available - 8.52s
single-threaded - 11.58s
multi-threaded - 8.62s
As you can hopefully see, the multi-threading results are very good.  Essentially, it takes just ~0.2s longer to do both functions in parallel than the slower of the two functions running entirely separately.
Hope that helps someone!
                I think you have a msec-sec bug in your code. You put seconds into frames and you increment this_time = prev_time + 1000 and use this_time in frames.
– Gazihan Alankus
                Sep 5, 2019 at 11:27
Coincidentally, I've worked on a similar problem, and I have created a python library (more of a thin wrapper) for reading videos. The library is called mydia.
The library does not use OpenCV. It uses FFmpeg as the backend for reading and processing videos.
mydia supports custom frame selection, frame resizing, grayscale conversion and much more. The documentation can be viewed here
So, if you want to select N frames per second (where N = 1 in your case), the following code would do it:
import numpy as np
from mydia import Videos
video_path = "path/to/video"
def select_frames(total_frames, num_frames, fps, *args):
    """This function will return the indices of the frames to be captured"""
    N = 1
    t = np.arange(total_frames)
    f = np.arange(num_frames)
    mask = np.resize(f, total_frames)
    return t[mask < N][:num_frames].tolist()
# Let's assume that the duration of your video is 120 seconds
# and you want 1 frame for each second 
# (therefore, setting `num_frames` to 120)
reader = Videos(num_frames=120, mode=select_frames)
video = reader.read(video_path)  # A video tensor/array
The best part is that internally, only those frames that are required are read, and therefore the process is much faster (which is what I believe you are looking for).
The installation of mydia is extremely simple and can be viewed here.
This might have a slight learning curve, but I believe that it is exactly what you are looking for.
Moreover, if you have multiple videos, you could use multiple workers for reading them in parallel. For instance:
from mydia import Videos
path = "path/to/video"
reader = Videos()
video = reader.read(path, workers=4)
Depending on your CPU, this could give you a significant speed-up.
Hope this helps !!
I'd like to comment your performance results. No reputation points so I have to write an answer :)
So by your experiments it seems that reading all frames and picking only interesting ones is the fastest.

This might be strange but the reason is that you're reading encoded file (propably h264 or other codec). The nature of encoded video (in opposition to raw frames) is that some of the frames are not fully stored but stored as a diff between the particular frame and previous (and sometimes next!) frames.  So to get the particular frame decoder must gather some nearby frames and decode them prior to decode the desired frame.
While getting all frames one by one makes it very optimized and fast because decoder expects all frames to be decoded one by one and holds all required data in nearby memory.
If you do your experiments with a raw video, where all frames are not tied together your results may vary significantly - suddenly accessing a random frame might be with the same speed or even faster than getting all frames as you skip some of them.
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.