添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接


OpenCV4 :并行计算cv::parallel_for_

在计算机视觉和图像处理领域,OpenCV(开源计算机视觉库)是一个非常强大和广泛使用的库。随着图像分辨率的提高和计算任务的复杂度增加,实时处理变得越来越困难。为了解决这个问题,OpenCV提供了并行处理能力,可以显著提高代码的性能。本文将介绍如何利用OpenCV的并行处理能力来优化图像处理任务。

OpenCV的并行框架

OpenCV自2.4版本以来就提供了一个并行框架,允许在多个核心或处理器上并行执行代码。该框架提供了一种简单且高效的方式来编写可以利用多核处理器的代码。OpenCV4继续沿用并扩展了这个并行框架,增加了对新硬件和平台的支持。

官方文档中的 并行框架教程 为我们提供了详细的指南和示例代码,说明了如何使用OpenCV的 cv::parallel_for_ 函数。

cv::parallel_for_ 函数

cv::parallel_for_ 函数是OpenCV并行框架的核心。该函数允许我们并行执行循环,每个循环迭代可以在不同的线程上执行。 cv::parallel_for_ 函数接受一个 cv::Range 对象和一个实现了 cv::ParallelLoopBody 接口的对象。

cv::parallel_for_(cv::Range(0, count), MyParallelLoopBody());

其中, MyParallelLoopBody 需要实现 cv::ParallelLoopBody 接口的 virtual void operator()(const cv::Range& range) const 方法。

并行卷积示例

我们创建了两个并行卷积类: parallelConv parallelConvByRow ,它们都继承了 cv::ParallelLoopBody 接口。 parallelConv 类按图像的每个像素并行执行卷积,而 parallelConvByRow 类则按图像的每行并行执行卷积。

parallelConv

parallelConv 类的构造函数接受源图像、目标图像和卷积核作为参数。它还计算了卷积核的半径,并为源图像添加了边框以处理边界像素。

class parallelConv : public cv::ParallelLoopBody
private:
	Mat m_src;
	Mat& m_dst;
	Mat m_kernel;
	int sz;
public:
	parallelConv(Mat src, Mat& dst, Mat kernel): m_src(src), m_dst(dst), m_kernel(kernel), sz(kernel.rows / 2)
		cv::copyMakeBorder(src, m_src, sz, sz, sz, sz, cv::BORDER_REPLICATE);
	virtual void operator()(const cv::Range& range) const override
		for (int r = range.start; r < range.end; ++r)
			auto [i, j] = std::div(r, m_dst.cols);
			double value = 0;
			for (int k = -sz; k <= sz; ++k)
				auto sptr = m_src.ptr(i + sz + k);
				for (int l = -sz; l <= sz; ++l)
					value += m_kernel.at<double>(k + sz, l + sz) * sptr[j + sz + l];
			m_dst.at<uchar>(i, j) = cv::saturate_cast<uchar>(value);
};

operator() 方法中,我们遍历了指定范围内的所有像素,并为每个像素执行卷积操作。

parallelConvByRow

parallelConv 类类似, parallelConvByRow 类也接受源图像、目标图像和卷积核作为参数,并为源图像添加了边框。

class parallelConvByRow : public cv::ParallelLoopBody
private:
	Mat m_src;
	Mat& m_dst;
	Mat m_kernel;
	int sz;
	int cols;
public:
	parallelConvByRow(Mat src, Mat& dst, Mat kernel)
		: m_src(src), m_dst(dst), m_kernel(kernel), sz(kernel.rows / 2), cols(src.cols)
		cv::copyMakeBorder(src, m_src, sz, sz, sz, sz, cv::BORDER_REPLICATE);
	virtual void operator()(const cv::Range& range) const override
		for (int i = range.start; i < range.end; ++i)
			if (i >= m_dst.rows)
				continue;
			auto dptr = m_dst.ptr<uchar>(i);
			for (int j = 0; j < cols; ++j)
				double value = 0;
				for (int k = -sz; k <= sz; ++k)
					auto sptr = m_src.ptr(i + sz + k);
					for (int l = -sz; l <= sz; ++l)
						value += m_kernel.at<double>(k + sz, l + sz) * sptr[j + sz + l];
				dptr[j] = cv::saturate_cast<uchar>(value);
};

operator() 方法中,我们遍历了指定范围内的所有行,并为每行的每个像素执行卷积操作。

性能比较

通过比较顺序卷积和两种并行卷积的执行时间,我们可以看到并行卷积显著提高了性能。尤其是在处理大图像或使用大卷积核时,这种性能提升尤为明显。

// 非并行方法
	auto start_seq = std::chrono::high_resolution_clock::now();
	seqConv(src, dst_seq, kernel);
	auto end_seq = std::chrono::high_resolution_clock::now();
	std::chrono::duration<double> diff_seq = end_seq - start_seq;
	std::cout << "Time taken by sequential method: " << diff_seq.count() << " s" << std::endl;
	// 方法 1:整体遍历
	auto start1 = std::chrono::high_resolution_clock::now();
	parallelConv obj1(src, dst1, kernel);
	cv::parallel_for_(cv::Range(0, src.rows * src.cols), obj1);
	auto end1 = std::chrono::high_resolution_clock::now();
	std::chrono::duration<double> diff1 = end1 - start1;
	std::cout << "Time taken by whole image traversal: " << diff1.count() << " s" << std::endl;
	// 方法 2:按行遍历
	auto start2 = std::chrono::high_resolution_clock::now();
	parallelConvByRow obj2(src, dst2, kernel);
	cv::parallel_for_(cv::Range(0, src.rows), obj2);
	auto end2 = std::chrono::high_resolution_clock::now();
	std::chrono::duration<double> diff2 = end2 - start2;
	std::cout << "Time taken by row-by-row traversal: " << diff2.count() << " s" << std::endl;
Time taken by sequential method: 0.308864 s
Time taken by whole image traversal: 0.2328 s
Time taken by row-by-row traversal: 0.169044 s

OpenCV4 :并行计算cv::parallel_for__Time

OpenCV4 :并行计算cv::parallel_for__opencv_02

完整代码

#include <iostream>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <chrono>
#include <span>
using cv::Mat;
void seqConv(Mat src, Mat& dst, Mat kernel)
	const int rows = src.rows, cols = src.cols;
	dst = Mat(rows, cols, src.type());
	int sz = kernel.rows / 2;
	Mat src_padded;
	cv::copyMakeBorder(src, src_padded, sz, sz, sz, sz, CV_HAL_BORDER_REPLICATE);
	for (int i = 0; i < rows; ++i)
		auto dptr = dst.ptr<uchar>(i);
		for (int j = 0; j < cols; ++j)
			double value = 0;
			for (int k = -sz; k <= sz; ++k)
				auto sptr = src_padded.ptr<uchar>(i + sz + k);
				for (int l = -sz; l <= sz; ++l)
					value += kernel.ptr<double>(k + sz)[l + sz] * sptr[j + sz + l];
			dptr[j] = cv::saturate_cast<uchar>(value);
class parallelConv : public cv::ParallelLoopBody
private:
	Mat m_src;
	Mat& m_dst;
	Mat m_kernel;
	int sz;
public:
	parallelConv(Mat src, Mat& dst, Mat kernel): m_src(src), m_dst(dst), m_kernel(kernel), sz(kernel.rows / 2)
		cv::copyMakeBorder(src, m_src, sz, sz, sz, sz, cv::BORDER_REPLICATE);
	virtual void operator()(const cv::Range& range) const override
		for (int r = range.start; r < range.end; ++r)
			auto [i, j] = std::div(r, m_dst.cols);
			double value = 0;
			for (int k = -sz; k <= sz; ++k)
				auto sptr = m_src.ptr(i + sz + k);
				for (int l = -sz; l <= sz; ++l)
					value += m_kernel.at<double>(k + sz, l + sz) * sptr[j + sz + l];
			m_dst.at<uchar>(i, j) = cv::saturate_cast<uchar>(value);
class parallelConvByRow : public cv::ParallelLoopBody
private:
	Mat m_src;
	Mat& m_dst;
	Mat m_kernel;
	int sz;
	int cols;
public:
	parallelConvByRow(Mat src, Mat& dst, Mat kernel)
		: m_src(src), m_dst(dst), m_kernel(kernel), sz(kernel.rows / 2), cols(src.cols)
		cv::copyMakeBorder(src, m_src, sz, sz, sz, sz, cv::BORDER_REPLICATE);
	virtual void operator()(const cv::Range& range) const override
		for (int i = range.start; i < range.end; ++i)
			if (i >= m_dst.rows)
				continue;
			auto dptr = m_dst.ptr<uchar>(i);
			for (int j = 0; j < cols; ++j)
				double value = 0;
				for (int k = -sz; k <= sz; ++k)
					auto sptr = m_src.ptr(i + sz + k);
					for (int l = -sz; l <= sz; ++l)
						value += m_kernel.at<double>(k + sz, l + sz) * sptr[j + sz + l];
				dptr[j] = cv::saturate_cast<uchar>(value);
int main(int argc, char* argv[])
	cv::setNumThreads(4);
	Mat src = cv::imread(R"(C:\4.jpg)", cv::IMREAD_GRAYSCALE); // 读取灰度图像
	if (src.empty())
		std::cerr << "Could not read the image!" << std::endl;
		return 1;
	Mat kernel = (cv::Mat_<double>(7, 7) << 0, 0, 0, 0, 0, 0, 0,
		0, 0, -1, -1, -1, 0, 0,
		0, -1, -1, -1, -1, -1, 0,
		0, -1, -1, 24, -1, -1, 0,
		0, -1, -1, -1, -1, -1, 0,
		0, 0, -1, -1, -1, 0, 0,
		0, 0, 0, 0, 0, 0, 0);
	Mat dst1, dst2, dst_seq;
	dst1 = Mat::zeros(src.size(), src.type());
	dst2 = Mat::zeros(src.size(), src.type());
	parallelConv obj(src, dst1, kernel);
	cv::parallel_for_(cv::Range(0, src.rows * src.cols), obj);
	// 非并行方法
	auto start_seq = std::chrono::high_resolution_clock::now();
	seqConv(src, dst_seq, kernel);
	auto end_seq = std::chrono::high_resolution_clock::now();
	std::chrono::duration<double> diff_seq = end_seq - start_seq;
	std::cout << "Time taken by sequential method: " << diff_seq.count() << " s" << std::endl;
	// 方法 1:整体遍历
	auto start1 = std::chrono::high_resolution_clock::now();
	parallelConv obj1(src, dst1, kernel);
	cv::parallel_for_(cv::Range(0, src.rows * src.cols), obj1);
	auto end1 = std::chrono::high_resolution_clock::now();
	std::chrono::duration<double> diff1 = end1 - start1;
	std::cout << "Time taken by whole image traversal: " << diff1.count() << " s" << std::endl;
	// 方法 2:按行遍历
	auto start2 = std::chrono::high_resolution_clock::now();
	parallelConvByRow obj2(src, dst2, kernel);
	cv::parallel_for_(cv::Range(0, src.rows), obj2);
	auto end2 = std::chrono::high_resolution_clock::now();
	std::chrono::duration<double> diff2 = end2 - start2;
	std::cout << "Time taken by row-by-row traversal: " << diff2.count() << " s" << std::endl;
	cv::imshow("Original Image", src);
	cv::imshow("Sequential Method", dst_seq);
	cv::imshow("Whole Image Traversal", dst1);
	cv::imshow("Row-by-Row Traversal", dst2);
	cv::waitKey(0);
	return 0;
	return 0;
}

公众号:coding日记


2. 在github或者码云新建仓库/项目 3. 在本地创建一个空文件夹用来存放项目,最好使用英文名 4. 进入文件夹,在文件夹根目录,右键git bash here ,git init 【初始化一个本地.git仓库】 5. git config --global user.name '码云/github/gitlab等用户名'