添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
C++ std::accumulate 与 std::reduce

C++ std::accumulate 与 std::reduce

cppreference 上有测试用例,不过我并没有跑出这上面说的效果,可能与 CPU 型号和编译选项有关吧。

不同编译器的优化效果也不一样,不过整体趋势还是一致的。

$ g++ -std=c++17 -O3 -march=native reduce.cpp -o reduce && ./reduce
std::accumulate (double) result 10000000.7 took 192.5 ms
std::reduce (seq, double) result 10000000.7 took 72.7 ms
std::reduce (par, double) result 10000000.7 took 72.3 ms
std::accumulate (long) result 100000007 took 37.1 ms
std::reduce (seq, long) result 100000007 took 45.8 ms
std::reduce (par, long) result 100000007 took 46.4 ms

Clang std::accumulate 的优化效果更佳

$ clang++ -std=c++17 -O3 -march=native reduce.cpp -o reduce && ./reduce
std::accumulate (double) result 10000000.7 took 90.4 ms
std::reduce (seq, double) result 10000000.7 took 45.1 ms
std::reduce (par, double) result 10000000.7 took 46.3 ms
std::accumulate (long) result 100000007 took 36.4 ms
std::reduce (seq, long) result 100000007 took 49.4 ms
std::reduce (par, long) result 100000007 took 49.5 ms

std::execution::seq std::execution::par 不管是 GCC 还是 Clang 都没有差别。

日常项目中用的最多的还是 int 和 float,加上这两种类型测试后我更加迷茫了

$ g++ -std=c++17 -O3 -march=native reduce.cpp -o reduce && ./reduce
std::accumulate (double) result 10000000.7 took 192.3 ms
std::reduce (seq, double) result 10000000.7 took 72.8 ms
std::reduce (par, double) result 10000000.7 took 72.2 ms
std::accumulate (float) result 10000000.8 took 279.4 ms
std::reduce (seq, float) result 8388608.0 took 70.1 ms
std::reduce (par, float) result 8388608.0 took 70.4 ms
std::accumulate (long) result 100000007 took 40.5 ms
std::reduce (seq, long) result 100000007 took 54.9 ms
std::reduce (par, long) result 100000007 took 54.9 ms
std::accumulate (int) result 100000007 took 20.7 ms
std::reduce (seq, int) result 100000007 took 27.5 ms
std::reduce (par, int) result 100000007 took 28.7 ms

GCC 对 float 的处理这么差劲吗,相较来说 Clang 就要好多了,但是 GCC 对 int 类型的优化又要优于 Clang

$ clang++ -std=c++17 -O3 -march=native reduce.cpp -o reduce && ./reduce
std::accumulate (double) result 10000000.7 took 89.9 ms
std::reduce (seq, double) result 10000000.7 took 45.4 ms
std::reduce (par, double) result 10000000.7 took 47.3 ms
std::accumulate (float) result 10000000.8 took 83.8 ms
std::reduce (seq, float) result 8388608.0 took 27.9 ms
std::reduce (par, float) result 8388608.0 took 27.7 ms
std::accumulate (long) result 100000007 took 36.9 ms
std::reduce (seq, long) result 100000007 took 47.0 ms
std::reduce (par, long) result 100000007 took 47.5 ms
std::accumulate (int) result 100000007 took 18.6 ms
std::reduce (seq, int) result 100000007 took 45.4 ms
std::reduce (par, int) result 100000007 took 46.3 ms

还是需要一个基准,那就手写一个循环求和吧

template<typename T>
T type_sum(const T *data, size_t size) {
    T sum{};