C++ std::accumulate 与 std::reduce
cppreference 上有测试用例,不过我并没有跑出这上面说的效果,可能与 CPU 型号和编译选项有关吧。
不同编译器的优化效果也不一样,不过整体趋势还是一致的。
$ g++ -std=c++17 -O3 -march=native reduce.cpp -o reduce && ./reduce
std::accumulate (double) result 10000000.7 took 192.5 ms
std::reduce (seq, double) result 10000000.7 took 72.7 ms
std::reduce (par, double) result 10000000.7 took 72.3 ms
std::accumulate (long) result 100000007 took 37.1 ms
std::reduce (seq, long) result 100000007 took 45.8 ms
std::reduce (par, long) result 100000007 took 46.4 ms
而
Clang
对
std::accumulate
的优化效果更佳
$ clang++ -std=c++17 -O3 -march=native reduce.cpp -o reduce && ./reduce
std::accumulate (double) result 10000000.7 took 90.4 ms
std::reduce (seq, double) result 10000000.7 took 45.1 ms
std::reduce (par, double) result 10000000.7 took 46.3 ms
std::accumulate (long) result 100000007 took 36.4 ms
std::reduce (seq, long) result 100000007 took 49.4 ms
std::reduce (par, long) result 100000007 took 49.5 ms
std::execution::seq
与
std::execution::par
不管是 GCC 还是 Clang 都没有差别。
日常项目中用的最多的还是 int 和 float,加上这两种类型测试后我更加迷茫了
$ g++ -std=c++17 -O3 -march=native reduce.cpp -o reduce && ./reduce
std::accumulate (double) result 10000000.7 took 192.3 ms
std::reduce (seq, double) result 10000000.7 took 72.8 ms
std::reduce (par, double) result 10000000.7 took 72.2 ms
std::accumulate (float) result 10000000.8 took 279.4 ms
std::reduce (seq, float) result 8388608.0 took 70.1 ms
std::reduce (par, float) result 8388608.0 took 70.4 ms
std::accumulate (long) result 100000007 took 40.5 ms
std::reduce (seq, long) result 100000007 took 54.9 ms
std::reduce (par, long) result 100000007 took 54.9 ms
std::accumulate (int) result 100000007 took 20.7 ms
std::reduce (seq, int) result 100000007 took 27.5 ms
std::reduce (par, int) result 100000007 took 28.7 ms
GCC 对 float 的处理这么差劲吗,相较来说 Clang 就要好多了,但是 GCC 对 int 类型的优化又要优于 Clang
$ clang++ -std=c++17 -O3 -march=native reduce.cpp -o reduce && ./reduce
std::accumulate (double) result 10000000.7 took 89.9 ms
std::reduce (seq, double) result 10000000.7 took 45.4 ms
std::reduce (par, double) result 10000000.7 took 47.3 ms
std::accumulate (float) result 10000000.8 took 83.8 ms
std::reduce (seq, float) result 8388608.0 took 27.9 ms
std::reduce (par, float) result 8388608.0 took 27.7 ms
std::accumulate (long) result 100000007 took 36.9 ms
std::reduce (seq, long) result 100000007 took 47.0 ms
std::reduce (par, long) result 100000007 took 47.5 ms
std::accumulate (int) result 100000007 took 18.6 ms
std::reduce (seq, int) result 100000007 took 45.4 ms
std::reduce (par, int) result 100000007 took 46.3 ms
还是需要一个基准,那就手写一个循环求和吧
template<typename T>
T type_sum(const T *data, size_t size) {
T sum{};