我正在用GDB调试一个24线程的程序,现在我已经找到了代码中的哪一行出现了错误,但是我无法从GDB的输出中看出错误是什么。下面这行代码导致了这个错误,它只是一个正常的插入到地图结构中。
current_node->children.insert(std::pair<string, ComponentTrieNode*>(comps[j], temp_node));
我用GDB查出错误发生在哪个线程,并切换到该线程,backtrace
命令显示了堆栈中的函数调用。(最后几行试图打印一个函数中一些变量的值,但失败了)。
我应该怎么做才能清楚地知道发生了什么错误?
[root@localhost nameComponentEncoding]# gdb NCE_david
GNU gdb (GDB) Fedora (7.2.90.20110429-36.fc15)
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /mnt/disk2/experiments_BLOODMOON/two_stage_bloom_filter/programs/nameComponentEncoding/NCE_david...done.
(gdb) r /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
Starting program: /mnt/disk2/experiments_BLOODMOON/two_stage_bloom_filter/programs/nameComponentEncoding/NCE_david /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
[Thread debugging using libthread_db enabled]
[New Thread 0x7fffd2bf5700 (LWP 13129)]
[New Thread 0x7fffd23f4700 (LWP 13130)]
[New Thread 0x7fffd1bf3700 (LWP 13131)]
[New Thread 0x7fffd13f2700 (LWP 13132)]
[New Thread 0x7fffd0bf1700 (LWP 13133)]
[New Thread 0x7fffd03f0700 (LWP 13134)]
[New Thread 0x7fffcfbef700 (LWP 13135)]
[New Thread 0x7fffcf3ee700 (LWP 13136)]
[New Thread 0x7fffcebed700 (LWP 13137)]
[New Thread 0x7fffce3ec700 (LWP 13138)]
[New Thread 0x7fffcdbeb700 (LWP 13139)]
[New Thread 0x7fffcd3ea700 (LWP 13140)]
[New Thread 0x7fffccbe9700 (LWP 13141)]
[New Thread 0x7fffcc3e8700 (LWP 13142)]
[New Thread 0x7fffcbbe7700 (LWP 13143)]
[New Thread 0x7fffcb3e6700 (LWP 13144)]
[New Thread 0x7fffcabe5700 (LWP 13145)]
[New Thread 0x7fffca3e4700 (LWP 13146)]
[New Thread 0x7fffc9be3700 (LWP 13147)]
[New Thread 0x7fffc93e2700 (LWP 13148)]
[New Thread 0x7fffc8be1700 (LWP 13149)]
[New Thread 0x7fffc83e0700 (LWP 13150)]
[New Thread 0x7fffc7bdf700 (LWP 13151)]
this is thread 1
this is thread 7
this is thread 14
this is thread 18
this is thread 2
this is thread 19
this is thread 6
this is thread 8
this is thread 24
base: 64312646
this is thread 11
this is thread 5
this is thread 12
this is thread 13
this is thread 3
this is thread 15
this is thread 16
this is thread 17
this is thread 4
this is thread 20
this is thread 21
this is thread 22
this is thread 23
this is thread 9
this is thread 10
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc8be1700 (LWP 13149)]
std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=@0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
126 __x->_M_right = __y->_M_left;
(gdb) info threads
Id Target Id Frame
24 Thread 0x7fffc7bdf700 (LWP 13151) "NCE_david" compare (__n=<optimized out>, __s2=<optimized out>, __s1=<optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/char_traits.h:257
(... other 22 threads not listed)
2 Thread 0x7fffd2bf5700 (LWP 13129) "NCE_david" compare (__n=<optimized out>, __s2=<optimized out>, __s1=<optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/char_traits.h:257
1 Thread 0x7ffff7fe57a0 (LWP 13126) "NCE_david" strtok () at ../sysdeps/x86_64/strtok.S:76
(gdb) thread 22
[Switching to thread 22 (Thread 0x7fffc8be1700 (LWP 13149))]
#0 std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=@0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
126 __x->_M_right = __y->_M_left;
(gdb) bt
#0 std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=@0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
#1 0x0000003cdd26e848 in std::_Rb_tree_insert_and_rebalance (__insert_left=<optimized out>, __x=0x7fffc0005ba0, __p=<optimized out>, __header=...)
at ../../../../libstdc++-v3/src/tree.cc:266
#2 0x00000000004029ca in std::_Rb_tree<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*>, std::_Select1st<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> > >::_M_insert_ (this=0x608108, __x=<optimized out>, __p=0x16cd3e30, __v=...)
at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/stl_pair.h:87
#3 0x0000000000402b7d in std::_Rb_tree<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*>, std::_Select1st<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ComponentTrieNode*> > >::_M_insert_unique (this=0x608108, __v=...)
at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/stl_tree.h:1281
#4 0x000000000040444c in insert (__x=..., this=0x608108) at /usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/stl_map.h:518
#5 ComponentTrie::add_prefix (this=0x7fffffffe2e0, prefix_input=<optimized out>, port=10) at ComponentTrie_david.cpp:112
#6 0x0000000000401c3b in main._omp_fn.0 () at NameComponentEncoding_david.cpp:277
#7 0x0000003cd2607fea in gomp_thread_start (xdata=<optimized out>) at ../../../libgomp/team.c:115
#8 0x0000003cd0607cd1 in start_thread (arg=0x7fffc8be1700) at pthread_create.c:305
#9 0x0000003cd02dfd3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) p 'ComponentTrie::add_prefix(char*, int)'::comps[j]
No symbol "comps" in specified context.
(gdb) p 'ComponentTrie::add_prefix(char*, int)'::prefix
No symbol "prefix" in specified context.
编辑:我用valgrind --tool=memcheck
运行了这个代码,结果如下。
[root@localhost nameComponentEncoding]# valgrind --tool=memcheck ./NCE_david /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
(... many lines omitted)
==13261==
==13261== Thread 11:
==13261== Invalid read of size 1
==13261== at 0x3CD02849BC: strtok (strtok.S:141)
==13261== by 0x40426A: ComponentTrie::add_prefix(char*, int) (ComponentTrie_david.cpp:99)
==13261== by 0x40242C: main._omp_fn.0 (NameComponentEncoding_david.cpp:531)
==13261== by 0x3CD2607FE9: gomp_thread_start (team.c:115)
==13261== by 0x3CD0607CD0: start_thread (pthread_create.c:305)
==13261== by 0x3CD02DFD3C: clone (clone.S:115)
==13261== Address 0x234422c02 is not stack'd, malloc'd or (recently) free'd
==13261==
==13261== Invalid read of size 1
==13261== at 0x3CD02849EC: strtok (strtok.S:167)
==13261== by 0x40426A: ComponentTrie::add_prefix(char*, int) (ComponentTrie_david.cpp:99)
==13261== by 0x40242C: main._omp_fn.0 (NameComponentEncoding_david.cpp:531)
==13261== by 0x3CD2607FE9: gomp_thread_start (team.c:115)
==13261== by 0x3CD0607CD0: start_thread (pthread_create.c:305)
==13261== by 0x3CD02DFD3C: clone (clone.S:115)
==13261== Address 0x234422c02 is not stack'd, malloc'd or (recently) free'd
==13261==
Insertion and lookup cost time(us): 994669532 67108864 14.821731 0.067469
component number:4849478, state number: 2545847
Parallel threads:24
==13261==
==13261== HEAP SUMMARY:
==13261== in use at exit: 4,239,081,584 bytes in 76,746,193 blocks
==13261== total heap usage: 80,050,114 allocs, 3,303,921 frees, 4,323,622,103 bytes allocated
==13261==
==13261== LEAK SUMMARY:
==13261== definitely lost: 0 bytes in 0 blocks
==13261== indirectly lost: 0 bytes in 0 blocks
==13261== possibly lost: 4,111,951,106 bytes in 74,746,429 blocks
==13261== still reachable: 127,130,478 bytes in 1,999,764 blocks
==13261== suppressed: 0 bytes in 0 blocks
==13261== Rerun with --leak-check=full to see details of leaked memory
==13261==
==13261== For counts of detected and suppressed errors, rerun with: -v
==13261== Use --track-origins=yes to see where uninitialised values come from
==13261== ERROR SUMMARY: 45 errors from 30 contexts (suppressed: 6 from 6)
1 个回答
0 人赞同
我们知道,程序在这一行发生了隔离故障。
current_node->children.insert(std::pair<string, ComponentTrieNode*>(comps[j], temp_node));
从堆栈跟踪中,我们知道segfault发生在std::map
的红黑树实现深处。
#0 std::local_Rb_tree_rotate_left (__x=0xa057c90, __root=@0x608118) at ../../../../libstdc++-v3/src/tree.cc:126
126 __x->_M_right = __y->_M_left;
这意味着。
The segfault could be caused by:
evaluating __x->_M_right
evaluating __y->_M_left
storing the right hand side to the left hand side of __x->_M_right = __y->_M_left
std::map::insert()
being called implies that the segfault was NOT caused while building the arguments to the call. In particular comps[j]
is not out of bounds.
这使我认为你的堆在这个时候已经被以前的内存操作错误破坏了,std::map::insert()
中的崩溃是一个症状,而不是一个原因。
在Valgrind memcheck工具下运行你的程序。
$ valgrind --tool=memcheck /mnt/disk2/experiments_BLOODMOON/two_stage_bloom_filter/programs/nameComponentEncoding/NCE_david /mnt/disk2/FIB_with_port/10_1.txt /mnt/disk2/trace/a_10_1.trace /mnt/disk2/FIB_with_port/10_2.txt
并在事后仔细阅读Valgrind的输出,找出你程序中的第一个内存错误。
Valgrind是作为一个虚拟的CPU来实现的,所以你的程序会减慢30倍左右。这很耗费时间,但应该可以让你在排除问题方面取得进展。
除了Valgrind之外,你可能还想试试为libstdc++
容器启用调试模式:
要使用libstdc++的调试模式,请用编译器标志-D_GLIBCXX_DEBUG编译你的应用程序。注意这个标志会改变标准类模板的大小和行为,比如std::vector,因此你只能把用调试模式编译的代码和没有调试模式编译的代码连接起来,如果在两个翻译单元之间没有传递容器的实例化。
如果你的程序没有使用外部库,那么就可以用以下方法重建整个程序-D_GLIBCXX_DEBUG添加到CXXFLAGS中的Makefile
应该可以工作。否则,你需要知道C++容器是否在使用和不使用调试标志编译的组件之间传递。
Valgrind Log Review
我很惊讶你在一个多线程程序中使用strtok()
。是ComponentTrie::add_prefix()从来没有从两个线程中同时调用?在通过检查strtok()
是如何被使用的来修复无效读取的同时,在组件Trie_david.cpp:99你可能想把strtok()
替换成strtok_r()也是如此。
Concurrent Access to STL Containers
标准的C++容器是explicitly documented不做线程同步。