So this month is going to be the no printf debugging month which means that I will only start modifying the source code of a programmer to debug it only as a last resort.
Meet the culpritToday I had a quick look at a problem with the compiler itself. The compiler segfaulted while compiling code with very long lists (PR#5368).
The following bash command will generate a file (
big_list.ml) that will cause the failure:
cat > big_list.ml <<EOF let big x =[ $(yes "true;" | head -n 100000) ] EOF
Looking at backtracesThis smelled like a stack overflow (the stack size is fixed if you have too many function call chained you blow your stack out and might get a segfault). Sure enough, after raising the size of the stack (
ulimit -s 50000) the compilation ran fine... So we are probably looking for a stack overflow. Those are usually called by non-tail call reccursions and real easy to find:
- load the binary in gdb. with
gdb --args ocamlopt.opt big_list.ml
- run it (
run) until it blows up
- look at the stack (
bt) and one or several function should appear all the time.
Using breakpointsIn my case the stacktrace was a bit anti climatic:
#0 0x000000000058150d in camlIdent__find_same_16167 () #1 0x0000000000000000 in ?? ()The lack of proper backtrace could be due to one of several things:
- Ocamlopt's calling convention for function is not the same as C and this could throw of
- the Ocaml run time has code to detect stack overflow (./asmrun/signals_asm.c). It works by registering a signal handler for the SIGSEGV signal and examining the address of the error and raising an exception if anything is wrong. This code is running inside a unix signal; this is a very restricted environment in which you are not allowed to do much (e.g. you cannot call malloc); it might be doing something illegal and/or messing up the stack.
camlIdent__find_same_16167. The caml compiler assigns symbols to functions following this naming convention: caml<module name>__<function name>_<integer>. In this case the function is the
find_namefunction in the
Ident(typing/ident,ml) module. Let's have a look at who's calling this function by using break points. No before calling
gdbwe set a breakpoint on the function.
(gdb) break camlIdent__find_same_16167 Breakpoint 1 at 0x5814f0 (gdb) run Starting program: /opt/ocaml-exp/bin/ocamlopt.opt big_list.ml Breakpoint 1, 0x00000000005814f0 in camlIdent__find_same_16167 ()We want to let cross this break point enough to have a nice a fat backtrace.
(gdb) ignore 1 500 Will ignore next 500 crossings of breakpoint 1. (gdb) continue Continuing. Breakpoint 1, 0x00000000005814f0 in camlIdent__find_same_16167 ()By looking at the backtrace we can now clearly see that:
camlTypecore__type_construct_206357is appearing a lot on the stack and, sure enough, the
typing/typecore.mlis not tail recursive. In our case the easiest solution is probably to change our code generator to output the list by chunks:
let v0= let v1= true::true:: ..... ::v0 let v2= true::true:: ..... ::v1 .... let v = vn
Finding function's symbolLast but not least: of you wanted to put a breakpoint in
typecore.mlon the function
type_argumentyou'd have to figure out the symbol name:
> nm /opt/ocaml-exp/bin/ocamlopt.opt | grep camlTypecore__type_argument 0000000000526b70 T camlTypecore__type_argument_206355