ICS-E4020: Debugging your code

Correctness issues

In general, always try to isolate the problem first! Figure out what is the smallest, simplest code that still does something unexpected. Remember that you are not limited to use our makefiles and test scripts, but you can always develop e.g. your own unit tests.

Also make sure that you are using Maari-A computers. You do not need to be physically there, you can use ssh.

Strange bugs, segmentation faults, etc.?

First try AddressSanitizer — see below for more information.

Still unexplained segmentation faults?

It might be a stack overflow. Unfortunately, a stack overflow is typically reported as a segmentation fault. In the classroom computers, the stack size limit is approx. 8MB. Do not allocate large arrays on the stack. If you need to allocate storage for megabytes of data, use the heap.

Different behaviour with vs. without make?

It might be a stack overflow (see above). Gnu Make accidentally changes the stack size limits so you might see a stack overflow (segmentation fault) when you run your code directly, yet it might work fine if you run it with make.

Random results? Strange results?

You might be reading wrong parts of the memory. Try AddressSanitizer — see below for more information.

You might be reading memory that is not initialised. Try malloc debugging options — see below for more information.

My CUDA code does not seem to work at all?

Check for errors. Wrap all CUDA API calls in error-checking macros, and also check for errors after each kernel launch. For example, you can define a macro like this:

    #define CHECK_CUDA_ERROR(call) do { \
        cudaError_t result_ = (call); \
        if (result_ != cudaSuccess) { \
            fprintf(stderr, #call " failed: %s\n", \
                    cudaGetErrorString(result_)); \
            exit(1); \
        } \
    } while(0)

And use it like this:

    ...
    CHECK_CUDA_ERROR(cudaMalloc((void**)&x, n));
    CHECK_CUDA_ERROR(cudaMalloc((void**)&y, n));
    ...
    kernel<<<dimGrid, dimBlock>>>(params);
    CHECK_CUDA_ERROR(cudaGetLastError());
    ...

Performance issues

Please always read the task-specific hints first!

Poor performance, strange benchmarks?

Make sure there is no other load on the machine that you use for benchmarking. Try uptime and top to see what is the current load and who is running what there.

Better code seems to run slower?

Read the assembly code of the relevant part, see below for mode details.

Here is one example of a seemingly counterintuitive issue you may encounter: branch predication vs. prediction.

Tools

Using AddressSanitizer — and understanding its output

To enable AddressSanitizer, run make clean and make DEBUG=2 — see README.md for more details on make parameters.

In our environment, AddressSanitizer output may be a bit hard to read; here is a quick tutorial that hopefully helps. I am using a buggy implementation of MF1 as an example. I have included only relevant parts of the messages:

    $ make DEBUG=2 test
    ...
    ... ERROR: AddressSanitizer: stack-buffer-overflow on ...
    READ of size 4 at ...
        #0 0x4034a3 (.../mf1/mf-test+0x4034a3)
    ...
    Address ... is located at offset 36 ...
    This frame has 2 object(s):
        [32, 36) 'in'
    ...

So program mf-test crashed, because we were trying to read at an address that is outside the bounds of any array. Furthermore, it happened to be right after the end of array in, so probably we were simply trying to read beyond the end of array.

This happened in the code at address 0x4034a3, which we can translate to a source code line:

    $ addr2line -e mf-test 0x4034a3
    .../mf1/mf.cc:32

Apparently the problem was at line 32 of mf.cc, which in my case contained:

    float v = in[i + j * nx];

Now it is easy to figure out precisely what went wrong; it turns out that the index i was one too large in certain cases.

Debugging bugs related to uninitialised memory

In C and C++, memory allocation functions typically do not guarantee that memory is initialised with zeros. However, it is easy to forget to initialise newly allocated memory, and in many cases your program may accidentally work correctly as newly allocated memory often happens to contain all zeros.

To better detect bugs related to the use of uninitialised memory accesses on Linux, try to set the environment variable MALLOC_PERTURB_, for example, as follows:

    MALLOC_PERTURB_=191 make test

On Mac OS X you can try to set the environment variable MallocScribble, for example, as follows:

    MallocScribble=1 make test

In the Linux example, the value 191 is convenient as it will fill newly allocated memory with the value 64. If you interpret such values as doubles or floats, you will get reasonable values (other choices might accidentally hide some errors if you get e.g. very small values).

GDB quick start

We will use MF1 here as an example. One part of its test suite is a small program called mf-test which will call mf function with some small inputs. Let us compile it with debug options and open it in the debugger, using GDB with its text user interface:

    $ make clean
    $ make DEBUG=1
    $ gdb -tui mf-test

Then use command b to set a breakpoint in function mf and start the program with command r; it will execute until it reaches function mf:

    (gdb) b mf
    Breakpoint 1 at ...: file mf.cc, line ...
    (gdb) r
    Starting program: .../mf-test
    ...
    Breakpoint 1 ...

You should have a split-screen text user interface, with the upper half of the screen showing your source code and the lower half showing the GDB console. You can scroll the source code with arrow keys; highlighting shows which line will be executed next. You can now e.g. execute code line by line with command n:

    (gdb) n
    (gdb) n
    ...

At any point you can also show the contents of any variable with command p:

    (gdb) p ny
    ... = 1
    (gdb) p in[0]
    ... = 0

We can continue execution until we reach the breakpoint again with c:

    (gdb) c
    Continuing.

    Breakpoint 1 ...

Other highly useful commands include bt for showing the stack backtrace (who called us), and q for quitting GDB. See the manual for more information; the split-screen text user interface is documented in section TUI.

Reading assembly code

The makefiles provide two ways of outputting the assembly code produced by the compiler. For example, if you want to see the compiled version of mf.cc, try the following commands:

    make mf.asm1
    make mf.asm2

Then open the file mf.asm1 or mf.asm2 in your text editor. Both of these try to produce somewhat readable assembly code, but it may depend on your luck which of these is more readable in your case.

Usually the assembly code is very long, and the most challenging part is finding the relevant part of it quickly. Here is one trick that you can use:

Another trick is to simply search for the relevant instruction. For example, you can search for vmulps to find all places in which you are multiplying float8_t vectors.