log in | about 

Folks who program for RISC-processors need to ensure proper alignment of certain data types. The Intel people, however, may not have such concerns. Neither they need to fear about crashing on unaligned data reads, nor unaligned reads have a substantial impact on performance. Unless, of course, we explicitly use special assembly store/load instructions that do require the data to be aligned. In this case, if we mess up, we get a deserved punishment for being fancy.

Simply speaking, folks writing in plain standard C++ should not worry about such things, right? Wrong, compilers are becoming increasingly smart and they may automatically unroll your loops. In that, they will generate Single Instruction Multiple Data assembly instructions. These instructions, which are used to carry out operations on small vectors, may require aligned data.

Consider the following code:

  1. double dist(const double*x,const double*y,size_t qty){
  2. double res=0;
  3. for (size_t i = 0; i < qty; i++)
  4. res += x[i] - y[i];
  5. return res;
  6. }

There is a loop and this loop can be unrolled. More specifically, a compiler may generate instructions to carry out 2 additions/subtractions if the CPU supports SSE extensions, or 4 additions/subtractions in the case of AVX extensions being available. This technique is commonly known as vectorization.

On the up side, vectorization can boost performance of your programs. On the down side, a compiler may use load/store instructions that assume the data to be aligned. In the above example, the compiler may assume that pointers x and y are aligned on an 8-byte boundary. But what if they are not? Your program will crash! Turns out this is a known issue, which was reported as a bug. Yet, the GCC folks closed the issue without fixing, because alignment behavior is, apparently, not enforced by the standard.

For those who want to learn more details, I created a reproducible example. If you type make, this will create two programs testalign_crash and testalign_nocrash (as well as corresponding assembly code). I used GNU C++ 4.7, but the example might not work in your specific case. This is why I also committed the assembly code that is produced in both cases: One where the program crashes and another when it does not. In the assembly of the crashing program there is an aligned read command vmovapd. If you replace all unaligned reads vmovapd with their unaligned "siblings" vmovupd, you will get the program that works without crashing (see the assembly code vectorized_manually_fixed.s) To compile the assembly file, just type:

gcc vectorized_manually_fixed.s -lstdc++

I think that one needs to understand the issue at least in some detail, unless she has spare 12 hours to debug weird program behavior. What is really "nice": An alignment-violation segfault looks exactly the same way as memory corruption segfaults. At least, I don't know how to tell the difference.

UPDATE1: It is also recommended to use -Wcast-align, though, I suspect this is not a bullet-proof recommendation.
UPDATE2: Also note that there is apparently no good reason for GCC folks to use aligned reads (performancewise it's about the same as unaligned ones). But with these kind of optimizations, a lot of old code could crash now.
UPDATE3: Following a hint by Nathan Kurz, I added handlers to catch segmentation fault and bus error signals. Now, you can see that the OS/CPU generate the segmentation fault signal, not the bus error signal. To do so, you need to uncomment the line:

  1. // Uncomment this line to see which signal handler is activated.
  2. //#ifdef CATCH_SIGBUS