This post is about the Clang C++ compiler and how we use it here at Belvedere. If you’re not familiar with Clang, it is a C++ compiler built on top of the llvm toolchain. The documentation is quite good and can be found here: http://clang.llvm.org/docs/index.html
Our interest in Clang began after watching this video from Google’s Chandler Carruth, which is essentially a live demo of many of the tools built into and around the Clang compiler. If you haven’t seen it before, it is a bit long (over 90 minutes in total), but definitely worth watching.
The use of Clang here at Belvedere grew out of a Hackathon project. The goal of the project was to evaluate how it could help us improve the quality of our software and possibly make development easier at the same time. The rest of this post talks what we have done, which tools we are currently using, and how we are planning to use Clang in the future.
WHAT WE HAVE DONE
This section covers how Belvedere has used Clang to improve the quality of our software, mainly discussing the Clang compiler itself and Clang’s sanitizers.
The Clang Compiler Itself
Our first step for the Hackathon was just getting the code to compile with Clang. It seemed like this should have been easy, but this actually consumed a large portion of the Hackathon. The largest hurdle wasn’t actually fixing compilation errors: it was hooking Clang into our current build infrastructure. We use CMake, and we had to update our Infrastructure to tell CMake where to find all the libraries that we need to link to in our build, whereas our current compiler automatically did do a lot of this for us. Once we got past this, our Hackathon team was able to actually compile our code. This is when we saw how great Clang’s warning and error messages really are. Even getting our code compiling under Clang allowed us to find numerous bugs.
Here are examples of a couple of gems that the Clang compiler found (you can just assume that a, b, and c are ints).
d = (a + b) , c;
We were actually surprised that this compiled, and was a good opportunity to brush up on some esoteric uses of the comma operator. In case you aren’t sure what this code would do (I sure wasn’t), it is basically the same as:
d = (a + b);
What is great about Clang is that even though this is valid C++ code it, it was able to warn us that this was probably not what we wanted to do. Once we saw this, we were able fix a typo that was actually supposed to do this:
d = (a + b) / c;
d = a; + b + c;
Again, we were surprised that this even compiled with an errant semicolon placed after a. However, it turns out that this is also valid C++, and the + b + c part of the statement gets evaluated and stored into an unnamed temporary which never gets used. Again, the Clang compiler was smart enough to warn us that this is probably not what we wanted to do.
Once we got everything compiling, we were able to move on to some of Clang’s more interesting tools.
SANITIZERS (a.k.a run-time error detection)
For us, the most exciting tools provided by Clang are called “sanitizers,” which enable you to detect a whole gamut of errors at runtime. The main sanitizers are:
- Address Sanitizer: Detects memory errors, including out-of bounds accesses, use after free, memory leaks, etc. [see Chandler’s video 1:04:00]
- Memory Sanitizer: Detects uninitialized reads [see Chandler’s video 1:26:20]
- Thread Sanitizer: Detects race conditions between threads [see Chandler’s video 1:13:10]
- Undefined Behavior Sanitizer: Detects misaligned pointers, null pointers, signed integer overflow, etc.
Once we got our code compiling, we started turning on these sanitizers and running them with our automated tests.
When we turned on the address sanitizer, it immediately found memory leaks and out of bound array accesses. The impressive thing here was not only that it could find these errors (other tools can do this too), but that the error messages were crystal clear, telling us exactly where the issue occurred so that we could easily fix it.
Here is an example for a buffer overflow (the output is pretty verbose, so here is the abbreviated version):
==60803==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffffffddea at pc 0x0000006fd3fd bp 0x7fffffffddb0 sp 0x7fffffffd560
Address 0x7fffffffddea is located in stack of thread T0 at offset 42 in frame
#0 0x84f36f in FunctionName() FileName.cpp:57
[32, 42) 'result' <== Memory access at offset 42 overflows this variable
The amount of information here is truly impressive. From this output we can see that a buffer overflow occurred, the function it happened in with the line number, the name of the variable, and the thread on which this occurred. We have all the information that we need to quickly find and resolve the problem.
Undefined Behavior Sanitizer
We were even more impressed when we turned on the undefined behavior sanitizer, which found some interesting and very hard to see issues with alignment. Here is an example of one of these issues:
As an example, say that we have a data structure that we want to ensure that is aligned to a cache line (64-bytes):
Then in our code we do this:
MyStruct* myStruct = (MyStruct*) malloc(sizeof(MyStruct));
The problem here is subtle. We want our data structure to be aligned, but we are not asking for aligned memory. What what we really wanted was this:
MyStruct* myStruct = (MyStruct*) aligned_alloc(64, sizeof(MyStruct));
posix_memalign would have also been acceptable here.
Advantages Over Other Tools
A key benefit of these sanitizers is that they only add a 2x to 5x slowdown over non-instrumented code. Just to compare, Valgrind can often provide a 30x slowdown. This low overhead allows us to run the full version of our binaries with the sanitizers, whereas running our full binary through Valgrind is often unfeasible. We have even leveraged these sanitizers when developing new applications, allowing us to catch bugs such as threading issues before beginning manual testing (which is where we typically find these types of problems). Our engineers can now identify issues much earlier in the development cycle, reducing the time needed to get high quality code into production.
WHAT WE’RE PLANNING TO DO IN THE FUTURE
We have barely started with all of the tools that Clang has to offer, but we have already been able to see some great results. We will be continuing to leverage the tools that it provides and integrate it further into our system.
Our next step will be to add Clang to our automated process for building and running tests. Even though Clang's features are incredibly powerful, getting it fully integrated into our automated unit and regression testing system was beyond the scope of what we could do in our Hackathon. The challenge has been that instead of naively duplicating our current system for two compilers, which would double the amount of work our build system needs to do, we decided to take a step back and re-evaluate our process for building and running our automated tests. This has allowed us take advantage of the strengths of each of these compilers, without increasing the load on our build system. Once this has been completed, we plan to start turning on the sanitizers to take our automated testing to the next level.
We have also been experimenting with ClangFormat to automatically format code based on Belvedere’s coding standards, allowing us to make our codebase more consistent and readable [see Chandler’s video 0:26:00]. This nicely integrates into our preferred text editor (SublimeText), and we have been using it with good results.
We have also started integrating include what you use to ensure that we exclude header files that we don’t need, thereby speeding up our builds.
There are several other tools that we are interested in using, but haven’t gotten around to trying out yet:
- Static Analysis via Clang-Tidy
- Intellisense like auto-complete (SublimeText also has some plug-ins for this, but it is not yet working with our build environment)
Lastly, we are currently evaluating Clang solely as a development tool. In the future, we will probably investigate using it in our production environment. We have a lot of work to do to get there, and we would also have to prove that it generates faster code than out current compiler (in our industry, performance is crucial), but we are excited to give it a try.