A few months back I signed up for the C++ grandmaster certification course with the idea that not only would it be educational, it’d be productive and a good resume piece. The Grandmaster Certification course consists of writing a complete, bootstrapped, C++11 compiler (sans optimizer) targeting Linux AMD64 from the ground up, using no 3rd party libraries. This is an interesting feat, as it seems there’s no better way to show that you completely understand a programming language than to write a complete, standards-compliant, bootstrapped compiler for that language. I disagree, and despite how wonderfully ambitious this idea is, I’m quitting
I think it goes without saying, but I’ll say it. Writing a C++ compiler is a monumentally huge undertaking. The latest publicly available working draft of the C++11 standard is a whopping 1314 pages long. Compare that to the R5RS Scheme standard which, in its entirety, is only 50 pages, including a table of contents, a few title and acknowledgement pages, and an index. I feel that I can say, with some degree of certainty, that the C++ standard is one of the largest programming language compiler specifications ever written.
The course defines the target platform and OS to be a vanilla Ubuntu AMD64 installation, meaning all compilers submitted for grading must compile under GCC 4.7.3 with (more importantly) stdlibc++ 4.7 which doesn’t have C++11 regex support. I feel like writing a compiler without some form of regular expressions or pattern matching libraries is like, to borrow a wonderful quote from James Hague’s blog, “the equivalent of writing a novel without using the letter ‘e’.” This wouldn’t be so much of an issue if external libraries were allowed, or if we could use clang and libc++, however the former is expressly forbidden and the latter isn’t available on the grading server.
What exactly does it mean to be a ‘grandmaster’ at C++? Is really just knowing the standard back to front with all of its caveats and cobwebs? Is it having a deep understanding of register assignment or aliasing detection algorithms working in the bowels of GCC’s code generator? I think this is important knowledge, but it doesn’t really have much to do with being a great C++ programmer. F1 drivers are surely aware of how their car works, and what their inputs are translated into when driving. However, to be the F1 world champion, you don’t have to understand the dynamics of compression ratios or know the tolerances and bevel angles on the pistons. You just have to understand how to drive fast, and how to make your car go as fast as it can. Maybe some of this very-deep knowledge helps or is interesting, but it changes frequently, and is extremely context-specific. The best thing you can do is adjust your driving inputs, benchmark the results, and repeat.
Given the formidableness of the task at hand, it’d be nice to have something useful to show for your efforts. While the course is centered around building a compiler, course-takers are required to agree that they will not release the source code for their projects, during or after the course. Given that the end result will be a non-optimizing, single-platform, heavily special-cased and restrictive C++ compiler, it’s hard to see what makes it worthwhile. If half of all the effort spent towards this course was directed at improving clang’s documentation, or writing desperately-needed modern C++ development tools, we might be able to make a difference in the industry, instead of writing a bunch of crappy compilers for the sake of passing a few unit tests and getting a pat on the back that ultimately means nothing.
Since there seem to be a few themes that have cropped up in the discussion of this post (stemming from my poor explanation) I thought it’d be good to clarify a few things and state my opinion regarding them.
Regex: I admit to over-stressing the lack of regex when talking about toolchain deficiencies. It happened to be an annoying caveat of using a mac with clang and libc++ for development and gcc with libstdc++ for testing. I was using regex for decomposing literal tokens into their respective components. For example, “1.53e-2_asd” is split into its mantissa “1.53”, exponent “-2,” and user-defined literal suffix “_asd”. Rewriting this without regex was simple, but frankly it’s just busy work. I ended up writing what essentially became an incomplete, buggy, slow, verbose regular expression engine because of a deficiency in the grading server toolchain. Indeed, preventing people from having to write code like this is the whole point of having <regex> in the standard library. The point I was trying to make with this was that prohibiting 3rd party libraries is just making things difficult for the sake of being difficult.
Difficulty: When I first started this course, I knew it would be difficult. In fact, the challenge of it was one of the things that really drove me to it initially. I actually really enjoyed working on it. It wasn’t that I couldn’t understand the standard’s langauge, or that writing a lexer and parser is really difficult at all. The process of writing the initial steps to a front-end were trivial, albeit very precise and detailed. Coupled with writing everything from scratch, including parts of the standard library that libc++ hasn’t implemented, it amounted to just busy work for the sake of busy work.
It was also mentioned that while the C++ standard is over 1300 pages, ‘only’ 400 or so deal with compilation, while the rest specify the standard library. This is absolutely true, however implementing a full standard library is also part of the course.
C++ Competency: Being able to recite the standard by heart is not necessarily what makes a good C++ programmer, nor is a complete understanding of every dusty corner of the 40 years of backwards-compatibility built into the front-end. If you’re using C++ seriously in 2013, you’re using it because it compiles to fast, small executables. Therefore it’s imperative that you understand how to structure your code such that it’s robust, easy to use correctly, and most importantly, as friendly as possible to the optimizer. Given that this course only goes as far as basic code generation, you’re missing out on the knowledge you need to be a ‘grandmaster’ C++ programmer anyway. It’s more valuable to know how to structure code so as to avoid inhibiting SSA decomposition than to know all the trigraph sequences and valid unicode ranges by heart.
The point: The whole point of this post is that there’s just a million better things to spend your time on that ultimately give back to the community and progress the industry forward.