Make C compiler using LLVM


Anyone starting with compilers and jumping directly to learn developing tools using LLVM or maybe just wanting to get to understand it’s working must have felt that their documentation is quite tough to hack into directly. I am writing this post to help understand developing a compiler frontend using LLVM backend.

The most important thing to figure before creating a compiler is to figure out the language for which the compiler will be created.

In this post we will develop a compiler frontend for an imperative language like C. We will be creating a multipass compiler front end that will support features like

  • Loop handling
  • Contextual Semantics
  • Input/Output (only in the form of print)
  • Conditional Statements
  • Binary operations
  • Commenting (both inline and paragrah wise)
  • JIT compilation

The project is available here https://github.com/SatyendraBanjare/C-LLVM-compiler .

Our compiler frontend will output LLVM IR which can be further analyzed and optimzed using LLVM.

[NOTE] I expect the readers to be familiar atleast in theory about compilers, lexing and parsing.

For learning about tools used, I will request to go through these links beforehand

References :

  • http://aquamentus.com/tut_lexyacc.html
  • http://dinosaur.compilertools.net/flex/flex_11.html
  • https://www.univ-orleans.fr/lifo/Members/Mirian.Halfeld/Cours/TLComp/l3-0708-LexA.pdf
  • https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.genprogc/yaac_file_declarations.htm
  • https://gist.github.com/serge-sans-paille/aa332fa22692fcdfdc51

CodeOut

Let’s begin with creating the lexical rules. Referring to above mentioned links, this is how final lexer file should look like. Most of this is self explainatory. Some tricks are used to implement Comments. How this works is explained in http://aquamentus.com/tut_lexyacc.html . We make the state go to comment / comment_oneline state and do nothing till the comment section is not over.

At this point We have developed the Vocabulary of our language. Let us now develop the reasoning and grammar.

This is first part where we have described the various token values, type values and associativity rules. We have created a union of later described block expression. Finally we have created a map for variable name and its value.

This is the second part where we describe all the expressions. We have created blocks that will be used while writing algorithms of implementation.

Here are some extra important methods that help in checking the code. Variable’s type is checked in here too.

Here is the complete file. https://gist.github.com/SatyendraBanjare/a9f12d927be4c3fc0537a41ea2573b4d

Now that we have developed our grammar, lets us implement it.

  1. We will create a wrapper header for the methods implemented.

  2. Define the functions to be used.

  3. And the final wrapper.

Compile & Run

This is how the Makefile is written. Basically is a series of operations of lexical analysis, parsing and final code generation.

To build , simply do make.

Testing

To test that our compiler works, just do

./compiler test.c