LLVM


LLVM or Low Level Virtual Machine is a powerful compiler technology used to develop compiler frontends and backends. Ever since I got to know about LLVM, I wanted to try it out and finally made a simple C to LLVM compiler by looking for resources online. In this blog I would be expressing some of the cool features that I got to know about this awesome software.

How LLVM works ?

The basic idea behind this whole project is creating a middleware for all the frontend ‘High level’ programming languages before finally generating the assembly codes. Thie middleware is called Intermediate Representation (IR) that can be optimized and tinkered upon without having to rewrite the entire code generation logic. Not only producing the optimized IR can be done independently, producing the assembly code out of the same IR for different architectures can be done separately. This saves a lot of time and reduces unnecessary redundany of code.

An example LLVM IR for Hello World code in C :

Currently most of tech leads use LLVM as thier compiler backend. This also adds on to why one should learn about LLVM.

Reversing LLVM IR to C/CPP

The commonly known tool is called Clang that is a C compiler created by Apple as a part of LLVM project. It is quite cool and has a lot of functionality that can be easily googled up. One of the most amazing things that I found was reversing the LLVM IR to possible C code. Now this is one important thing that can be taken advantage and be used to a different language’s code that produces the same LLVM IR.

Here is reversed CPP code for the above LLVM IR.

Producing different Architecture codes

One more awesome functionality recently added is that you can produce LLVM IR for a differnet hardware architecture without having to even run code on that architecture system.

Creating an actual Compiler

These are just important thing I found while working on making a compiler. There are a lot of tutorials available on how to do so, here are just common things that are used in accomplishing the same.

Blocks

Using LLVM to write a backend for the compiler uses the idea of blocks. The different blocks can be equipped with feature sets individually as per wish. These blocks are then linked with each other while generating the IR. The makeLLVMmodule() uses methods to stackup these blocks according to language grammar provided and generates a LLVM module that is run to execute the program.

Inbuilt functions

Each of Blocks are accompanied by Function Pointers that tells the LLVM that these are functions that need to be analyzed. Luckily one need not to write all the function logic for some basic things like Binary Operation, these come handy as inbuilt functions in LLVM. for some trivial C/CPP functions can also be accessed using twine().

SSA

In Compiler Theory, SSA stands for Static Single Assignment and LLVM sticks to it while generating very basic optimized code. SSA can be added easily to one’s codegen logic.

Optimization

There are different prebuilt optimizations available that can be top level accessed using Pass Manager. One can customize a new one if required and run using the pass manager.