docs/9 - How is a toolchain constructed.txt
author "Yann E. MORIN" <yann.morin.1998@free.fr>
Sun May 05 18:34:20 2013 +0200 (2013-05-05)
changeset 3212 4c0d4394d0b0
parent 2908 dcdb309b7967
permissions -rw-r--r--
scripts: help debugging missing directories

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
     1 File.........: 9 - Build procedure overview.txt
     2 Copyright....: (C) 2011 Yann E. MORIN <yann.morin.1998@free.fr>
     3 License......: Creative Commons Attribution Share Alike (CC-by-sa), v2.5
     4 
     5 
     6 How is a toolchain constructed? /
     7 _______________________________/
     8 
     9 This is the result of a discussion with Francesco Turco <mail@fturco.org>:
    10   http://sourceware.org/ml/crossgcc/2011-01/msg00060.html
    11 
    12 Francesco has a nice tutorial for beginners, along with a sample, step-by-
    13 step procedure to build a toolchain for an ARM target from an x86_64 Debian
    14 host:
    15   http://fturco.org/wiki/doku.php?id=debian:cross-compiler
    16 
    17 Thank you Francesco for initiating this!
    18 
    19 
    20 I want a cross-compiler! What is this toolchain you're speaking about? |
    21 -----------------------------------------------------------------------+
    22 
    23 A cross-compiler is in fact a collection of different tools set up to
    24 tightly work together. The tools are arranged in a way that they are
    25 chained, in a kind of cascade, where the output from one becomes the
    26 input to another one, to ultimately produce the actual binary code that
    27 runs on a machine. So, we call this arrangement a "toolchain". When
    28 a toolchain is meant to generate code for a machine different from the
    29 machine it runs on, this is called a cross-toolchain.
    30 
    31 
    32 So, what are those components in a toolchain? |
    33 ----------------------------------------------+
    34 
    35 The components that play a role in the toolchain are first and foremost
    36 the compiler itself. The compiler turns source code (in C, C++, whatever)
    37 into assembly code. The compiler of choice is the GNU compiler collection,
    38 well known as 'gcc'.
    39 
    40 The assembly code is interpreted by the assembler to generate object code.
    41 This is done by the binary utilities, such as the GNU 'binutils'.
    42 
    43 Once the different object code files have been generated, they got to get
    44 aggregated together to form the final executable binary. This is called
    45 linking, and is achieved with the use of a linker. The GNU 'binutils' also
    46 come with a linker.
    47 
    48 So far, we get a complete toolchain that is capable of turning source code
    49 into actual executable code. Depending on the Operating System, or the lack
    50 thereof, running on the target, we also need the C library. The C library
    51 provides a standard abstraction layer that performs basic tasks (such as
    52 allocating memory, printing output on a terminal, managing file access...).
    53 There are many C libraries, each targeted to different systems. For the
    54 Linux /desktop/, there is glibc or eglibc or even uClibc, for embedded Linux,
    55 you have a choice of eglibc or uClibc, while for system without an Operating
    56 System, you may use newlib, dietlibc, or even none at all. There a few other
    57 C libraries, but they are not as widely used, and/or are targeted to very
    58 specific needs (eg. klibc is a very small subset of the C library aimed at
    59 building constrained initial ramdisks).
    60 
    61 Under Linux, the C library needs to know the API to the kernel to decide
    62 what features are present, and if needed, what emulation to include for
    63 missing features. That API is provided by the kernel headers. Note: this
    64 is Linux-specific (and potentially a very few others), the C library on
    65 other OSes do not need the kernel headers.
    66 
    67 
    68 And now, how do all these components chained together? |
    69 -------------------------------------------------------+
    70 
    71 So far, all major components have been covered, but yet there is a specific
    72 order they need to be built. Here we see what the dependencies are, starting
    73 with the compiler we want to ultimately use. We call that compiler the
    74 'final compiler'.
    75 
    76   - the final compiler needs the C library, to know how to use it,
    77 but:
    78   - building the C library requires a compiler
    79 
    80 A needs B which needs A. This is the classic chicken'n'egg problem... This
    81 is solved by building a stripped-down compiler that does not need the C
    82 library, but is capable of building it. We call it a bootstrap, initial, or
    83 core compiler. So here is the new dependency list:
    84 
    85   - the final compiler needs the C library, to know how to use it,
    86   - building the C library requires a core compiler
    87 but:
    88   - the core compiler needs the C library headers and start files, to know
    89     how to use the C library
    90 
    91 B needs C which needs B. Chicken'n'egg, again. To solve this one, we will
    92 need to build a C library that will only install its headers and start
    93 files. The start files are a very few files that gcc needs to be able to
    94 turn on thread local storage (TLS) on an NPTL system. So now we have:
    95 
    96   - the final compiler needs the C library, to know how to use it,
    97   - building the C library requires a core compiler
    98   - the core compiler needs the C library headers and start files, to know
    99     how to use the C library
   100 but:
   101   - building the start files require a compiler
   102 
   103 Geez... C needs D which needs C, yet again. So we need to build a yet
   104 simpler compiler, that does not need the headers and does need the start
   105 files. This compiler is also a bootstrap, initial or core compiler. In order
   106 to differentiate the two core compilers, let's call that one "core pass 1",
   107 and the former one "core pass 2". The dependency list becomes:
   108 
   109   - the final compiler needs the C library, to know how to use it,
   110   - building the C library requires a compiler
   111   - the core pass 2 compiler needs the C library headers and start files,
   112     to know how to use the C library
   113   - building the start files requires a compiler
   114   - we need a core pass 1 compiler
   115 
   116 And as we said earlier, the C library also requires the kernel headers.
   117 There is no requirement for the kernel headers, so end of story in this
   118 case:
   119 
   120   - the final compiler needs the C library, to know how to use it,
   121   - building the C library requires a core compiler
   122   - the core pass 2 compiler needs the C library headers and start files,
   123     to know how to use the C library
   124   - building the start files requires a compiler and the kernel headers
   125   - we need a core pass 1 compiler
   126 
   127 We need to add a few new requirements. The moment we compile code for the
   128 target, we need the assembler and the linker. Such code is, of course,
   129 built from the C library, so we need to build the binutils before the C
   130 library start files, and the complete C library itself. Also, some code
   131 in gcc will turn to run on the target as well. Luckily, there is no
   132 requirement for the binutils. So, our dependency chain is as follows:
   133 
   134   - the final compiler needs the C library, to know how to use it, and the
   135     binutils
   136   - building the C library requires a core pass 2 compiler and the binutils
   137   - the core pass 2 compiler needs the C library headers and start files,
   138     to know how to use the C library, and the binutils
   139   - building the start files requires a compiler, the kernel headers and the
   140     binutils
   141   - the core pass 1 compiler needs the binutils
   142 
   143 Which turns in this order to build the components:
   144 
   145   1 binutils
   146   2 core pass 1 compiler
   147   3 kernel headers
   148   4 C library headers and start files
   149   5 core pass 2 compiler
   150   6 complete C library
   151   7 final compiler
   152 
   153 Yes! :-) But are we done yet?
   154 
   155 In fact, no, there are still missing dependencies. As far as the tools
   156 themselves are involved, we do not need anything else.
   157 
   158 But gcc has a few pre-requisites. It relies on a few external libraries to
   159 perform some non-trivial tasks (such as handling complex numbers in
   160 constants...). There are a few options to build those libraries. First, one
   161 may think to rely on a Linux distribution to provide those libraries. Alas,
   162 they were not widely available until very, very recently. So, if the distro
   163 is not too recent, chances are that we will have to build those libraries
   164 (which we do below). The affected libraries are:
   165 
   166   - the GNU Multiple Precision Arithmetic Library, GMP
   167   - the C library for multiple-precision floating-point computations with
   168     correct rounding, MPFR
   169   - the C library for the arithmetic of complex numbers, MPC
   170 
   171 The dependencies for those libraries are:
   172 
   173   - MPC requires GMP and MPFR
   174   - MPFR requires GMP
   175   - GMP has no pre-requisite
   176 
   177 So, the build order becomes:
   178 
   179   1 GMP
   180   2 MPFR
   181   3 MPC
   182   4 binutils
   183   5 core pass 1 compiler
   184   6 kernel headers
   185   7 C library headers and start files
   186   8 core pass 2 compiler
   187   9 complete C library
   188  10 final compiler
   189 
   190 Yes! Or yet some more?
   191 
   192 This is now sufficient to build a functional toolchain. So if you've had
   193 enough for now, you can stop here. Or if you are curious, you can continue
   194 reading.
   195 
   196 gcc can also make use of a few other external libraries. These additional,
   197 optional libraries are used to enable advanced features in gcc, such as
   198 loop optimisation (GRAPHITE) and Link Time Optimisation (LTO). If you want
   199 to use these, you'll need three additional libraries:
   200 
   201 To enable GRAPHITE:
   202   - the Parma Polyhedra Library, PPL
   203   - the Chunky Loop Generator, using the PPL backend, CLooG/PPL
   204 
   205 To enable LTO:
   206   - the ELF object file access library, libelf
   207 
   208 The dependencies for those libraries are:
   209 
   210   - PPL requires GMP
   211   - CLooG/PPL requires GMP and PPL
   212   - libelf has no pre-requisites
   213 
   214 The list now looks like (optional libs with a *):
   215 
   216   1 GMP
   217   2 MPFR
   218   3 MPC
   219   4 PPL *
   220   5 CLooG/PPL *
   221   6 libelf *
   222   7 binutils
   223   8 core pass 1 compiler
   224   9 kernel headers
   225  10 C library headers and start files
   226  11 core pass 2 compiler
   227  12 complete C library
   228  13 final compiler
   229 
   230 This list is now complete! Wouhou! :-)
   231 
   232 
   233 So the list is complete. But why does crosstool-NG have more steps? |
   234 --------------------------------------------------------------------+
   235 
   236 The already thirteen steps are the necessary steps, from a theoretical point
   237 of view. In reality, though, there are small differences; there are three
   238 different reasons for the additional steps in crosstool-NG.
   239 
   240 First, the GNU binutils do not support some kinds of output. It is not possible
   241 to generate 'flat' binaries with binutils, so we have to use another component
   242 that adds this support: elf2flt. Another binary utility called sstrip has been
   243 added. It allows for super-stripping the target binaries, although it is not
   244 strictly required.
   245 
   246 Second, crosstool-NG can also build some additional debug utilities to run on
   247 the target. This is where we build, for example, the cross-gdb, the gdbserver
   248 and the native gdb (the last two run on the target, the first runs on the
   249 same machine as the toolchain). The others (strace, ltrace, DUMA and dmalloc)
   250 are absolutely not related to the toolchain, but are nice-to-have stuff that
   251 can greatly help when developing, so are included as goodies (and they are
   252 quite easy to build, so it's OK; more complex stuff is not worth the effort
   253 to include in crosstool-NG).