docs/9 - How is a toolchain constructed.txt
changeset 2322 892351110ce8
parent 2321 d896b85e8738
child 2563 e17f35b05539
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/docs/9 - How is a toolchain constructed.txt	Sun Feb 27 15:27:54 2011 +0100
     1.3 @@ -0,0 +1,257 @@
     1.4 +File.........: 9 - Build procedure overview.txt
     1.5 +Copyrigth....: (C) 2011 Yann E. MORIN <yann.morin.1998@anciens.enib.fr>
     1.6 +License......: Creative Commons Attribution Share Alike (CC-by-sa), v2.5
     1.7 +
     1.8 +
     1.9 +How is a toolchain constructed? /
    1.10 +_______________________________/
    1.11 +
    1.12 +This is the result of a discussion with Francesco Turco <mail@fturco.org>:
    1.13 +  http://sourceware.org/ml/crossgcc/2011-01/msg00060.html
    1.14 +
    1.15 +Francesco has a nice tutorial for beginners, along with a sample, step-by-
    1.16 +step procedure to build a toolchain for an ARM target from an x86_64 Debian
    1.17 +host:
    1.18 +  http://fturco.org/wiki/doku.php?id=debian:cross-compiler
    1.19 +
    1.20 +Thank you Francesco for initiating this!
    1.21 +
    1.22 +
    1.23 +I want a cross-compiler! What is this toolchain you're speaking about? |
    1.24 +-----------------------------------------------------------------------+
    1.25 +
    1.26 +A cross-compiler is in fact a collection of different tools set up to
    1.27 +tightly work together. The tools are arranged in a way that they are
    1.28 +chained, in a kind of cascade, where the output from one becomes the
    1.29 +input to another one, to ultimately produce the actual binary code that
    1.30 +runs on a machine. So, we call this arrangement a "toolchain". When
    1.31 +a toolchain is meant to generate code for a machine different from the
    1.32 +machine it runs on, this is called a cross-toolchain.
    1.33 +
    1.34 +
    1.35 +So, what are those components in a toolchain? |
    1.36 +----------------------------------------------+
    1.37 +
    1.38 +The components that play a role in the toolchain are first and foremost
    1.39 +the compiler itself. The compiler turns source code (in C, C++, whatever)
    1.40 +into assembly code. The compiler of choice is the GNU compiler collection,
    1.41 +well known as 'gcc'.
    1.42 +
    1.43 +The assembly code is interpreted by the assembler to generate object code.
    1.44 +This is done by the binary utilities, such as the GNU 'binutils'.
    1.45 +
    1.46 +Once the different object code files have been generated, they got to get
    1.47 +aggregated together to form the final executable binary. This is called
    1.48 +linking, and is achieved with the use of a linker. The GNU 'binutils' also
    1.49 +come with a linker.
    1.50 +
    1.51 +So far, we get a complete toolchain that is capable of turning source code
    1.52 +into actual executable code. Depending on the Operating System, or the lack
    1.53 +thereof, running on the target, we also need the C library. The C library
    1.54 +provides a standard abstraction layer that performs basic tasks (such as
    1.55 +allocating memory, printing output on a terminal, managing file access...).
    1.56 +There are many C libraries, each targetted to different systems. For the
    1.57 +Linux /desktop/, there is glibc or eglibc or ven uClibc, for embeded Linux,
    1.58 +you have a choice of eglibc or uClibc, while for system without an Operating
    1.59 +System, you may use newlib, dietlibc, or even none at all. There a few other
    1.60 +C libraries, but they are not as widely used, and/or are targetted to very
    1.61 +specific needs (eg. klibc is a very small subset of the C library aimed at
    1.62 +building contrained initial ramdisks).
    1.63 +
    1.64 +Under Linux, the C library needs to know the API to the kernel to decide
    1.65 +what features are present, and if needed, what emulation to include for
    1.66 +missing features. That API is provided by the kernel headers. Note: this
    1.67 +is Linux-specific (and potentially a very few others), the C library on
    1.68 +other OSes do not need the kernel headers.
    1.69 +
    1.70 +
    1.71 +And now, how do all these components chained together? |
    1.72 +-------------------------------------------------------+
    1.73 +
    1.74 +So far, all major components have been covered, but yet there is a specific
    1.75 +order they need to be built. Here we see what the dependencies are, starting
    1.76 +with the compiler we want to ultimately use. We call that compiler the
    1.77 +'final compiler'.
    1.78 +
    1.79 +  - the final compiler needs the C library, to know how to use it,
    1.80 +but:
    1.81 +  - building the C library requires a compiler
    1.82 +
    1.83 +A needs B which needs A. This is the classic chicken'n'egg problem... This
    1.84 +is solved by building a stripped-down compiler that does not need the C
    1.85 +library, but is capable of building it. We call it a bootstrap, initial, or
    1.86 +core compiler. So here is the new dependency list:
    1.87 +
    1.88 +  - the final compiler needs the C library, to know how to use it,
    1.89 +  - building the C library requires a core compiler
    1.90 +but:
    1.91 +  - the core compiler needs the C library headers and start files, to know
    1.92 +    how to use the C library
    1.93 +
    1.94 +B needs C which needs B. Chicken'n'egg, again. To solve this one, we will
    1.95 +need to build a C library that will only install its headers and start
    1.96 +files. The start files are a very few files that gcc needs to be able to
    1.97 +turn on thread local storage (TLS) on an NPTL system. So now we have:
    1.98 +
    1.99 +  - the final compiler needs the C library, to know how to use it,
   1.100 +  - building the C library requires a core compiler
   1.101 +  - the core compiler needs the C library headers and start files, to know
   1.102 +    how to use the C library
   1.103 +but:
   1.104 +  - building the start files require a compiler
   1.105 +
   1.106 +Geez... C needs D which needs C, yet again. So we need to build a yet
   1.107 +simpler compiler, that does not need the headers and does need the start
   1.108 +files. This compiler is also a bootstrap, initial or core compiler. In order
   1.109 +to differentiate the two core compilers, let's call that one "core pass 1",
   1.110 +and the former one "core pass 2". The dependency list becomes:
   1.111 +
   1.112 +  - the final compiler needs the C library, to know how to use it,
   1.113 +  - building the C library requires a compiler
   1.114 +  - the core pass 2 compiler needs the C library headers and start files,
   1.115 +    to know how to use the C library
   1.116 +  - building the start files requires a compiler
   1.117 +  - we need a core pass 1 compiler
   1.118 +
   1.119 +And as we said earlier, the C library also requires the kernel headers.
   1.120 +There is no requirement for the kernel headers, so end of story in this
   1.121 +case:
   1.122 +
   1.123 +  - the final compiler needs the C library, to know how to use it,
   1.124 +  - building the C library requires a core compiler
   1.125 +  - the core pass 2 compiler needs the C library headers and start files,
   1.126 +    to know how to use the C library
   1.127 +  - building the start files requires a compiler and the kernel headers
   1.128 +  - we need a core pass 1 compiler
   1.129 +
   1.130 +We need to add a few new requirements. The moment we compile code for the
   1.131 +target, we need the assembler and the linker. Such code is, of course,
   1.132 +built from the C library, so we need to build the binutils before the C
   1.133 +library start files, and the complete C library itself. Also, some code
   1.134 +in gcc will turn to run on the target as well. Luckily, there is no
   1.135 +requirement for the binutils. So, our dependency chain is as follows:
   1.136 +
   1.137 +  - the final compiler needs the C library, to know how to use it, and the
   1.138 +    binutils
   1.139 +  - building the C library requires a core pass 2 compiler and the binutils
   1.140 +  - the core pass 2 compiler needs the C library headers and start files,
   1.141 +    to know how to use the C library, and the binutils
   1.142 +  - building the start files requires a compiler, the kernel headers and the
   1.143 +    binutils
   1.144 +  - the core pass 1 compiler needs the binutils
   1.145 +
   1.146 +Which turns in this order to build the components:
   1.147 +
   1.148 +  1 binutils
   1.149 +  2 core pass 1 compiler
   1.150 +  3 kernel headers
   1.151 +  4 C library headers and start files
   1.152 +  5 core pass 2 compiler
   1.153 +  6 complete C library
   1.154 +  7 final compiler
   1.155 +
   1.156 +Yes! :-) But are we done yet?
   1.157 +
   1.158 +In fact, no, there are still missing dependencies. As far as the tools
   1.159 +themselves are involved, we do not need anything else.
   1.160 +
   1.161 +But gcc has a few pre-requisites. It relies on a few external libraries to
   1.162 +perform some non-trivial tasks (such as handling complex numbers in
   1.163 +constants...). There are a few options to build those libraries. First, one
   1.164 +may think to rely on a Linux distribution to provide those libraries. Alas,
   1.165 +they were not widely available until very, very recently. So, if the distro
   1.166 +is not too recent, chances are that we will have to build those libraries
   1.167 +(which we do below). The affected libraries are:
   1.168 +
   1.169 +  - the GNU Multiple Precision Arithmetic Library, GMP
   1.170 +  - the C library for multiple-precision floating-point computations with
   1.171 +    correct rounding, MPFR
   1.172 +  - the C library for the arithmetic of complex numbers, MPC
   1.173 +
   1.174 +The dependencies for those liraries are:
   1.175 +
   1.176 +  - MPC requires GMP and MPFR
   1.177 +  - MPFR requires GMP
   1.178 +  - GMP has no pre-requisite
   1.179 +
   1.180 +So, the build order becomes:
   1.181 +
   1.182 +  1 GMP
   1.183 +  2 MPFR
   1.184 +  3 MPC
   1.185 +  4 binutils
   1.186 +  5 core pass 1 compiler
   1.187 +  6 kernel headers
   1.188 +  7 C library headers and start files
   1.189 +  8 core pass 2 compiler
   1.190 +  9 complete C library
   1.191 + 10 final compiler
   1.192 +
   1.193 +Yes! Or yet some more?
   1.194 +
   1.195 +This is now sufficient to build a functional toolchain. So if you've had
   1.196 +enough for now, you can stop here. Or if you are curious, you can continue
   1.197 +reading.
   1.198 +
   1.199 +gcc can also make use of a few other external libraries. These additional,
   1.200 +optional libraries are used to enable advanced features in gcc, such as
   1.201 +loop optimisation (GRAPHITE) and Link Time Optimisation (LTO). If you want
   1.202 +to use these, you'll need three additional libraries:
   1.203 +
   1.204 +To enable GRAPHITE:
   1.205 +  - the Parma Polyhedra Library, PPL
   1.206 +  - the Chunky Loop Generator, using the PPL backend, CLooG/PPL
   1.207 +
   1.208 +To enable LTO:
   1.209 +  - the ELF object file access library, libelf
   1.210 +
   1.211 +The depencies for those libraries are:
   1.212 +
   1.213 +  - PPL requires GMP
   1.214 +  - CLooG/PPL requires GMP and PPL
   1.215 +  - libelf has no pre-requisites
   1.216 +
   1.217 +The list now looks like (optional libs with a *):
   1.218 +
   1.219 +  1 GMP
   1.220 +  2 MPFR
   1.221 +  3 MPC
   1.222 +  4 PPL *
   1.223 +  5 CLooG/PPL *
   1.224 +  6 libelf *
   1.225 +  7 binutils
   1.226 +  8 core pass 1 compiler
   1.227 +  9 kernel headers
   1.228 + 10 C library headers and start files
   1.229 + 11 core pass 2 compiler
   1.230 + 12 complete C library
   1.231 + 13 final compiler
   1.232 +
   1.233 +This list is now complete! Wouhou! :-)
   1.234 +
   1.235 +
   1.236 +So the list is complete. But why does crosstool-NG have more steps? |
   1.237 +--------------------------------------------------------------------+
   1.238 +
   1.239 +The already thirteen steps are the necessary steps, from a theorical point
   1.240 +of view. In reality, though, there are small differences; there are three
   1.241 +different reasons for the additional steps in crosstool-NG.
   1.242 +
   1.243 +First, the GNU binutils do not support some kinds of output. It is not possible
   1.244 +to generate 'flat' binaries with binutils, so we have to use another component
   1.245 +that adds this support: elf2flt. Another binary utility called sstrip has been
   1.246 +added. It allows for super-stripping the target binaries, although it is not
   1.247 +strictly required.
   1.248 +
   1.249 +Second, some C libraries require another step after the compiler is built, to
   1.250 +install additional stuff. This is the case for mingw and newlib. Hence the
   1.251 +libc_finish step.
   1.252 +
   1.253 +Third, crosstool-NG can also build some additional debug utilities to run on
   1.254 +the target. This is where we build, for example, the cross-gdb, the gdbserver
   1.255 +and the native gdb (the last two run on the target, the furst runs on the
   1.256 +same machine as the toolchain). The others (strace, ltrace, DUMA and dmalloc)
   1.257 +are absolutely not related to the toolchain, but are nice-to-have stuff that
   1.258 +can greatly help when developping, so are included as goodies (and they are
   1.259 +quite easy to build, so it's OK; more complex stuff is not worth the effort
   1.260 +to include in crosstool-NG).