docs/9 - How is a toolchain constructed.txt
author "Benoît Thébaudeau" <benoit.thebaudeau@advansee.com>
Mon Apr 16 15:25:36 2012 +0200 (2012-04-16)
changeset 2941 13e40098fffc
parent 2564 5d4e91c0343e
child 3162 e51eb0a614c7
permissions -rw-r--r--
cc/gcc: update Linaro GCC revisions to 2012.04

Update Linaro GCC with the latest available revisions.

The 4.7 revision is also released, but the infrastructure is not yet ready for
it in CT-NG.

Signed-off-by: "Benoît Thébaudeau" <benoit.thebaudeau@advansee.com>
yann@2320
     1
File.........: 9 - Build procedure overview.txt
yann@2908
     2
Copyright....: (C) 2011 Yann E. MORIN <yann.morin.1998@free.fr>
yann@2320
     3
License......: Creative Commons Attribution Share Alike (CC-by-sa), v2.5
yann@2320
     4
yann@2320
     5
yann@2321
     6
How is a toolchain constructed? /
yann@2321
     7
_______________________________/
yann@2320
     8
yann@2320
     9
This is the result of a discussion with Francesco Turco <mail@fturco.org>:
yann@2320
    10
  http://sourceware.org/ml/crossgcc/2011-01/msg00060.html
yann@2320
    11
yann@2320
    12
Francesco has a nice tutorial for beginners, along with a sample, step-by-
yann@2320
    13
step procedure to build a toolchain for an ARM target from an x86_64 Debian
yann@2320
    14
host:
yann@2320
    15
  http://fturco.org/wiki/doku.php?id=debian:cross-compiler
yann@2320
    16
yann@2320
    17
Thank you Francesco for initiating this!
yann@2320
    18
yann@2320
    19
yann@2320
    20
I want a cross-compiler! What is this toolchain you're speaking about? |
yann@2320
    21
-----------------------------------------------------------------------+
yann@2320
    22
yann@2320
    23
A cross-compiler is in fact a collection of different tools set up to
yann@2320
    24
tightly work together. The tools are arranged in a way that they are
yann@2320
    25
chained, in a kind of cascade, where the output from one becomes the
yann@2320
    26
input to another one, to ultimately produce the actual binary code that
yann@2320
    27
runs on a machine. So, we call this arrangement a "toolchain". When
yann@2320
    28
a toolchain is meant to generate code for a machine different from the
yann@2320
    29
machine it runs on, this is called a cross-toolchain.
yann@2320
    30
yann@2320
    31
yann@2320
    32
So, what are those components in a toolchain? |
yann@2320
    33
----------------------------------------------+
yann@2320
    34
yann@2320
    35
The components that play a role in the toolchain are first and foremost
yann@2320
    36
the compiler itself. The compiler turns source code (in C, C++, whatever)
yann@2320
    37
into assembly code. The compiler of choice is the GNU compiler collection,
yann@2320
    38
well known as 'gcc'.
yann@2320
    39
yann@2320
    40
The assembly code is interpreted by the assembler to generate object code.
yann@2320
    41
This is done by the binary utilities, such as the GNU 'binutils'.
yann@2320
    42
yann@2320
    43
Once the different object code files have been generated, they got to get
yann@2320
    44
aggregated together to form the final executable binary. This is called
yann@2320
    45
linking, and is achieved with the use of a linker. The GNU 'binutils' also
yann@2320
    46
come with a linker.
yann@2320
    47
yann@2320
    48
So far, we get a complete toolchain that is capable of turning source code
yann@2320
    49
into actual executable code. Depending on the Operating System, or the lack
yann@2320
    50
thereof, running on the target, we also need the C library. The C library
yann@2320
    51
provides a standard abstraction layer that performs basic tasks (such as
yann@2320
    52
allocating memory, printing output on a terminal, managing file access...).
antony@2564
    53
There are many C libraries, each targeted to different systems. For the
antony@2564
    54
Linux /desktop/, there is glibc or eglibc or even uClibc, for embedded Linux,
yann@2320
    55
you have a choice of eglibc or uClibc, while for system without an Operating
yann@2320
    56
System, you may use newlib, dietlibc, or even none at all. There a few other
antony@2564
    57
C libraries, but they are not as widely used, and/or are targeted to very
yann@2320
    58
specific needs (eg. klibc is a very small subset of the C library aimed at
antony@2564
    59
building constrained initial ramdisks).
yann@2320
    60
yann@2320
    61
Under Linux, the C library needs to know the API to the kernel to decide
yann@2320
    62
what features are present, and if needed, what emulation to include for
yann@2320
    63
missing features. That API is provided by the kernel headers. Note: this
yann@2320
    64
is Linux-specific (and potentially a very few others), the C library on
yann@2320
    65
other OSes do not need the kernel headers.
yann@2320
    66
yann@2320
    67
yann@2320
    68
And now, how do all these components chained together? |
yann@2320
    69
-------------------------------------------------------+
yann@2320
    70
yann@2320
    71
So far, all major components have been covered, but yet there is a specific
yann@2320
    72
order they need to be built. Here we see what the dependencies are, starting
yann@2320
    73
with the compiler we want to ultimately use. We call that compiler the
yann@2320
    74
'final compiler'.
yann@2320
    75
yann@2320
    76
  - the final compiler needs the C library, to know how to use it,
yann@2320
    77
but:
yann@2320
    78
  - building the C library requires a compiler
yann@2320
    79
yann@2320
    80
A needs B which needs A. This is the classic chicken'n'egg problem... This
yann@2320
    81
is solved by building a stripped-down compiler that does not need the C
yann@2320
    82
library, but is capable of building it. We call it a bootstrap, initial, or
yann@2320
    83
core compiler. So here is the new dependency list:
yann@2320
    84
yann@2320
    85
  - the final compiler needs the C library, to know how to use it,
yann@2320
    86
  - building the C library requires a core compiler
yann@2320
    87
but:
yann@2320
    88
  - the core compiler needs the C library headers and start files, to know
yann@2320
    89
    how to use the C library
yann@2320
    90
yann@2320
    91
B needs C which needs B. Chicken'n'egg, again. To solve this one, we will
yann@2320
    92
need to build a C library that will only install its headers and start
yann@2320
    93
files. The start files are a very few files that gcc needs to be able to
yann@2320
    94
turn on thread local storage (TLS) on an NPTL system. So now we have:
yann@2320
    95
yann@2320
    96
  - the final compiler needs the C library, to know how to use it,
yann@2320
    97
  - building the C library requires a core compiler
yann@2320
    98
  - the core compiler needs the C library headers and start files, to know
yann@2320
    99
    how to use the C library
yann@2320
   100
but:
yann@2320
   101
  - building the start files require a compiler
yann@2320
   102
yann@2320
   103
Geez... C needs D which needs C, yet again. So we need to build a yet
yann@2320
   104
simpler compiler, that does not need the headers and does need the start
yann@2320
   105
files. This compiler is also a bootstrap, initial or core compiler. In order
yann@2320
   106
to differentiate the two core compilers, let's call that one "core pass 1",
yann@2320
   107
and the former one "core pass 2". The dependency list becomes:
yann@2320
   108
yann@2320
   109
  - the final compiler needs the C library, to know how to use it,
yann@2320
   110
  - building the C library requires a compiler
yann@2320
   111
  - the core pass 2 compiler needs the C library headers and start files,
yann@2320
   112
    to know how to use the C library
yann@2320
   113
  - building the start files requires a compiler
yann@2320
   114
  - we need a core pass 1 compiler
yann@2320
   115
yann@2320
   116
And as we said earlier, the C library also requires the kernel headers.
yann@2320
   117
There is no requirement for the kernel headers, so end of story in this
yann@2320
   118
case:
yann@2320
   119
yann@2320
   120
  - the final compiler needs the C library, to know how to use it,
yann@2320
   121
  - building the C library requires a core compiler
yann@2320
   122
  - the core pass 2 compiler needs the C library headers and start files,
yann@2320
   123
    to know how to use the C library
yann@2320
   124
  - building the start files requires a compiler and the kernel headers
yann@2320
   125
  - we need a core pass 1 compiler
yann@2320
   126
yann@2320
   127
We need to add a few new requirements. The moment we compile code for the
yann@2320
   128
target, we need the assembler and the linker. Such code is, of course,
yann@2320
   129
built from the C library, so we need to build the binutils before the C
yann@2320
   130
library start files, and the complete C library itself. Also, some code
yann@2320
   131
in gcc will turn to run on the target as well. Luckily, there is no
yann@2320
   132
requirement for the binutils. So, our dependency chain is as follows:
yann@2320
   133
yann@2320
   134
  - the final compiler needs the C library, to know how to use it, and the
yann@2320
   135
    binutils
yann@2320
   136
  - building the C library requires a core pass 2 compiler and the binutils
yann@2320
   137
  - the core pass 2 compiler needs the C library headers and start files,
yann@2320
   138
    to know how to use the C library, and the binutils
yann@2320
   139
  - building the start files requires a compiler, the kernel headers and the
yann@2320
   140
    binutils
yann@2320
   141
  - the core pass 1 compiler needs the binutils
yann@2320
   142
yann@2320
   143
Which turns in this order to build the components:
yann@2320
   144
yann@2320
   145
  1 binutils
yann@2320
   146
  2 core pass 1 compiler
yann@2320
   147
  3 kernel headers
yann@2320
   148
  4 C library headers and start files
yann@2320
   149
  5 core pass 2 compiler
yann@2320
   150
  6 complete C library
yann@2320
   151
  7 final compiler
yann@2320
   152
yann@2320
   153
Yes! :-) But are we done yet?
yann@2320
   154
yann@2320
   155
In fact, no, there are still missing dependencies. As far as the tools
yann@2320
   156
themselves are involved, we do not need anything else.
yann@2320
   157
yann@2320
   158
But gcc has a few pre-requisites. It relies on a few external libraries to
yann@2320
   159
perform some non-trivial tasks (such as handling complex numbers in
yann@2320
   160
constants...). There are a few options to build those libraries. First, one
yann@2320
   161
may think to rely on a Linux distribution to provide those libraries. Alas,
yann@2320
   162
they were not widely available until very, very recently. So, if the distro
yann@2320
   163
is not too recent, chances are that we will have to build those libraries
yann@2320
   164
(which we do below). The affected libraries are:
yann@2320
   165
yann@2320
   166
  - the GNU Multiple Precision Arithmetic Library, GMP
yann@2320
   167
  - the C library for multiple-precision floating-point computations with
yann@2320
   168
    correct rounding, MPFR
yann@2320
   169
  - the C library for the arithmetic of complex numbers, MPC
yann@2320
   170
antony@2564
   171
The dependencies for those libraries are:
yann@2320
   172
yann@2320
   173
  - MPC requires GMP and MPFR
yann@2320
   174
  - MPFR requires GMP
yann@2320
   175
  - GMP has no pre-requisite
yann@2320
   176
yann@2320
   177
So, the build order becomes:
yann@2320
   178
yann@2320
   179
  1 GMP
yann@2320
   180
  2 MPFR
yann@2320
   181
  3 MPC
yann@2320
   182
  4 binutils
yann@2320
   183
  5 core pass 1 compiler
yann@2320
   184
  6 kernel headers
yann@2320
   185
  7 C library headers and start files
yann@2320
   186
  8 core pass 2 compiler
yann@2320
   187
  9 complete C library
yann@2320
   188
 10 final compiler
yann@2320
   189
yann@2320
   190
Yes! Or yet some more?
yann@2320
   191
yann@2320
   192
This is now sufficient to build a functional toolchain. So if you've had
yann@2320
   193
enough for now, you can stop here. Or if you are curious, you can continue
yann@2320
   194
reading.
yann@2320
   195
yann@2320
   196
gcc can also make use of a few other external libraries. These additional,
yann@2320
   197
optional libraries are used to enable advanced features in gcc, such as
yann@2320
   198
loop optimisation (GRAPHITE) and Link Time Optimisation (LTO). If you want
yann@2320
   199
to use these, you'll need three additional libraries:
yann@2320
   200
yann@2320
   201
To enable GRAPHITE:
yann@2320
   202
  - the Parma Polyhedra Library, PPL
yann@2320
   203
  - the Chunky Loop Generator, using the PPL backend, CLooG/PPL
yann@2320
   204
yann@2320
   205
To enable LTO:
yann@2320
   206
  - the ELF object file access library, libelf
yann@2320
   207
antony@2564
   208
The dependencies for those libraries are:
yann@2320
   209
yann@2320
   210
  - PPL requires GMP
yann@2320
   211
  - CLooG/PPL requires GMP and PPL
yann@2320
   212
  - libelf has no pre-requisites
yann@2320
   213
yann@2320
   214
The list now looks like (optional libs with a *):
yann@2320
   215
yann@2320
   216
  1 GMP
yann@2320
   217
  2 MPFR
yann@2320
   218
  3 MPC
yann@2320
   219
  4 PPL *
yann@2320
   220
  5 CLooG/PPL *
yann@2320
   221
  6 libelf *
yann@2320
   222
  7 binutils
yann@2320
   223
  8 core pass 1 compiler
yann@2320
   224
  9 kernel headers
yann@2320
   225
 10 C library headers and start files
yann@2320
   226
 11 core pass 2 compiler
yann@2320
   227
 12 complete C library
yann@2320
   228
 13 final compiler
yann@2320
   229
yann@2320
   230
This list is now complete! Wouhou! :-)
yann@2320
   231
yann@2320
   232
yann@2320
   233
So the list is complete. But why does crosstool-NG have more steps? |
yann@2320
   234
--------------------------------------------------------------------+
yann@2320
   235
antony@2564
   236
The already thirteen steps are the necessary steps, from a theoretical point
yann@2320
   237
of view. In reality, though, there are small differences; there are three
yann@2320
   238
different reasons for the additional steps in crosstool-NG.
yann@2320
   239
yann@2320
   240
First, the GNU binutils do not support some kinds of output. It is not possible
yann@2320
   241
to generate 'flat' binaries with binutils, so we have to use another component
yann@2320
   242
that adds this support: elf2flt. Another binary utility called sstrip has been
yann@2320
   243
added. It allows for super-stripping the target binaries, although it is not
yann@2320
   244
strictly required.
yann@2320
   245
yann@2320
   246
Second, some C libraries require another step after the compiler is built, to
yann@2320
   247
install additional stuff. This is the case for mingw and newlib. Hence the
yann@2320
   248
libc_finish step.
yann@2320
   249
yann@2320
   250
Third, crosstool-NG can also build some additional debug utilities to run on
yann@2320
   251
the target. This is where we build, for example, the cross-gdb, the gdbserver
antony@2564
   252
and the native gdb (the last two run on the target, the first runs on the
yann@2320
   253
same machine as the toolchain). The others (strace, ltrace, DUMA and dmalloc)
yann@2320
   254
are absolutely not related to the toolchain, but are nice-to-have stuff that
antony@2564
   255
can greatly help when developing, so are included as goodies (and they are
yann@2320
   256
quite easy to build, so it's OK; more complex stuff is not worth the effort
yann@2320
   257
to include in crosstool-NG).