docs/9 - How is a toolchain constructed.txt
author "Yann E. MORIN" <yann.morin.1998@free.fr>
Sun Mar 03 22:24:40 2013 +0100 (2013-03-03)
changeset 3195 cbaf37cc20b7
parent 2908 dcdb309b7967
permissions -rw-r--r--
libc/glibc: do not overwrite existing bits/syscall.h

Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Rafael C <groups.r2@gmail.com>
Cc: Jérôme BARDON <bardon.pro@gmail.com>
Cc: Daniel Price <daniel.price@gmail.com>
yann@2320
     1
File.........: 9 - Build procedure overview.txt
yann@2908
     2
Copyright....: (C) 2011 Yann E. MORIN <yann.morin.1998@free.fr>
yann@2320
     3
License......: Creative Commons Attribution Share Alike (CC-by-sa), v2.5
yann@2320
     4
yann@2320
     5
yann@2321
     6
How is a toolchain constructed? /
yann@2321
     7
_______________________________/
yann@2320
     8
yann@2320
     9
This is the result of a discussion with Francesco Turco <mail@fturco.org>:
yann@2320
    10
  http://sourceware.org/ml/crossgcc/2011-01/msg00060.html
yann@2320
    11
yann@2320
    12
Francesco has a nice tutorial for beginners, along with a sample, step-by-
yann@2320
    13
step procedure to build a toolchain for an ARM target from an x86_64 Debian
yann@2320
    14
host:
yann@2320
    15
  http://fturco.org/wiki/doku.php?id=debian:cross-compiler
yann@2320
    16
yann@2320
    17
Thank you Francesco for initiating this!
yann@2320
    18
yann@2320
    19
yann@2320
    20
I want a cross-compiler! What is this toolchain you're speaking about? |
yann@2320
    21
-----------------------------------------------------------------------+
yann@2320
    22
yann@2320
    23
A cross-compiler is in fact a collection of different tools set up to
yann@2320
    24
tightly work together. The tools are arranged in a way that they are
yann@2320
    25
chained, in a kind of cascade, where the output from one becomes the
yann@2320
    26
input to another one, to ultimately produce the actual binary code that
yann@2320
    27
runs on a machine. So, we call this arrangement a "toolchain". When
yann@2320
    28
a toolchain is meant to generate code for a machine different from the
yann@2320
    29
machine it runs on, this is called a cross-toolchain.
yann@2320
    30
yann@2320
    31
yann@2320
    32
So, what are those components in a toolchain? |
yann@2320
    33
----------------------------------------------+
yann@2320
    34
yann@2320
    35
The components that play a role in the toolchain are first and foremost
yann@2320
    36
the compiler itself. The compiler turns source code (in C, C++, whatever)
yann@2320
    37
into assembly code. The compiler of choice is the GNU compiler collection,
yann@2320
    38
well known as 'gcc'.
yann@2320
    39
yann@2320
    40
The assembly code is interpreted by the assembler to generate object code.
yann@2320
    41
This is done by the binary utilities, such as the GNU 'binutils'.
yann@2320
    42
yann@2320
    43
Once the different object code files have been generated, they got to get
yann@2320
    44
aggregated together to form the final executable binary. This is called
yann@2320
    45
linking, and is achieved with the use of a linker. The GNU 'binutils' also
yann@2320
    46
come with a linker.
yann@2320
    47
yann@2320
    48
So far, we get a complete toolchain that is capable of turning source code
yann@2320
    49
into actual executable code. Depending on the Operating System, or the lack
yann@2320
    50
thereof, running on the target, we also need the C library. The C library
yann@2320
    51
provides a standard abstraction layer that performs basic tasks (such as
yann@2320
    52
allocating memory, printing output on a terminal, managing file access...).
antony@2564
    53
There are many C libraries, each targeted to different systems. For the
antony@2564
    54
Linux /desktop/, there is glibc or eglibc or even uClibc, for embedded Linux,
yann@2320
    55
you have a choice of eglibc or uClibc, while for system without an Operating
yann@2320
    56
System, you may use newlib, dietlibc, or even none at all. There a few other
antony@2564
    57
C libraries, but they are not as widely used, and/or are targeted to very
yann@2320
    58
specific needs (eg. klibc is a very small subset of the C library aimed at
antony@2564
    59
building constrained initial ramdisks).
yann@2320
    60
yann@2320
    61
Under Linux, the C library needs to know the API to the kernel to decide
yann@2320
    62
what features are present, and if needed, what emulation to include for
yann@2320
    63
missing features. That API is provided by the kernel headers. Note: this
yann@2320
    64
is Linux-specific (and potentially a very few others), the C library on
yann@2320
    65
other OSes do not need the kernel headers.
yann@2320
    66
yann@2320
    67
yann@2320
    68
And now, how do all these components chained together? |
yann@2320
    69
-------------------------------------------------------+
yann@2320
    70
yann@2320
    71
So far, all major components have been covered, but yet there is a specific
yann@2320
    72
order they need to be built. Here we see what the dependencies are, starting
yann@2320
    73
with the compiler we want to ultimately use. We call that compiler the
yann@2320
    74
'final compiler'.
yann@2320
    75
yann@2320
    76
  - the final compiler needs the C library, to know how to use it,
yann@2320
    77
but:
yann@2320
    78
  - building the C library requires a compiler
yann@2320
    79
yann@2320
    80
A needs B which needs A. This is the classic chicken'n'egg problem... This
yann@2320
    81
is solved by building a stripped-down compiler that does not need the C
yann@2320
    82
library, but is capable of building it. We call it a bootstrap, initial, or
yann@2320
    83
core compiler. So here is the new dependency list:
yann@2320
    84
yann@2320
    85
  - the final compiler needs the C library, to know how to use it,
yann@2320
    86
  - building the C library requires a core compiler
yann@2320
    87
but:
yann@2320
    88
  - the core compiler needs the C library headers and start files, to know
yann@2320
    89
    how to use the C library
yann@2320
    90
yann@2320
    91
B needs C which needs B. Chicken'n'egg, again. To solve this one, we will
yann@2320
    92
need to build a C library that will only install its headers and start
yann@2320
    93
files. The start files are a very few files that gcc needs to be able to
yann@2320
    94
turn on thread local storage (TLS) on an NPTL system. So now we have:
yann@2320
    95
yann@2320
    96
  - the final compiler needs the C library, to know how to use it,
yann@2320
    97
  - building the C library requires a core compiler
yann@2320
    98
  - the core compiler needs the C library headers and start files, to know
yann@2320
    99
    how to use the C library
yann@2320
   100
but:
yann@2320
   101
  - building the start files require a compiler
yann@2320
   102
yann@2320
   103
Geez... C needs D which needs C, yet again. So we need to build a yet
yann@2320
   104
simpler compiler, that does not need the headers and does need the start
yann@2320
   105
files. This compiler is also a bootstrap, initial or core compiler. In order
yann@2320
   106
to differentiate the two core compilers, let's call that one "core pass 1",
yann@2320
   107
and the former one "core pass 2". The dependency list becomes:
yann@2320
   108
yann@2320
   109
  - the final compiler needs the C library, to know how to use it,
yann@2320
   110
  - building the C library requires a compiler
yann@2320
   111
  - the core pass 2 compiler needs the C library headers and start files,
yann@2320
   112
    to know how to use the C library
yann@2320
   113
  - building the start files requires a compiler
yann@2320
   114
  - we need a core pass 1 compiler
yann@2320
   115
yann@2320
   116
And as we said earlier, the C library also requires the kernel headers.
yann@2320
   117
There is no requirement for the kernel headers, so end of story in this
yann@2320
   118
case:
yann@2320
   119
yann@2320
   120
  - the final compiler needs the C library, to know how to use it,
yann@2320
   121
  - building the C library requires a core compiler
yann@2320
   122
  - the core pass 2 compiler needs the C library headers and start files,
yann@2320
   123
    to know how to use the C library
yann@2320
   124
  - building the start files requires a compiler and the kernel headers
yann@2320
   125
  - we need a core pass 1 compiler
yann@2320
   126
yann@2320
   127
We need to add a few new requirements. The moment we compile code for the
yann@2320
   128
target, we need the assembler and the linker. Such code is, of course,
yann@2320
   129
built from the C library, so we need to build the binutils before the C
yann@2320
   130
library start files, and the complete C library itself. Also, some code
yann@2320
   131
in gcc will turn to run on the target as well. Luckily, there is no
yann@2320
   132
requirement for the binutils. So, our dependency chain is as follows:
yann@2320
   133
yann@2320
   134
  - the final compiler needs the C library, to know how to use it, and the
yann@2320
   135
    binutils
yann@2320
   136
  - building the C library requires a core pass 2 compiler and the binutils
yann@2320
   137
  - the core pass 2 compiler needs the C library headers and start files,
yann@2320
   138
    to know how to use the C library, and the binutils
yann@2320
   139
  - building the start files requires a compiler, the kernel headers and the
yann@2320
   140
    binutils
yann@2320
   141
  - the core pass 1 compiler needs the binutils
yann@2320
   142
yann@2320
   143
Which turns in this order to build the components:
yann@2320
   144
yann@2320
   145
  1 binutils
yann@2320
   146
  2 core pass 1 compiler
yann@2320
   147
  3 kernel headers
yann@2320
   148
  4 C library headers and start files
yann@2320
   149
  5 core pass 2 compiler
yann@2320
   150
  6 complete C library
yann@2320
   151
  7 final compiler
yann@2320
   152
yann@2320
   153
Yes! :-) But are we done yet?
yann@2320
   154
yann@2320
   155
In fact, no, there are still missing dependencies. As far as the tools
yann@2320
   156
themselves are involved, we do not need anything else.
yann@2320
   157
yann@2320
   158
But gcc has a few pre-requisites. It relies on a few external libraries to
yann@2320
   159
perform some non-trivial tasks (such as handling complex numbers in
yann@2320
   160
constants...). There are a few options to build those libraries. First, one
yann@2320
   161
may think to rely on a Linux distribution to provide those libraries. Alas,
yann@2320
   162
they were not widely available until very, very recently. So, if the distro
yann@2320
   163
is not too recent, chances are that we will have to build those libraries
yann@2320
   164
(which we do below). The affected libraries are:
yann@2320
   165
yann@2320
   166
  - the GNU Multiple Precision Arithmetic Library, GMP
yann@2320
   167
  - the C library for multiple-precision floating-point computations with
yann@2320
   168
    correct rounding, MPFR
yann@2320
   169
  - the C library for the arithmetic of complex numbers, MPC
yann@2320
   170
antony@2564
   171
The dependencies for those libraries are:
yann@2320
   172
yann@2320
   173
  - MPC requires GMP and MPFR
yann@2320
   174
  - MPFR requires GMP
yann@2320
   175
  - GMP has no pre-requisite
yann@2320
   176
yann@2320
   177
So, the build order becomes:
yann@2320
   178
yann@2320
   179
  1 GMP
yann@2320
   180
  2 MPFR
yann@2320
   181
  3 MPC
yann@2320
   182
  4 binutils
yann@2320
   183
  5 core pass 1 compiler
yann@2320
   184
  6 kernel headers
yann@2320
   185
  7 C library headers and start files
yann@2320
   186
  8 core pass 2 compiler
yann@2320
   187
  9 complete C library
yann@2320
   188
 10 final compiler
yann@2320
   189
yann@2320
   190
Yes! Or yet some more?
yann@2320
   191
yann@2320
   192
This is now sufficient to build a functional toolchain. So if you've had
yann@2320
   193
enough for now, you can stop here. Or if you are curious, you can continue
yann@2320
   194
reading.
yann@2320
   195
yann@2320
   196
gcc can also make use of a few other external libraries. These additional,
yann@2320
   197
optional libraries are used to enable advanced features in gcc, such as
yann@2320
   198
loop optimisation (GRAPHITE) and Link Time Optimisation (LTO). If you want
yann@2320
   199
to use these, you'll need three additional libraries:
yann@2320
   200
yann@2320
   201
To enable GRAPHITE:
yann@2320
   202
  - the Parma Polyhedra Library, PPL
yann@2320
   203
  - the Chunky Loop Generator, using the PPL backend, CLooG/PPL
yann@2320
   204
yann@2320
   205
To enable LTO:
yann@2320
   206
  - the ELF object file access library, libelf
yann@2320
   207
antony@2564
   208
The dependencies for those libraries are:
yann@2320
   209
yann@2320
   210
  - PPL requires GMP
yann@2320
   211
  - CLooG/PPL requires GMP and PPL
yann@2320
   212
  - libelf has no pre-requisites
yann@2320
   213
yann@2320
   214
The list now looks like (optional libs with a *):
yann@2320
   215
yann@2320
   216
  1 GMP
yann@2320
   217
  2 MPFR
yann@2320
   218
  3 MPC
yann@2320
   219
  4 PPL *
yann@2320
   220
  5 CLooG/PPL *
yann@2320
   221
  6 libelf *
yann@2320
   222
  7 binutils
yann@2320
   223
  8 core pass 1 compiler
yann@2320
   224
  9 kernel headers
yann@2320
   225
 10 C library headers and start files
yann@2320
   226
 11 core pass 2 compiler
yann@2320
   227
 12 complete C library
yann@2320
   228
 13 final compiler
yann@2320
   229
yann@2320
   230
This list is now complete! Wouhou! :-)
yann@2320
   231
yann@2320
   232
yann@2320
   233
So the list is complete. But why does crosstool-NG have more steps? |
yann@2320
   234
--------------------------------------------------------------------+
yann@2320
   235
antony@2564
   236
The already thirteen steps are the necessary steps, from a theoretical point
yann@2320
   237
of view. In reality, though, there are small differences; there are three
yann@2320
   238
different reasons for the additional steps in crosstool-NG.
yann@2320
   239
yann@2320
   240
First, the GNU binutils do not support some kinds of output. It is not possible
yann@2320
   241
to generate 'flat' binaries with binutils, so we have to use another component
yann@2320
   242
that adds this support: elf2flt. Another binary utility called sstrip has been
yann@2320
   243
added. It allows for super-stripping the target binaries, although it is not
yann@2320
   244
strictly required.
yann@2320
   245
yann@3162
   246
Second, crosstool-NG can also build some additional debug utilities to run on
yann@2320
   247
the target. This is where we build, for example, the cross-gdb, the gdbserver
antony@2564
   248
and the native gdb (the last two run on the target, the first runs on the
yann@2320
   249
same machine as the toolchain). The others (strace, ltrace, DUMA and dmalloc)
yann@2320
   250
are absolutely not related to the toolchain, but are nice-to-have stuff that
antony@2564
   251
can greatly help when developing, so are included as goodies (and they are
yann@2320
   252
quite easy to build, so it's OK; more complex stuff is not worth the effort
yann@2320
   253
to include in crosstool-NG).