docs/9 - Build procedure overview.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257

File.........: 9 - Build procedure overview.txt
Copyrigth....: (C) 2011 Yann E. MORIN <yann.morin.1998@anciens.enib.fr>
License......: Creative Commons Attribution Share Alike (CC-by-sa), v2.5


How is a toolchain constructed? /
_______________________________/

This is the result of a discussion with Francesco Turco <mail@fturco.org>:
  http://sourceware.org/ml/crossgcc/2011-01/msg00060.html

Francesco has a nice tutorial for beginners, along with a sample, step-by-
step procedure to build a toolchain for an ARM target from an x86_64 Debian
host:
  http://fturco.org/wiki/doku.php?id=debian:cross-compiler

Thank you Francesco for initiating this!


I want a cross-compiler! What is this toolchain you're speaking about? |
-----------------------------------------------------------------------+

A cross-compiler is in fact a collection of different tools set up to
tightly work together. The tools are arranged in a way that they are
chained, in a kind of cascade, where the output from one becomes the
input to another one, to ultimately produce the actual binary code that
runs on a machine. So, we call this arrangement a "toolchain". When
a toolchain is meant to generate code for a machine different from the
machine it runs on, this is called a cross-toolchain.


So, what are those components in a toolchain? |
----------------------------------------------+

The components that play a role in the toolchain are first and foremost
the compiler itself. The compiler turns source code (in C, C++, whatever)
into assembly code. The compiler of choice is the GNU compiler collection,
well known as 'gcc'.

The assembly code is interpreted by the assembler to generate object code.
This is done by the binary utilities, such as the GNU 'binutils'.

Once the different object code files have been generated, they got to get
aggregated together to form the final executable binary. This is called
linking, and is achieved with the use of a linker. The GNU 'binutils' also
come with a linker.

So far, we get a complete toolchain that is capable of turning source code
into actual executable code. Depending on the Operating System, or the lack
thereof, running on the target, we also need the C library. The C library
provides a standard abstraction layer that performs basic tasks (such as
allocating memory, printing output on a terminal, managing file access...).
There are many C libraries, each targetted to different systems. For the
Linux /desktop/, there is glibc or eglibc or ven uClibc, for embeded Linux,
you have a choice of eglibc or uClibc, while for system without an Operating
System, you may use newlib, dietlibc, or even none at all. There a few other
C libraries, but they are not as widely used, and/or are targetted to very
specific needs (eg. klibc is a very small subset of the C library aimed at
building contrained initial ramdisks).

Under Linux, the C library needs to know the API to the kernel to decide
what features are present, and if needed, what emulation to include for
missing features. That API is provided by the kernel headers. Note: this
is Linux-specific (and potentially a very few others), the C library on
other OSes do not need the kernel headers.


And now, how do all these components chained together? |
-------------------------------------------------------+

So far, all major components have been covered, but yet there is a specific
order they need to be built. Here we see what the dependencies are, starting
with the compiler we want to ultimately use. We call that compiler the
'final compiler'.

  - the final compiler needs the C library, to know how to use it,
but:
  - building the C library requires a compiler

A needs B which needs A. This is the classic chicken'n'egg problem... This
is solved by building a stripped-down compiler that does not need the C
library, but is capable of building it. We call it a bootstrap, initial, or
core compiler. So here is the new dependency list:

  - the final compiler needs the C library, to know how to use it,
  - building the C library requires a core compiler
but:
  - the core compiler needs the C library headers and start files, to know
    how to use the C library

B needs C which needs B. Chicken'n'egg, again. To solve this one, we will
need to build a C library that will only install its headers and start
files. The start files are a very few files that gcc needs to be able to
turn on thread local storage (TLS) on an NPTL system. So now we have:

  - the final compiler needs the C library, to know how to use it,
  - building the C library requires a core compiler
  - the core compiler needs the C library headers and start files, to know
    how to use the C library
but:
  - building the start files require a compiler

Geez... C needs D which needs C, yet again. So we need to build a yet
simpler compiler, that does not need the headers and does need the start
files. This compiler is also a bootstrap, initial or core compiler. In order
to differentiate the two core compilers, let's call that one "core pass 1",
and the former one "core pass 2". The dependency list becomes:

  - the final compiler needs the C library, to know how to use it,
  - building the C library requires a compiler
  - the core pass 2 compiler needs the C library headers and start files,
    to know how to use the C library
  - building the start files requires a compiler
  - we need a core pass 1 compiler

And as we said earlier, the C library also requires the kernel headers.
There is no requirement for the kernel headers, so end of story in this
case:

  - the final compiler needs the C library, to know how to use it,
  - building the C library requires a core compiler
  - the core pass 2 compiler needs the C library headers and start files,
    to know how to use the C library
  - building the start files requires a compiler and the kernel headers
  - we need a core pass 1 compiler

We need to add a few new requirements. The moment we compile code for the
target, we need the assembler and the linker. Such code is, of course,
built from the C library, so we need to build the binutils before the C
library start files, and the complete C library itself. Also, some code
in gcc will turn to run on the target as well. Luckily, there is no
requirement for the binutils. So, our dependency chain is as follows:

  - the final compiler needs the C library, to know how to use it, and the
    binutils
  - building the C library requires a core pass 2 compiler and the binutils
  - the core pass 2 compiler needs the C library headers and start files,
    to know how to use the C library, and the binutils
  - building the start files requires a compiler, the kernel headers and the
    binutils
  - the core pass 1 compiler needs the binutils

Which turns in this order to build the components:

  1 binutils
  2 core pass 1 compiler
  3 kernel headers
  4 C library headers and start files
  5 core pass 2 compiler
  6 complete C library
  7 final compiler

Yes! :-) But are we done yet?

In fact, no, there are still missing dependencies. As far as the tools
themselves are involved, we do not need anything else.

But gcc has a few pre-requisites. It relies on a few external libraries to
perform some non-trivial tasks (such as handling complex numbers in
constants...). There are a few options to build those libraries. First, one
may think to rely on a Linux distribution to provide those libraries. Alas,
they were not widely available until very, very recently. So, if the distro
is not too recent, chances are that we will have to build those libraries
(which we do below). The affected libraries are:

  - the GNU Multiple Precision Arithmetic Library, GMP
  - the C library for multiple-precision floating-point computations with
    correct rounding, MPFR
  - the C library for the arithmetic of complex numbers, MPC

The dependencies for those liraries are:

  - MPC requires GMP and MPFR
  - MPFR requires GMP
  - GMP has no pre-requisite

So, the build order becomes:

  1 GMP
  2 MPFR
  3 MPC
  4 binutils
  5 core pass 1 compiler
  6 kernel headers
  7 C library headers and start files
  8 core pass 2 compiler
  9 complete C library
 10 final compiler

Yes! Or yet some more?

This is now sufficient to build a functional toolchain. So if you've had
enough for now, you can stop here. Or if you are curious, you can continue
reading.

gcc can also make use of a few other external libraries. These additional,
optional libraries are used to enable advanced features in gcc, such as
loop optimisation (GRAPHITE) and Link Time Optimisation (LTO). If you want
to use these, you'll need three additional libraries:

To enable GRAPHITE:
  - the Parma Polyhedra Library, PPL
  - the Chunky Loop Generator, using the PPL backend, CLooG/PPL

To enable LTO:
  - the ELF object file access library, libelf

The depencies for those libraries are:

  - PPL requires GMP
  - CLooG/PPL requires GMP and PPL
  - libelf has no pre-requisites

The list now looks like (optional libs with a *):

  1 GMP
  2 MPFR
  3 MPC
  4 PPL *
  5 CLooG/PPL *
  6 libelf *
  7 binutils
  8 core pass 1 compiler
  9 kernel headers
 10 C library headers and start files
 11 core pass 2 compiler
 12 complete C library
 13 final compiler

This list is now complete! Wouhou! :-)


So the list is complete. But why does crosstool-NG have more steps? |
--------------------------------------------------------------------+

The already thirteen steps are the necessary steps, from a theorical point
of view. In reality, though, there are small differences; there are three
different reasons for the additional steps in crosstool-NG.

First, the GNU binutils do not support some kinds of output. It is not possible
to generate 'flat' binaries with binutils, so we have to use another component
that adds this support: elf2flt. Another binary utility called sstrip has been
added. It allows for super-stripping the target binaries, although it is not
strictly required.

Second, some C libraries require another step after the compiler is built, to
install additional stuff. This is the case for mingw and newlib. Hence the
libc_finish step.

Third, crosstool-NG can also build some additional debug utilities to run on
the target. This is where we build, for example, the cross-gdb, the gdbserver
and the native gdb (the last two run on the target, the furst runs on the
same machine as the toolchain). The others (strace, ltrace, DUMA and dmalloc)
are absolutely not related to the toolchain, but are nice-to-have stuff that
can greatly help when developping, so are included as goodies (and they are
quite easy to build, so it's OK; more complex stuff is not worth the effort
to include in crosstool-NG).