yann@2320: File.........: 9 - Build procedure overview.txt yann@2908: Copyright....: (C) 2011 Yann E. MORIN yann@2320: License......: Creative Commons Attribution Share Alike (CC-by-sa), v2.5 yann@2320: yann@2320: yann@2321: How is a toolchain constructed? / yann@2321: _______________________________/ yann@2320: yann@2320: This is the result of a discussion with Francesco Turco : yann@2320: http://sourceware.org/ml/crossgcc/2011-01/msg00060.html yann@2320: yann@2320: Francesco has a nice tutorial for beginners, along with a sample, step-by- yann@2320: step procedure to build a toolchain for an ARM target from an x86_64 Debian yann@2320: host: yann@2320: http://fturco.org/wiki/doku.php?id=debian:cross-compiler yann@2320: yann@2320: Thank you Francesco for initiating this! yann@2320: yann@2320: yann@2320: I want a cross-compiler! What is this toolchain you're speaking about? | yann@2320: -----------------------------------------------------------------------+ yann@2320: yann@2320: A cross-compiler is in fact a collection of different tools set up to yann@2320: tightly work together. The tools are arranged in a way that they are yann@2320: chained, in a kind of cascade, where the output from one becomes the yann@2320: input to another one, to ultimately produce the actual binary code that yann@2320: runs on a machine. So, we call this arrangement a "toolchain". When yann@2320: a toolchain is meant to generate code for a machine different from the yann@2320: machine it runs on, this is called a cross-toolchain. yann@2320: yann@2320: yann@2320: So, what are those components in a toolchain? | yann@2320: ----------------------------------------------+ yann@2320: yann@2320: The components that play a role in the toolchain are first and foremost yann@2320: the compiler itself. The compiler turns source code (in C, C++, whatever) yann@2320: into assembly code. The compiler of choice is the GNU compiler collection, yann@2320: well known as 'gcc'. yann@2320: yann@2320: The assembly code is interpreted by the assembler to generate object code. yann@2320: This is done by the binary utilities, such as the GNU 'binutils'. yann@2320: yann@2320: Once the different object code files have been generated, they got to get yann@2320: aggregated together to form the final executable binary. This is called yann@2320: linking, and is achieved with the use of a linker. The GNU 'binutils' also yann@2320: come with a linker. yann@2320: yann@2320: So far, we get a complete toolchain that is capable of turning source code yann@2320: into actual executable code. Depending on the Operating System, or the lack yann@2320: thereof, running on the target, we also need the C library. The C library yann@2320: provides a standard abstraction layer that performs basic tasks (such as yann@2320: allocating memory, printing output on a terminal, managing file access...). antony@2564: There are many C libraries, each targeted to different systems. For the antony@2564: Linux /desktop/, there is glibc or eglibc or even uClibc, for embedded Linux, yann@2320: you have a choice of eglibc or uClibc, while for system without an Operating yann@2320: System, you may use newlib, dietlibc, or even none at all. There a few other antony@2564: C libraries, but they are not as widely used, and/or are targeted to very yann@2320: specific needs (eg. klibc is a very small subset of the C library aimed at antony@2564: building constrained initial ramdisks). yann@2320: yann@2320: Under Linux, the C library needs to know the API to the kernel to decide yann@2320: what features are present, and if needed, what emulation to include for yann@2320: missing features. That API is provided by the kernel headers. Note: this yann@2320: is Linux-specific (and potentially a very few others), the C library on yann@2320: other OSes do not need the kernel headers. yann@2320: yann@2320: yann@2320: And now, how do all these components chained together? | yann@2320: -------------------------------------------------------+ yann@2320: yann@2320: So far, all major components have been covered, but yet there is a specific yann@2320: order they need to be built. Here we see what the dependencies are, starting yann@2320: with the compiler we want to ultimately use. We call that compiler the yann@2320: 'final compiler'. yann@2320: yann@2320: - the final compiler needs the C library, to know how to use it, yann@2320: but: yann@2320: - building the C library requires a compiler yann@2320: yann@2320: A needs B which needs A. This is the classic chicken'n'egg problem... This yann@2320: is solved by building a stripped-down compiler that does not need the C yann@2320: library, but is capable of building it. We call it a bootstrap, initial, or yann@2320: core compiler. So here is the new dependency list: yann@2320: yann@2320: - the final compiler needs the C library, to know how to use it, yann@2320: - building the C library requires a core compiler yann@2320: but: yann@2320: - the core compiler needs the C library headers and start files, to know yann@2320: how to use the C library yann@2320: yann@2320: B needs C which needs B. Chicken'n'egg, again. To solve this one, we will yann@2320: need to build a C library that will only install its headers and start yann@2320: files. The start files are a very few files that gcc needs to be able to yann@2320: turn on thread local storage (TLS) on an NPTL system. So now we have: yann@2320: yann@2320: - the final compiler needs the C library, to know how to use it, yann@2320: - building the C library requires a core compiler yann@2320: - the core compiler needs the C library headers and start files, to know yann@2320: how to use the C library yann@2320: but: yann@2320: - building the start files require a compiler yann@2320: yann@2320: Geez... C needs D which needs C, yet again. So we need to build a yet yann@2320: simpler compiler, that does not need the headers and does need the start yann@2320: files. This compiler is also a bootstrap, initial or core compiler. In order yann@2320: to differentiate the two core compilers, let's call that one "core pass 1", yann@2320: and the former one "core pass 2". The dependency list becomes: yann@2320: yann@2320: - the final compiler needs the C library, to know how to use it, yann@2320: - building the C library requires a compiler yann@2320: - the core pass 2 compiler needs the C library headers and start files, yann@2320: to know how to use the C library yann@2320: - building the start files requires a compiler yann@2320: - we need a core pass 1 compiler yann@2320: yann@2320: And as we said earlier, the C library also requires the kernel headers. yann@2320: There is no requirement for the kernel headers, so end of story in this yann@2320: case: yann@2320: yann@2320: - the final compiler needs the C library, to know how to use it, yann@2320: - building the C library requires a core compiler yann@2320: - the core pass 2 compiler needs the C library headers and start files, yann@2320: to know how to use the C library yann@2320: - building the start files requires a compiler and the kernel headers yann@2320: - we need a core pass 1 compiler yann@2320: yann@2320: We need to add a few new requirements. The moment we compile code for the yann@2320: target, we need the assembler and the linker. Such code is, of course, yann@2320: built from the C library, so we need to build the binutils before the C yann@2320: library start files, and the complete C library itself. Also, some code yann@2320: in gcc will turn to run on the target as well. Luckily, there is no yann@2320: requirement for the binutils. So, our dependency chain is as follows: yann@2320: yann@2320: - the final compiler needs the C library, to know how to use it, and the yann@2320: binutils yann@2320: - building the C library requires a core pass 2 compiler and the binutils yann@2320: - the core pass 2 compiler needs the C library headers and start files, yann@2320: to know how to use the C library, and the binutils yann@2320: - building the start files requires a compiler, the kernel headers and the yann@2320: binutils yann@2320: - the core pass 1 compiler needs the binutils yann@2320: yann@2320: Which turns in this order to build the components: yann@2320: yann@2320: 1 binutils yann@2320: 2 core pass 1 compiler yann@2320: 3 kernel headers yann@2320: 4 C library headers and start files yann@2320: 5 core pass 2 compiler yann@2320: 6 complete C library yann@2320: 7 final compiler yann@2320: yann@2320: Yes! :-) But are we done yet? yann@2320: yann@2320: In fact, no, there are still missing dependencies. As far as the tools yann@2320: themselves are involved, we do not need anything else. yann@2320: yann@2320: But gcc has a few pre-requisites. It relies on a few external libraries to yann@2320: perform some non-trivial tasks (such as handling complex numbers in yann@2320: constants...). There are a few options to build those libraries. First, one yann@2320: may think to rely on a Linux distribution to provide those libraries. Alas, yann@2320: they were not widely available until very, very recently. So, if the distro yann@2320: is not too recent, chances are that we will have to build those libraries yann@2320: (which we do below). The affected libraries are: yann@2320: yann@2320: - the GNU Multiple Precision Arithmetic Library, GMP yann@2320: - the C library for multiple-precision floating-point computations with yann@2320: correct rounding, MPFR yann@2320: - the C library for the arithmetic of complex numbers, MPC yann@2320: antony@2564: The dependencies for those libraries are: yann@2320: yann@2320: - MPC requires GMP and MPFR yann@2320: - MPFR requires GMP yann@2320: - GMP has no pre-requisite yann@2320: yann@2320: So, the build order becomes: yann@2320: yann@2320: 1 GMP yann@2320: 2 MPFR yann@2320: 3 MPC yann@2320: 4 binutils yann@2320: 5 core pass 1 compiler yann@2320: 6 kernel headers yann@2320: 7 C library headers and start files yann@2320: 8 core pass 2 compiler yann@2320: 9 complete C library yann@2320: 10 final compiler yann@2320: yann@2320: Yes! Or yet some more? yann@2320: yann@2320: This is now sufficient to build a functional toolchain. So if you've had yann@2320: enough for now, you can stop here. Or if you are curious, you can continue yann@2320: reading. yann@2320: yann@2320: gcc can also make use of a few other external libraries. These additional, yann@2320: optional libraries are used to enable advanced features in gcc, such as yann@2320: loop optimisation (GRAPHITE) and Link Time Optimisation (LTO). If you want yann@2320: to use these, you'll need three additional libraries: yann@2320: yann@2320: To enable GRAPHITE: yann@2320: - the Parma Polyhedra Library, PPL yann@2320: - the Chunky Loop Generator, using the PPL backend, CLooG/PPL yann@2320: yann@2320: To enable LTO: yann@2320: - the ELF object file access library, libelf yann@2320: antony@2564: The dependencies for those libraries are: yann@2320: yann@2320: - PPL requires GMP yann@2320: - CLooG/PPL requires GMP and PPL yann@2320: - libelf has no pre-requisites yann@2320: yann@2320: The list now looks like (optional libs with a *): yann@2320: yann@2320: 1 GMP yann@2320: 2 MPFR yann@2320: 3 MPC yann@2320: 4 PPL * yann@2320: 5 CLooG/PPL * yann@2320: 6 libelf * yann@2320: 7 binutils yann@2320: 8 core pass 1 compiler yann@2320: 9 kernel headers yann@2320: 10 C library headers and start files yann@2320: 11 core pass 2 compiler yann@2320: 12 complete C library yann@2320: 13 final compiler yann@2320: yann@2320: This list is now complete! Wouhou! :-) yann@2320: yann@2320: yann@2320: So the list is complete. But why does crosstool-NG have more steps? | yann@2320: --------------------------------------------------------------------+ yann@2320: antony@2564: The already thirteen steps are the necessary steps, from a theoretical point yann@2320: of view. In reality, though, there are small differences; there are three yann@2320: different reasons for the additional steps in crosstool-NG. yann@2320: yann@2320: First, the GNU binutils do not support some kinds of output. It is not possible yann@2320: to generate 'flat' binaries with binutils, so we have to use another component yann@2320: that adds this support: elf2flt. Another binary utility called sstrip has been yann@2320: added. It allows for super-stripping the target binaries, although it is not yann@2320: strictly required. yann@2320: yann@2320: Second, some C libraries require another step after the compiler is built, to yann@2320: install additional stuff. This is the case for mingw and newlib. Hence the yann@2320: libc_finish step. yann@2320: yann@2320: Third, crosstool-NG can also build some additional debug utilities to run on yann@2320: the target. This is where we build, for example, the cross-gdb, the gdbserver antony@2564: and the native gdb (the last two run on the target, the first runs on the yann@2320: same machine as the toolchain). The others (strace, ltrace, DUMA and dmalloc) yann@2320: are absolutely not related to the toolchain, but are nice-to-have stuff that antony@2564: can greatly help when developing, so are included as goodies (and they are yann@2320: quite easy to build, so it's OK; more complex stuff is not worth the effort yann@2320: to include in crosstool-NG).