Introduction
One of the hardest parts of getting the correct binary for an embedded
system is the linker template. This little write up comes from my experience of
getting a c++ unit test framework (cpputest) to run on the target hardware.
I build my linker templates in multiple parts as some of it is specific to a single
processor and some is generic to virtually any embedded system.
Requirements
There are a number of things that you need to get right for an embedded target.
Memory
In an embedded target the system will have a number of different types of memory, e.g. Static RAM and FLash. There may be multiple blocks of each memory
type with different characteristics, i.e. access speed.
So the challenges are to get the correct compiler output into the correct
memory block. This may involve some work from the startup code, i.e.
copying initialised data from Flash to SRAM.
Hardware Startup
On reset the processor will start running code from a known point, not totally
accurate for cortex M - the code address is stored at a known location. We need
to have startup code that configures the hardware for us such as memory
controllers for external memory.
C Runtime
The C runtime library will have some startup requirements. These
are responsible for handling internal setup. This may also setup
code to be run on exit from the application main function.
C++ runtime startup
The runtime requirements for c++ is greater and this is where we need to be
more aware of what we are doing. For c++ you need to ensure that constructors
are called for static and global objects. This is where you need to read the
compiler documentation as each compiler will generate this information in a
different way. See link under References for gcc.
Compiler neccesities
There may be other requirements from the compiler, i.e. the use of a compiler
specific library.
Compiler options
There are a number of compiler options that can be used to control
the compiler output. These options can also be used to help reduce the image
size.
Linker sections
The input to the compiler is a number of compiled objects and object libraries
containing named sections. It is these named sections that are pulled into the
resulting binary. For some of these the required sections are easy to identify
as they will be used to satisfy external link requirements of other objects.
There will also be some sections that must be included even without any
reference to them.
The linker template.
The references section links to the linker template files.
The main one of interest is stm32f-sections.ld
1) First of all we need to make sure the interrupt vectors are placed at the
beginning of Flash memory.
- We then include all referenced text sections (executable code).
- We also include read only data (const declarations)
4) We then have the init and fini sections, these are not sorted but included
in the order the objects are provided to the linker. These include the various
crt files that must be kept in order as the compiler provided parts do not
constitute a valid procedure unless included in this correct order. i.e. crti
provides a function prolog for the init function.
5) We then have the preinit_array, init_array and fini_array. These are addresses
generated by the compiler to be run at certain phases of the program.
6) We then create the section of code to run constructors followed by a section
for destructors. As this will be an embedded system we could probably exclude them.
Again there are compiler provided prolog and epilog code (crtbegin and crtend).
7) We then pull in code to do with exception handling and unwinding. Thats is we
compile with exception support.
8) We then pull in all the initailised data. This is given addresses in RAM but stored
in flash. The startup code has to copy this from Flash to RAM before main is called.
- We then allocate space inRAM for uninialised data. (bss)
10) And as a final check we allocate stack space. This is not where the stack will be unless
we really are using all the RAM. its purpose is to cause a link error if the RAM size is too
small.
Notes
The linker template declares some labels that are used by the startup code, i.e. how to find
initialised data.
Improvements
Some arm cortex M cores have multiple RAM banks. e.g. STM32F4 has Core Coupled RAM (CCM)
that is only accessible from the arm core. This CCM could be used for processor stack to reduce
contention when using DMA.