2756 lines
106 KiB
Plaintext
2756 lines
106 KiB
Plaintext
this is a rough draft, if/when I complete this draft I will at some point
|
|
go back through and rework it to improve it.
|
|
Update: draft 2. I went through almost all of this and cleaned it up.
|
|
Update: draft 3. Lots of typos and misspellings that I had missed before
|
|
|
|
THIS IS NOT AN ASSEMBLY LANGUAGE TUTORIAL, IT DOES HAVE A LOT OF
|
|
ASSEMBLY LANGUAGE IT IT. IF YOU ARE STUCK FOCUSING ON THE ASSEMBLY
|
|
LANGUAGE YOU ARE MISSING OUT, THE FOCUS IS CONTROLLING THE TOOLS SO
|
|
THAT THINGS ARE PLACED WHERE WE WANT THEM TO BE PLACED SO THE PROCESSOR
|
|
BOOTS RIGHT AND LAUNCHES OUR C PROGRAM, AND SO OUR C FUNCTIONS CAN
|
|
CALL OTHER C FUNCTIONS. ASSEMBLY LANGUAGE KNOWLEDGE IS NOT REQUIRED
|
|
FOR THIS TUTORIAL. ASSEMBLY LANGUAGE KNOWLEDGE IS NOT REQUIRED FOR
|
|
THIS TUTORIAL. ASSEMBLY LANGUAGE KNOWLEDGE IS NOT REQUIRED FOR THIS
|
|
TUTORIAL.
|
|
|
|
See the top level README for information on where to find the
|
|
schematic and programmers reference manual for the ARM processor
|
|
on the Raspberry Pi. Also find information on how to load and run
|
|
these programs.
|
|
|
|
This was originally written for the ARM11 based Raspberry Pi since
|
|
then a Cortex-A7 based (Raspberry Pi 2) has come out. When you get
|
|
to this point the ARM11 based uses a file named kernel.img the
|
|
Cortex-A7 uses one named kernel7.img. I will use kernel.img in the
|
|
text, but if you are on a Raspberry Pi 2 use kernel7.img instead.
|
|
|
|
The purpose of this tutorial is to give you a foundation for bare
|
|
metal programming. The actual touching of registers and making the
|
|
chip do things is not addressed here, that is the purpose of the
|
|
individual blinker and uart examples. This tutorial is about mastering
|
|
the toolchain to understand the foundation of those programs and also
|
|
to allow you to create your own and hopefully avoid common traps.
|
|
|
|
First and foremost, what is bare metal programming? You are going to
|
|
get different answers to that question from people who say they are
|
|
bare metal programmers. I would say most of them are right despite the
|
|
difference of opinion on specific details.
|
|
|
|
To try to generalize my opinion of this I would start by saying that
|
|
bare metal programming means you are talking to the hardware directly,
|
|
bypassing an operating system, or certainly if you have no real/formal
|
|
operating system running. Processors/computers do not require operating
|
|
systems to run. Operating systems are just programs anyway themselves
|
|
perhaps being considered bare metal programming.
|
|
|
|
|
|
To begin bare metal programming you start by understanding how the
|
|
processor boots, how and where it loads and executes its first
|
|
instruciton, and then making programs that fit that model, placing the
|
|
first instruction of your program such that the processor executes it
|
|
when it boots.
|
|
|
|
The second generalization I will make is that with bare metal programming
|
|
you are often programming registers and memory for peripherals directly.
|
|
For example printf() is not bare metal, there are way too many layers of
|
|
stuff often landing in system calls which are often tied to an operating
|
|
system. That doesnt mean you cant rig up a printf that works in a bare
|
|
metal environment, but it does contradict the concept of bare metal.
|
|
This of course is a gray area for the definition. For example if you
|
|
wanted to read items off of or write things to the sd card, using a
|
|
filesystem most programmers even if they create all the code from
|
|
scratch are going to end up with some sort of layered approach, at one
|
|
end is low level bare metal talking to registers that wiggle things on a
|
|
bus somewhere on the other end some sort of open file or create file,
|
|
read file, close file, etc. Being your own creation it doesnt have to
|
|
conform to any other file function call standard fopen(), fclose(),
|
|
etc. So what happens when one person writes some bare metal code, no
|
|
operating system involved, that can open, read, write, close files on
|
|
the sd card on the Raspberry Pi, then shares that code? Is that bare
|
|
metal? Tough question.
|
|
|
|
I have seen some folks argue that you are not bare metal if you are
|
|
not writing in assembly. I would argue back maybe you are not bare
|
|
metal if you are not writing machine code. I keep my bare metal
|
|
definition to no operating system (unless the operating system IS the
|
|
bare metal program you are writing) and programming peripherals,
|
|
etc, directly from your program, or through libraries but not through
|
|
an operating system.
|
|
|
|
To continue this tutorial you are going to be exposed to my personal
|
|
preferences which are not a bare metal thing in general but my personal
|
|
bare metal things. These will be explained as we go. I have been
|
|
around the block many times, I have been burned by compilers and
|
|
manuals and other things and am trying to share some of those experiences
|
|
at the same time when I had been around the block fewer times I was that
|
|
person that refused to take someone elses code as is. I always had to
|
|
rewrite it myself before even trying it. What I have learned since is
|
|
that unless the other persons programming environment or tools or
|
|
whatever are not so painful to get up and running, you should make an
|
|
attempt to use their environment with their code the way they do it.
|
|
In particular for these kinds of things that you have not learned and
|
|
dont know how to do but the author appears to know how to do, THEN,
|
|
start to make that code your own. Eventually if you are like me,
|
|
completely replacing all of it including the environment. Other than
|
|
the potential pain of trying to get their environment up and running,
|
|
this path of just trying it their way then re-inventing the wheel to
|
|
make it your own, will have greater success sooner and less frustration.
|
|
|
|
I assume you are running Linux. The things I am doing here for the
|
|
most part can be done easily in Windows or on a MAC, but I am not going
|
|
to get into explaining certain things three times or N times to cover
|
|
all the possible operating system variations. I tend to run a 64
|
|
bit Linux, I switched from Ubuntu to Linux Mint when the post gnome 2
|
|
disaster happened. Linux Mint has worked to salvage the Linux desktop
|
|
for everyone else and I am using Mint now. I do have a number of
|
|
computers or laptops that I develop on and not all run the same distro
|
|
or version. For the most part the focus will be on using the gnu tools
|
|
(binutils and gcc) and other than forward slashes vs backslashes in
|
|
path names there should be nothing operating system specific about
|
|
this discussion.
|
|
|
|
So as soon as we say no operating system, we open a big can of worms.
|
|
That is as big a problem as the fear of programming peripherals directly,
|
|
perhaps the biggest problem of bare metal programming. Why is it a
|
|
problem? Well lets think about the classic hello world C program and
|
|
maybe what you do or dont realize is going on. In some way, shape, or
|
|
form you have installed a C compiler on your computer, and they tell
|
|
you how to compile your first hello world program and it works. One or
|
|
a few includes, the main() function and a single printf() call. Well
|
|
there is a HUGE amount of stuff behind that program, it is not one
|
|
trivial line of code. A myriad of C libraries required, math libraries,
|
|
etc all to support the uber generic printf function and whatever format
|
|
string you might send to it. That is just scratching the surface
|
|
the C libraries that are linked in, a number of them have an intimate
|
|
relationship with the operating system. The C libraries nor printf
|
|
code itself handles the console directly, it makes calls to the operating
|
|
system and its myriad of drivers that ultimately illuminate pixels on
|
|
the screen. When you go bare metal YOU have to do all of this, a hello
|
|
world printf() program should NOT be your first bare metal program.
|
|
Generally your first bare metal program is turning an led on and off
|
|
assuming the hardware folks have provided an led you can turn on and
|
|
off with software (usually a good idea for them to do that). Later a
|
|
uart with individual characters then later a string, but a formatted
|
|
string, perhaps never.
|
|
|
|
Note this discussion is limited to assembly language and C. This is one
|
|
of those personal preference things. In my opinion if you want to be
|
|
a bare metal programmer you need to know C, no exceptions. And at least
|
|
some assembly, dont have to be an assembly guru, just enough to get
|
|
into your C program and perhaps support interrupts or other exceptions.
|
|
You should work to make your C programming strong though.
|
|
|
|
Another one of my simplifications in life is I try to avoid C library
|
|
calls in my bare metal C programs and further I try to avoid compiler
|
|
specific library calls, we will see what that means in a bit.
|
|
|
|
A C compiler is just a program that takes an input and produces an
|
|
output. That program is compiled to run on a particular computer, my
|
|
computer. That compiler's job is to create other programs that will
|
|
also run natively on my computer. The Raspberry Pi uses an ARM
|
|
processor, most computers out there (servers, desktops and laptops) are
|
|
running some flavor of the x86 instruction set, generally Intel or AMD
|
|
chips. ARM is a completely separate company from intel and AMD and
|
|
their processors use a completely different and incompatible in any
|
|
way instruction set. On a side note Intel and AMD make chips, ARM does
|
|
not make chips it just sells its processor designs to people who make
|
|
chips.
|
|
|
|
It is quite possible to use a compiler on my computer to generate a
|
|
program that runs on an ARM processor. A general term for a compiler
|
|
that runs on one computer but produces output (instructions) that are
|
|
for another computer/instruction set is called a cross compiler.
|
|
|
|
Just because a compiler is open source does not mean that that compiler
|
|
can be made to be a cross compiler. Some/many compilers in history are
|
|
targetted to their native platform and not cross compiler capable. GCC
|
|
is designed to generate code for many different instruction sets on
|
|
the backend. And itself can be built as a cross compiler, but the way
|
|
GCC works for each architecture you want to target you need to compile
|
|
gcc for that architecture. LLVM/Clang for example is designed from
|
|
the ground up to be both a traditional compiler and a Just In Time tool,
|
|
so its output remains mostly target independent until Just In Time. I
|
|
suspect it is mostly used as a static compiler though. It has a backend
|
|
that turns the generic into target specific. A big difference from
|
|
the gnu tools is that the default build of this backend can output
|
|
for any of the supported targets with the one tool. No need to re-build
|
|
for each desired target.
|
|
|
|
Just because a compiler CAN be built as a cross compiler does not mean
|
|
it is a good compiler, the more generic you get the more you take away
|
|
from tuning for a particular instruction set. Both GNU tools and LLVM
|
|
do a pretty good job in general for each target. Understanding that
|
|
each target is maintained to some extent by individuals and different
|
|
individuals produce different quality code so either of these toolchains
|
|
might have a bad apple or two due to the maturity of the target or the
|
|
individual or team working on it but other targets may be mature.
|
|
|
|
This tutorial is going to focus primarily on the gnu toolchain,
|
|
which is one of those that can be used as a cross compiler but is not
|
|
trivial to make it a cross compiler.
|
|
|
|
Fairly soon you will need some tools. At first we only need binutils
|
|
which is GNU's collection of assembler and linker tools. There are
|
|
other tools in there, the assembler and linker are the first we care
|
|
about. This is NOT a tutorial on teaching assembly language, you will
|
|
see some, but just enough to get a C programming running. That means
|
|
we will need a C compiler as well fairly soon. Now I say that this
|
|
is a non-trivial task. Since this is more of a moving target than
|
|
this README (hopefully), see the file TOOLCHAIN in this directory for
|
|
info on finding a gnu toolchain for your platform.
|
|
|
|
As with C libraries, I also try to not use gcc libraries (I will let
|
|
you figure out what that means). This is one of those personal things
|
|
not a general bare metal thing, and the benefit here is that I am only
|
|
relying on the compiler to do the job of compiling, turn C into ASM.
|
|
Dont try to do more than that. I become less dependent on the specific
|
|
compiler and the code is more portable.
|
|
|
|
So you will need a GNU ARM cross compiler toolchain. binutils and gcc
|
|
at a minimum, more than that is beyond the scope of this tutorial, have
|
|
fun. If you cant get that toolchain up you may be stuck at this point.
|
|
Now the one get out of jail free card you have here is that your
|
|
Raspberry Pi can run Linux, and you can get a native, non-cross-compiler
|
|
ARM gnu toolchain on your Raspberry Pi when running Linux fairly easy.
|
|
Simply prepare a Linux sd card for your Raspberry Pi and use it as a
|
|
normal computer. At the price point of a Raspberry Pi, if you want to
|
|
do it this way you might want to have a second Raspberry Pi. One as a
|
|
Linux development machine where you create the programs and the other as
|
|
the bare metal machine where you try to run those programs. Where you
|
|
see arm-none-eabi-gcc for example, on an ARM based Linux system just
|
|
type gcc instead. If you are using the Linux cross compiler you may
|
|
have something like arm-Linux-gnueabi-gcc. If I have done my work right
|
|
then any one of these will work. If you are on an x86 computer though
|
|
the gcc command by itself WILL NOT WORK. Let me say that again WILL
|
|
NOT WORK (it builds x86 programs not ARM).
|
|
|
|
Well beyond the scope of this document but you can also run Linux in a
|
|
virtual machine like qemu, and within that virtual machine like running
|
|
on a Raspberry Pi, you can then use a native ARM compiler. And there
|
|
are other ARM based boards as well the BeagleBones and such that can
|
|
run Linux and have a native gnu toolchain.
|
|
|
|
For bare metal the first thing we have to learn is how does our
|
|
processor/computer boot. We have to know this so we can make our
|
|
program work, we have to build our program so that the first
|
|
instruction in our program is placed in the computer such that it is
|
|
the first instruction run by the computer.
|
|
|
|
The Raspberry Pi is very much NON STANDARD with respect to how the ARM
|
|
is brought up. ARM processors boot in one of two ways normally. The
|
|
normal way an ARM boots is the first instruction executed its at address
|
|
0x00000000. The Cortex-M processors specifically (the Raspberry Pi does
|
|
NOT use a Cortex-M) the ADDRESS of the first instruction executed is at
|
|
address 0x00000004, the processor reads 0x00000004 then uses the value
|
|
read as an address, and then starts executing there. The Raspberry Pi
|
|
contains two primary processors one is a GPU, a processor dedicated to
|
|
graphics processing. It is a fully capable general purpose processor
|
|
with floating point and other features that allow it to be used for
|
|
graphics as well. The GPU and the ARM share the rest of the chip
|
|
resources for the most part, they share the same RAM, they share the
|
|
peripherals, etc. The GPU boots first, how exactly, I dont know, it
|
|
eventually reads and things from the sd card, then it reads the file
|
|
kernel.img which it loads into ram for us. Then the GPU controls the
|
|
ARM boot. So where does the GPU place the ARM code? What address?
|
|
Well that is part of the problem. From our (users) perspective, the
|
|
firmware available at the time that the Raspberry Pi first hit the
|
|
streets was placing kernel.img in memory such that the first instruction
|
|
it executed that we had control over was at address 0x00000000.
|
|
Understand that the purpose for the Raspberry Pi is to run Linux (for
|
|
educational purposes) and at least on ARM, the Linux kernel (also
|
|
known as a kernel image) is typically loaded at ARM address 0x00008000.
|
|
So those early (to us) kernel.img files had 0x8000 bytes of padding.
|
|
Later this was changed to a typical kernel.img that instead of being
|
|
loaded at address 0x00000000 was loaded at 0x00008000.
|
|
|
|
So the typical setup is the GPU copies the kernel.img contents to
|
|
address 0x00008000 in the ARM address space, then it places code at
|
|
address 0x00000000 which does a little bit of prep then branches to the
|
|
kernel.img code at offset 0x00008000. Since kernel.img is our entry
|
|
point, it is the ARM boot code that we can control, we have to build our
|
|
program based on where the bytes in this file are placed and how it is
|
|
used. The presence of a file named config.txt and its contents can
|
|
change the way the GPU boots the ARM, including moving where this file
|
|
is placed and/or what address the ARM boots. All of these things
|
|
combined can put the contents of the file in memory where you didnt
|
|
expect and your program may not run properly.
|
|
|
|
Here is another one of my personal preferences to deal with. I prefer
|
|
to use the most current GPU firmware files from the Raspberry Pi
|
|
repository: bootcode.bin and start.elf. I prefer to not use config.txt,
|
|
not have a file named that on the sd card, and the only other file
|
|
being kernel.img that I create instead of the one from the Raspberry Pi
|
|
folks. This means that I prefer to deal with how the kernel.img file
|
|
is used for the Linux folks. From the time that I received my first
|
|
Raspberry Pi to the present, the up to date bootcode.bin and start.elf
|
|
have placed kernel.img at 0x00008000 in ARM address space, and that is
|
|
my ARM entry point. 0x00008000 is the location for the first ARM
|
|
instruction that we choose to control.
|
|
|
|
So now we are ready to approach our first program. We know that our
|
|
program is a file named kernel.img which is just a binary file that
|
|
is copied to ARM memory space at address 0x00008000. We have built
|
|
and/or installed a gnu cross compiler for ARM, at a minimum binutils
|
|
and gcc.
|
|
|
|
Now now for another preference of mine. If you think about your C
|
|
programming experience, although you may have been taught to avoid
|
|
global variables at all costs you know they exist and you have or
|
|
should have been taught at least something about them. Even if you
|
|
have not you have no doubt initialized static local variables:
|
|
|
|
unsigned int apple;
|
|
unsigned int orange = 5;
|
|
int main ( void )
|
|
{
|
|
static unsigned int pear = 7;
|
|
unsigned int peach;
|
|
...
|
|
}
|
|
|
|
With the code above as a C programmer your are taught that apple will
|
|
have the value zero, orange and pear will have the values indicated in
|
|
the code when the body of your main program runs. Now you should also
|
|
know that peach will be undefined, you have to assign it a value before
|
|
you can safely use it.
|
|
-How does all of that happen?
|
|
-Is there C code that runs before main() is called that prepares memory
|
|
so that your program has those memory locations filled with values?
|
|
|
|
If that were the case and it was C code, and that C code made the same
|
|
assumptions about variables being pre-initialized, would there be C code
|
|
that preceeds that code? This feels like a "Which came first, the
|
|
chicken or the egg" problem. But it is not. The answer is there is
|
|
some code written in assembly language the is executed before main() is
|
|
called and that assembly language code prepares these memory locations
|
|
so that when your C code starts apple, orange and pear have the proper
|
|
values loaded. This assembly language code is often called the
|
|
bootstrap code. A very appropriate term for us as that small bit of
|
|
assembly language code will both be the boot code for the ARM, the first
|
|
instructions, that we control, that the ARM runs and it is also the
|
|
code that we are using to prepare memory, etc so that the C programs
|
|
work as desired.
|
|
|
|
And this is my preference on this with respect to bare metal. For the
|
|
code that follows and much of the code in my repos, I DO NOT support the
|
|
initializing of variables in the way described above. If you were to
|
|
take one of my examples and add the apple orange and pear variables
|
|
above you should not expect to get 0, 5, and 7. Further what you do
|
|
find you should not expect to find every time, simply make no assumptions
|
|
about the starting contents of variables. This is my preference not a
|
|
generic bare metal thing. It is a problem that you have to solve for
|
|
generic bare metal programming and this is how I solved it. When you
|
|
finish this tutorial go over to the bssdata directory, and read about
|
|
why I do it the way I do it and what other work you have to do to insure
|
|
those variables are pre-initialized before main() is called. The short
|
|
answer is it involves toolchain specific things you have to do, and I
|
|
prefer to lean toward more portable including portable across toolchains
|
|
(minimizing effort to port) solutions. So one thing is I try to make
|
|
my C code so that it does not use "implementation defined" features of
|
|
the language (that do not port from one compiler to another, inline
|
|
assembly for example). Second I try to keep the boot code and linker
|
|
scripts, etc as simple as possible with a little sacrifice on adding
|
|
some more code. Linker scripts in particular are toolchain specific
|
|
and the the entry label and perhaps other boostrap items are also
|
|
toolchain specific. You will see what all of that means in the bssdata
|
|
directory.
|
|
|
|
Also note that I do not use main() as the entry point funciton in my
|
|
code. The first time I learned all of this stuff the compiler tools I
|
|
was using at the time would add extra junk to your binary when it saw
|
|
the word main(). If you used some other name then it would not add
|
|
that junk, and not bloat the binary. The Raspberry Pi has relatively
|
|
lots of memory at 128KB+ for the ARM. In the embedded bare metal
|
|
programming world you very often face 8KB or 16Kb or 32KB etc and you
|
|
cannot afford the toolchain sucking up chunks of that memory with stuff
|
|
you are not using. Part of bare metal programming is you being in
|
|
control of everything, the code, the peripherals, and the binary.
|
|
|
|
Good, bad, or otherwise the GNU tools dominate, binutils which includes
|
|
an assembler, linker and library tools and gcc which includes a C
|
|
compiler and can include other things. One of the pro's is that when
|
|
you learn the GNU tools for one platform most of that knowledge
|
|
translates to other platforms (learn embedded ARM with gnu tools and
|
|
the learning curve for MIPS is much smaller). What are the tools we
|
|
are going to be using? We should at this point already know that gcc
|
|
is the C compiler and we can compile our programs into something called
|
|
an object or your experience may be limited to creating binaries from
|
|
your C program without seeing any of the intermediate files. There is
|
|
actually a bit of hidden magic that goes on. When you compile your
|
|
hello world program on your Linux machine, the first one or few files
|
|
generated is your C code in different forms they make another file
|
|
which is your C code plus all of the includes expanded into that file.
|
|
Eventually the actual C compiler is called and that turns the C code
|
|
into assembly language in a text file. Yes, assembly language. Then
|
|
the assembler is called by the compiler and the assembler assembles
|
|
the assembly language into an object file, which in this case is a
|
|
flavor of binary file that has most of the instructions in machine code
|
|
but is not a complete binary because there may be some functions or
|
|
variables in other objects that wont be resolved until link time. For
|
|
our hello world printf to output something it needs to link with a C
|
|
library which makes system calls and may or may not have to link with
|
|
other stuff. So the linker takes the object that came from our code
|
|
and links that with these other items and creates a binary that is
|
|
compatible with the operating system we are running.
|
|
|
|
The next thing we have to know is there can be a difference between the
|
|
entry point into our program and the first instruction in the program.
|
|
If you think about it most programs we use a compiler for run on
|
|
operating systems. The operating system loads the program from the
|
|
filesystem into memory and then performs a jump into that memory, it
|
|
can jump to any address. It may or may not do that but it is at least
|
|
possible on a system that is already running. But for booting a
|
|
processor we cannot change the processor to boot anywhere we want and on
|
|
the Raspberry Pi we cant or at least shouldnt try to change its habit
|
|
of executing the first instruction in the kernel.img file. So we have
|
|
to make sure we control the whole linking process to insure that happens.
|
|
|
|
I think we have enough ammo to stop chatting and start writing some
|
|
programs. I hope you dont hate me at this point but this tutorial
|
|
is not actually going to run any programs on the Raspberry Pi, in order
|
|
to build a brick wall someone has to show you how to mix the mortar and
|
|
how to build that wall one layer at a time, the right amount of mortar
|
|
per layer, how to keep the rows straight and keep the wall from leaning
|
|
one way or the other. I mentioned at the beginning that bare metal
|
|
programming is as much about knowing and manipulating the compiler tools
|
|
as it is about manipulating peripheral registers. Before we can even
|
|
begin to talk about peripherals we have to have code that actually
|
|
runs on the hardware. We will touch on perhiperals in the sense
|
|
that I will borrow from my other programs in this repository that
|
|
already talk about the peripheral side of bare metal. This directory
|
|
is about the compiler side of bare metal. Your takeaway here is being
|
|
able to understand why my bare metal examples work.
|
|
|
|
The GNU linker is looking for a label named _start to know where the
|
|
entry point of the program is. It is possible to override or replace
|
|
this with something on the linker command line, it is easy enough to
|
|
just use that label, so we will do that.
|
|
|
|
The bare minimum bootstrap code for this processor would be to set
|
|
the stack pointer and to branch to our C program. Now I use notmain()
|
|
as the name of my entry point into C. But you ask: What is a stack
|
|
pointer? You should have learned about stacks in general in your prior
|
|
programming training or experience. The stack is nothing more than a
|
|
chunk of memory. How it differs from memory is not that it is special
|
|
because it is not, it is how it is accessed. Our apple and orange
|
|
variables above are global, they are at a fixed place in memory, lets
|
|
say they end up after compiling and linking these variables end up at
|
|
addresses 0x1234 and 0x1238 respectively. Any code in any function that
|
|
wants to access them will after compiling and linking be accessing those
|
|
addresses. But what about our peach variable above, that is a local
|
|
variable and you may have been told that that "lives on the stack".
|
|
Instead of being at a fixed address in memory, the peach variable will,
|
|
after compiling and linking be at a fixed OFFSET in memory, offset
|
|
relative to what? Relative to the stack pointer at some point in time
|
|
in the function. The stack pointer is simply a register that holds a
|
|
number which is an address in memory. Not special memory just memory on
|
|
this platform the same memory we use for our program and our variables.
|
|
When the compiler converts our C code into assembly code one of the
|
|
things it has to do is manage these local variables and other things.
|
|
Any C function that has local variables will cause the compiler to
|
|
create code that moves the stack pointer as a way to allocate memory
|
|
for that variable. We will cover this topic more as we go, for now
|
|
understand that the minimum bootstrap code for this platform is to set
|
|
the stack pointer and then to branch to our top level C function. Here
|
|
is some code that does that:
|
|
|
|
.globl _start
|
|
_start:
|
|
mov sp,#0x00010000
|
|
b notmain
|
|
|
|
Now I told you this is not a lesson in assembly language programming,
|
|
but we will be looking at assembly language even if we dont know exactly
|
|
what all the code means or does. Many may disagree with me but
|
|
disassembling your program is one of the fastest and easiest ways to
|
|
debug your bare metal code. I will keep saying this, a big part of bare
|
|
metal programming is knowing your compiler tools, very often, esp with
|
|
bootstrap code your bug may not be in the code itself but in the way you
|
|
used the tools, the command lines or linker scripts that you used to
|
|
compile and link that code. Get it wrong and no matter how bug free
|
|
your C code is it will not run and you will have a hard time figuring
|
|
it out without looking at what the compiler and linker generated. So
|
|
the above code starts with a directive .globl, .global also works, both
|
|
do the same thing, declare the label _start as global meaning it is
|
|
visible to the linker. In C everything (functions and non-local
|
|
variables) is global unless you put the word static in front of it then
|
|
it becomes local:
|
|
|
|
static unsigned int apple;
|
|
unsigned int orange:
|
|
|
|
The apple variable which becomes a label or an address in assembler
|
|
would not be global, where orange would be marked as global.
|
|
|
|
We read above that _start is a special name the linker is looking for.
|
|
The linker interprets this as our entry point. Since we are not running
|
|
this program on an operating system for example it doesnt actually
|
|
matter if _start is our entry point, but for places where it is used
|
|
it is a good habit to place it at our entry point for sake of habit. And
|
|
that is what we are doing here.
|
|
|
|
The mov sp line basically says put the number 0x00010000 in the
|
|
register named sp, which is an alias for r13. R13 in the ARM is a
|
|
register that has special use as the stack pointer. Registers in a
|
|
processor are very much like variables in a C program in how they are
|
|
used.
|
|
|
|
And the last line b notmain means branch to notmain. Branch is also
|
|
known as a jump in other assembly languages and is exactly like a goto
|
|
in C.
|
|
|
|
We are going to start using the tools that you installed, this step
|
|
may be a major research project for you or it might just work. You
|
|
might only need to set the path to your tools to make this all work
|
|
( "baremetal >" being the command prompt):
|
|
|
|
baremetal > arm-none-eabi-as --version
|
|
arm-none-eabi-as: command not found
|
|
baremetal > PATH=/opt/gnuarm/bin/:$PATH
|
|
baremetal > arm-none-eabi-as --version
|
|
GNU assembler (GNU Binutils) 2.22
|
|
Copyright 2011 Free Software Foundation, Inc.
|
|
This program is free software; you may redistribute it under the terms of
|
|
the GNU General Public License version 3 or later.
|
|
This program has absolutely no warranty.
|
|
This assembler was configured for a target of `arm-none-eabi'.
|
|
|
|
Your path may be and probably is different than mine. If you dont
|
|
get the command not found, then you wont need to mess with the PATH
|
|
it is ready to go. Again this may be a research project for you or it
|
|
may just work or somewhere in the middle.
|
|
|
|
The gnu assembler is a program named "as". When we make it a cross
|
|
assembler to not confuse it with the as assembler that we need for the
|
|
operating system we are running on, we add a prefix to the name. A
|
|
common one you will find in this day and age for gnu tools is
|
|
arm-none-eabi-. That will be tacked on the front of everything in the
|
|
GNU tools that we care about and that is the one I will be using. You
|
|
may have arm-linux-gnueabi- or you may have arm-elf- or arm-thumb-elf-
|
|
or many other prefixes. Although they can vary in theory, the way I
|
|
write my code, they should mostly come close to working.
|
|
|
|
Lets say I called that small bit of assembly bootstrap.s
|
|
|
|
baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
|
|
baremetal > arm-none-eabi-objdump -D bootstrap.o
|
|
|
|
bootstrap.o: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00000000 <_start>:
|
|
0: e3a0d801 mov sp, #65536 ; 0x10000
|
|
4: eafffffe b 0 <notmain>
|
|
|
|
|
|
So I have assembled the code into an object file. The default object
|
|
file format is elf. Then objdump -D disassembles that object file
|
|
so that we can see the machine code and other things the assembler
|
|
did.
|
|
|
|
So what do I mean by elf format? Well you may or may not know that
|
|
the term binary when you are talking about a program running the
|
|
binary loading the binary, compiling to binary. Is a loaded term
|
|
sometimes it is all binary bits and bytes that make up your program.
|
|
Most of the time, esp when running on an operating system, that file
|
|
is a mixture of the bits and bytes of your program that are wrapped by
|
|
a file format that contains things like debugging information and
|
|
other things.
|
|
|
|
If the file only contained the machine code and data that makes up the
|
|
program it would only need these 8 bytes (this is not a real, functioning
|
|
program remember).
|
|
|
|
e3 a0 d8 01
|
|
ea ff ff fe
|
|
|
|
How would the disassembler then know from that the names of things like
|
|
_start and notmain? The answer is the file is not 8 bytes it is
|
|
larger
|
|
|
|
baremetal > ls -al bootstrap.o
|
|
-rw-r--r-- 1 root root 664 Sep 23 13:47 bootstrap.o
|
|
|
|
baremetal > hexdump -C bootstrap.o
|
|
00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
|
|
00000010 01 00 28 00 01 00 00 00 00 00 00 00 00 00 00 00 |..(.............|
|
|
00000020 94 00 00 00 00 00 00 05 34 00 00 00 00 00 28 00 |........4.....(.|
|
|
00000030 09 00 06 00 01 d8 a0 e3 fe ff ff ea 41 15 00 00 |............A...|
|
|
00000040 00 61 65 61 62 69 00 01 0b 00 00 00 06 01 08 01 |.aeabi..........|
|
|
00000050 2c 01 00 2e 73 79 6d 74 61 62 00 2e 73 74 72 74 |,...symtab..strt|
|
|
00000060 61 62 00 2e 73 68 73 74 72 74 61 62 00 2e 72 65 |ab..shstrtab..re|
|
|
00000070 6c 2e 74 65 78 74 00 2e 64 61 74 61 00 2e 62 73 |l.text..data..bs|
|
|
00000080 73 00 2e 41 52 4d 2e 61 74 74 72 69 62 75 74 65 |s..ARM.attribute|
|
|
00000090 73 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |s...............|
|
|
000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
|
|
000000b0 00 00 00 00 00 00 00 00 00 00 00 00 1f 00 00 00 |................|
|
|
....
|
|
|
|
You can see at offset 0x34 in the file we see the 8 bytes of our program.
|
|
|
|
There are many file formats supported by the GNU tools. Elf is the
|
|
default format for ARM based programs and many others as well. But we
|
|
can convert those into other formats using another of the binutils tools
|
|
and we will have to use that tool for the Raspberry Pi. First off
|
|
notice that the .elf file format is binary itself most of the information
|
|
is not directly human readable you need to use other programs (like
|
|
objdump) to extract information from that file. Another format that you
|
|
will see "binaries" in is the intel hex file format. This is an ASCII
|
|
format file making it easier for us to read and manipulate as programmers
|
|
and hack at if so desired...You will still find this format used in various
|
|
corners of the embedded world. Many rom/flash programmers support this
|
|
file format, many bootloaders (like my bootloader07) support this
|
|
format.
|
|
|
|
baremetal > arm-none-eabi-objcopy bootstrap.o -O ihex bootstrap.hex
|
|
baremetal > cat bootstrap.hex
|
|
:0800000001D8A0E3FEFFFFEAB6
|
|
:00000001FF
|
|
|
|
The objcopy command line takes a command line option -O with some
|
|
predefined name like binary, ihex, srec, and others. If possible it
|
|
determines the file format of the input file (bootstrap.o in this case)
|
|
and then converts what it can to the output file format.
|
|
|
|
baremetal > arm-none-eabi-objcopy bootstrap.o -O binary a.bin
|
|
baremetal > arm-none-eabi-objcopy bootstrap.hex -O binary b.bin
|
|
arm-none-eabi-objcopy: Unable to recognise the format of the input file `bootstrap.hex'
|
|
baremetal > arm-none-eabi-objcopy -I ihex bootstrap.hex -O binary b.bin
|
|
baremetal > ls -al *.bin
|
|
-rw-r--r-- 1 root root 8 Sep 23 14:04 a.bin
|
|
-rw-r--r-- 1 root root 8 Sep 23 14:04 b.bin
|
|
baremetal > diff a.bin b.bin
|
|
baremetal > hexdump -C a.bin
|
|
00000000 01 d8 a0 e3 fe ff ff ea |........|
|
|
00000008
|
|
|
|
That little exercise shows how to take just the bytes of our program
|
|
and put them in what we would most accurately call a binary file, just
|
|
the 8 bytes of our program nothing more nothing less. We will need
|
|
to do this for the Raspberry Pi. Notice how objcopy was not able
|
|
to recognize the file format for the intel hex file and we had to
|
|
specify it using the -I.
|
|
|
|
To see the file formats supported by objcopy try this:
|
|
|
|
baremetal > arm-none-eabi-objcopy --info
|
|
BFD header file version (GNU Binutils) 2.22
|
|
elf32-littlearm
|
|
(header little endian, data little endian)
|
|
arm
|
|
elf32-bigarm
|
|
(header big endian, data big endian)
|
|
arm
|
|
elf32-little
|
|
(header little endian, data little endian)
|
|
arm
|
|
elf32-big
|
|
(header big endian, data big endian)
|
|
arm
|
|
srec
|
|
(header endianness unknown, data endianness unknown)
|
|
arm
|
|
symbolsrec
|
|
(header endianness unknown, data endianness unknown)
|
|
arm
|
|
verilog
|
|
(header endianness unknown, data endianness unknown)
|
|
arm
|
|
tekhex
|
|
(header endianness unknown, data endianness unknown)
|
|
arm
|
|
binary
|
|
(header endianness unknown, data endianness unknown)
|
|
arm
|
|
ihex
|
|
(header endianness unknown, data endianness unknown)
|
|
arm
|
|
|
|
We have tried intel hex or ihex and I want to show you another ASCII
|
|
based one called srec or s record
|
|
|
|
baremetal > arm-none-eabi-objcopy bootstrap.o -O srec bootstrap.srec
|
|
baremetal > cat bootstrap.srec
|
|
S0110000626F6F7473747261702E7372656335
|
|
S10B000001D8A0E3FEFFFFEAB2
|
|
S9030000FC
|
|
|
|
You can use wikipedia to get the definitions for the intel hex and
|
|
s-record file formats and very easily write a program that parses those
|
|
files and extracts things, maybe write your own disassembler for
|
|
educational purposes or write a bootloader or an instruction set
|
|
simulator or any place where you need to take a compiler/assembler/linker
|
|
generated program and read it for any reason. Let me point out that
|
|
the elf specification is as readily available and although there are
|
|
libraries out there to parse those files, it is as easy to make an elf
|
|
parser as it is to make an ihex or srec parser. If you make it yourself
|
|
then you dont rely on some third party library that is going to change
|
|
over time causing your code to no longer work or have to change to
|
|
conform to some new standard for that library.
|
|
|
|
So now lets make our first C program, this is not hello world, even
|
|
simpler it does nothing, so we think:
|
|
|
|
void notmain ( void )
|
|
{
|
|
}
|
|
|
|
baremetal > arm-none-eabi-gcc -O2 -c notmain.c -o notmain.o
|
|
baremetal > arm-none-eabi-objdump -D notmain.o
|
|
|
|
notmain.o: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00000000 <notmain>:
|
|
0: e12fff1e bx lr
|
|
|
|
So what does bx lr mean? Bx is an ARM instruction that means branch
|
|
exchange, and lr is the link register. When you call a function in
|
|
your C code your expectation is that the processor will jump somewhere
|
|
and execute the code in the function then it will come back and
|
|
keep running your program/code after that function call.
|
|
|
|
...
|
|
a = b + 7;
|
|
c = fun(a);
|
|
d = c * 5;
|
|
...
|
|
|
|
After calling the function fun() we expect the code to come back and run
|
|
d = c * 5. Well the way the ARM does it is the call to a function uses
|
|
an instruction called branch link, which saves the address of the code
|
|
after the function call in a register called the link register. Then
|
|
at some point we encounter one of a couple instructions in ARM that
|
|
will allow the program to jump to the address in the link register
|
|
returning to where we were executing just after the function call. One
|
|
is the branch exchange and the other is a mov pc = lr
|
|
|
|
bx lr
|
|
|
|
or
|
|
|
|
mov pc,lr
|
|
|
|
Depending on the tools and how you use them you should mostly see the
|
|
bx lr in assembly and in the code generated by the compiler if you dont
|
|
then there may be a reason which you may or may not be concerned about
|
|
at this time. I will keep saying this, this is not a tutorial on
|
|
assembly language, but you may already see that assembly language is
|
|
required in order to start up C code, and I argue required in order
|
|
to debug bare metal code. I am only touching on a little bit of
|
|
asm readability which is a long way away from teaching how to program in
|
|
assembly language. I have to cover some basics so that we can get to
|
|
our C code and also so we can see what the compiler and tools are doing.
|
|
|
|
So now we have two objects bootstrap.o and notmain.o that we need to
|
|
link together. Way above we talked about having our program start at
|
|
address 0x8000, so lets try linking for the first time.
|
|
|
|
baremetal > arm-none-eabi-ld -Ttext 0x00008000 bootstrap.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eaffffff b 8008 <notmain>
|
|
|
|
00008008 <notmain>:
|
|
8008: e12fff1e bx lr
|
|
|
|
Cool, our first Raspberry Pi bare metal program. Problem is we cannot
|
|
run this, for a number of reasons. First off I intentionally used the
|
|
wrong instruction in the bootstrap code, second this is an elf file
|
|
not a bin file. How do we fix these things?
|
|
|
|
So now that I have mentioned the link register and how it is used to get
|
|
back from one function after calling it. If you think about the
|
|
compilers job, at one level it doesnt really know or care what the name
|
|
of your function is or its purpose. When compiling the code in the
|
|
main() function it for the most part doesnt care if it is called main()
|
|
or notmain() or pickle() it does a job, it assumes that function is
|
|
called from another function and it uses the proper return instruction.
|
|
Since we called notmain() from assembly we should be prepared for the
|
|
notmain() function to return, so we should have used a branch link
|
|
instruction and put some code after the call to the notmain function.
|
|
If notmain() returns then we are pretty much done so we can put the
|
|
processor into an infinite loop, waiting for the user to turn the power
|
|
off to try another program.
|
|
|
|
.globl _start
|
|
_start:
|
|
mov sp,#0x00010000
|
|
bl notmain
|
|
hang: b hang
|
|
|
|
So bl notmain performs a branch and link, branch like the b instruction
|
|
is exactly like a goto in C, a branch and link is like calling a
|
|
function in C. So we have to remember to put something after the branch
|
|
link in case the function returns. In this case we send it into an
|
|
infinite loop.
|
|
|
|
So here we go we have patched up bootstrap.s and need to assemble it
|
|
and link it with notmain.o
|
|
|
|
baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
|
|
baremetal > arm-none-eabi-ld -Ttext 0x00008000 bootstrap.o notmain.o -o hello.elf
|
|
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eb000000 bl 800c <notmain>
|
|
|
|
00008008 <hang>:
|
|
8008: eafffffe b 8008 <hang>
|
|
|
|
0000800c <notmain>:
|
|
800c: e12fff1e bx lr
|
|
|
|
...
|
|
|
|
baremetal > arm-none-eabi-objcopy hello.elf -O binary kernel.img
|
|
baremetal > hexdump -C kernel.img
|
|
00000000 01 d8 a0 e3 00 00 00 eb fe ff ff ea 1e ff 2f e1 |............../.|
|
|
00000010
|
|
|
|
Now we have a file that we can put on our sd card and run. It does
|
|
nothing that we can see, so it isnt much use to us, but it will work.
|
|
|
|
We can see that the linker has prepared the program such that our first
|
|
instruciton is at address 0x8000. We load the stack pointer and
|
|
call notmain(). Notmain does what it does (nothing) and returns from
|
|
the function call which takes us back to the hang line which is an
|
|
infinite loop, hang branches to hang forever or until the power is
|
|
turned off.
|
|
|
|
A few things you should have noticed. When we disassembled the object
|
|
files the address was zero not 0x8000. Well the object files are by
|
|
definition incomplete programs, even if everything we are going to
|
|
run is there we should use the linker to polish that file.
|
|
|
|
This is a disassembly of the object file bootstrap.o
|
|
|
|
Disassembly of section .text:
|
|
|
|
00000000 <_start>:
|
|
0: e3a0d801 mov sp, #65536 ; 0x10000
|
|
4: eafffffe b 0 <notmain>
|
|
|
|
Also notice that when we disassembled that object the instruction was
|
|
a branch to address zero but it had a note of notmain, well there wasnt
|
|
a notmain in that code, something linker has to fix later. Once
|
|
we linked we saw:
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eaffffff b 8008 <notmain>
|
|
|
|
00008008 <notmain>:
|
|
8008: e12fff1e bx lr
|
|
|
|
that the instruction changed from eafffffe to eaffffff, this is
|
|
something the linker did when it figured out where notmain was going
|
|
to be in memory it had to go back and fix all the references to notmain.
|
|
Which includes instructions.
|
|
|
|
The other thing you might have noticed is Disassembly of section .text
|
|
what is a section and what is .text and what does text have to do with
|
|
my programs machine code?
|
|
|
|
Well, and this is not limited to GNU tools, for the sanity of the
|
|
compiler and assembler and linker folks, portions of our programs
|
|
are broken into categories. There is the program itself, the machine
|
|
code and some other items that are needed for the machine code to run
|
|
these are for some historical reason that I have not researched called
|
|
.text. Or the .text segment. The .data segment like the apple and
|
|
orange global variables way above. Data actually is broken up into
|
|
different segments sometimes, and in particular with the GNU tools.
|
|
Most of the code out there that has global variables the globals are
|
|
not defined, not initialized in the code, but the language declares
|
|
those as assumed to be zero when you start using them (if you have
|
|
not changed them before you read them). So there is a special data
|
|
segment called .bss which holds all of our .data that when we start
|
|
running C code should be zero. These are lumped together so that some
|
|
code can easily go through that chunk of memory and zero that
|
|
area before branching to the C entry point. Another segment we may
|
|
encounter is the .rodata segment. Sometimes even with GNU tools you
|
|
may find the read only data in the .text segment.
|
|
|
|
For fun lets make one of each:
|
|
|
|
|
|
unsigned int apple;
|
|
unsigned int orange=5;
|
|
const unsigned int pickle=9;
|
|
|
|
void notmain ( void )
|
|
{
|
|
static unsigned int pear=7;
|
|
unsigned int peach;
|
|
}
|
|
|
|
arm-none-eabi-gcc -O2 -c notmain.c -o notmain.o
|
|
baremetal > arm-none-eabi-objdump -D notmain.o
|
|
|
|
notmain.o: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00000000 <notmain>:
|
|
0: e12fff1e bx lr
|
|
|
|
Disassembly of section .data:
|
|
|
|
00000000 <orange>:
|
|
0: 00000005 andeq r0, r0, r5
|
|
|
|
Disassembly of section .rodata:
|
|
|
|
00000000 <pickle>:
|
|
0: 00000009 andeq r0, r0, r9
|
|
|
|
|
|
So we see that the code is in .text. The pre-initialized variable
|
|
orange is in .data. And the read only variable pickle is in .rodata.
|
|
What happened to apple and pear and peach and where is the .bss segment?
|
|
Well notice that I used -O2 on the gcc command line this means
|
|
optimization level 2. -O0 or optimizaiton level 0 means no optimization
|
|
-O1 means some and -O2 is the maximum safe level of optimization using
|
|
the gcc compiler. There is a -O3 but we are not supposed to trust that
|
|
to be as well tested as -O2. I am not going to get into that but
|
|
recommend you use -O2 often, esp with embedded bare metal where size and
|
|
speed are important. I use it here because it produces much less code
|
|
than no optimization, you can play with compiling and disassembling these
|
|
things on your own with less or without optimization to see what
|
|
happens.
|
|
|
|
So our program didnt actually use use apple, or pear or peach so the
|
|
compiler optimized those away. We didnt use orange or pickle either
|
|
but because those were defined as something and were also both global
|
|
variables the compiler when making an object doesnt know if other code
|
|
is using those variables so it has to generate something for them for
|
|
linking with other code.
|
|
|
|
Lets try to resolve this:
|
|
|
|
unsigned int apple;
|
|
unsigned int orange=5;
|
|
const unsigned int pickle=9;
|
|
|
|
void notmain ( void )
|
|
{
|
|
static unsigned int pear=7;
|
|
unsigned int peach;
|
|
apple+=pear;
|
|
}
|
|
|
|
|
|
baremetal > arm-none-eabi-gcc -O2 -c notmain.c -o notmain.o
|
|
baremetal > arm-none-eabi-objdump -D notmain.o
|
|
|
|
notmain.o: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00000000 <notmain>:
|
|
0: e59f300c ldr r3, [pc, #12] ; 14 <notmain+0x14>
|
|
4: e5932000 ldr r2, [r3]
|
|
8: e2822007 add r2, r2, #7
|
|
c: e5832000 str r2, [r3]
|
|
10: e12fff1e bx lr
|
|
14: 00000000 andeq r0, r0, r0
|
|
|
|
Disassembly of section .data:
|
|
|
|
00000000 <orange>:
|
|
0: 00000005 andeq r0, r0, r5
|
|
|
|
Disassembly of section .rodata:
|
|
|
|
00000000 <pickle>:
|
|
0: 00000009 andeq r0, r0, r9
|
|
|
|
So we still see a .data segment and a .rodata and .text, but no .bss
|
|
dont worry about that just yet. I will just tell you that since the
|
|
pear and peach variables are limited in scope to being within the notmain
|
|
function and the notmain function is so simple that the optimizer has
|
|
optimized out the peach variable completely and simply taken the
|
|
number 7 and added it to the global variable apple as a constant
|
|
basically the optimizer has replaced our code with:
|
|
|
|
void notmain ( void )
|
|
{
|
|
apple+=7;
|
|
}
|
|
|
|
We are just disassembling the object though, which is only part of the
|
|
picture, to see the whole picture we need to link
|
|
|
|
baremetal > arm-none-eabi-ld -Ttext 0x8000 -Tdata 0x9000 -Tbss 0xA000 bootstrap.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eb000000 bl 800c <notmain>
|
|
|
|
00008008 <hang>:
|
|
8008: eafffffe b 8008 <hang>
|
|
|
|
0000800c <notmain>:
|
|
800c: e59f300c ldr r3, [pc, #12] ; 8020 <notmain+0x14>
|
|
8010: e5932000 ldr r2, [r3]
|
|
8014: e2822007 add r2, r2, #7
|
|
8018: e5832000 str r2, [r3]
|
|
801c: e12fff1e bx lr
|
|
8020: 0000a000 andeq sl, r0, r0
|
|
|
|
Disassembly of section .data:
|
|
|
|
00009000 <__data_start>:
|
|
9000: 00000005 andeq r0, r0, r5
|
|
|
|
Disassembly of section .bss:
|
|
|
|
0000a000 <apple>:
|
|
a000: 00000000 andeq r0, r0, r0
|
|
|
|
Disassembly of section .rodata:
|
|
|
|
00008024 <pickle>:
|
|
8024: 00000009 andeq r0, r0, r9
|
|
|
|
|
|
So our apple variable has appeared is in the .bss section. Notice
|
|
on the linker command line I specified a few things the text segment
|
|
address and data and bss but not the rodata. The linker again has
|
|
put the .text where we said and where we need it at 0x8000 we said
|
|
to put .data at 0x9000 and it is there and notice it has the value
|
|
5 from our orange variable. .bss is where we said at 0xA000. Since
|
|
we didnt specify a home for .rodata notice how the linker has just
|
|
tacked it onto the end of .text the last thing in .text was a four
|
|
byte address at address 0x8020, so the next address after that is 0x8024
|
|
and that is where the .rodata variable pickle is placed and has
|
|
the value 9 that we pre-initialized.
|
|
|
|
I want to point something out here that is very important for general
|
|
bare metal programming. What do we have above, something like 12 32
|
|
bit numbers which is 12*4 = 48 bytes. So if I make this a true
|
|
binary (memory image) we should see 48 bytes right? Well you would be
|
|
wrong:
|
|
|
|
baremetal > ls -al hello.elf
|
|
-rwxr-xr-x 1 root root 38002 Sep 23 15:06 hello.elf
|
|
baremetal > arm-none-eabi-objcopy hello.elf -O binary kernel.img
|
|
baremetal > ls -al kernel.img
|
|
-rwxr-xr-x 1 root root 4100 Sep 23 15:17 kernel.img
|
|
baremetal > hexdump -C kernel.img
|
|
00000000 01 d8 a0 e3 00 00 00 eb fe ff ff ea 0c 30 9f e5 |.............0..|
|
|
00000010 00 20 93 e5 07 20 82 e2 00 20 83 e5 1e ff 2f e1 |. ... ... ..../.|
|
|
00000020 00 a0 00 00 09 00 00 00 00 00 00 00 00 00 00 00 |................|
|
|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
|
|
*
|
|
00001000 05 00 00 00 |....|
|
|
00001004
|
|
|
|
We can see that the first thing in the file is our code that lives
|
|
at address 0x8000, understand that the file offset and the memory offset
|
|
are not the same. What is important is that first thing in the file
|
|
ends up at 0x8000 and since it is our entry code we are good from that
|
|
perspective. Now why isnt the file 48 bytes? Because a binary file
|
|
when we define it as a memory image means that if we have a few things
|
|
at 0x8000 a few things at 0x9000 and a few things at 0xA000 in order
|
|
for those things to be in the right place in the file they need to be
|
|
spaced apart, the file has to have some filler to put the important
|
|
things at the right place.
|
|
|
|
If this is at 0x8000
|
|
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
|
|
And this is at 0x9000
|
|
|
|
9000: 00000005 andeq r0, r0, r5
|
|
|
|
Then they are 0x1000 bytes apart. The * in the hexdump output means
|
|
I am skipping a bunch of zeros, there is nothing you are missing. The
|
|
hexdump output verifies that these two items are 0x1000 byte apart.
|
|
|
|
00000000 01 d8 a0 e3
|
|
|
|
00001000 05 00 00 00
|
|
|
|
If you keep up with bare metal embedded programming you will no doubt
|
|
at some point come across a system that has the program memory space
|
|
in a flash at some high address say 0x80000000 and the memory
|
|
where you can put your .data is at some lower address say 0x20000000.
|
|
|
|
You can very easily try this with the code we have written simply try
|
|
a different linker command line.
|
|
|
|
baremetal > arm-none-eabi-ld -Ttext 0x8000 -Tdata 0x9000 -Tbss 0xA000 bootstrap.o notmain.o -o hello.elf
|
|
baremetal > ls -al hello.elf
|
|
-rwxr-xr-x 1 root root 38002 Sep 23 15:26 hello.elf
|
|
baremetal > arm-none-eabi-ld -Ttext 0x80000000 -Tdata 0x20000000 -Tbss 0xA000 bootstrap.o notmain.o -o hello.elf
|
|
baremetal > ls -al hello.elf
|
|
-rwxr-xr-x 1 root root 66710 Sep 23 15:27 hello.elf
|
|
|
|
Our file grew but if you were to try to objcopy to a -O binary format
|
|
(I recommend you DO NOT do this). What is going to happen?
|
|
|
|
|
|
80000000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
|
|
20000000: 00000005 andeq r0, r0, r5
|
|
|
|
There are 0x60000000 bytes between these two items, that means the
|
|
binary file created would at least be 0x60000000 bytes which is
|
|
1.6 GigaBytes. If you are like me you probably dont always have
|
|
1.6Gig of disk space handy. Much less wanting it to be filled with a
|
|
single file which is mostly zeros. You can start to see the appeal for
|
|
these not really a binary binary file formats like elf and ihex and
|
|
srec. They only define the real data and dont have to hold the zero
|
|
filler.
|
|
|
|
The stuff I wrote in the bssdata directory continues with understanding
|
|
how to control the GNU tools and segments. For the Raspberry Pi we
|
|
dont need to deal with all of this, you are actually missing out on
|
|
some of the experience (pain).
|
|
|
|
Here is something else I hope you caught:
|
|
|
|
baremetal > arm-none-eabi-ld -Ttext 0x8000 -Tdata 0x9000 -Tbss 0xA000 bootstrap.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eb000000 bl 800c <notmain>
|
|
|
|
00008008 <hang>:
|
|
8008: eafffffe b 8008 <hang>
|
|
|
|
0000800c <notmain>:
|
|
800c: e59f300c ldr r3, [pc, #12] ; 8020 <notmain+0x14>
|
|
8010: e5932000 ldr r2, [r3]
|
|
8014: e2822007 add r2, r2, #7
|
|
8018: e5832000 str r2, [r3]
|
|
801c: e12fff1e bx lr
|
|
8020: 0000a000 andeq sl, r0, r0
|
|
|
|
Disassembly of section .data:
|
|
|
|
00009000 <__data_start>:
|
|
9000: 00000005 andeq r0, r0, r5
|
|
|
|
Disassembly of section .bss:
|
|
|
|
0000a000 <apple>:
|
|
a000: 00000000 andeq r0, r0, r0
|
|
|
|
Disassembly of section .rodata:
|
|
|
|
00008024 <pickle>:
|
|
8024: 00000009 andeq r0, r0, r9
|
|
|
|
I dont expect you to know that the notmain assembly code is reading the
|
|
thing at 0x8020
|
|
|
|
8020: 0000a000 andeq sl, r0, r0
|
|
|
|
Which the linker has filled in with the address to the apple variable
|
|
which is in .bss.
|
|
|
|
baremetal > arm-none-eabi-objcopy hello.elf -O binary kernel.img
|
|
baremetal > ls -al kernel.img
|
|
-rwxr-xr-x 1 root root 4100 Sep 23 15:36 kernel.img
|
|
|
|
4100 bytes. 0x8000 + 4100 = 0x8000 + 0x1004 = 0x9004 the binary
|
|
only includes an image of memory from 0x8000 to 0x9003 the objcopy
|
|
to -O binary did not include bss it was chopped off. Why? because
|
|
in part where we specified it and because in part the toolchain
|
|
expects that the .bss segment will be zeroed by the bootstrap code
|
|
and not waste space in the binary image for that data.
|
|
|
|
But what if we were to do this:
|
|
|
|
baremetal > arm-none-eabi-ld -Ttext 0x8000 -Tdata 0xA000 -Tbss 0x9000 bootstrap.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objcopy hello.elf -O binary kernel.img
|
|
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eb000000 bl 800c <notmain>
|
|
|
|
00008008 <hang>:
|
|
8008: eafffffe b 8008 <hang>
|
|
|
|
0000800c <notmain>:
|
|
800c: e59f300c ldr r3, [pc, #12] ; 8020 <notmain+0x14>
|
|
8010: e5932000 ldr r2, [r3]
|
|
8014: e2822007 add r2, r2, #7
|
|
8018: e5832000 str r2, [r3]
|
|
801c: e12fff1e bx lr
|
|
8020: 00009000 andeq r9, r0, r0
|
|
|
|
Disassembly of section .data:
|
|
|
|
0000a000 <__data_start>:
|
|
a000: 00000005 andeq r0, r0, r5
|
|
|
|
Disassembly of section .bss:
|
|
|
|
00009000 <apple>:
|
|
9000: 00000000 andeq r0, r0, r0
|
|
|
|
Disassembly of section .rodata:
|
|
|
|
00008024 <pickle>:
|
|
8024: 00000009 andeq r0, r0, r9
|
|
|
|
|
|
baremetal > ls -al kernel.img
|
|
-rwxr-xr-x 1 root root 8196 Sep 23 15:40 kernel.img
|
|
baremetal > hexdump -C kernel.img
|
|
00000000 01 d8 a0 e3 00 00 00 eb fe ff ff ea 0c 30 9f e5 |.............0..|
|
|
00000010 00 20 93 e5 07 20 82 e2 00 20 83 e5 1e ff 2f e1 |. ... ... ..../.|
|
|
00000020 00 90 00 00 09 00 00 00 00 00 00 00 00 00 00 00 |................|
|
|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
|
|
*
|
|
00002000 05 00 00 00 |....|
|
|
00002004
|
|
|
|
Know your tools, know your tools, know your tools. Now we have important
|
|
stuff at 0x8000 and 0xA000
|
|
|
|
8000: e3a0d801
|
|
|
|
a000: 00000005
|
|
|
|
The file is now 8196 bytes
|
|
|
|
0x8000 + 8196 = 0x8000 + 0x2004 = 0xA004
|
|
|
|
And the objcopy -O binary has filled in the spaces with zeros so our
|
|
.bss segment is there in the binary AND it is filled with zeros! Need
|
|
I say it again a big part of bare metal programming is knowing your
|
|
tools?
|
|
|
|
|
|
One more thing:
|
|
|
|
unsigned int apple;
|
|
void notmain ( void )
|
|
{
|
|
apple+=7;
|
|
}
|
|
|
|
|
|
baremetal > arm-none-eabi-gcc -O2 -c notmain.c -o notmain.o
|
|
baremetal > arm-none-eabi-ld -Ttext 0x8000 bootstrap.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eb000000 bl 800c <notmain>
|
|
|
|
00008008 <hang>:
|
|
8008: eafffffe b 8008 <hang>
|
|
|
|
0000800c <notmain>:
|
|
800c: e59f300c ldr r3, [pc, #12] ; 8020 <notmain+0x14>
|
|
8010: e5932000 ldr r2, [r3]
|
|
8014: e2822007 add r2, r2, #7
|
|
8018: e5832000 str r2, [r3]
|
|
801c: e12fff1e bx lr
|
|
8020: 00010024 andeq r0, r1, r4, lsr #32
|
|
|
|
Disassembly of section .bss:
|
|
|
|
00010024 <apple>:
|
|
10024: 00000000 andeq r0, r0, r0
|
|
|
|
|
|
We saw before that when we didnt declare a .rodata on the command line
|
|
it tacked it onto the end of .text, but in this case it didnt tack
|
|
.bss onto the end of .text it added 0x2000 bytes of padding then it
|
|
added it on there. Why? who knows. The bottom line though is that
|
|
we need to take more control over how we tell the linker to do things.
|
|
In the GNU world this is through what is often called a linker script
|
|
yet another programming language that is parsed by the linker tool
|
|
where we can go to or beyond the level of crazy complication. And
|
|
as you can guess I dont do that, I try for the minimal linker script
|
|
I dont want to be tied to a tool, I want my code to be as portable
|
|
as possible with minimal work. Linker scripts are painful, because
|
|
so many are so complicated, few if any simple examples, it took me a
|
|
while to make to make this simple script and keep it working, I have
|
|
actually had three different solutions which I thought each time where
|
|
the simple, end all, be all, GNU linker script, they werent they worked
|
|
on one version of tools and later failed. At this point I wouldnt be
|
|
surprised if this script also fails some day.
|
|
|
|
MEMORY
|
|
{
|
|
ram : ORIGIN = 0x8000, LENGTH = 0x1000
|
|
}
|
|
|
|
SECTIONS
|
|
{
|
|
.text : { *(.text*) } > ram
|
|
.bss : { *(.bss*) } > ram
|
|
}
|
|
|
|
baremetal > arm-none-eabi-ld -T lscript bootstrap.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eb000000 bl 800c <notmain>
|
|
|
|
00008008 <hang>:
|
|
8008: eafffffe b 8008 <hang>
|
|
|
|
0000800c <notmain>:
|
|
800c: e59f300c ldr r3, [pc, #12] ; 8020 <notmain+0x14>
|
|
8010: e5932000 ldr r2, [r3]
|
|
8014: e2822007 add r2, r2, #7
|
|
8018: e5832000 str r2, [r3]
|
|
801c: e12fff1e bx lr
|
|
8020: 00008024 andeq r8, r0, r4, lsr #32
|
|
|
|
Disassembly of section .bss:
|
|
|
|
00008024 <apple>:
|
|
8024: 00000000 andeq r0, r0, r0
|
|
|
|
How about that now it is all packed together nice and tight.
|
|
|
|
And to take this one step further:
|
|
|
|
|
|
unsigned int apple;
|
|
unsigned int orange=5;
|
|
const unsigned int banana=9;
|
|
void notmain ( void )
|
|
{
|
|
apple+=7;
|
|
}
|
|
|
|
baremetal > arm-none-eabi-gcc -O2 -c notmain.c -o notmain.o
|
|
baremetal > arm-none-eabi-ld -T lscript bootstrap.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eb000000 bl 800c <notmain>
|
|
|
|
00008008 <hang>:
|
|
8008: eafffffe b 8008 <hang>
|
|
|
|
0000800c <notmain>:
|
|
800c: e59f300c ldr r3, [pc, #12] ; 8020 <notmain+0x14>
|
|
8010: e5932000 ldr r2, [r3]
|
|
8014: e2822007 add r2, r2, #7
|
|
8018: e5832000 str r2, [r3]
|
|
801c: e12fff1e bx lr
|
|
8020: 00008028 andeq r8, r0, r8, lsr #32
|
|
|
|
Disassembly of section .rodata:
|
|
|
|
00008024 <banana>:
|
|
8024: 00000009 andeq r0, r0, r9
|
|
|
|
Disassembly of section .bss:
|
|
|
|
00008028 <apple>:
|
|
8028: 00000000 andeq r0, r0, r0
|
|
|
|
Disassembly of section .data:
|
|
|
|
0000802c <orange>:
|
|
802c: 00000005 andeq r0, r0, r5
|
|
|
|
|
|
baremetal > arm-none-eabi-objcopy hello.elf -O binary kernel.img
|
|
baremetal > ls -al kernel.img
|
|
-rwxr-xr-x 1 root root 48 Sep 23 16:58 kernel.img
|
|
|
|
There we go, 12 items all packed up tight in 48 bytes of binary
|
|
|
|
|
|
00000000 01 d8 a0 e3 00 00 00 eb fe ff ff ea 0c 30 9f e5 |.............0..|
|
|
00000010 00 20 93 e5 07 20 82 e2 00 20 83 e5 1e ff 2f e1 |. ... ... ..../.|
|
|
00000020 28 80 00 00 09 00 00 00 00 00 00 00 05 00 00 00 |(...............|
|
|
00000030
|
|
|
|
|
|
All this work so far and we have not seen the stack, we have not seen
|
|
our local variables.
|
|
|
|
|
|
bootstrap.s
|
|
|
|
.globl _start
|
|
_start:
|
|
mov sp,#0x00010000
|
|
bl notmain
|
|
hang: b hang
|
|
|
|
notmain.c
|
|
|
|
extern unsigned int fun ( unsigned int );
|
|
void notmain ( void )
|
|
{
|
|
unsigned int x;
|
|
|
|
x=fun(5);
|
|
}
|
|
|
|
fun.c
|
|
|
|
extern unsigned int more_fun ( unsigned int );
|
|
unsigned int fun ( unsigned int x )
|
|
{
|
|
static unsigned int pear = 7;
|
|
pear+=more_fun(x+3);
|
|
return(pear+1);
|
|
}
|
|
|
|
more_fun.c
|
|
|
|
unsigned int more_fun ( unsigned int x )
|
|
{
|
|
return(x+7);
|
|
}
|
|
|
|
baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
|
|
baremetal > arm-none-eabi-gcc -O2 -c notmain.c -o notmain.o
|
|
baremetal > arm-none-eabi-gcc -O2 -c fun.c -o fun.o
|
|
baremetal > arm-none-eabi-gcc -O2 -c more_fun.c -o more_fun.o
|
|
baremetal > arm-none-eabi-ld -T lscript bootstrap.o notmain.o fun.o more_fun.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eb000000 bl 800c <notmain>
|
|
|
|
00008008 <hang>:
|
|
8008: eafffffe b 8008 <hang>
|
|
|
|
0000800c <notmain>:
|
|
800c: e92d4008 push {r3, lr}
|
|
8010: e3a00005 mov r0, #5
|
|
8014: eb000001 bl 8020 <fun>
|
|
8018: e8bd4008 pop {r3, lr}
|
|
801c: e12fff1e bx lr
|
|
|
|
00008020 <fun>:
|
|
8020: e92d4008 push {r3, lr}
|
|
8024: e2800003 add r0, r0, #3
|
|
8028: eb000007 bl 804c <more_fun>
|
|
802c: e59f3014 ldr r3, [pc, #20] ; 8048 <fun+0x28>
|
|
8030: e5932000 ldr r2, [r3]
|
|
8034: e0800002 add r0, r0, r2
|
|
8038: e5830000 str r0, [r3]
|
|
803c: e2800001 add r0, r0, #1
|
|
8040: e8bd4008 pop {r3, lr}
|
|
8044: e12fff1e bx lr
|
|
8048: 00008054 andeq r8, r0, r4, asr r0
|
|
|
|
0000804c <more_fun>:
|
|
804c: e2800007 add r0, r0, #7
|
|
8050: e12fff1e bx lr
|
|
|
|
Disassembly of section .data:
|
|
|
|
00008054 <pear.4055>:
|
|
8054: 00000007 andeq r0, r0, r7
|
|
|
|
|
|
So the first thing we see is that our local global (static local)
|
|
variable pear now has its own address in memory, it did not get
|
|
optimized out.
|
|
|
|
I dont expect you to know assembly language but what I want to you to
|
|
see is a continuation of what we discussed before with respect to the
|
|
branch link instruction and the link register. The ARM instruction
|
|
set uses branch link (bl) to make function calls. The branch means
|
|
goto or jump or branch the program to some address. The link means
|
|
preserve a link back to the calling function, the hardware puts
|
|
the address of the instruciton after the branch link in the link
|
|
register so that you can return. But what happens if you have
|
|
a function that calls a function? Wont the second call overwrite the
|
|
link register, making it so you cannot return to the original
|
|
function? Yes, on the surface that is true, this is where the stack
|
|
comes in. Notice how the function fun() starts with a push and in
|
|
the brackets is the link register lr, this means save these items
|
|
on the stack and move the stack pointer. So say the stack pointer
|
|
was at address 0x1020 when this function was called, this means
|
|
that after the push the stack pointer is now 0x1018. At address
|
|
0x1018 the contents of r3 will be stored and at address 0x101C the
|
|
contents of lr, the address used to return to whomever called fun().
|
|
If the first thing we did in fun() was call fun() again then
|
|
the stack pointer would go from 0x1018 to 0x1010, address 0x1010 would
|
|
get the contents of r3 and 0x1014 would get the contents of the link
|
|
register the address this instance of the fun() can needs to return,
|
|
this of course would be an infinite loop, so we didnt do that. What
|
|
we did do is add 3 to the incoming value and call more_fun() this
|
|
branch link call to more fun modifies the link register. More_fun
|
|
does its thing, we go through the rest of the fun() code then we pop
|
|
r3 and lr off of the stack. Because the stack pointer has not moved
|
|
due to any other code relative to where it was when the push at the
|
|
beginnning happened, that means r3 gets back the value it had when that
|
|
push was executed and the link register also gets back its prior value,
|
|
the value we needed to return to the fun() calling function. So that
|
|
bx lr that follows the pop returns to the proper place in notmain().
|
|
So you can see with a very small application we still need the stack
|
|
set up meaning we need the stack pointer initialized in our bootstrap
|
|
code. The compiler assumes it has been done, if we dont and leave
|
|
that register out of our control we can get into trouble fast.
|
|
|
|
You may be asking why did I make those tiny functions separate files?
|
|
This is from experience, I knew that I was using the optimizer and
|
|
I knew what the optimizer would do. This is important learning curve
|
|
stuff for bare metal:
|
|
|
|
notmain.c
|
|
|
|
unsigned int more_fun ( unsigned int x )
|
|
{
|
|
return(x+7);
|
|
}
|
|
unsigned int fun ( unsigned int x )
|
|
{
|
|
static unsigned int pear = 7;
|
|
pear+=more_fun(x+3);
|
|
return(pear+1);
|
|
}
|
|
void notmain ( void )
|
|
{
|
|
unsigned int x;
|
|
x=fun(5);
|
|
}
|
|
|
|
baremetal > arm-none-eabi-gcc -O2 -c notmain.c -o notmain.o
|
|
baremetal > arm-none-eabi-ld -T lscript bootstrap.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eb00000a bl 8034 <notmain>
|
|
|
|
00008008 <hang>:
|
|
8008: eafffffe b 8008 <hang>
|
|
|
|
0000800c <more_fun>:
|
|
800c: e2800007 add r0, r0, #7
|
|
8010: e12fff1e bx lr
|
|
|
|
00008014 <fun>:
|
|
8014: e59f3014 ldr r3, [pc, #20] ; 8030 <fun+0x1c>
|
|
8018: e5932000 ldr r2, [r3]
|
|
801c: e282200a add r2, r2, #10
|
|
8020: e0820000 add r0, r2, r0
|
|
8024: e5830000 str r0, [r3]
|
|
8028: e2800001 add r0, r0, #1
|
|
802c: e12fff1e bx lr
|
|
8030: 0000804c andeq r8, r0, ip, asr #32
|
|
|
|
00008034 <notmain>:
|
|
8034: e59f300c ldr r3, [pc, #12] ; 8048 <notmain+0x14>
|
|
8038: e5932000 ldr r2, [r3]
|
|
803c: e282200f add r2, r2, #15
|
|
8040: e5832000 str r2, [r3]
|
|
8044: e12fff1e bx lr
|
|
8048: 0000804c andeq r8, r0, ip, asr #32
|
|
|
|
Disassembly of section .data:
|
|
|
|
0000804c <pear.4056>:
|
|
804c: 00000007 andeq r0, r0, r7
|
|
|
|
|
|
So you say "What is different". we still have each of the functions
|
|
fun() more_fun() and notmain(), I see the local global variable pear
|
|
has a home, etc. But the key difference is that notmain() has been
|
|
greatly optimized. Notice how notmain does not call fun, if it doesnt
|
|
call fun then that doesnt call more_fun() what the...If you follow the
|
|
math in the code
|
|
|
|
notmain passes a 5 to fun.
|
|
|
|
fun passes 5+3 = 8 to morefun
|
|
|
|
morefun returns 8+7 = 15
|
|
|
|
fun saves 15 in pear
|
|
then returns 15+1 = 16
|
|
|
|
So if we wanted to optimize this code and had visibility to all of the
|
|
functions we could optimize all of this code to be:
|
|
|
|
pear = 15;
|
|
x=16;
|
|
|
|
Actually notice how we dont do anything with the x variable in the
|
|
notmain function, we compute it but dont do anything with it? There
|
|
is no reason to actually compute that variable, it is not used it
|
|
gets optimized out so all of this code boils down to this:
|
|
|
|
pear = 15;
|
|
|
|
And that is all that the notmain() function does, even though notmain
|
|
is not supposed to know about pear which is a local static variable
|
|
in another function, nevertheless the notmain() code is writing a 15
|
|
to pear.
|
|
|
|
I separated the files so that the compilers optimizer could not see
|
|
all of the functions and would not be able to optimize to this level.
|
|
|
|
So for example if you wanted to speed test a function, that you suspect
|
|
is slow, you might want to do something like this:
|
|
|
|
start=get_timer_tick();
|
|
answer=fun(5,6);
|
|
end=get_timer_tick();
|
|
runtime=end-start;
|
|
|
|
Where fun is some complicated algorithm or other code that you want
|
|
to speed test. It is very important that the fun() code and this
|
|
code that calls it ARE NOT OPTIMIZED TOGETHER. Because you hardcoded
|
|
the inputs for test purposes
|
|
|
|
fun(5,6)
|
|
|
|
where they normally might be variables:
|
|
|
|
fun(a,b)
|
|
|
|
The optimizer if allowed might simply replace all of your complicated
|
|
algorithm with:
|
|
|
|
start=get_timer_tick();
|
|
answer=42;
|
|
end=get_timer_tick();
|
|
runtime=end-start;
|
|
|
|
And this may lead you to believe that this is not the code causing
|
|
your performance problems. Or hopefully you realize that this code
|
|
is executing way too fast and there is something wrong with your
|
|
experiment. Knowing enough assembly code to see what is going on
|
|
will clue you into the optimization, just like in the notmain() example
|
|
above.
|
|
|
|
Lets go back to some basics and common mistakes.
|
|
|
|
First you may ask why am I calling the assembler and linker and gcc
|
|
all separate, cant I just put it all on one gcc command line? Sure,
|
|
you can but you are giving up control to the compiler and that
|
|
requires even more knowledge to get the command line right to get it
|
|
to build the program you want it to build. Sometimes to get the
|
|
compiler to do what you want or of you have borrowed some code you
|
|
might have to have GCC do the assembling or linking. Some folks like
|
|
to put C stuff like defines and comment symbols in their assembler code
|
|
which works fine if you feed it through gcc, but it is not assembly
|
|
language it is some sort of hybrid. Doesnt stop people from doing it,
|
|
and when you borrow that code you either have to fix the code or use the
|
|
C compiler as an assembler.
|
|
|
|
bootstrap.s
|
|
|
|
.globl _start
|
|
_start:
|
|
mov sp,#0x00010000
|
|
bl notmain
|
|
hang: b hang
|
|
|
|
notmain.c
|
|
|
|
void notmain ( void )
|
|
{
|
|
}
|
|
|
|
lscript
|
|
|
|
MEMORY
|
|
{
|
|
ram : ORIGIN = 0x8000, LENGTH = 0x18000
|
|
}
|
|
|
|
SECTIONS
|
|
{
|
|
.text : { *(.text*) } > ram
|
|
.bss : { *(.bss*) } > ram
|
|
.rodata : { *(.rodata*) } > ram
|
|
.data : { *(.data*) } > ram
|
|
}
|
|
|
|
You might try this
|
|
|
|
baremetal > arm-none-eabi-gcc -Xlinker -T -Xlinker lscript bootstrap.s notmain.c -o hello.elf
|
|
/gnuarm/lib/gcc/arm-none-eabi/4.7.1/../../../../arm-none-eabi/bin/ld: cannot find crt0.o: No such file or directory
|
|
collect2: error: ld returned 1 exit status
|
|
|
|
Well crt0.o is the bootstrap code the toolchain wants to use.
|
|
|
|
So lets try it this way
|
|
|
|
baremetal > arm-none-eabi-gcc -nostdlib -nostartfiles -ffreestanding -Xlinker -T -Xlinker lscript bootstrap.s notmain.c -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eb000000 bl 800c <notmain>
|
|
|
|
00008008 <hang>:
|
|
8008: eafffffe b 8008 <hang>
|
|
|
|
0000800c <notmain>:
|
|
800c: e52db004 push {fp} ; (str fp, [sp, #-4]!)
|
|
8010: e28db000 add fp, sp, #0
|
|
8014: e28bd000 add sp, fp, #0
|
|
8018: e8bd0800 pop {fp}
|
|
801c: e12fff1e bx lr
|
|
|
|
Now I happen to always use the -nostdlib -nostartfiles -ffreestanding
|
|
with GCC when making bare metal.
|
|
|
|
Also note that I dont use
|
|
|
|
#include <stdio.h>
|
|
#include <stdlib.h>
|
|
|
|
and so on.
|
|
|
|
Well I dont use C libraries, I dont want those triggering the tools
|
|
to add more junk. Might not happen with GCC but I have seen it happen
|
|
elsewhere. Also you have to have your paths right to find those files
|
|
(that you arent using).
|
|
|
|
|
|
|
|
Here is a mistake you might make
|
|
|
|
|
|
baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
|
|
baremetal > arm-none-eabi-gcc -O2 -c notmain.c -o notmain.o
|
|
baremetal > arm-none-eabi-ld -T lscript notmain.o bootstrap.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <notmain>:
|
|
8000: e12fff1e bx lr
|
|
|
|
00008004 <_start>:
|
|
8004: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8008: ebfffffc bl 8000 <notmain>
|
|
|
|
0000800c <hang>:
|
|
800c: eafffffe b 800c <hang>
|
|
|
|
Changing the order of the items on the linker command line has changed
|
|
where they are placed in the final binary. And in this case we
|
|
are in trouble, this code wont work because the first instruction of
|
|
the boot strap is not at address 0x8000.
|
|
|
|
Now changing the linker script to have the name of the boot code in
|
|
the script and have that line before the rest of the .text
|
|
|
|
MEMORY
|
|
{
|
|
ram : ORIGIN = 0x8000, LENGTH = 0x18000
|
|
}
|
|
|
|
SECTIONS
|
|
{
|
|
.text : { bootstrap.o } > ram
|
|
.text : { *(.text*) } > ram
|
|
.bss : { *(.bss*) } > ram
|
|
.rodata : { *(.rodata*) } > ram
|
|
.data : { *(.data*) } > ram
|
|
}
|
|
|
|
|
|
|
|
baremetal > arm-none-eabi-ld -T lscript notmain.o bootstrap.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eb000006 bl 8024 <notmain>
|
|
|
|
00008008 <hang>:
|
|
8008: eafffffe b 8008 <hang>
|
|
800c: 00001541 andeq r1, r0, r1, asr #10
|
|
8010: 61656100 cmnvs r5, r0, lsl #2
|
|
8014: 01006962 tsteq r0, r2, ror #18
|
|
8018: 0000000b andeq r0, r0, fp
|
|
801c: 01080106 tsteq r8, r6, lsl #2
|
|
8020: 0000012c andeq r0, r0, ip, lsr #2
|
|
|
|
00008024 <notmain>:
|
|
8024: e12fff1e bx lr
|
|
|
|
That fixes it, but there is other junk in our file now, not the perfect
|
|
solution. I prefer to use ld and specify the bootstrap code first
|
|
on the command line. And when developing a new program I disassemble
|
|
the binary before running it the first time to make sure the boot code
|
|
is where I wanted it.
|
|
|
|
|
|
Here is a situation you have a lot of data, perhaps it is a large
|
|
graphic image or a bunch of font data or something like that
|
|
|
|
bootstrap.s
|
|
|
|
.globl _start
|
|
_start:
|
|
mov sp,#0x00010000
|
|
bl notmain
|
|
hang: b hang
|
|
|
|
somedata.s
|
|
|
|
.space 0x10000000,0
|
|
|
|
notmain.c
|
|
|
|
void notmain ( void )
|
|
{
|
|
}
|
|
|
|
lscript
|
|
|
|
MEMORY
|
|
{
|
|
ram : ORIGIN = 0x8000, LENGTH = 0xF0000000
|
|
}
|
|
|
|
SECTIONS
|
|
{
|
|
.text : { *(.text*) } > ram
|
|
.bss : { *(.bss*) } > ram
|
|
.rodata : { *(.rodata*) } > ram
|
|
.data : { *(.data*) } > ram
|
|
}
|
|
|
|
|
|
baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
|
|
baremetal > arm-none-eabi-as somedata.s -o somedata.o
|
|
baremetal > arm-none-eabi-gcc -O2 -c notmain.c -o notmain.o
|
|
baremetal > arm-none-eabi-ld -T lscript bootstrap.o somedata.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: eb000001 bl 8010 <__notmain_veneer>
|
|
|
|
00008008 <hang>:
|
|
8008: eafffffe b 8008 <hang>
|
|
800c: 00000000 andeq r0, r0, r0
|
|
|
|
00008010 <__notmain_veneer>:
|
|
8010: e51ff004 ldr pc, [pc, #-4] ; 8014 <__notmain_veneer+0x4>
|
|
8014: 10008018 andne r8, r0, r8, lsl r0
|
|
...
|
|
|
|
10008018 <notmain>:
|
|
10008018: e12fff1e bx lr
|
|
|
|
You are telling me: I dont see the problem.
|
|
The reason is the linker fixed the problem.
|
|
|
|
I am trying to put the tool in a position where it has assembled a
|
|
single instruction for the branch link, which is limited in how
|
|
far in memory it can go. What the linker did is it created some
|
|
code near the branch link, somewhere it could reach and used that
|
|
as what I call a trampoline. The tools have performed the branch
|
|
link at the right place so the return address is in the link register
|
|
then it used location that reads a value from memory and puts that
|
|
in the program counter meaning it branches to that address. Being a
|
|
branch it does not modify the link register so notmain doesnt know
|
|
any better how the program got there it returns to the right place.
|
|
|
|
If we combine the two into one file
|
|
|
|
bootstrap.s
|
|
|
|
.globl _start
|
|
_start:
|
|
mov sp,#0x00010000
|
|
bl notmain
|
|
hang: b hang
|
|
.space 0x10000000,0
|
|
|
|
and dont use somedata.s
|
|
|
|
baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
|
|
baremetal > arm-none-eabi-ld -T lscript bootstrap.o notmain.o -o hello.elf
|
|
bootstrap.o: In function `_start':
|
|
(.text+0x4): relocation truncated to fit: R_ARM_CALL against symbol `notmain' defined in .text section in notmain.o
|
|
|
|
Now the problem is that the linker is unable to find a place close enough
|
|
to the bl instruction to put a trampoline so it has to error out. This
|
|
is not necessarily the exact error message I was after but it will do.
|
|
|
|
The ARM instructions have quite a bit of a reach. Other instruction
|
|
sets have different limitations as to how far a branch can go and
|
|
how you place the object files on the command line can affect how
|
|
far the branches have to go to get from one place to another and
|
|
the linker may not be able to patch it.
|
|
|
|
|
|
At this point I hope you have more than enough of a feel for the kinds
|
|
of things you need to know from a gnu toolchain perspective to get
|
|
started with ARM bare metal programming on the Raspberry Pi.
|
|
|
|
Also, a side effect is that I hope that you can see without actually
|
|
buying any hardware or running any code we were able to perform many
|
|
experiments and learn many things about the tools. It doesnt matter
|
|
what instruction set or computer you can often do similar things,
|
|
certainly with the GNU tools, create simple functions compile and
|
|
disassemble just that function, or link it with something simple
|
|
enough to get the linker to stop complaining.
|
|
|
|
|
|
|
|
|
|
Now I am going to move into thumb mode, which creates a number of
|
|
other problems that can be quite difficult to find.
|
|
|
|
Traditionally ARM has used 32 bit instructions, fixed instruction
|
|
length. Then the thumb instruction set was added. The original
|
|
thumb instruction set had a one to one relationship with a full
|
|
sized ARM instruction. I have no direct knowledge but assume that
|
|
the thumb instructions were converted to ARM instructions before
|
|
being executed so that there only needed to be one execution unit in
|
|
the processor. The thumb instructions are 16 bits wide, originally
|
|
fixed length, thumb2 extensions to the thumb instruction set create a
|
|
bit of a mess with 16 and 32 bit thumb instructions. The 16 bit
|
|
instructions provide some cost and performance benefits for embedded
|
|
systems. First off you can pack more instructions into the same
|
|
amount of memory, understanding that it may take more instructions to
|
|
perform the same task using thumb instructions than it would have using
|
|
ARM. My experiments at the time showed about 10-15% more instructions,
|
|
but half the memory so that was a fair tradeoff. I know of one platform
|
|
that went so far as to use 16 bit memory busses, which actually made
|
|
thumb mode run much faster than ARM mode on that platform. That
|
|
platform is/was the Nintendo Gameboy Advance.
|
|
|
|
There are very specific rules for switching modes between the two modes.
|
|
Specifically you have to use the bx (or blx) instruction. When you use
|
|
the bx instruction the least significant bit of the address in the
|
|
register you are using determines if the mode you switching to as
|
|
you branch is ARM mode or thumb mode. ARM mode the bit is zero,
|
|
thumb mode the bit is a 1. This may not be obvious and the ARM
|
|
documents are a little misleading or incorrect as to what valid
|
|
bits you can have in that register. Note that that lower bit
|
|
is stripped off it is only used by the bx instruction itself the
|
|
address in the program counter always has the lower two bits zero
|
|
for ARM mode (4 byte instructions) and the lower bit zero for
|
|
thumb instructions (2 or 4 byte instructions). Note the bx/blx
|
|
instruction is not the only way to switch modes, sometimes you can
|
|
use the pop instruction, but bx works the same way on all ARM
|
|
architectures that I know of, the other solutions (pop for example)
|
|
vary in if/how they work for switching modes depending on the ARM
|
|
architecture in question. So that makes for very unportable code
|
|
across ARM if you are not careful. When in doubt just use BX.
|
|
|
|
Here again the goal is not to teach assembly but you may want to
|
|
get the ARM Architectural Reference Manual for this platform
|
|
(see the top level README file) so that you can look at the
|
|
ARM and thumb instructions as well as other things that describe at
|
|
least in part what I am talking about. For example this flavor of
|
|
ARM boots in a normal ARM way meaning the exception table is filled
|
|
with 32 bit ARM instructions that get executed. Address 0x00000000
|
|
contains the instruction executed on reset, 0x00000004 some other
|
|
exception and so on, one for interrupt one for fast interrupt one
|
|
for data abort, one for prefetch abort, etc. At least the traditional
|
|
ARM exception table, in recent years both the Cortex-M which is
|
|
different and the ARM exception table are seeing changes from the past.
|
|
Anyway, I bring this up because it is important to know that in this
|
|
case all exceptions are entered in ARM mode, even if you were in thumb
|
|
mode when you were interrupted or otherwise had an exception. The cpsr
|
|
contains a T bit which is the mode bit, when you return from the
|
|
interrupt or exception the cpsr is restored along with your
|
|
program counter and you return to the mode you were in. This is the
|
|
exception to the rule that you use bx to change modes (or blx).
|
|
|
|
So the ARM is going to come out of reset in ARM mode and whatever
|
|
mechanism that the Raspberry Pi uses to have our code at 0x8000 run we
|
|
start running our code in full 32 bit ARM mode.
|
|
|
|
You probably know that the C language has somewhat of a standard
|
|
every so often that standard is re-written and if you want to make a
|
|
C compiler that conforms to that standard...well you conform or at
|
|
least try. Assembly language in general does not have a standard.
|
|
A company designs a chip, which means they create an instruction set,
|
|
binary machine code instructions, and generally they create an
|
|
assembly language so that they can write down and talk about those
|
|
instructions using mnemonics instead of patterns of ones and zeros.
|
|
And not always but often if that company actually wants to sell those
|
|
processors, so they create or hire someone to create an assembler and
|
|
a compiler or few. Assembly language, like C language, has
|
|
directives that are not actually code like #pragma in C for example
|
|
you are using that to talk to the compiler not using it as code
|
|
necessarily. Assembly has those as well, many of them. It is in the
|
|
processor vendors best interest to use the same assembly language
|
|
syntax for the instructions in the processor manual in the assembler
|
|
that they create or have someone create for them. But that manual
|
|
although you might consider it a standard, is not, the machine code is
|
|
the hard and fast standard, the ASCII assembly language is fair game and
|
|
anyone can create their own assembly language for that processor
|
|
with whatever syntax and directives that they want. ARM has a nice
|
|
set of compiler tools, or at least when I worked at a place that paid
|
|
for the tools for a few years and tried them they were very nice and
|
|
conformed of course to the ARM documents. GNU assembler, in true
|
|
GNU assembler fashion does not like to conform to the vendors assembly
|
|
language and generally makes some sort of a mess out of it. Fortunately
|
|
the ARM mess is nowhere near as bad as the x86 mess. Subtle things
|
|
like the comment symbol are the most glaring problems with GNU assembler
|
|
for ARM. Anyway, I dont remember the syntax or directives for the
|
|
ARM tools, the ARM tools have evolved anyway. At the time I did try
|
|
to write asm that would compile on both ARMs tools and gnus tools with
|
|
minimal massaging, and you will forever see me use ;@ for comments
|
|
instead of @ because this ; is the proper, almost universal, symbol for
|
|
a comment in assembly languages from many vendors. This @ is not.
|
|
Combined like this ;@ and you get code that is commented in both worlds
|
|
equally. Enough with that rant, this asm code will continue to be GNU
|
|
assembler specific as that is the toolchain I am using, I dont know if
|
|
it works on any other assembler, I keep the directives to a bare
|
|
minimum though.
|
|
|
|
Another side effect of thumb and in particular thumb2 is that ARM
|
|
decided to change their syntax in subtle ways to come up with a unified
|
|
syntax, for example to perform the addition r0 = r0 + r1
|
|
|
|
Thumb:
|
|
add r0,r1
|
|
|
|
ARM
|
|
add r0,r0,r1
|
|
|
|
Early on you had to write all three registers, but for thumb part of
|
|
the reduction is one source and the destination have to be the same
|
|
register for many of the alu instructions. Now even not the unified but
|
|
certainly the unified syntax attempted to resolve this into a dumbed
|
|
down instruction set. Naturally the unfied cant do everythign of every
|
|
one of the flavors (ARM, thumbv1 and v2), for the most part you basically
|
|
get to write thumb code and have it assemble for ARM without complaints.
|
|
The GNU assembler has also adopted the unified syntax and relaxed its
|
|
rules on the non-unified syntax. I have not switched over to using the
|
|
unified syntax...yet. Eventually I will be forced to and then at that
|
|
time I will likely always use it...
|
|
|
|
There are games you need to play with assembly language directives
|
|
using the GNU assembler in order to get the tool to properly create
|
|
thumb address for use with the bx instruction so you dont have to
|
|
be silly and add one or or one to the address before you use it.
|
|
|
|
So our normal ARM boostrap code:
|
|
|
|
.globl _start
|
|
_start:
|
|
mov sp,#0x00010000
|
|
bl notmain
|
|
hang: b hang
|
|
|
|
For running in thumb mode I recommend going all the way, run everything
|
|
you can in thumb. We have to have some bootstrap in ARM mode, but after
|
|
that it makes your life easier from a compiling and linking perspective
|
|
to go all thumb after the bootstrap. lets dive in.
|
|
|
|
bootstrap.s
|
|
|
|
|
|
.code 32
|
|
.globl _start
|
|
_start:
|
|
mov sp,#0x00010000
|
|
ldr r0,thumbstart_add
|
|
bx r0
|
|
|
|
thumbstart_add: .word thumbstart
|
|
|
|
;@ ----- ARM above, thumb below
|
|
.thumb
|
|
|
|
.thumb_func
|
|
thumbstart:
|
|
bl notmain
|
|
hang: b hang
|
|
|
|
|
|
notmain.c
|
|
|
|
void notmain ( void )
|
|
{
|
|
}
|
|
|
|
lscript
|
|
|
|
MEMORY
|
|
{
|
|
ram : ORIGIN = 0x8000, LENGTH = 0x18000
|
|
}
|
|
|
|
SECTIONS
|
|
{
|
|
.text : { *(.text*) } > ram
|
|
.bss : { *(.bss*) } > ram
|
|
.rodata : { *(.rodata*) } > ram
|
|
.data : { *(.data*) } > ram
|
|
}
|
|
|
|
|
|
|
|
baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
|
|
|
|
|
|
baremetal > arm-none-eabi-gcc -mthumb -O2 -c notmain.c -o notmain.o
|
|
baremetal > arm-none-eabi-ld -T lscript bootstrap.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: e59f0000 ldr r0, [pc] ; 800c <thumbstart_add>
|
|
8008: e12fff10 bx r0
|
|
|
|
0000800c <thumbstart_add>:
|
|
800c: 00008011 andeq r8, r0, r1, lsl r0
|
|
|
|
00008010 <thumbstart>:
|
|
8010: f000 f802 bl 8018 <notmain>
|
|
|
|
00008014 <hang>:
|
|
8014: e7fe b.n 8014 <hang>
|
|
8016: 46c0 nop ; (mov r8, r8)
|
|
|
|
00008018 <notmain>:
|
|
8018: 4770 bx lr
|
|
801a: 46c0 nop ; (mov r8, r8)
|
|
|
|
|
|
So we see the ARM instructions mov sp, ldr r0, and bx r0. These
|
|
are 32 bit instructions and most of them start with an E which makes
|
|
them kind of stand out in a crowd. The .code 32 directive tells
|
|
the assembler to assemble the following code using 32 bit arm
|
|
instructions or at least until I tell you otherwise. the .thumb
|
|
directive is me telling the assembler otherwise. Start assembling
|
|
using 16 bit thumb instructions. Yes the bl is actually two separate
|
|
16 bit instructions and are documented by ARM as such, but always shown
|
|
as a pair in disassembly. It is not a 32 bit instruction.
|
|
|
|
The .thumb_func is used to tell the assembler that the label
|
|
that follows is branch destination for thumb code, when you see this
|
|
label set the lsbit so that I dont have to play any games to switch
|
|
or stay in the right mode. You can see that the thumbstart label
|
|
is at address 0x8010, but the thumbstart_add is 0x8011, the thumbstart
|
|
address with the lsbit set, so that when it hits the bx instruction
|
|
it tells the processor that we want to be in thumb mode. Note that
|
|
bx can be used even if you are staying in the same mode, that is the key
|
|
to it, if you have used the proper address you dont care what
|
|
mode you are branching to. You can write code that calls functions
|
|
and the code making the call can be thumb mode and the code you are
|
|
calling can be ARM mode and so long as the compiler and/or you has
|
|
not messed up, it will properly switch back and forth. Problem is
|
|
the compiler doesnt always get it right. You may see or hear
|
|
the word interwork or thumb interwork (command line options for the
|
|
compiler/tools) which puts extra stuff in there to hopefully have
|
|
it all work out. I prefer as you know to use few/no gcclib or
|
|
clib canned functions (which can be in the wrong mode depending on
|
|
your tools and how lucky you are when linking) and I prefer other
|
|
than the asm startup code to remain as thumb pure as possible to
|
|
minimize any of these problems. This part of the tutorial of course
|
|
is not necessarily about staying thumb pure but showing the problems
|
|
or at least possible problems you will no doubt see when trying to use
|
|
thumb mode.
|
|
|
|
So the simple program above all worked out fine, by remembering to
|
|
place the .thumb_func directive before the label we told the assembler
|
|
to compute the right address, what if we forgot?
|
|
|
|
|
|
.code 32
|
|
.globl _start
|
|
_start:
|
|
mov sp,#0x00010000
|
|
ldr r0,thumbstart_add
|
|
bx r0
|
|
|
|
thumbstart_add: .word thumbstart
|
|
|
|
;@ ----- ARM above, thumb below
|
|
.thumb
|
|
|
|
thumbstart:
|
|
bl notmain
|
|
hang: b hang
|
|
|
|
|
|
baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
|
|
baremetal > arm-none-eabi-ld -T lscript bootstrap.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: e59f0000 ldr r0, [pc] ; 800c <thumbstart_add>
|
|
8008: e12fff10 bx r0
|
|
|
|
0000800c <thumbstart_add>:
|
|
800c: 00008010 andeq r8, r0, r0, lsl r0
|
|
|
|
00008010 <thumbstart>:
|
|
8010: f000 f802 bl 8018 <notmain>
|
|
|
|
00008014 <hang>:
|
|
8014: e7fe b.n 8014 <hang>
|
|
8016: 46c0 nop ; (mov r8, r8)
|
|
|
|
00008018 <notmain>:
|
|
8018: 4770 bx lr
|
|
801a: 46c0 nop ; (mov r8, r8)
|
|
|
|
|
|
Not a single peep from the compiler tools and we have created perfectly
|
|
broken code. It is hard to see in the dump above if you dont know
|
|
what to look for but it will make for a very long day or very expensive
|
|
waste of time playing with thumb if you dont know what to look for.
|
|
that little 0x8010 being loaded into r0 and then the bx r0 in ARM mode
|
|
is telling the processor to branch to address 0x8010 AND STAY IN ARM
|
|
MODE. But the instructions at 0x8010 and the ones that follow are
|
|
thumb mode, they might line up with some sort of ARM instruction
|
|
and the ARM may limp along executing gibberish, but at some point
|
|
in a normal sized program it will hit a pair of thumb instructions
|
|
whose binary pattern are not a valid ARM instruction and the arm
|
|
will fire off the undefined instruction exception. One wee little
|
|
bit is all the difference between success and massive failure in the
|
|
above code.
|
|
|
|
Now lets try mixing the modes and see what the tool does. I am running
|
|
a somewhat cutting edge gcc and binutils as of this writing:
|
|
|
|
baremetal > arm-none-eabi-gcc --version
|
|
arm-none-eabi-gcc (GCC) 4.7.1
|
|
Copyright (C) 2012 Free Software Foundation, Inc.
|
|
This is free software; see the source for copying conditions. There is NO
|
|
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
|
|
|
baremetal > arm-none-eabi-as --version
|
|
GNU assembler (GNU Binutils) 2.22
|
|
Copyright 2011 Free Software Foundation, Inc.
|
|
This program is free software; you may redistribute it under the terms of
|
|
the GNU General Public License version 3 or later.
|
|
This program has absolutely no warranty.
|
|
This assembler was configured for a target of `arm-none-eabi'.
|
|
|
|
I have been using the gnu tools for ARM since the 2.95.x days of gcc.
|
|
starting with thumb in the 3.x.x days pretty much every version from
|
|
then to the present. And there have been good ones and bad ones as
|
|
to how the mixing of modes is resolved. I have to say these newer
|
|
versions are doing a better job, but I know in recent months I did
|
|
trip it up, will see if I can again.
|
|
|
|
Fixing our bootstrap and not using the -mthumb option, builds ARM code:
|
|
|
|
baremetal > arm-none-eabi-gcc -O2 -c notmain.c -o notmain.o
|
|
baremetal > arm-none-eabi-ld -T lscript bootstrap.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: e59f0000 ldr r0, [pc] ; 800c <thumbstart_add>
|
|
8008: e12fff10 bx r0
|
|
|
|
0000800c <thumbstart_add>:
|
|
800c: 00008011 andeq r8, r0, r0, lsl r0
|
|
|
|
00008010 <thumbstart>:
|
|
8010: f000 f806 bl 8020 <__notmain_from_thumb>
|
|
|
|
00008014 <hang>:
|
|
8014: e7fe b.n 8014 <hang>
|
|
8016: 46c0 nop ; (mov r8, r8)
|
|
|
|
00008018 <notmain>:
|
|
8018: e12fff1e bx lr
|
|
801c: 00000000 andeq r0, r0, r0
|
|
|
|
00008020 <__notmain_from_thumb>:
|
|
8020: 4778 bx pc
|
|
8022: 46c0 nop ; (mov r8, r8)
|
|
8024: eafffffb b 8018 <notmain>
|
|
|
|
|
|
very nicely handled. after thumbstart they use a bl instruction
|
|
as we had in the assemblly language code so that the link register
|
|
is filled in not only with a return address but the return address
|
|
with the lsbit set so that we return to the right mode with a bx lr
|
|
instruction. Instead of branching right to the ARM code though
|
|
which would not work you cannot use bl to switch modes, they
|
|
branch to what I call a trampoline, when they hit
|
|
__notmain_from_thumb the link register is prepped to return to address
|
|
0x8014. I am not teaching you assembly just how to see what is going
|
|
on, but this next thing is advanced even for assembly programmers.
|
|
In whichever mode the program counter points to two instructions ahead
|
|
so in this case we are running instruction 0x8020 bx pc in thumb mode
|
|
thumb mode is 2 bytes per instruction, two instructions ahead is the
|
|
address 0x8024 and note that that address has a zero in the lsbit so
|
|
this is a cool trick, the linker by adding these instructions at a
|
|
four byte aligned address (lower two bits are zero) 0x8020 then doing
|
|
a bx pc, and sticking a nop in between although I dont think it matters
|
|
what is there. The bx pc causes a switch to ARM mode and a branch to
|
|
address 0x8024, which being a trampoline to bounce off of, that instruction
|
|
bounces us back to 0x8018 which is the ARM instruction we wanted
|
|
to get to. this is all good, this code will run properly.
|
|
|
|
|
|
You may or may not know that compilers for a processor follow a "calling
|
|
convention" or binary interface or whatever term you like. It is a set
|
|
of rules for generating the code for a function so that you can have
|
|
functions call functions call functions and any function can
|
|
return values and the code generated will all work without having to
|
|
have some secret knowledge into the code for each function calling it.
|
|
Conform to the calling convention and the code will all work together.
|
|
Now the conventions are not hard and fast rules any more than assembly
|
|
language is a standard for any particular processor. These things
|
|
change from time to time in some cases. For the ARM, in general across
|
|
the compilers I have used the first four registers r0,r1,r2,r3 are
|
|
used for passing the first up to 16 bytes worth of parameters, r0 is
|
|
used for returning things, etc. I find it surprising how often
|
|
I see someone who is trying to write a simple bit of assembly what
|
|
the calling convention is for a particular processor using a particular
|
|
compiler. Most often gcc for example. Well why dont you ask the
|
|
compiler itself it will tell you, for example:
|
|
|
|
unsigned int fun ( unsigned int a, unsigned int b )
|
|
{
|
|
return((a>>1)+b);
|
|
}
|
|
|
|
|
|
baremetal > arm-none-eabi-gcc -O2 -c fun.c -o fun.o
|
|
baremetal > arm-none-eabi-objdump -D fun.o
|
|
|
|
fun.o: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00000000 <fun>:
|
|
0: e08100a0 add r0, r1, r0, lsr #1
|
|
4: e12fff1e bx lr
|
|
|
|
So what did I just figure out? Well if I had that function in C and
|
|
used that compiler and linked in that object code it would work with
|
|
other code created by that compiler, so that object code must follow
|
|
the calling convention. What I figured out is from that trivial experiment
|
|
is that if I want to make a function in assembly code that uses two
|
|
inputs and one output (unsigned 32 bits each) then the first parameter,
|
|
a in this case, is passed in r0, the second is passed in r1, and the
|
|
return value is in r0. let me jump to a complete different processor
|
|
for a second.
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00000000 <fun>:
|
|
0: b8 63 00 41 l.srli r3,r3,0x1
|
|
4: 44 00 48 00 l.jr r9
|
|
8: e1 64 18 00 l.add r11,r4,r3
|
|
|
|
This is not ARM but some completely different instruction set, and the
|
|
compiler for it has a different calling convention. What I see here is
|
|
that the first parameter is passed in register r3, the second parameter
|
|
is passed in r4 and the return value goes back in r11. and it just
|
|
so happens that the link register is r9.
|
|
|
|
Yes, it is true that I have not yet figured out what registers
|
|
I can modify without preserving them and what registers I have to
|
|
preserve, etc, etc. You can figure that out with these simple experiments
|
|
with practice. Because sometimes you may think you have found the
|
|
docment describing the calling convention only to find you have not.
|
|
And as far as preservation, if in doubt preserve everything but the
|
|
return registers...
|
|
|
|
So if you have looked at my work you see that I prefer to perform
|
|
singular memory accesses using hand written assembly routines like
|
|
PUT32 and GET32. Not going to say why here and now, I have mentioned
|
|
it elsewhere and it doesnt matter for this discussion. Lets accept
|
|
it and move on to use it, a quick thumb experiment:
|
|
|
|
|
|
baremetal > arm-none-eabi-gcc -mthumb -O2 -c fun.c -o fun.o
|
|
baremetal > arm-none-eabi-objdump -D fun.o
|
|
|
|
fun.o: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00000000 <fun>:
|
|
0: 0840 lsrs r0, r0, #1
|
|
2: 1808 adds r0, r1, r0
|
|
4: 4770 bx lr
|
|
6: 46c0 nop ; (mov r8, r8)
|
|
|
|
r0 is first paramter, r1 second, and return value is r0.
|
|
|
|
So to create a PUT32 in thumb mode, since we already have some
|
|
assembly in our project, lets just put it there:
|
|
|
|
bootstrap.s
|
|
|
|
.code 32
|
|
.globl _start
|
|
_start:
|
|
mov sp,#0x00010000
|
|
ldr r0,thumbstart_add
|
|
bx r0
|
|
|
|
thumbstart_add: .word thumbstart
|
|
|
|
;@ ----- ARM above, thumb below
|
|
.thumb
|
|
|
|
.thumb_func
|
|
thumbstart:
|
|
bl notmain
|
|
hang: b hang
|
|
|
|
.thumb_func
|
|
.globl PUT32
|
|
PUT32:
|
|
str r1,[r0]
|
|
bx lr
|
|
|
|
|
|
And use it in notmain.c
|
|
|
|
void PUT32 ( unsigned int, unsigned int );
|
|
void notmain ( void )
|
|
{
|
|
PUT32(0x0000B000,0x12345678);
|
|
}
|
|
|
|
And make notmain ARM code
|
|
|
|
|
|
|
|
baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
|
|
baremetal > arm-none-eabi-gcc -O2 -c notmain.c -o notmain.o
|
|
baremetal > arm-none-eabi-ld -T lscript bootstrap.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: e59f0000 ldr r0, [pc] ; 800c <thumbstart_add>
|
|
8008: e12fff10 bx r0
|
|
|
|
0000800c <thumbstart_add>:
|
|
800c: 00008011 andeq r8, r0, r1, lsl r0
|
|
|
|
00008010 <thumbstart>:
|
|
8010: f000 f818 bl 8044 <__notmain_from_thumb>
|
|
|
|
00008014 <hang>:
|
|
8014: e7fe b.n 8014 <hang>
|
|
|
|
00008016 <PUT32>:
|
|
8016: 6001 str r1, [r0, #0]
|
|
8018: 4770 bx lr
|
|
801a: 46c0 nop ; (mov r8, r8)
|
|
|
|
0000801c <notmain>:
|
|
801c: e92d4008 push {r3, lr}
|
|
8020: e3a00a0b mov r0, #45056 ; 0xb000
|
|
8024: e59f1008 ldr r1, [pc, #8] ; 8034 <notmain+0x18>
|
|
8028: eb000002 bl 8038 <__PUT32_from_arm>
|
|
802c: e8bd4008 pop {r3, lr}
|
|
8030: e12fff1e bx lr
|
|
8034: 12345678 eorsne r5, r4, #125829120 ; 0x7800000
|
|
|
|
00008038 <__PUT32_from_arm>:
|
|
8038: e59fc000 ldr ip, [pc] ; 8040 <__PUT32_from_arm+0x8>
|
|
803c: e12fff1c bx ip
|
|
8040: 00008017 andeq r8, r0, r7, lsl r0
|
|
|
|
00008044 <__notmain_from_thumb>:
|
|
8044: 4778 bx pc
|
|
8046: 46c0 nop ; (mov r8, r8)
|
|
8048: eafffff3 b 801c <notmain>
|
|
804c: 00000000 andeq r0, r0, r0
|
|
|
|
So we start in arm, use 0x8011 to swich to thumb mode at address 0x8010
|
|
trampoline off to get to 0x801C entering notmain in ARM mode. and we
|
|
branch link to another trampoline. This one is not complicated as
|
|
we did this ourselves right after _start. Load a register with
|
|
the address orred with one. 0x8017 fed to bx means switch to thumb
|
|
mode and branch to 0x8016 which is our PUT32 in thumb mode.
|
|
|
|
lets go the other way, PUT32 in ARM mode called from thumb code
|
|
|
|
|
|
baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
|
|
baremetal > arm-none-eabi-gcc -mthumb -O2 -c notmain.c -o notmain.o
|
|
baremetal > arm-none-eabi-ld -T lscript bootstrap.o notmain.o -o hello.elf
|
|
baremetal > arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: e59f0000 ldr r0, [pc] ; 800c <thumbstart_add>
|
|
8008: e12fff10 bx r0
|
|
|
|
0000800c <thumbstart_add>:
|
|
800c: 00008019 andeq r8, r0, r9, lsl r0
|
|
|
|
00008010 <PUT32>:
|
|
8010: e5801000 str r1, [r0]
|
|
8014: e12fff1e bx lr
|
|
|
|
00008018 <thumbstart>:
|
|
8018: f000 f802 bl 8020 <notmain>
|
|
|
|
0000801c <hang>:
|
|
801c: e7fe b.n 801c <hang>
|
|
801e: 46c0 nop ; (mov r8, r8)
|
|
|
|
00008020 <notmain>:
|
|
8020: b508 push {r3, lr}
|
|
8022: 20b0 movs r0, #176 ; 0xb0
|
|
8024: 0200 lsls r0, r0, #8
|
|
8026: 4903 ldr r1, [pc, #12] ; (8034 <notmain+0x14>)
|
|
8028: f7ff fff2 bl 8010 <PUT32>
|
|
802c: bc08 pop {r3}
|
|
802e: bc01 pop {r0}
|
|
8030: 4700 bx r0
|
|
8032: 46c0 nop ; (mov r8, r8)
|
|
8034: 12345678 eorsne r5, r4, #125829120 ; 0x7800000
|
|
|
|
|
|
And we did it, this code is broken and will not work. Can you see
|
|
the problem? PUT32 is in ARM mode at address 0x8010. Notmain is
|
|
thumb code. You cannot use a branch link to get to ARM mode from
|
|
thumb mode you have to use bx (or blx). The bl 0x8010 will start
|
|
executing the code at 0x8010 as if it were thumb instructions, and
|
|
you might get lucky in this case and survive long enogh to run
|
|
into the thumbstart code which in this case puts you right back into
|
|
notmain sending you into an infinite loop. One might hope that at
|
|
least the ARM machine code at 0x8010 is not valid thumb machine code
|
|
and will cause an undefined instruction exception which if you bothered
|
|
to make an exception handler for you might start to see why the
|
|
code doesnt work.
|
|
|
|
It was very easy to fall into this trap, and very very hard to find
|
|
out where and why the failure is until you have lived the pain or been
|
|
shown where to look. Even with me showing you where to look you may
|
|
still end up spending hours or days on this. But as you do know
|
|
as an experienced programmer each time you spend hours or days on
|
|
some bug, you learn from that experience and the next time you
|
|
are much faster at recognizing the problem and where to look. If you
|
|
happen to get bitten a few times you should get very fast at finding
|
|
the problem.
|
|
|
|
If I add this
|
|
|
|
notmain.c
|
|
|
|
extern unsigned int fun ( unsigned int, unsigned int );
|
|
extern void PUT32 ( unsigned int, unsigned int );
|
|
void notmain ( void )
|
|
{
|
|
fun(123,456);
|
|
PUT32(0x0000B000,0x12345678);
|
|
}
|
|
|
|
and this
|
|
|
|
|
|
unsigned int fun ( unsigned int a, unsigned int b )
|
|
{
|
|
return((a>>1)+b);
|
|
}
|
|
|
|
|
|
dwelch-desktop baremetal # arm-none-eabi-gcc -O2 -c fun.c -o fun.o
|
|
dwelch-desktop baremetal # arm-none-eabi-ld -T lscript bootstrap.o notmain.o fun.o -o hello.elf
|
|
dwelch-desktop baremetal # arm-none-eabi-objdump -D hello.elf
|
|
|
|
hello.elf: file format elf32-littlearm
|
|
|
|
|
|
Disassembly of section .text:
|
|
|
|
00008000 <_start>:
|
|
8000: e3a0d801 mov sp, #65536 ; 0x10000
|
|
8004: e59f0000 ldr r0, [pc] ; 800c <thumbstart_add>
|
|
8008: e12fff10 bx r0
|
|
|
|
0000800c <thumbstart_add>:
|
|
800c: 00008019 andeq r8, r0, r9, lsl r0
|
|
|
|
00008010 <PUT32>:
|
|
8010: e5801000 str r1, [r0]
|
|
8014: e12fff1e bx lr
|
|
|
|
00008018 <thumbstart>:
|
|
8018: f000 f802 bl 8020 <notmain>
|
|
|
|
0000801c <hang>:
|
|
801c: e7fe b.n 801c <hang>
|
|
801e: 46c0 nop ; (mov r8, r8)
|
|
|
|
00008020 <notmain>:
|
|
8020: b508 push {r3, lr}
|
|
8022: 21e4 movs r1, #228 ; 0xe4
|
|
8024: 0049 lsls r1, r1, #1
|
|
8026: 207b movs r0, #123 ; 0x7b
|
|
8028: f000 f80e bl 8048 <__fun_from_thumb>
|
|
802c: 20b0 movs r0, #176 ; 0xb0
|
|
802e: 0200 lsls r0, r0, #8
|
|
8030: 4902 ldr r1, [pc, #8] ; (803c <notmain+0x1c>)
|
|
8032: f7ff ffed bl 8010 <PUT32>
|
|
8036: bc08 pop {r3}
|
|
8038: bc01 pop {r0}
|
|
803a: 4700 bx r0
|
|
803c: 12345678 eorsne r5, r4, #125829120 ; 0x7800000
|
|
|
|
00008040 <fun>:
|
|
8040: e08100a0 add r0, r1, r0, lsr #1
|
|
8044: e12fff1e bx lr
|
|
|
|
00008048 <__fun_from_thumb>:
|
|
8048: 4778 bx pc
|
|
804a: 46c0 nop ; (mov r8, r8)
|
|
804c: eafffffb b 8040 <fun>
|
|
|
|
fun() which is in ARM mode, when called from notmain() which is thumb
|
|
mode is handled properly. So there is something there that tells the
|
|
linker that fun is ARM and needs a mode change.
|
|
|
|
When we use .thumb_func for thumb functions in assembly that triggers
|
|
the linker to do the right thing. I wonder if there is something
|
|
in ARM functions in assembly that we can use to do the same thing.
|
|
|
|
This is another one of my personal preferences: when using thumb mode
|
|
on an ARM booting system I use the minimal ARM code to get into thumb
|
|
mode in the bootstrap code then everywhere else I stay in thumb mode
|
|
as far as I know. If there is a time where I need ARM mode then I
|
|
am careful to see if the tools changed mode properly or I may do my
|
|
own mode change the tools dont have to get it right.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
this is a rough draft, if/when I complete this draft I will at some point
|
|
go back through and rework it to improve it.
|