more typos fixed in baremetal
This commit is contained in:
108
baremetal/README
108
baremetal/README
@@ -1,6 +1,7 @@
|
|||||||
this is a rough draft, if/when I complete this draft I will at some point
|
this is a rough draft, if/when I complete this draft I will at some point
|
||||||
go back through and rework it to improve it.
|
go back through and rework it to improve it.
|
||||||
Update: draft 2. I went through almost all of this and cleaned it up.
|
Update: draft 2. I went through almost all of this and cleaned it up.
|
||||||
|
Update: draft 3. Lots of typos and misspellings that I had missed before
|
||||||
|
|
||||||
THIS IS NOT AN ASSEMBLY LANGUAGE TUTORIAL, IT DOES HAVE A LOT OF
|
THIS IS NOT AN ASSEMBLY LANGUAGE TUTORIAL, IT DOES HAVE A LOT OF
|
||||||
ASSEMBLY LANGUAGE IT IT. IF YOU ARE STUCK FOCUSING ON THE ASSEMBLY
|
ASSEMBLY LANGUAGE IT IT. IF YOU ARE STUCK FOCUSING ON THE ASSEMBLY
|
||||||
@@ -881,14 +882,14 @@ orange global variables way above. Data actually is broken up into
|
|||||||
different segments sometimes, and in particular with the GNU tools.
|
different segments sometimes, and in particular with the GNU tools.
|
||||||
Most of the code out there that has global variables the globals are
|
Most of the code out there that has global variables the globals are
|
||||||
not defined, not initialized in the code, but the language declares
|
not defined, not initialized in the code, but the language declares
|
||||||
those are assumed to be zero when you start using them (if you have
|
those as assumed to be zero when you start using them (if you have
|
||||||
not changed them before you used them). So there is a special data
|
not changed them before you read them). So there is a special data
|
||||||
segment called .bss which holds all of our global variables that when
|
segment called .bss which holds all of our .data that when we start
|
||||||
we start running C code should be zero. These are lumped together so
|
running C code should be zero. These are lumped together so that some
|
||||||
that some code can easily go through that chunk of memory and zero that
|
code can easily go through that chunk of memory and zero that
|
||||||
area before branching to the C entry point. Another segment we may
|
area before branching to the C entry point. Another segment we may
|
||||||
encounter is the .rodata segment. Sometimes even with GNU tools you
|
encounter is the .rodata segment. Sometimes even with GNU tools you
|
||||||
may find the read only data in the .text segment.
|
may find the read only data in the .text segment.
|
||||||
|
|
||||||
For fun lets make one of each:
|
For fun lets make one of each:
|
||||||
|
|
||||||
@@ -932,10 +933,10 @@ Well notice that I used -O2 on the gcc command line this means
|
|||||||
optimization level 2. -O0 or optimizaiton level 0 means no optimization
|
optimization level 2. -O0 or optimizaiton level 0 means no optimization
|
||||||
-O1 means some and -O2 is the maximum safe level of optimization using
|
-O1 means some and -O2 is the maximum safe level of optimization using
|
||||||
the gcc compiler. There is a -O3 but we are not supposed to trust that
|
the gcc compiler. There is a -O3 but we are not supposed to trust that
|
||||||
to be as tested as -O2. I am not going to get into that but recommend
|
to be as well tested as -O2. I am not going to get into that but
|
||||||
you use -O2 often, esp with embedded bare metal where size and speed
|
recommend you use -O2 often, esp with embedded bare metal where size and
|
||||||
are important. I use it here because it produces much less code than
|
speed are important. I use it here because it produces much less code
|
||||||
no optimization, you can play with compiling and disassembling these
|
than no optimization, you can play with compiling and disassembling these
|
||||||
things on your own with less or without optimization to see what
|
things on your own with less or without optimization to see what
|
||||||
happens.
|
happens.
|
||||||
|
|
||||||
@@ -1056,7 +1057,8 @@ the value 9 that we pre-initialized.
|
|||||||
I want to point something out here that is very important for general
|
I want to point something out here that is very important for general
|
||||||
bare metal programming. What do we have above, something like 12 32
|
bare metal programming. What do we have above, something like 12 32
|
||||||
bit numbers which is 12*4 = 48 bytes. So if I make this a true
|
bit numbers which is 12*4 = 48 bytes. So if I make this a true
|
||||||
binary we should see 48 bytes right? Well you would be wrong:
|
binary (memory image) we should see 48 bytes right? Well you would be
|
||||||
|
wrong:
|
||||||
|
|
||||||
baremetal > ls -al hello.elf
|
baremetal > ls -al hello.elf
|
||||||
-rwxr-xr-x 1 root root 38002 Sep 23 15:06 hello.elf
|
-rwxr-xr-x 1 root root 38002 Sep 23 15:06 hello.elf
|
||||||
@@ -1126,7 +1128,7 @@ There are 0x60000000 bytes between these two items, that means the
|
|||||||
binary file created would at least be 0x60000000 bytes which is
|
binary file created would at least be 0x60000000 bytes which is
|
||||||
1.6 GigaBytes. If you are like me you probably dont always have
|
1.6 GigaBytes. If you are like me you probably dont always have
|
||||||
1.6Gig of disk space handy. Much less wanting it to be filled with a
|
1.6Gig of disk space handy. Much less wanting it to be filled with a
|
||||||
singel file which is mostly zeros. You can start to see the appeal for
|
single file which is mostly zeros. You can start to see the appeal for
|
||||||
these not really a binary binary file formats like elf and ihex and
|
these not really a binary binary file formats like elf and ihex and
|
||||||
srec. They only define the real data and dont have to hold the zero
|
srec. They only define the real data and dont have to hold the zero
|
||||||
filler.
|
filler.
|
||||||
@@ -1520,7 +1522,7 @@ variable pear now has its own address in memory, it did not get
|
|||||||
optimized out.
|
optimized out.
|
||||||
|
|
||||||
I dont expect you to know assembly language but what I want to you to
|
I dont expect you to know assembly language but what I want to you to
|
||||||
see is a continuation what we discussed before with respect to the
|
see is a continuation of what we discussed before with respect to the
|
||||||
branch link instruction and the link register. The ARM instruction
|
branch link instruction and the link register. The ARM instruction
|
||||||
set uses branch link (bl) to make function calls. The branch means
|
set uses branch link (bl) to make function calls. The branch means
|
||||||
goto or jump or branch the program to some address. The link means
|
goto or jump or branch the program to some address. The link means
|
||||||
@@ -1689,7 +1691,7 @@ runtime=end-start;
|
|||||||
|
|
||||||
And this may lead you to believe that this is not the code causing
|
And this may lead you to believe that this is not the code causing
|
||||||
your performance problems. Or hopefully you realize that this code
|
your performance problems. Or hopefully you realize that this code
|
||||||
is executing way to fast and there is something wrong with your
|
is executing way too fast and there is something wrong with your
|
||||||
experiment. Knowing enough assembly code to see what is going on
|
experiment. Knowing enough assembly code to see what is going on
|
||||||
will clue you into the optimization, just like in the notmain() example
|
will clue you into the optimization, just like in the notmain() example
|
||||||
above.
|
above.
|
||||||
@@ -1705,10 +1707,9 @@ compiler to do what you want or of you have borrowed some code you
|
|||||||
might have to have GCC do the assembling or linking. Some folks like
|
might have to have GCC do the assembling or linking. Some folks like
|
||||||
to put C stuff like defines and comment symbols in their assembler code
|
to put C stuff like defines and comment symbols in their assembler code
|
||||||
which works fine if you feed it through gcc, but it is not assembly
|
which works fine if you feed it through gcc, but it is not assembly
|
||||||
code it is some sort of hybrid. Doesnt stop people from doing it, and
|
language it is some sort of hybrid. Doesnt stop people from doing it,
|
||||||
when you borrow that code you either have to fix the code or use the C
|
and when you borrow that code you either have to fix the code or use the
|
||||||
compiler as an assembler.
|
C compiler as an assembler.
|
||||||
|
|
||||||
|
|
||||||
bootstrap.s
|
bootstrap.s
|
||||||
|
|
||||||
@@ -2004,7 +2005,7 @@ instructions provide some cost and performance benefits for embedded
|
|||||||
systems. First off you can pack more instructions into the same
|
systems. First off you can pack more instructions into the same
|
||||||
amount of memory, understanding that it may take more instructions to
|
amount of memory, understanding that it may take more instructions to
|
||||||
perform the same task using thumb instructions than it would have using
|
perform the same task using thumb instructions than it would have using
|
||||||
ARM. My experiements at the time showed about 10-15% more instructions,
|
ARM. My experiments at the time showed about 10-15% more instructions,
|
||||||
but half the memory so that was a fair tradeoff. I know of one platform
|
but half the memory so that was a fair tradeoff. I know of one platform
|
||||||
that went so far as to use 16 bit memory busses, which actually made
|
that went so far as to use 16 bit memory busses, which actually made
|
||||||
thumb mode run much faster than ARM mode on that platform. That
|
thumb mode run much faster than ARM mode on that platform. That
|
||||||
@@ -2021,7 +2022,13 @@ bits you can have in that register. Note that that lower bit
|
|||||||
is stripped off it is only used by the bx instruction itself the
|
is stripped off it is only used by the bx instruction itself the
|
||||||
address in the program counter always has the lower two bits zero
|
address in the program counter always has the lower two bits zero
|
||||||
for ARM mode (4 byte instructions) and the lower bit zero for
|
for ARM mode (4 byte instructions) and the lower bit zero for
|
||||||
thumb instructions (2 or 4 byte instructions).
|
thumb instructions (2 or 4 byte instructions). Note the bx/blx
|
||||||
|
instruction is not the only way to switch modes, sometimes you can
|
||||||
|
use the pop instruction, but bx works the same way on all ARM
|
||||||
|
architectures that I know of, the other solutions (pop for example)
|
||||||
|
vary in if/how they work for switching modes depending on the ARM
|
||||||
|
architecture in question. So that makes for very unportable code
|
||||||
|
across ARM if you are not careful. When in doubt just use BX.
|
||||||
|
|
||||||
Here again the goal is not to teach assembly but you may want to
|
Here again the goal is not to teach assembly but you may want to
|
||||||
get the ARM Architectural Reference Manual for this platform
|
get the ARM Architectural Reference Manual for this platform
|
||||||
@@ -2054,19 +2061,18 @@ least try. Assembly language in general does not have a standard.
|
|||||||
A company designs a chip, which means they create an instruction set,
|
A company designs a chip, which means they create an instruction set,
|
||||||
binary machine code instructions, and generally they create an
|
binary machine code instructions, and generally they create an
|
||||||
assembly language so that they can write down and talk about those
|
assembly language so that they can write down and talk about those
|
||||||
instructions without going insane with confusion and/or pain. And
|
instructions using mnemonics instead of patterns of ones and zeros.
|
||||||
not always but often if that company actually wants to sell those
|
And not always but often if that company actually wants to sell those
|
||||||
processors they create or hire someone to create an assembler and
|
processors, so they create or hire someone to create an assembler and
|
||||||
a compiler or few. Assembly language, like C language, has
|
a compiler or few. Assembly language, like C language, has
|
||||||
directives that are not actually code like #pragma in C for example
|
directives that are not actually code like #pragma in C for example
|
||||||
you are using that to talk to the compiler not using it as code
|
you are using that to talk to the compiler not using it as code
|
||||||
necessarily. Assembly has those as well, many of them. The vendor
|
necessarily. Assembly has those as well, many of them. It is in the
|
||||||
will often at a minimum use the syntax for the assembly language
|
processor vendors best interest to use the same assembly language
|
||||||
instructions in the manual they create or have someone create to
|
syntax for the instructions in the processor manual in the assembler
|
||||||
provide to users of this processor they want to sell and if smart
|
that they create or have someone create for them. But that manual
|
||||||
will have the assembler match that manual. But that manual although
|
although you might consider it a standard, is not, the machine code is
|
||||||
you might consider it a standard, is not, the machine code is the
|
the hard and fast standard, the ASCII assembly language is fair game and
|
||||||
hard and fast standard, the ASCII assembly language is fair game and
|
|
||||||
anyone can create their own assembly language for that processor
|
anyone can create their own assembly language for that processor
|
||||||
with whatever syntax and directives that they want. ARM has a nice
|
with whatever syntax and directives that they want. ARM has a nice
|
||||||
set of compiler tools, or at least when I worked at a place that paid
|
set of compiler tools, or at least when I worked at a place that paid
|
||||||
@@ -2084,7 +2090,9 @@ instead of @ because this ; is the proper, almost universal, symbol for
|
|||||||
a comment in assembly languages from many vendors. This @ is not.
|
a comment in assembly languages from many vendors. This @ is not.
|
||||||
Combined like this ;@ and you get code that is commented in both worlds
|
Combined like this ;@ and you get code that is commented in both worlds
|
||||||
equally. Enough with that rant, this asm code will continue to be GNU
|
equally. Enough with that rant, this asm code will continue to be GNU
|
||||||
assembler specific I dont know if it works on any other assembler.
|
assembler specific as that is the toolchain I am using, I dont know if
|
||||||
|
it works on any other assembler, I keep the directives to a bare
|
||||||
|
minimum though.
|
||||||
|
|
||||||
Another side effect of thumb and in particular thumb2 is that ARM
|
Another side effect of thumb and in particular thumb2 is that ARM
|
||||||
decided to change their syntax in subtle ways to come up with a unified
|
decided to change their syntax in subtle ways to come up with a unified
|
||||||
@@ -2210,16 +2218,16 @@ instructions or at least until I tell you otherwise. the .thumb
|
|||||||
directive is me telling the assembler otherwise. Start assembling
|
directive is me telling the assembler otherwise. Start assembling
|
||||||
using 16 bit thumb instructions. Yes the bl is actually two separate
|
using 16 bit thumb instructions. Yes the bl is actually two separate
|
||||||
16 bit instructions and are documented by ARM as such, but always shown
|
16 bit instructions and are documented by ARM as such, but always shown
|
||||||
as a pair in disassembly.
|
as a pair in disassembly. It is not a 32 bit instruction.
|
||||||
|
|
||||||
The .thumb_func is used to tell the assembler that the label
|
The .thumb_func is used to tell the assembler that the label
|
||||||
that follows is branch destination for thumb code, when you see this
|
that follows is branch destination for thumb code, when you see this
|
||||||
label set the lsbit so that I dont have to play any games to switch
|
label set the lsbit so that I dont have to play any games to switch
|
||||||
or stay in the right mode. You can see that the thumbstart label
|
or stay in the right mode. You can see that the thumbstart label
|
||||||
is at address 0x8010, but the thumb_start add is 0x8011, the thumbstart
|
is at address 0x8010, but the thumbstart_add is 0x8011, the thumbstart
|
||||||
address with the lsbit set, so that when it hits the bx instruction
|
address with the lsbit set, so that when it hits the bx instruction
|
||||||
it tells the processor that we want to be in thumb mode. Note that
|
it tells the processor that we want to be in thumb mode. Note that
|
||||||
bx is used even if you are staying in the same mode, that is the key
|
bx can be used even if you are staying in the same mode, that is the key
|
||||||
to it, if you have used the proper address you dont care what
|
to it, if you have used the proper address you dont care what
|
||||||
mode you are branching to. You can write code that calls functions
|
mode you are branching to. You can write code that calls functions
|
||||||
and the code making the call can be thumb mode and the code you are
|
and the code making the call can be thumb mode and the code you are
|
||||||
@@ -2385,16 +2393,17 @@ address 0x8024, which being a trampoline to bounce off of, that instruction
|
|||||||
bounces us back to 0x8018 which is the ARM instruction we wanted
|
bounces us back to 0x8018 which is the ARM instruction we wanted
|
||||||
to get to. this is all good, this code will run properly.
|
to get to. this is all good, this code will run properly.
|
||||||
|
|
||||||
|
|
||||||
You may or may not know that compilers for a processor follow a "calling
|
You may or may not know that compilers for a processor follow a "calling
|
||||||
convention" or binary interface or whatever term you like. It is a set
|
convention" or binary interface or whatever term you like. It is a set
|
||||||
of rules for generating the code for a function so that you can have
|
of rules for generating the code for a function so that you can have
|
||||||
functions call functions call functions and any function can
|
functions call functions call functions and any function can
|
||||||
return values and the code generated will all work without having to
|
return values and the code generated will all work without having to
|
||||||
have some secret knowledge into the code for each function calling it.
|
have some secret knowledge into the code for each function calling it.
|
||||||
conform to the calling convention and the code will all work together.
|
Conform to the calling convention and the code will all work together.
|
||||||
Now the conventions are not hard and fast rules any more than assembly
|
Now the conventions are not hard and fast rules any more than assembly
|
||||||
language is a standard for any particular processor. these things
|
language is a standard for any particular processor. These things
|
||||||
change from time to time in some cases. For the arm, in general across
|
change from time to time in some cases. For the ARM, in general across
|
||||||
the compilers I have used the first four registers r0,r1,r2,r3 are
|
the compilers I have used the first four registers r0,r1,r2,r3 are
|
||||||
used for passing the first up to 16 bytes worth of parameters, r0 is
|
used for passing the first up to 16 bytes worth of parameters, r0 is
|
||||||
used for returning things, etc. I find it surprising how often
|
used for returning things, etc. I find it surprising how often
|
||||||
@@ -2424,7 +2433,7 @@ Disassembly of section .text:
|
|||||||
So what did I just figure out? Well if I had that function in C and
|
So what did I just figure out? Well if I had that function in C and
|
||||||
used that compiler and linked in that object code it would work with
|
used that compiler and linked in that object code it would work with
|
||||||
other code created by that compiler, so that object code must follow
|
other code created by that compiler, so that object code must follow
|
||||||
the calling convention. what I figured out is from that trivial experiment
|
the calling convention. What I figured out is from that trivial experiment
|
||||||
is that if I want to make a function in assembly code that uses two
|
is that if I want to make a function in assembly code that uses two
|
||||||
inputs and one output (unsigned 32 bits each) then the first parameter,
|
inputs and one output (unsigned 32 bits each) then the first parameter,
|
||||||
a in this case, is passed in r0, the second is passed in r1, and the
|
a in this case, is passed in r0, the second is passed in r1, and the
|
||||||
@@ -2439,14 +2448,15 @@ Disassembly of section .text:
|
|||||||
4: 44 00 48 00 l.jr r9
|
4: 44 00 48 00 l.jr r9
|
||||||
8: e1 64 18 00 l.add r11,r4,r3
|
8: e1 64 18 00 l.add r11,r4,r3
|
||||||
|
|
||||||
Call me twisted an evil toward you but, what I see here is that
|
This is not ARM but some completely different instruction set, and the
|
||||||
the first parameter is passed in register r3, the second parameter
|
compiler for it has a different calling convention. What I see here is
|
||||||
|
that the first parameter is passed in register r3, the second parameter
|
||||||
is passed in r4 and the return value goes back in r11. and it just
|
is passed in r4 and the return value goes back in r11. and it just
|
||||||
so happens that the link register is r9.
|
so happens that the link register is r9.
|
||||||
|
|
||||||
Yes, it is true that I have not yet figured out what registers
|
Yes, it is true that I have not yet figured out what registers
|
||||||
I can modify without preserving them and what registers I have to
|
I can modify without preserving them and what registers I have to
|
||||||
preserve, etc, etc. You can figure that out with these simple experiements
|
preserve, etc, etc. You can figure that out with these simple experiments
|
||||||
with practice. Because sometimes you may think you have found the
|
with practice. Because sometimes you may think you have found the
|
||||||
docment describing the calling convention only to find you have not.
|
docment describing the calling convention only to find you have not.
|
||||||
And as far as preservation, if in doubt preserve everything but the
|
And as far as preservation, if in doubt preserve everything but the
|
||||||
@@ -2455,8 +2465,8 @@ return registers...
|
|||||||
So if you have looked at my work you see that I prefer to perform
|
So if you have looked at my work you see that I prefer to perform
|
||||||
singular memory accesses using hand written assembly routines like
|
singular memory accesses using hand written assembly routines like
|
||||||
PUT32 and GET32. Not going to say why here and now, I have mentioned
|
PUT32 and GET32. Not going to say why here and now, I have mentioned
|
||||||
it elsewhere and it doesnt matter for this discussion. Moving on, lets
|
it elsewhere and it doesnt matter for this discussion. Lets accept
|
||||||
do a quick thumb experiment:
|
it and move on to use it, a quick thumb experiment:
|
||||||
|
|
||||||
|
|
||||||
baremetal > arm-none-eabi-gcc -mthumb -O2 -c fun.c -o fun.o
|
baremetal > arm-none-eabi-gcc -mthumb -O2 -c fun.c -o fun.o
|
||||||
@@ -2567,12 +2577,12 @@ Disassembly of section .text:
|
|||||||
|
|
||||||
So we start in arm, use 0x8011 to swich to thumb mode at address 0x8010
|
So we start in arm, use 0x8011 to swich to thumb mode at address 0x8010
|
||||||
trampoline off to get to 0x801C entering notmain in ARM mode. and we
|
trampoline off to get to 0x801C entering notmain in ARM mode. and we
|
||||||
branch link to another trampoline. this one is not complicated as
|
branch link to another trampoline. This one is not complicated as
|
||||||
we did this ourselves right after _start. load a register with
|
we did this ourselves right after _start. Load a register with
|
||||||
the address orred with one. 0x8017 fed to bx means switch to thumb
|
the address orred with one. 0x8017 fed to bx means switch to thumb
|
||||||
mode and branch to 0x8016 which is our put32 in thumb mode.
|
mode and branch to 0x8016 which is our PUT32 in thumb mode.
|
||||||
|
|
||||||
lets go the other way, put32 in ARM mode called from thumb code
|
lets go the other way, PUT32 in ARM mode called from thumb code
|
||||||
|
|
||||||
|
|
||||||
baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
|
baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
|
||||||
@@ -2620,7 +2630,7 @@ Disassembly of section .text:
|
|||||||
And we did it, this code is broken and will not work. Can you see
|
And we did it, this code is broken and will not work. Can you see
|
||||||
the problem? PUT32 is in ARM mode at address 0x8010. Notmain is
|
the problem? PUT32 is in ARM mode at address 0x8010. Notmain is
|
||||||
thumb code. You cannot use a branch link to get to ARM mode from
|
thumb code. You cannot use a branch link to get to ARM mode from
|
||||||
thumb mode you have to use bx (or blx). the bl 0x8010 will start
|
thumb mode you have to use bx (or blx). The bl 0x8010 will start
|
||||||
executing the code at 0x8010 as if it were thumb instructions, and
|
executing the code at 0x8010 as if it were thumb instructions, and
|
||||||
you might get lucky in this case and survive long enogh to run
|
you might get lucky in this case and survive long enogh to run
|
||||||
into the thumbstart code which in this case puts you right back into
|
into the thumbstart code which in this case puts you right back into
|
||||||
@@ -2630,7 +2640,7 @@ and will cause an undefined instruction exception which if you bothered
|
|||||||
to make an exception handler for you might start to see why the
|
to make an exception handler for you might start to see why the
|
||||||
code doesnt work.
|
code doesnt work.
|
||||||
|
|
||||||
it was very easy to fall into this trap, and very very hard to find
|
It was very easy to fall into this trap, and very very hard to find
|
||||||
out where and why the failure is until you have lived the pain or been
|
out where and why the failure is until you have lived the pain or been
|
||||||
shown where to look. Even with me showing you where to look you may
|
shown where to look. Even with me showing you where to look you may
|
||||||
still end up spending hours or days on this. But as you do know
|
still end up spending hours or days on this. But as you do know
|
||||||
|
|||||||
Reference in New Issue
Block a user