adding the beginnings of learning assembly from C
This commit is contained in:
474
learnasmfromc/LAFC.txt
Normal file
474
learnasmfromc/LAFC.txt
Normal file
@@ -0,0 +1,474 @@
|
||||
|
||||
Learn Assembly Langauge from C
|
||||
|
||||
This is an attempt to learn assembly language by compiling simple C
|
||||
code segments and analyzing what is going on.
|
||||
|
||||
You will want to go to http://infocenter.arm.com. Along the left side
|
||||
expand ARM architecture. Then expand Reference Manuals then click on
|
||||
ARMv5 Reference Manual
|
||||
|
||||
Then in the right side of the page, low, center, click on PDF Version
|
||||
|
||||
These may or may not be direct links, if not follow the instructions
|
||||
above.
|
||||
|
||||
Reference Manuals
|
||||
http://infocenter.arm.com/help/topic/com.arm.doc.set.architecture/index.html
|
||||
|
||||
ARMv5 Reference Manual
|
||||
http://infocenter.arm.com/help/topic/com.arm.doc.subset.architecture.reference/index.html#v5
|
||||
|
||||
This might be a direct link to the pdf.
|
||||
https://silver.arm.com/download/download.tm?pv=1073121
|
||||
|
||||
You may need to create a user name and password, it only costs you
|
||||
an email address...
|
||||
|
||||
I know that the Raspberry Pi uses an ARMv6. The document they now
|
||||
call the ARMv5 Architectural Reference Manual, is a direct derivative
|
||||
of what was simply called the ARM ARM (ARM Architectural Reference
|
||||
Manual). But it probably became too complicated to try to
|
||||
cover all the architecture variations, so they just started making
|
||||
new manuals for each and this one stopped here. It is still a good
|
||||
starting point, includes the classic 32 bit instructions and the
|
||||
original 16 bit thumb instructions. A good place to build a foundation
|
||||
for the ARM instruction set.
|
||||
|
||||
I have included a copy of the build_arm script that I use to download
|
||||
and build a GNU based ARM toolchain from sources. This is the toolchain
|
||||
I use for my projects. I maintain this script in a different github
|
||||
repo, build_gcc, so this is just a copy and may get stale the real one
|
||||
I maintain for myself is in my build_gcc repo. If you are running on
|
||||
Windows, I have stopped doing that myself and have stopped trying to
|
||||
maintain a Windows build script. I preferred mingw to cygwin, but it
|
||||
was possible on both, even better just download one of the many out
|
||||
there.
|
||||
|
||||
Linux or Windows you can get a pre-built GNU toolchain here.
|
||||
|
||||
https://launchpad.net/gcc-arm-embedded
|
||||
|
||||
which should work well enough, or
|
||||
|
||||
go here
|
||||
http://www.mentor.com/embedded-software/sourcery-tools/sourcery-codebench/editions/lite-edition/
|
||||
Under ARM Processors select the Download the EABI Release link then fill
|
||||
in your email/whatever. They will email you a link to download the
|
||||
Linux or Windows version. This formery-codesourcery-now-mentor-graphics
|
||||
lite version is more complete than the one I use.
|
||||
|
||||
Now we can start...We have to master the compiler first it is quite
|
||||
easy to let the optimizer in the compiler remove your code, for example
|
||||
|
||||
|
||||
void fun ( void )
|
||||
{
|
||||
unsigned int a;
|
||||
unsigned int b;
|
||||
unsigned int c;
|
||||
a = 5;
|
||||
b = 7;
|
||||
c = a+b;
|
||||
}
|
||||
|
||||
Assuming you have your toolchain in place this is how we are going to
|
||||
learn asm for every example:
|
||||
|
||||
arm-none-eabi-gcc -O2 test.c -c -o test.o
|
||||
arm-none-eabi-objdump -D test.o
|
||||
|
||||
you may have to adjust the prefix to -gcc and -objdump
|
||||
arm-none-linux-gnueabi or arm-elf or whatever...it should all work fine
|
||||
|
||||
------ cut -------
|
||||
test.o: file format elf32-littlearm
|
||||
|
||||
|
||||
Disassembly of section .comment:
|
||||
|
||||
00000000 <.comment>:
|
||||
0: 43434700 movtmi r4, #14080
|
||||
------ cut -------
|
||||
|
||||
Here is the problem. There is no .text. The stuff it did disassemble
|
||||
wasnt really code it was some text and the disassembler just chewed on
|
||||
it anyway. What happened is that function DOES NOTHING. Think about
|
||||
it there are no inputs, there are no outputs, it calls no functions,
|
||||
the math it does means nothing because it is sent nowhere.
|
||||
|
||||
So lets try this
|
||||
|
||||
unsigned int fun ( void )
|
||||
{
|
||||
unsigned int a;
|
||||
unsigned int b;
|
||||
unsigned int c;
|
||||
a = 5;
|
||||
b = 7;
|
||||
c = a+b;
|
||||
return(c);
|
||||
}
|
||||
|
||||
run those same two commands
|
||||
|
||||
Disassembly of section .text:
|
||||
|
||||
00000000 <fun>:
|
||||
0: e3a0000c mov r0, #12
|
||||
4: e12fff1e bx lr
|
||||
|
||||
So we can learn a little here, but it wasnt really what we wanted, the
|
||||
addition was removed. the optimizer knew that we were simply adding
|
||||
5+7=12 so it just moves the answer 12 into register r0 and the function
|
||||
returns.
|
||||
|
||||
So in the ARM ARM there is a chapter titled ARM Instructions, under that
|
||||
Aphabetical list of ARM instructions, and under that each instruction
|
||||
has its own subsection.
|
||||
|
||||
So we start with MOV.
|
||||
|
||||
We a drawing thing and then some Syntax
|
||||
|
||||
MOV{<cond>}{S} <Rd>, <shifter_operand>
|
||||
|
||||
As with other things you are by now used to {these brackets} mean
|
||||
optional. Rd in this case means the destination register and
|
||||
shifter_operand we have to dig deeper.
|
||||
|
||||
The condition field the S bit we will get to that later.
|
||||
|
||||
Most processors use "registers". If you look that word up in the
|
||||
dictionary it talks about a book in which records of acts, events, names,
|
||||
etc., are kept. Or variations on that type of a register. The word
|
||||
here is not really incorrectly used. It is not a book but it is a place
|
||||
where we keep information, bits. And most processors have many of them
|
||||
some only one or two some hundreds, usually in the 4, 8, 16, or 32 range
|
||||
is typical. The ARM as far as we are concerned has 16 (the new 64 bit
|
||||
ARM which has a different instruction set has 32).
|
||||
|
||||
Now when I talk about some processors I dont mean some ARM processors
|
||||
have this and some ARM processors have that. What I mean is that
|
||||
different processor archtectures are different. An Intel x86 processor
|
||||
is different from an ARM, is different from a MIPS, is different from
|
||||
a Power PC and so on. There are many many different processor architectures
|
||||
designed and sold by many different companies. Once you learn one
|
||||
instruction set (assembly language) it is not hard to learn a second or
|
||||
third and so on. They are more similar than they are different. yes
|
||||
this does mean that ARM processors are different from x86, they are not
|
||||
compatible, you cant directly run the code compiled for one on the other.
|
||||
|
||||
So registers, we use them here, there are 16 of them in this ARM. In
|
||||
C programs we have variables we can make as many of them as we want
|
||||
(within reason) and call them what we want. We are going to stick with
|
||||
their proper names for the most part.
|
||||
|
||||
Registers in the ARM instrucition set are mostly general purpose meaning
|
||||
one is the same as the other, they dont have special features or powers.
|
||||
BUT...There are a few of them that have special features or powers, in
|
||||
that they are tied to some instructions. r0-r12 are general purpose,
|
||||
nothing special about them. r13 is also known as the stack pointer,
|
||||
to be talked about later. For now it is really general purpose but
|
||||
is commonly used as the stack pointer so we will assume it has that
|
||||
special property. r14 is also known as the link register or lr. If you
|
||||
think about it when we call a function in C
|
||||
|
||||
a=7;
|
||||
b=5;
|
||||
printf("blah");
|
||||
c=b+a;
|
||||
|
||||
We know enough at the C level that the call to printf() means our program
|
||||
changes path, runs through all the code in printf(), then comes back
|
||||
to the line after we called printf.
|
||||
|
||||
Assembly works the same way but of course much simpler, lower level.
|
||||
We call a function with the bl instruction which you can look up.
|
||||
Branch and Link. We have to talk about r15 for a second first. R15
|
||||
in this ARM is the program counter or PC. It is the register that keeps
|
||||
track of where we are in our program. It keeps the address of the
|
||||
instructions we are fetching and executing. Thinking in C in the code
|
||||
above the pc would be the line number perhaps, keeping track of where
|
||||
we are. Now when you call a function that you expect to return from
|
||||
you need to do two things. You need to save the address of the
|
||||
line/instruction after your branch, and you need to then branch to the
|
||||
code where the function you are calling lives. (branch, jump, goto, all
|
||||
the same thing)
|
||||
|
||||
So to return from a function we need to put the address of the instruction
|
||||
after the call back in the pc so that the pc branches back to where
|
||||
we were before the function call. Basically the end of printf() needs
|
||||
to point the pc back to the c=b+a; line. Now since you can call printf()
|
||||
from a zillion different places you cant hardcode that return address
|
||||
in printf, it has to be more flexiby. So r14, a.k.a lr is used.
|
||||
When we call a function lr will contain the return address.
|
||||
|
||||
|
||||
00000000 <fun>:
|
||||
0: e3a0000c mov r0, #12
|
||||
4: e12fff1e bx lr
|
||||
|
||||
So our program is moving the value 12 into r0 and then returning back
|
||||
to the calling functions return address in lr. the old way to do the
|
||||
return was
|
||||
|
||||
mov pc,lr
|
||||
|
||||
And for what we are doing for now that is fine. But then ARM created
|
||||
this thumb instruction set thing where the instructions are 16 bits
|
||||
instead of 32, basically a completely different instruction set, and
|
||||
to bounce between thumb and arm mode you use the BX instruction, you can
|
||||
look that up in your manual. it may be a little confusing in the manual
|
||||
depending on how they worded it. And some of the manuals are kinda wrong,
|
||||
you may already know that there is no perfect programmers reference manual
|
||||
ARM is no different. We may run across some.
|
||||
|
||||
The traditional ARM instructions are 32 bits wide, and as a rule they
|
||||
must be on aligned addresses, basically a multiple of 4 bytes, so
|
||||
0x0, 0x4, 0x8, 0xC, 0x10, and so on. The lower two address bits must
|
||||
be a zero for ARM instructions. the traditional thumb instructions
|
||||
are 16 bits wide and must be aligned so the lower bit is always zero,
|
||||
0x0, 0x2, 0x4, 0x6, 0x8, and so on. What they did is for the BX
|
||||
instruciton if the register you give it, the lr in this case, has an
|
||||
lsbit of 1 then the processor knows that is a thumb instruciton, it
|
||||
strips that lsbit off (makes it a zero) and starts fetching thumb
|
||||
instructions at that address. If the register specified in the bx
|
||||
instruciton contains an address with the lsbit of zero, then the bx
|
||||
instruction puts that value in the PC and starts fetching ARM instructions
|
||||
in ARM mode. The beauty of this is the bl instruction does the complement
|
||||
if you were in thumb mode then the lr is loaded with the return address|1
|
||||
the lsbit is set. If the bl happens in arm mode then the lsbit is
|
||||
not set in the lr.
|
||||
|
||||
The mov pc,lr instruction simply moves the value in lr into the value
|
||||
in pc, these are shortcut names you can also write
|
||||
|
||||
mov r15,r14
|
||||
|
||||
Some folks would call this intel syntax (vs att), this is an ARM it
|
||||
is neither an intel or att or anything else. I prefer this style where
|
||||
the destination is on the left (with an exception of course). Replace
|
||||
that comma with an equals sign
|
||||
|
||||
mov r15=r14
|
||||
|
||||
when you read that code, I am putting the thing on the right into the
|
||||
thing on the left.
|
||||
|
||||
The way the ARM processors work the mov instruction does not work like
|
||||
the bx it simply copies the registers, if you were in thumb mode lets
|
||||
say and you called a function in ARM mode and did a mov pc,lr you would
|
||||
make the processor very upset because it wont fetch an instrucition
|
||||
at an unaligned address (more on aligned and unaligned later).
|
||||
|
||||
So our little two line program which didnt do what we wanted was still
|
||||
quite the talking point.
|
||||
|
||||
00000000 <fun>:
|
||||
0: e3a0000c mov r0, #12
|
||||
4: e12fff1e bx lr
|
||||
|
||||
One more thing and we are done with this one. The #12 on the right there
|
||||
as we saw above think of the comma as an equals r0=#12. The # is just
|
||||
a syntax thing to help the assembler parse our code just like brackets
|
||||
and semicolons and such are used in C to help the parser and the human
|
||||
keep track of things.
|
||||
|
||||
Now forgetting about thumb for now, the ARM instruction set is know as
|
||||
a fixed length instruction set. All of the ARM instructions are
|
||||
32 bits, no more, no less. Other instruction sets like the Intel x86
|
||||
are variable length instruction sets. You can have instructions as
|
||||
small as one byte and some that are many bytes long. There are pros
|
||||
and cons to each approach. One of the cons to having fixed length
|
||||
instructions, and worse the length of the instruction is the size of
|
||||
a register. Well explain how you would encode 0xABCD1234 into a single
|
||||
instruciton
|
||||
|
||||
mov r0,#0xABCD1234
|
||||
|
||||
and have some other bits there to tell the processor this is a mov and
|
||||
the destination is r0? Answer is you cant. ARMs approach is confusing
|
||||
at first, but in this case the value 12 fits in the bits we have. so
|
||||
mov r0,#12
|
||||
fits in a single instruction.
|
||||
|
||||
This number at the end there is called an immediate. That bit pattern
|
||||
for that 12 is encoded in the instruciton or immediate vicinity if you
|
||||
will.
|
||||
|
||||
Lets make the compiler deal with an immediate that ARM cannot encode
|
||||
in a single instruction. since I happen to know how ARM does this I
|
||||
can pick one at will...
|
||||
|
||||
unsigned int fun ( void )
|
||||
{
|
||||
unsigned int a;
|
||||
unsigned int b;
|
||||
unsigned int c;
|
||||
a = 0x1200;
|
||||
b = 0x0034;
|
||||
c = a+b;
|
||||
return(c);
|
||||
}
|
||||
|
||||
|
||||
00000000 <fun>:
|
||||
0: e59f0000 ldr r0, [pc] ; 8 <fun+0x8>
|
||||
4: e12fff1e bx lr
|
||||
8: 00001234 andeq r1, r0, r4, lsr
|
||||
|
||||
You should know this, but in case you dont. Not all compilers produde
|
||||
the same machine/assembly code from the same high level language.
|
||||
You might actually get a different answer here than I do depending
|
||||
on your compiler. And I dont mean that gcc vs clang vs borland vs
|
||||
microsoft. You can easily have gcc produce things different ways
|
||||
depending on the command line settings or the version of gcc you are using
|
||||
and so on. So just becuase I happen to get these results for this
|
||||
code today doesnt mean you will, you just have to roll with what my
|
||||
compiler is producing and then figure out what yours is later.
|
||||
|
||||
What they did here is know that they couldnt encode a
|
||||
|
||||
mov r0,#0x1234
|
||||
|
||||
into a single instruction, that is invalid it will complain if you try.
|
||||
so they put that 32 bit number 0x00001234 in some memory location somewhere
|
||||
then they said read this 32 bit thing from memory and put all 32 bits in
|
||||
r0. With that technique they can have any 32 bit pattern they want.
|
||||
|
||||
A beauty of a fixed length instruction set is that you can be lazy with
|
||||
your disassembler, you can assume everything is an instruction and
|
||||
just disassemble it. So the andeq r1 stuff is not real it is not
|
||||
an instruction that is our 0x0001234 data. If you happened to have
|
||||
that andeq instruciton just like that then the machine code would
|
||||
be 0x00001234. So sometimes with these ARM disassemblers you have
|
||||
to just know which is instructions and which is data.
|
||||
|
||||
Now the ldr r0,[pc] that is a real instruction. Ldr means load register
|
||||
or load into a register the value at some address. The address for
|
||||
sytax parsing and human readable purposes is in [brackets]. And in
|
||||
this case it is the program counter. So get the address that is in
|
||||
the program counter, read from memory at that address, and place that
|
||||
value in r0.
|
||||
|
||||
Now you should be asking, but isnt the pc the address of our instruction
|
||||
should that load 0xe59f0000 instead of 0x00001234. Well this is one
|
||||
of those pipeline things you may have heard of. These days the pipe
|
||||
is actually deeper and for reverse compatibility we just happen to know
|
||||
the rule. For ARM the rule is whenever you use the pc in an instruction
|
||||
it points two instructions ahead, or it points at the address after
|
||||
the next instruciton. In this case the bx lr is the next instruction
|
||||
so while we are in the instruction at address 0 the pc is pointing at
|
||||
address 8, the pc contains 0x00000008, two ahead. So this is actually
|
||||
loading from memory at address 0x00000008 which is the value 0x00001234
|
||||
and puts that 0x00001234 into r0.
|
||||
|
||||
Moving on.
|
||||
|
||||
The problem with this code from a "how do I see an add" perspective
|
||||
|
||||
unsigned int fun ( unsigned int a, unsigned int b )
|
||||
{
|
||||
return(a+b);
|
||||
}
|
||||
|
||||
is that we have told the compiler what the inputs to the addition is
|
||||
and the compiler can then do that addition for us at compile time
|
||||
instead of runtime. So if we want to see the compiler generate
|
||||
an add operation then we have to hide the operands from it by making
|
||||
them inputs to this function. We also need this function to do something
|
||||
so we have to return something as well and to see that add this function
|
||||
has to return the addition or something derived from it otherwise the
|
||||
addition surves no purpose and will be removed as dead code.
|
||||
|
||||
So the above generates
|
||||
|
||||
00000000 <fun>:
|
||||
0: e0800001 add r0, r0, r1
|
||||
4: e12fff1e bx lr
|
||||
|
||||
Okay, should have said something by now but here goes. Compilers use
|
||||
a calling convention in order to manage the code being generated. It
|
||||
is up to the compiler at the end of the day what that convention is.
|
||||
Some processor families will try to encourage or dictate the calling
|
||||
convention, assuming they know more about their processor and how
|
||||
it interacts with compiled code. Sometimes not sometimes the same
|
||||
processor may have different conventions from different compilers or
|
||||
versions of compilers. Naturally objects made with different conventions
|
||||
wont necessarily link together and run.
|
||||
|
||||
Calling convetion by this or other terms, is a list of rules if you will
|
||||
for knowing where to find the inputs to a function, where to place the
|
||||
output, in some cases where to find the return address and so on.
|
||||
|
||||
In the case of ARM for these simple 32 bit variables the first variable
|
||||
a in this case will always be in the r0 register, the second in r1 and
|
||||
so on up to r3. Then after four (r0,r1,r2,r3) registers are used the
|
||||
stack holds the rest. We will get to the stack later, and we may get
|
||||
to more compilcated situations where the r0-r3 gets more confusing.
|
||||
|
||||
For the time being the function parameters are in r0,r1,r2...the
|
||||
compiler can assume this when it compiles a function. You may have
|
||||
noticed we are not compiling an entire program we are only compiling
|
||||
one function into on object. yet the compiler knows where
|
||||
the operands are because it always uses the same set of rules. In this
|
||||
case a comes into the function in r0, b into the function in r1.
|
||||
|
||||
This add operation like many of the arm math operations can be read
|
||||
by your mind this way
|
||||
|
||||
add r0,r0,r1 ; the syntax
|
||||
r0=r0+r1 ; what you should see/think
|
||||
|
||||
Now I left this out before but these functions are returning something
|
||||
as well. The calling convention so much as we need to know for now
|
||||
puts the return value in r0. Just like we know all functions by this
|
||||
compiler will do the same thing for placing operands in registers before
|
||||
calling our function, we will place the return value in a know place
|
||||
so the function we return to can find that return value.
|
||||
|
||||
So we have successfully prevented the assembler from optimizing out our
|
||||
addition as dead code by hiding the inputs and forcing the result as
|
||||
an output.
|
||||
|
||||
lets get slightly more complicated.
|
||||
|
||||
unsigned int fun ( unsigned int a, unsigned int b )
|
||||
{
|
||||
return(a+b+7);
|
||||
}
|
||||
|
||||
00000000 <fun>:
|
||||
0: e2811007 add r1, r1, #7
|
||||
4: e0810000 add r0, r1, r0
|
||||
8: e12fff1e bx lr
|
||||
|
||||
|
||||
So the compiler chose to add 7 to b (b is held in r1 here) and then
|
||||
add b+1. We know that a+b = b+a so why the compiler did it that way
|
||||
r0,r1,r0 instead of r0,r0,r1 we may never know. It works.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
42
learnasmfromc/build_arm
Executable file
42
learnasmfromc/build_arm
Executable file
@@ -0,0 +1,42 @@
|
||||
|
||||
# Usage
|
||||
# sudo ./build_arm
|
||||
|
||||
# Setup vars
|
||||
export TARGET=arm-none-eabi
|
||||
export PREFIX=/opt/gnuarm
|
||||
export PATH=$PATH:$PREFIX/bin
|
||||
export JN
|
||||
export JN='-j 8'
|
||||
|
||||
rm -rf build-*
|
||||
rm -rf gcc-*
|
||||
rm -rf binutils-*
|
||||
|
||||
# Get archives
|
||||
wget http://ftp.gnu.org/gnu/binutils/binutils-2.23.2.tar.bz2
|
||||
wget http://ftp.gnu.org/gnu/gcc/gcc-4.8.2/gcc-4.8.2.tar.bz2
|
||||
|
||||
# Extract archives
|
||||
bzip2 -dc binutils-2.23.2.tar.bz2 | tar -xf -
|
||||
bzip2 -dc gcc-4.8.2.tar.bz2 | tar -xf -
|
||||
|
||||
# Build binutils
|
||||
mkdir build-binutils
|
||||
cd build-binutils
|
||||
../binutils-2.23.2/configure --target=$TARGET --prefix=$PREFIX
|
||||
echo "MAKEINFO = :" >> Makefile
|
||||
make $JN all
|
||||
sudo make install
|
||||
|
||||
# Build GCC
|
||||
mkdir ../build-gcc
|
||||
cd ../build-gcc
|
||||
../gcc-4.8.2/configure --target=$TARGET --prefix=$PREFIX --without-headers --with-newlib --with-gnu-as --with-gnu-ld --enable-languages='c'
|
||||
make $JN all-gcc
|
||||
sudo make install-gcc
|
||||
|
||||
# Build libgcc.a
|
||||
make $JN all-target-libgcc CFLAGS_FOR_TARGET="-g -O2"
|
||||
sudo make install-target-libgcc
|
||||
|
||||
5
learnasmfromc/test.c
Normal file
5
learnasmfromc/test.c
Normal file
@@ -0,0 +1,5 @@
|
||||
|
||||
unsigned int fun ( unsigned int a, unsigned int b )
|
||||
{
|
||||
return(a+b+7);
|
||||
}
|
||||
Reference in New Issue
Block a user