From 9adfc2d3edafabad58691db38a583a5a73e9a648 Mon Sep 17 00:00:00 2001 From: dwelch Date: Tue, 19 Jan 2016 10:56:10 -0500 Subject: [PATCH] more typos fixed in baremetal --- baremetal/README | 108 ++++++++++++++++++++++++++--------------------- 1 file changed, 59 insertions(+), 49 deletions(-) diff --git a/baremetal/README b/baremetal/README index fc58a44..86dbaf3 100644 --- a/baremetal/README +++ b/baremetal/README @@ -1,6 +1,7 @@ this is a rough draft, if/when I complete this draft I will at some point go back through and rework it to improve it. Update: draft 2. I went through almost all of this and cleaned it up. +Update: draft 3. Lots of typos and misspellings that I had missed before THIS IS NOT AN ASSEMBLY LANGUAGE TUTORIAL, IT DOES HAVE A LOT OF ASSEMBLY LANGUAGE IT IT. IF YOU ARE STUCK FOCUSING ON THE ASSEMBLY @@ -881,14 +882,14 @@ orange global variables way above. Data actually is broken up into different segments sometimes, and in particular with the GNU tools. Most of the code out there that has global variables the globals are not defined, not initialized in the code, but the language declares -those are assumed to be zero when you start using them (if you have -not changed them before you used them). So there is a special data -segment called .bss which holds all of our global variables that when -we start running C code should be zero. These are lumped together so -that some code can easily go through that chunk of memory and zero that +those as assumed to be zero when you start using them (if you have +not changed them before you read them). So there is a special data +segment called .bss which holds all of our .data that when we start +running C code should be zero. These are lumped together so that some +code can easily go through that chunk of memory and zero that area before branching to the C entry point. Another segment we may encounter is the .rodata segment. Sometimes even with GNU tools you -may find the read only data in the .text segment. +may find the read only data in the .text segment. For fun lets make one of each: @@ -932,10 +933,10 @@ Well notice that I used -O2 on the gcc command line this means optimization level 2. -O0 or optimizaiton level 0 means no optimization -O1 means some and -O2 is the maximum safe level of optimization using the gcc compiler. There is a -O3 but we are not supposed to trust that -to be as tested as -O2. I am not going to get into that but recommend -you use -O2 often, esp with embedded bare metal where size and speed -are important. I use it here because it produces much less code than -no optimization, you can play with compiling and disassembling these +to be as well tested as -O2. I am not going to get into that but +recommend you use -O2 often, esp with embedded bare metal where size and +speed are important. I use it here because it produces much less code +than no optimization, you can play with compiling and disassembling these things on your own with less or without optimization to see what happens. @@ -1056,7 +1057,8 @@ the value 9 that we pre-initialized. I want to point something out here that is very important for general bare metal programming. What do we have above, something like 12 32 bit numbers which is 12*4 = 48 bytes. So if I make this a true -binary we should see 48 bytes right? Well you would be wrong: +binary (memory image) we should see 48 bytes right? Well you would be +wrong: baremetal > ls -al hello.elf -rwxr-xr-x 1 root root 38002 Sep 23 15:06 hello.elf @@ -1126,7 +1128,7 @@ There are 0x60000000 bytes between these two items, that means the binary file created would at least be 0x60000000 bytes which is 1.6 GigaBytes. If you are like me you probably dont always have 1.6Gig of disk space handy. Much less wanting it to be filled with a -singel file which is mostly zeros. You can start to see the appeal for +single file which is mostly zeros. You can start to see the appeal for these not really a binary binary file formats like elf and ihex and srec. They only define the real data and dont have to hold the zero filler. @@ -1520,7 +1522,7 @@ variable pear now has its own address in memory, it did not get optimized out. I dont expect you to know assembly language but what I want to you to -see is a continuation what we discussed before with respect to the +see is a continuation of what we discussed before with respect to the branch link instruction and the link register. The ARM instruction set uses branch link (bl) to make function calls. The branch means goto or jump or branch the program to some address. The link means @@ -1689,7 +1691,7 @@ runtime=end-start; And this may lead you to believe that this is not the code causing your performance problems. Or hopefully you realize that this code -is executing way to fast and there is something wrong with your +is executing way too fast and there is something wrong with your experiment. Knowing enough assembly code to see what is going on will clue you into the optimization, just like in the notmain() example above. @@ -1705,10 +1707,9 @@ compiler to do what you want or of you have borrowed some code you might have to have GCC do the assembling or linking. Some folks like to put C stuff like defines and comment symbols in their assembler code which works fine if you feed it through gcc, but it is not assembly -code it is some sort of hybrid. Doesnt stop people from doing it, and -when you borrow that code you either have to fix the code or use the C -compiler as an assembler. - +language it is some sort of hybrid. Doesnt stop people from doing it, +and when you borrow that code you either have to fix the code or use the +C compiler as an assembler. bootstrap.s @@ -2004,7 +2005,7 @@ instructions provide some cost and performance benefits for embedded systems. First off you can pack more instructions into the same amount of memory, understanding that it may take more instructions to perform the same task using thumb instructions than it would have using -ARM. My experiements at the time showed about 10-15% more instructions, +ARM. My experiments at the time showed about 10-15% more instructions, but half the memory so that was a fair tradeoff. I know of one platform that went so far as to use 16 bit memory busses, which actually made thumb mode run much faster than ARM mode on that platform. That @@ -2021,7 +2022,13 @@ bits you can have in that register. Note that that lower bit is stripped off it is only used by the bx instruction itself the address in the program counter always has the lower two bits zero for ARM mode (4 byte instructions) and the lower bit zero for -thumb instructions (2 or 4 byte instructions). +thumb instructions (2 or 4 byte instructions). Note the bx/blx +instruction is not the only way to switch modes, sometimes you can +use the pop instruction, but bx works the same way on all ARM +architectures that I know of, the other solutions (pop for example) +vary in if/how they work for switching modes depending on the ARM +architecture in question. So that makes for very unportable code +across ARM if you are not careful. When in doubt just use BX. Here again the goal is not to teach assembly but you may want to get the ARM Architectural Reference Manual for this platform @@ -2054,19 +2061,18 @@ least try. Assembly language in general does not have a standard. A company designs a chip, which means they create an instruction set, binary machine code instructions, and generally they create an assembly language so that they can write down and talk about those -instructions without going insane with confusion and/or pain. And -not always but often if that company actually wants to sell those -processors they create or hire someone to create an assembler and +instructions using mnemonics instead of patterns of ones and zeros. +And not always but often if that company actually wants to sell those +processors, so they create or hire someone to create an assembler and a compiler or few. Assembly language, like C language, has directives that are not actually code like #pragma in C for example you are using that to talk to the compiler not using it as code -necessarily. Assembly has those as well, many of them. The vendor -will often at a minimum use the syntax for the assembly language -instructions in the manual they create or have someone create to -provide to users of this processor they want to sell and if smart -will have the assembler match that manual. But that manual although -you might consider it a standard, is not, the machine code is the -hard and fast standard, the ASCII assembly language is fair game and +necessarily. Assembly has those as well, many of them. It is in the +processor vendors best interest to use the same assembly language +syntax for the instructions in the processor manual in the assembler +that they create or have someone create for them. But that manual +although you might consider it a standard, is not, the machine code is +the hard and fast standard, the ASCII assembly language is fair game and anyone can create their own assembly language for that processor with whatever syntax and directives that they want. ARM has a nice set of compiler tools, or at least when I worked at a place that paid @@ -2084,7 +2090,9 @@ instead of @ because this ; is the proper, almost universal, symbol for a comment in assembly languages from many vendors. This @ is not. Combined like this ;@ and you get code that is commented in both worlds equally. Enough with that rant, this asm code will continue to be GNU -assembler specific I dont know if it works on any other assembler. +assembler specific as that is the toolchain I am using, I dont know if +it works on any other assembler, I keep the directives to a bare +minimum though. Another side effect of thumb and in particular thumb2 is that ARM decided to change their syntax in subtle ways to come up with a unified @@ -2210,16 +2218,16 @@ instructions or at least until I tell you otherwise. the .thumb directive is me telling the assembler otherwise. Start assembling using 16 bit thumb instructions. Yes the bl is actually two separate 16 bit instructions and are documented by ARM as such, but always shown -as a pair in disassembly. +as a pair in disassembly. It is not a 32 bit instruction. The .thumb_func is used to tell the assembler that the label that follows is branch destination for thumb code, when you see this label set the lsbit so that I dont have to play any games to switch or stay in the right mode. You can see that the thumbstart label -is at address 0x8010, but the thumb_start add is 0x8011, the thumbstart +is at address 0x8010, but the thumbstart_add is 0x8011, the thumbstart address with the lsbit set, so that when it hits the bx instruction it tells the processor that we want to be in thumb mode. Note that -bx is used even if you are staying in the same mode, that is the key +bx can be used even if you are staying in the same mode, that is the key to it, if you have used the proper address you dont care what mode you are branching to. You can write code that calls functions and the code making the call can be thumb mode and the code you are @@ -2385,16 +2393,17 @@ address 0x8024, which being a trampoline to bounce off of, that instruction bounces us back to 0x8018 which is the ARM instruction we wanted to get to. this is all good, this code will run properly. + You may or may not know that compilers for a processor follow a "calling convention" or binary interface or whatever term you like. It is a set of rules for generating the code for a function so that you can have functions call functions call functions and any function can return values and the code generated will all work without having to have some secret knowledge into the code for each function calling it. -conform to the calling convention and the code will all work together. +Conform to the calling convention and the code will all work together. Now the conventions are not hard and fast rules any more than assembly -language is a standard for any particular processor. these things -change from time to time in some cases. For the arm, in general across +language is a standard for any particular processor. These things +change from time to time in some cases. For the ARM, in general across the compilers I have used the first four registers r0,r1,r2,r3 are used for passing the first up to 16 bytes worth of parameters, r0 is used for returning things, etc. I find it surprising how often @@ -2424,7 +2433,7 @@ Disassembly of section .text: So what did I just figure out? Well if I had that function in C and used that compiler and linked in that object code it would work with other code created by that compiler, so that object code must follow -the calling convention. what I figured out is from that trivial experiment +the calling convention. What I figured out is from that trivial experiment is that if I want to make a function in assembly code that uses two inputs and one output (unsigned 32 bits each) then the first parameter, a in this case, is passed in r0, the second is passed in r1, and the @@ -2439,14 +2448,15 @@ Disassembly of section .text: 4: 44 00 48 00 l.jr r9 8: e1 64 18 00 l.add r11,r4,r3 -Call me twisted an evil toward you but, what I see here is that -the first parameter is passed in register r3, the second parameter +This is not ARM but some completely different instruction set, and the +compiler for it has a different calling convention. What I see here is +that the first parameter is passed in register r3, the second parameter is passed in r4 and the return value goes back in r11. and it just so happens that the link register is r9. Yes, it is true that I have not yet figured out what registers I can modify without preserving them and what registers I have to -preserve, etc, etc. You can figure that out with these simple experiements +preserve, etc, etc. You can figure that out with these simple experiments with practice. Because sometimes you may think you have found the docment describing the calling convention only to find you have not. And as far as preservation, if in doubt preserve everything but the @@ -2455,8 +2465,8 @@ return registers... So if you have looked at my work you see that I prefer to perform singular memory accesses using hand written assembly routines like PUT32 and GET32. Not going to say why here and now, I have mentioned -it elsewhere and it doesnt matter for this discussion. Moving on, lets -do a quick thumb experiment: +it elsewhere and it doesnt matter for this discussion. Lets accept +it and move on to use it, a quick thumb experiment: baremetal > arm-none-eabi-gcc -mthumb -O2 -c fun.c -o fun.o @@ -2567,12 +2577,12 @@ Disassembly of section .text: So we start in arm, use 0x8011 to swich to thumb mode at address 0x8010 trampoline off to get to 0x801C entering notmain in ARM mode. and we -branch link to another trampoline. this one is not complicated as -we did this ourselves right after _start. load a register with +branch link to another trampoline. This one is not complicated as +we did this ourselves right after _start. Load a register with the address orred with one. 0x8017 fed to bx means switch to thumb -mode and branch to 0x8016 which is our put32 in thumb mode. +mode and branch to 0x8016 which is our PUT32 in thumb mode. -lets go the other way, put32 in ARM mode called from thumb code +lets go the other way, PUT32 in ARM mode called from thumb code baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o @@ -2620,7 +2630,7 @@ Disassembly of section .text: And we did it, this code is broken and will not work. Can you see the problem? PUT32 is in ARM mode at address 0x8010. Notmain is thumb code. You cannot use a branch link to get to ARM mode from -thumb mode you have to use bx (or blx). the bl 0x8010 will start +thumb mode you have to use bx (or blx). The bl 0x8010 will start executing the code at 0x8010 as if it were thumb instructions, and you might get lucky in this case and survive long enogh to run into the thumbstart code which in this case puts you right back into @@ -2630,7 +2640,7 @@ and will cause an undefined instruction exception which if you bothered to make an exception handler for you might start to see why the code doesnt work. -it was very easy to fall into this trap, and very very hard to find +It was very easy to fall into this trap, and very very hard to find out where and why the failure is until you have lived the pain or been shown where to look. Even with me showing you where to look you may still end up spending hours or days on this. But as you do know