From 098d730b68330c333695e36d3a9dd5d0cae1dd65 Mon Sep 17 00:00:00 2001
From: dwelch67 <dwelch@dwelch.com>
Date: Thu, 17 Apr 2014 15:23:37 -0400
Subject: [PATCH] baremetal draft 2

---
 baremetal/README | 1140 +++++++++++++++++++++++++---------------------
 1 file changed, 614 insertions(+), 526 deletions(-)

diff --git a/baremetal/README b/baremetal/README
index 36bdcbd..4f9e274 100644
--- a/baremetal/README
+++ b/baremetal/README
@@ -1,5 +1,17 @@
 this is a rough draft, if/when I complete this draft I will at some point
 go back through and rework it to improve it.
+Update: draft 2.  I went through almost all of this and cleaned it up.
+
+THIS IS NOT AN ASSEMBLY LANGUAGE TUTORIAL, IT DOES HAVE A LOT OF
+ASSEMBLY LANGUAGE IT IT.  IF YOU ARE STUCK FOCUSING ON THE ASSEMBLY
+LANGUAGE YOU ARE MISSING OUT, THE FOCUS IS CONTROLLING THE TOOLS SO
+THAT THINGS ARE PLACED WHERE WE WANT THEM TO BE PLACED SO THE PROCESSOR
+BOOTS RIGHT AND LAUNCHES OUR C PROGRAM, AND SO OUR C FUNCTIONS CAN
+CALL OTHER C FUNCTIONS.  ASSEMBLY LANGUAGE KNOWLEDGE IS NOT REQUIRED
+FOR THIS TUTORIAL.  ASSEMBLY LANGUAGE KNOWLEDGE IS NOT REQUIRED FOR
+THIS TUTORIAL.  ASSEMBLY LANGUAGE KNOWLEDGE IS NOT REQUIRED FOR THIS
+TUTORIAL.
+
 
 
 
@@ -16,51 +28,48 @@ basics using the Raspberry Pi.
 First and foremost, what is bare metal programming?  You are going to
 get different answers to that question from people who say they are
 bare metal programmers.  I would say most of them are right despite the
-difference of opinion.
+difference of opinion on specific details.
 
 To try to generalize my opinion of this I would start by saying that
 bare metal programming means you are talking to the hardware directly,
 bypassing an operating system, or certainly if you have no real/formal
 operating system running.  Processors/computers do not require operating
 systems to run.  Operating systems are just programs anyway themselves
-perhaps being considered bare metal programming.  You start by understanding
-how the processor boots, how and where it loads and executes its
-first instruciton, and then making programs that fit that model, placing
-the first instruction of your program such that the processor executes
-it when it boots.
+perhaps being considered bare metal programming.
+
+
+To begin bare metal programming you start by understanding how the
+processor boots, how and where it loads and executes its first
+instruciton, and then making programs that fit that model, placing the
+first instruction of your program such that the processor executes it
+when it boots.
 
 The second generalization I will make is that with bare metal programming
 you are often programming registers and memory for peripherals directly.
-For example printf() is not bare metal, way to many layers of stuff
-often landing in system calls which are often tied to an operating system.
-That doesnt mean you cant rig up a printf that works in a bare metal
-environment, but it does contradict the concept of bare metal.  This
-of course is a gray area for the definition.  For example if you wanted
-to read items off of or write things to the sd card, using a filesystem
-most programmers even if they create all the code from scratch are going
-to end up with some sort of layered approach, at one end is low level
-bare metal talking to registers that wiggle things on a bus somewhere
-on the other end some sort of open file or create file, read file, close
-file, etc.  Being your own creation it doesnt have to conform to any
-other file function call standard fopen(), fclose(), etc.  So what
-happens when one person writes some bare metal code, no operating system
-involved, that can open, read, write, close files on the sd card on
-the raspberry pi, then shares that code?  Does it lose its bare metal
-status?  Tough question.  I would say no, but at the same time if you
-look around at most of my public work I am trying to teach how to
-use some of the peripherals in a device by programming them directly,
-I am usually not interested in borrowing other chunks of code, I am
-personally not interested in making some robot or whatever that performs
-a task, I want to turn on an led, find out how to program the uart
-directly so that it works, etc.
+For example printf() is not bare metal, there areway to many layers of
+stuff often landing in system calls which are often tied to an operating
+system.  That doesnt mean you cant rig up a printf that works in a bare
+metal environment, but it does contradict the concept of bare metal.
+This of course is a gray area for the definition.  For example if you
+wanted to read items off of or write things to the sd card, using a
+filesystem most programmers even if they create all the code from
+scratch are going to end up with some sort of layered approach, at one
+end is low level bare metal talking to registers that wiggle things on a
+bus somewhere on the other end some sort of open file or create file,
+read file, close file, etc.  Being your own creation it doesnt have to
+conform to any other file function call standard fopen(), fclose(),
+etc.  So what happens when one person writes some bare metal code, no
+operating system involved, that can open, read, write, close files on
+the sd card on the raspberry pi, then shares that code?  Is that bare
+metal? Tough question.
 
 I have seen some folks argue that you are not bare metal if you are
 not writing in assembly.  I would argue back maybe you are not bare
-metal if you are not writing machine code directly.  I keep my bare
-metal definition to no operating system (unless the operating system
-IS the bare metal program you are writing) and programming peripherals,
-etc, directly from your program.  Or at least not through some system
-calls in a rom monitor/debugger nor an operating system.
+metal if you are not writing machine code.  I keep my bare metal
+definition to no operating system (unless the operating system IS the
+bare metal program you are writing) and programming peripherals,
+etc, directly from your program, or through libraries but not through
+an operating system.
 
 To continue this tutorial you are going to be exposed to my personal
 preferences which are not a bare metal thing in general but my personal
@@ -70,50 +79,53 @@ manuals and other things and am trying to share some of those experiences
 at the same time when I had been around the block fewer times I was that
 person that refused to take someone elses code as is.  I always had to
 rewrite it myself before even trying it.  What I have learned since is
-that unless the other persons programming environment or tools or whatever
-are not so painful to get up and running, you should make an attempt to
-use their environment with their code the way they do it.  For these
-kinds of things that you have not learned and dont know how to do but
-the author appears to know how to do.  THEN, start to make that code
-your own.  Eventually if you are like me, completely replacing all of
-it including the environment.  Other than the potential pain of trying
-to get their environment up and running, this path of just trying it
-their way then re-inventing the wheel to make it your own, will have
-greater success sooner and less frustration.
+that unless the other persons programming environment or tools or
+whatever are not so painful to get up and running, you should make an
+attempt to use their environment with their code the way they do it.
+For these kinds of things that you have not learned and dont know how
+to do but the author appears to know how to do.  THEN, start to make
+that code your own.  Eventually if you are like me, completely replacing
+all of it including the environment.  Other than the potential pain of
+trying to get their environment up and running, this path of just
+trying it their way then re-inventing the wheel to make it your own,
+will have greater success sooner and less frustration.
 
 I assume you are running linux.  The things I am doing here for the
-most part can be done easily in Windows or on a mac, but I am not going
+most part can be done easily in Windows or on a MAC, but I am not going
 to get into explaining certain things three times or N times to cover
 all the possible operating system variations.  I tend to run a 64
-bit linux, often a bit older as I hated what Ubuntu did and gnome, but
-since linux mint fixed some of those ubuntu/gnome problems I am a bit
-closer to the most current releases.  I have a number of computers
-or laptops that I develop on and not all run the same distro or version.
-For the most part the focus will be on using the gnu tools (binutils
-and gcc) and other than forward slashes vs backslashes in path names
-there should be nothing operating system specific about this discussion.
+bit linux, I switched from Ubuntu to Linux Mint when the post gnome 2
+disaster happened.  Linux Mint has worked to salvage the linux desktop
+for everyone else and I am using Mint now.  I do have a number of
+computers or laptops that I develop on and not all run the same distro
+or version.  For the most part the focus will be on using the gnu tools
+(binutils and gcc) and other than forward slashes vs backslashes in
+path names there should be nothing operating system specific about
+this discussion.
 
 So as soon as we say no operating system, we open a big can of worms.
 That is as big a problem as the fear of programming peripherals directly,
-perhaps the biggest problem of bare metal programming.  Why is it a problem?
-Well lets think about the classic hello world C program and maybe what
-you do or dont realize is going on.  In some way shape or form you have
-installed a C compiler on your computer, and they tell you how to
-compile your first hello world program and it works.  One or a few
-includes, the main() function and a single printf() call.  Well there
-is a HUGE amount of stuff behind that program, it is not one trivial
-line of code.  A myriad of C libraries required, math libraries, etc
-all to support the uber generic printf function and whatever format
+perhaps the biggest problem of bare metal programming.  Why is it a
+problem?  Well lets think about the classic hello world C program and
+maybe what you do or dont realize is going on.  In some way shape or
+form you have installed a C compiler on your computer, and they tell
+you how to compile your first hello world program and it works.  One or
+a few includes, the main() function and a single printf() call.  Well
+there is a HUGE amount of stuff behind that program, it is not one
+trivial line of code.  A myriad of C libraries required, math libraries,
+etc all to support the uber generic printf function and whatever format
 string you might send to it.  That is just scratching the surface
 the C libraries that are linked in, a number of them have an intimate
 relationship with the operating system.  The C libraries nor printf
 code itself handles the console directly, it makes calls to the operating
 system and its myriad of drivers that ultimately illuminate pixels on
-the screen.  When you go bare metal YOU have to do all of this, a
-hello world printf() program is NOT your first bare metal program.
+the screen.  When you go bare metal YOU have to do all of this, a hello
+world printf() program should NOT be your first bare metal program.
 Generally your first bare metal program is turning an led on and off
 assuming the hardware folks have provided an led you can turn on and
-off with software (usually a good idea for them to do that).
+off with software (usually a good idea for them to do that).  Later a
+uart with individual characters then later a string, but a formatted
+string, perhaps never.
 
 Note this discussion is limited to assembly language and C.  This is one
 of those personal preference things.  In my opinion if you want to be
@@ -123,115 +135,151 @@ into your C program and perhaps support interrupts or other exceptions.
 You should work to make your C programming strong though.
 
 Another one of my simplifications in life is I try to avoid C library
-calls in my bare metal C programs and even worse I try to avoid
-compiler specific library calls, we will see what that means in a bit.
+calls in my bare metal C programs and further I try to avoid compiler
+specific library calls, we will see what that means in a bit.
 
-So when we write programs using our C compiler that run on the same
-computer that we are writing and compiling the programs on, means the
-compiler itself is made up of instructions native to that processor
-and is creating programs using instructions native to that processor.
-The raspberry pi uses an ARM processor, most computers out there (I
-include laptops when I say computers in this context) are running
-intel chips using some flavor of the x86 instruction set.  ARM is
-a completely separate company from intel and their processors use a
-completely different and incompatible in any way instruction set.  So
-there is a good chance you need a cross compiler.  A cross compiler
-loosely means you are crossing over a boundary from one processor
-to another.  In this case a compiler that is made up of x86 instructions
-that is creating programs that use ARM instructions.  And then it gets
-worse than that there are a myriad of C compilers out there some
-only run on certain operating systems, some or more flexible, some can
-be made to be cross compilers, some cannot.  Some are easy to turn into
-a cross compiler, some are not.  This tutorial is going to focus
-primarily on the gnu toolchain, which is one of those that can be used
-as a cross compiler but is not trivial to make it a cross compiler.
+A C compiler is just a program that takes an input and produces an
+output.  That program is compiled to run on a particular computer, my
+computer.  That compiler's job is to create other programs that will
+also run natively on my computer.  The raspberry pi uses an ARM
+processor, most computers out there (servers, desktops and laptops) are
+running some flavor of the x86 instruction set, generally Intel or AMD
+chips.  ARM is a completely separate company from intel and  AMD and
+their processors use a completely different and incompatible in any
+way instruction set.  On a side note Intel and AMD make chips, ARM does
+not make chips it just sells its processor designs to people who make
+chips.  It is quite possible to use a compiler on my computer to
+generate a program that runs on an ARM processor.  A general term for
+a compiler that runs on one computer but produces output (instructions)
+that are for another computer/instruction set is called a cross compiler.
+Just because a compiler is open source doesnt mean that that compiler
+can be made to be a cross compiler.  Some/many compilers in history are
+targetted to their native platform and not cross compiler capable.  GCC
+is designed to generate code for many different instruction sets on
+the backend.  And itself can be built as a cross compiler, but the way
+GCC works for each architecture you want to target you need to compile
+gcc for that architecture.  LLVM/Clang for example is designed from
+the ground up to be a Just In Time tool, so its output remains mostly
+target independent until Just In Time.  It also contains and I would
+assume is more widely used a backend that turns it into a compiler, you
+dont have to wait until JIT, you can get targetted output now.  And
+to take this further LLVM in its native build has all the targets built
+in at once you build LLVM/Clang one time an can use it as a cross
+compiler for many targets, you dont have to do a separate build per
+target.  And just because a compiler CAN be built as a cross compiler
+doesnt mean it is a good compiler, the more generic you get the more
+you take away from tuning for a particular instruction set.  Both GNU
+tools and LLVM do a pretty good job in general for each target.
+Understanding that each target is maintained to some extent by
+individuals and different individuals produce different quality code
+so either of these toolchains might have a bad apple or two due to
+the maturity of the target or the individual or team working on it but
+other targets may be mature.
+
+This tutorial is going to focus primarily on the gnu toolchain,
+which is one of those that can be used as a cross compiler but is not
+trivial to make it a cross compiler.
 
 Fairly soon you will need some tools.  At first we only need binutils
-which is gnu's collection of assembler and linker tools.  there are
+which is GNU's collection of assembler and linker tools.  There are
 other tools in there, the assembler and linker are the first we care
 about.  This is NOT a tutorial on teaching assembly language, you will
 see some, but just enough to get a C programming running.  That means
 we will need a C compiler as well fairly soon.   Now I say that this
 is a non-trivial task.  The more trivial way to do this is to go to
 http://codesourcery.com (which is not codesourcery anymore but now
-part of mentor graphics, it is easier on me to just remember the codesourcery
-link).  You are looking for the Lite version of their compiler this
-is a free version (you might have to give up an email address to get it)
-of their tools.  Not limited necessarily, just means that you dont get
-any tech support for it.  If you get a pay-for version from them then
-you get some level of support for the toolchain.  Now because of how
-I use the gnu tools (no C libraries, no gcc libraries) it doesnt matter
-which one you get the Linux compiler or the eabi compiler will both
-work just fine.  The non-linux, eabi compiler is the more correct one
-to use for bare metal programming.  Another tool alternative is to
-go and find one of the hobby gnu based toolchains, winarm, yagarto, devkitarm,
-etc.  Or you can build your own...sometimes...and sometimes that can
-turn into a long research project.   The buildgcc directory of this
-Raspberry Pi repository has scripts for building on linux, now there are
-a number of packages you need to install before that will work and
-I am not going to get into all of that.  Another path would be to
-have buildroot build you a toolchain.  Buildroot's goal is to build
-something to run on your system, and to do that it needs a cross compiler
-and to do that it tries to do all the work for you, so you are likely
-to end up with a longer build time and a lot more stuff that you wanted
-but you might have better success actually getting a cross compiler
-built from sources if that is interesting.
+part of mentor graphics, it is easier on me to just remember the
+codesourcery link).  You are looking for the Lite version of their
+compiler this is a free version (you might have to give up an email
+address to get it) of their tools.  Lite does not mean limited
+necesarily, just means that you dont get any tech support for it.  If
+you get a pay-for version from them then you get some level of support
+for the toolchain.  Now because of how I use the gnu tools (no C
+libraries, no gcc libraries) it usually  doesnt matter which one you
+get the Linux compiler or the embedded eabi compiler will both work
+just fine.  The non-linux, eabi compiler is  the more correct one to
+use for bare metal programming.  This is one of those personal things
+not a general bare metal thing, and the benefit here is that I am only
+relying on the compiler to do the job of compiling, turn C into ASM.
+Dont try to do more than that.  I become less dependent on the specific
+compiler and the code is more portable, more of you and myself too can
+use it over time.
 
-You will need a gnu ARM cross compiler toolchain.  binutils and gcc at
-a minimum, more than that is beyond the scope of this tutorial, have
+Another pre-built you may want to get also or instead is
+https://launchpad.net/gcc-arm-embedded.  Another tool alternative is to
+go and find one of the hobby gnu based toolchains, winarm, yagarto,
+devkitarm, etc.  Or you can build your own...sometimes...and sometimes
+that can turn into a long research project.  I have a build_gcc
+repository at github (https://github.com/dwelch67/build_gcc) that has
+scripts for building gcc based cross comipilers for a few targets as
+well as a script for building LLVM/Clang from sources.  These are
+the scripts I use myself and the toolchains built are the ones I use
+at work and at home.  There are a number of packages you will need to
+have installed on your system and I wont get into that here.
+
+So you will need a GNU ARM cross compiler toolchain.  binutils and gcc
+at a minimum, more than that is beyond the scope of this tutorial, have
 fun.  If you cant get that toolchain up you may be stuck at this point.
 Now the one get out of jail free card you have here is that your
-raspberry pi runs linux, and you can get a native, non-cross-compiler
+raspberry pi can run linux, and you can get a native, non-cross-compiler
 ARM gnu toolchain on your raspberry pi when running linux fairly easy.
-At the price point of a raspberry pi, if you want to do it this way
-you might want to have a second raspberry pi.  One as a linux development
-machine where you create the programs and the other as the bare metal
-machine where you try to run those programs.  Where you see
-arm-none-eabi-gcc for example, on an arm based linux system just type
-gcc instead.  if you are using the linux cross compiler you may have
-something like arm-linux-gnueabi-gcc.  If I have done my work right then
-any one of these will work.  if you are on an x86 computer though
-the gcc command by itself WILL NOT WORK.  Let me say that again WILL
-NOT WORK.
+Simply prepare a raspbian sd card and use it.  At the price point of a
+raspberry pi, if you want to do it this way you might want to have a
+second raspberry pi.  One as a linux development machine where you
+create the programs and the other as the bare metal machine where you
+try to run those programs.  Where you see arm-none-eabi-gcc for example,
+on an arm based linux system just type gcc instead.  if you are using
+the linux cross compiler you may have something like
+arm-linux-gnueabi-gcc.  If I have done my work right then any one of
+these will work.  if you are on an x86 computer though the gcc command
+by itself WILL NOT WORK.  Let me say that again WILL NOT WORK.
 
-The first thing we have to learn is how does our processor/computer
-boot.  We have to know this so we can make our program work, we have
-to build our program so that the first instruction in our program
-is placed in the computer such that it is the first instruction
-run by the computer.  The Raspberry Pi is very much NON STANDARD with
-respect to how the ARM is brought up.  ARM processors boot in one of
-two ways normally.  The normal way an ARM boot is the first instruction
-executed its at address 0x00000000.  The Cortex-M processors specifically
-(the Raspberry Pi does NOT use a Cortex-M) the address of the first
-instruction executed is at address 0x00000004, the processor reads
-0x00000004 then uses the value read as an address, and then starts
-executing there.  The Raspberry Pi contains to primary processors one
-is a GPU, a processor dedicated to graphics processing.  It is a fully
-capable general purpose processor with floating point and other features
-that allow it to be used for graphics as well.  The gpu and the ARM
-share the rest of the processor for the most part, they share the same
-RAM, they share the peripherals, etc.  The GPU boots first, it reads
+Well beyond the scope of this document but you can also run Linux in a
+virtual machine like qemu, and within that virtual machine like running
+on a Raspberry Pi, you can then use a native ARM compiler.  And there
+are other ARM boards as well the BeagleBones and such that can
+natively compile.
+
+For bare metal the first thing we have to learn is how does our
+processor/computer boot.  We have to know this so we can make our
+program work, we have to build our program so that the first
+instruction in our program is placed in the computer such that it is
+the first instruction run by the computer.  The Raspberry Pi is very
+much NON STANDARD with respect to how the ARM is brought up.  ARM
+processors boot in one of two ways normally.  The normal way an ARM
+boot is the first instruction executed its at address 0x00000000.  The
+Cortex-M processors specifically (the Raspberry Pi does NOT use a
+Cortex-M) the ADDRESS of the first instruction executed is at address
+0x00000004, the processor reads 0x00000004 then uses the value read as
+an address, and then starts executing there.  The Raspberry Pi contains
+two primary processors one is a GPU, a processor dedicated to graphics
+processing.  It is a fully capable general purpose processor with
+floating point and other features that allow it to be used for graphics
+as well.  The gpu and the ARM share the rest of the chip resources for
+the most part, they share the same RAM, they share the peripherals, etc.
+The GPU boots first, how exactly, I dont know, it eventually reads
 things from the sd card, then it reads the file kernel.img which it
-loads into ram.  Then the gpu controls the ARM boot.
-
-So where does the GPU place the ARM code?  What address?  Well that is
-part of the problem.  From our (users) perspective, the firmware available
-at the time that the Raspberry Pi first hit the streets was placing
-kernel.img in memory such that it is at ARM address 0x00000000.  Understand
-that the purpose for the Raspberry Pi is to run linux (for educational
-purposes) and at least on arm, the linux kernel (also known as a kernel
-image) is typically loaded at ARM address 0x8000.  So those early (to us)
-kernel.img files had 0x8000 bytes of padding.  Later this was changed
-to a typical kernel.img that instead of being loaded at address 0x00000000
-was loaded at 0x00008000.  Since kernel.img is our entry point, it is
-the ARM boot code that we can control, we have to build our program
-based on where this file is placed and how it is used.  The presense of
-a file named config.txt and its contents can change the way the GPU
-boots the ARM, including moving where this file is placed and/or what
-address the ARM boots.  All of these things combined can put the contents
-of the file in memory where you didnt expect and your program may not
-run very long once it goes to an address that does not have the data
-or instructions it needs.
+loads into ram.  Then the gpu controls the ARM boot.  So where does the
+GPU place the ARM code?  What address?  Well that is part of the problem.
+From our (users) perspective, the firmware available at the time that
+the Raspberry Pi first hit the streets was placing kernel.img in
+memory such that the first instruction it executed that we had control
+over was at address 0x00000000.  Understand that the purpose for the
+Raspberry Pi is to run linux (for educational purposes) and at least on
+ARM, the linux kernel (also known as a kernel image) is typically loaded
+at ARM address 0x8000.  So those early (to us) kernel.img files had
+0x8000 bytes of padding.  Later this was changed to a typical kernel.img
+that instead of being loaded at address 0x00000000 was loaded at
+0x00008000.  The GPU would place the first instruction the ARM executed
+(at address 0x00000000 per the rules of an ARM processor like this) that
+would branch to the first instruction we controlled at address 0x8000.
+Since kernel.img is our entry point, it is the ARM boot code that we
+can control, we have to build our program based on where this file is
+placed and how it is used.  The presense of a file named config.txt and
+its contents can change the way the GPU boots the ARM, including moving
+where this file is placed and/or what address the ARM boots.  All of
+these things combined can put the contents of the file in memory where
+you didnt expect and your program may not run properly.
 
 Here is another one of my personal preferences to deal with.  I prefer
 to use the most current GPU firmware files from the Raspberry Pi
@@ -241,19 +289,21 @@ only other file beeing kernel.img that I am creating instead of the one
 from the Raspberry Pi folks.  This means that I prefer to deal with
 how the kernel.img file is used for the linux folks.  From the time that
 I received my first Raspberry Pi to the present, the up to date
-bootcode.bin, loader.bin, and start.elf have placed kerne.img at 0x00008000
-in ARM address space, and that is our ARM entry point.  0x00008000 is
-the location for the first ARM instruction that we can control.
+bootcode.bin, loader.bin, and start.elf have placed kernel.img at
+0x00008000 in ARM address space, and that is my ARM entry point.
+0x00008000 is the location for the first ARM instruction that we can
+control.
 
 So now we are ready to approach our first program.  We know that our
 program is a file named kernel.img which is just a binary file that
 is copied to ARM memory space at address 0x00008000.  We have built
 and/or installed a gnu cross compiler for ARM, at a minimum binutils
-and gcc.  And now for another preference of mine, but this is one that
-you will find a number of other folks controlling as well.  If you think
-about your C programming experience, although you may have been taught
-to avoid global variables at all costs you know they exist and you have
-or should have been taught at least something about them.  Even if you
+and gcc.
+
+Now now for another preference of mine.  If you think about your C
+programming experience, although you may have been taught to avoid
+global variables at all costs you know they exist and you have or
+should have been taught at least something about them.  Even if you
 have not you have no doubt initialized static local variables:
 
 unsigned int apple;
@@ -265,97 +315,101 @@ int main ( void )
     ...
 }
 
-With the code above as a C programmer you are not only under the impression
-the language dictates that apple will have the value zero, orange and pear
-will have the values indicated in the code when you start.  Now you should
-also know that peach will be undefined, you have to assign it a value
-before you can safely use it.   How does all of that happen?  Is there
-C code that runs before main() is called that prepares memory so that
-your program has those memory locations filled with values?  If that were
-the case and it was C code, and that C code made the same assumptions
-about variables being pre-initialized, would there be C code that preceeds
-that code?  This feels like a "Which came first, the chicken or the egg"
-problem.  But it is not.  The answer is there is some code written in
-assembly language the is executed before main() is called and that assembly
-language code prepares these memory locations so that when your C code
-starts apple, orange and pear have the proper values loaded.  This assembly
-language code is often called the bootstrap code.  A very appropriate
-term for us as that small bit of assembly language code will both be
-the boot code for the ARM, the first instructions, that we control, that
-the ARM runs and it is also the code that we are using to prepare memory,
-etc so that the C programs work as desired.
+With the code above as a C programmer you are not only under the
+impression the language dictates that apple will have the value zero,
+orange and pear will have the values indicated in the code when you
+start.  Now you should also know that peach will be undefined, you have
+to assign it a value before you can safely use it.
+-How does all of that happen?
+-Is there C code that runs before main() is called that  prepares memory
+so that your program has those memory locations filled with values?
+
+If that were the case and it was C code, and that C code made the same
+assumptions about variables being pre-initialized, would there be C code
+that preceeds that code?  This feels like a "Which came first, the
+chicken or the egg" problem.  But it is not.  The answer is there is
+some code written in assembly language the is executed before main() is
+called and that assembly language code prepares these memory locations
+so that when your C code starts apple, orange and pear have the proper
+values loaded.  This assembly language code is often called the bootstrap
+code.  A very appropriate term for us as that small bit of assembly
+language code will both be the boot code for the ARM, the first
+instructions, that we control, that the ARM runs and it is also the
+code that we are using to prepare memory, etc so that the C programs
+work as desired.
 
 Here comes another one of my preferences.  For the code that follows
 and much of the code in my repos, I DO NOT support the initializing of
 variables.  If you were to take one of my examples and add the apple
-orange and pear variables above you would not get 0, 5, and 7 you would
-expect to find some garbage values, or maybe zeros if you are lucky for
-all of those variables but something that you should not anticipate or
-expect to be the same every time.  When you finish this tutorial go
-over to the bssdata directory, and read about why I do it the way I do it
-and what other work you have to do to insure those variables are pre-initialized
+orange and pear variables above you should not expect to get 0, 5, and
+7.  Further what you do find you should not expect to find every time,
+simply make no assumptions about the starting contents of variables.
+This is my preference not a generic bare metal thing.  It is a problem
+that you have to solve for generic bare metal programming and this is
+how I solved it.  When you finish this tutorial go over to the bssdata
+directory, and read about why I do it the way I do it and what other
+work you have to do to insure those variables are pre-initialized
 before main() is called.  The short answer is it involves toolchain
 specific things you have to do, and I prefer to lean toward more portable
-including portable across toolchains (minimizing effort to port) solutions
-so I try to make my C code so that it does not use "implementation defined"
-features of the language (that do not port from one compiler to another)
-and try to keep the boot code and linker scripts, etc as simple as possible
-with a little sacrifice on adding some more code.  You will see what
-all of that means.  Also note that I do not use main() as the entry point
-funciton in my code.  The first time I learned all of this stuff the
-compiler tools I was using at the time would add extra junk to your binary
-when it saw the word main().  If you used some other name then it would
-not add that junk, and not bloat the binary.  The Raspberry Pi has
-relatively lots of memory at 128KB + for the ARM.  In the embedded
-bare metal programming world you very often face 8KB or 16Kb or 32KB
-etc and you cannot afford the toolchain sucking up chunks of that
-memory with stuff you are not using.  Part of bare metal programming
-is you being in control of everything, the code, the peripherals, and
-the binary.
+including portable across toolchains (minimizing effort to port)
+solutions.  So one thing is I try to make my C code so that it does not
+use  "implementation defined" features of the language (that do not port
+from one compiler to another, inline assembly for example).  Second
+I try to keep the boot code and linker scripts, etc as simple as possible
+with a little sacrifice on adding some more code.  Linker scripts in
+particular are toolchain specific and the the entry label and perhaps
+other boostrap items are also toolchain specific.  You will see what
+all of that means in the bssdata directory.
 
-Good, bad, or otherwise the gnu tools dominate, binutils which includes
+Also note that I do not use main() as the entry point funciton in my
+code.  The first time I learned all of this stuff the compiler tools I
+was using at the time would add extra junk to your binary when it saw
+the word main().  If you used some other name then it would not add
+that junk, and not bloat the binary.  The Raspberry Pi has relatively
+lots of memory at 128KB + for the ARM.  In the embedded bare metal
+programming world you very often face 8KB or 16Kb or 32KB etc and you
+cannot afford the toolchain sucking up chunks of that memory with stuff
+you are not using.  Part of bare metal programming is you being in
+control of everything, the code, the peripherals, and the binary.
+
+Good, bad, or otherwise the GNU tools dominate, binutils which includes
 an assembler, linker and library tools and gcc which includes a C
 compiler and can include other things.  One of the pro's is that when
-you learn the gcc tools for one platform most of that knowledge translates
-to other platforms (learn embedded ARM with gnu tools and the learning
-curve for MIPS is much smaller).  What are the tools we are going to
-be using?  We should at this point already know that gcc is the C compiler
-and we can compile our programs into something called an object or your
-experience may be limited to creating binaries from your C program.  There
-is actually a bit of hidden magic that goes on.  When you compile your
-hello world program on your Linux machine, first off the C code is
-compiled into assembly language, yes, in text, assembly language.  Then
+you learn the gcc tools for one platform most of that knowledge
+translates to other platforms (learn embedded ARM with gnu tools and
+the learning curve for MIPS is much smaller).  What are the tools we
+are going to be using?  We should at this point already know that gcc
+is the C compiler and we can compile our programs into something called
+an object or your experience may be limited to creating binaries from
+your C program without seeing any of the intermediate files.  There is
+actually a bit of hidden magic that goes on.  When you compile your
+hello world program on your Linux machine, the first one or few files
+generated is your C code in different forms they make another file
+which is your C code plus all of the includes expanded into that file.
+Eventually the actual C compiler is called and that turns the C code
+into assembly language in a txt file.  Yes, assembly language.  Then
 the assembler is called by the compiler and the assembler assembles
-the assembly language into an object file, which in this case is a flavor
-of binary file that has most of the instructions in machine code but is
-not a compilete binary because there may be some functions or variables
-in other objects that wont be resolved until link time.  Now the hello
-world C code is made into an object.  to make it something we can
-run on our operating system it has to be linked with some bootstrap
-code which is some assembly (crt0.S in the gnu world) that at some point
-has been made into an object file (crt0.o in the gnu world).  We also
-have printf() in our hello world program, which is made up of a large
-pile of other C library calls, these C libraries were all C and assembly
-files that were made into objects and likely the objects were put into
-a single file called a library which is just an easier way to manage
-a bunch of object files.  Combine the bootstrap code the library files
-add to that the object created from our one line hello world printf
-and call the linker.  The linker takes the object files and links them
-together like a chain.  For example printf() is a function call the object
-made by our C code is not able to resolve printf in that code, there is
-no printf() function in our program so it is an external function call,
-it cannot resolve that function in that object file so it leaves something
-dangling waiting for the linker to later connect it.
+the assembly language into an object file, which in this case is a
+flavor of binary file that has most of the instructions in machine code
+but is not a complete binary because there may be some functions or
+variables in other objects that wont be resolved until link time.  For
+our hello world printf to output something it needs to link witha C
+library which makes system calls and may or may not have to link with
+other stuff.  So the linker takes the object that came from our code
+and links that with these other items and creates a binary that is
+compatible with the operating system we are running.
 
 The next thing we have to know is there can be a difference between the
 entry point into our program and the first instruction in the program.
 If you think about it most programs we use a compiler for run on
 operating systems.  The operating system loads the program from the
 filesystem into memory and then performs a jump into that memory, it
-can jump to any address.  That does not make any sense for this platform.
-The GPU is going to load the program at an address and cause the ARM
-to start executing at that address so our entry point needs to be at
-the beginning.
+can jump to any address.  It may or may not do that but it is at least
+possible on a system that is already running.  But for booting a
+processor we cannot change the processor to boot anywhere we want and on
+the Raspberry Pi we cant or at least shouldnt try to change its habit
+of executing the first instruction in the kernel.img file.  So we have
+to make sure we control the whole linking process to insure that happens.
 
 I think we have enough ammo to stop chatting and start writing some
 programs.  I hope you dont hate me at this point but this tutorial
@@ -368,42 +422,43 @@ programming is as much about knowing and manipulating the compiler tools
 as it is about manipulating peripheral registers.  Before we can even
 begin to talk about peripherals we have to have code that actually
 runs on the hardware.  We will touch on perhiperals in the sense
-that I will borrow from my other programs in this repository that already
-talk about the peripheral side of bare metal.  This directory is about
-the compiler side of bare metal.
+that I will borrow from my other programs in this repository that
+already talk about the peripheral side of bare metal.  This directory
+is about the compiler side of bare metal.  Your takeaway here is being
+able to understand why my bare metal examples work.
 
-The gnu linker is looking for a label named _start to know where the
+The GNU linker is looking for a label named _start to know where the
 entry point of the program is.  It is possible to override or replace
 this with something on the linker command line, it is easy enough to
-just use the label that we will do that.
+just use that label, so we will do that.
 
 The bare minimum bootstrap code for this processor would be to set
-the stack pointer and to branch to our main() program.  Now I use
-notmain() as the name of my entry point into C.  What is a stack pointer?
-You should have learned about stacks in general in your prior programming
-training or experience.  The stack is nothing more than a chunk of
-memory.  How it differs from memory is not that it is special because it
-isnt, it is how it is accessed.  Our apple and orange variables above
-are global, they are at a fixed place in memory, lets say they end up
-after compiling and linking to be at addresses 0x1234 and 0x1238
-respectively.  Any code in any function that wants to access them will
-after compiling and linking be accessing those addresses.   But what about
-our peach variable above, that is a local variable and you may have been
-told that that "lives on the stack"  Instead of being at a fixed address
-in memory, the peach variable will, after compiling and linking be at
-a fixed OFFSET in memory, offset relative to what?  Relative to the
-stack pointer at some point in time in the function.  The stack pointer
-is simply a register that holds a number which is an address in memory.
-Not special memory just memory on this platform the same memory we use
-for our program and our variables.  When the compiler converts our C
-code into assembly code one of the things it has to do is manage these
-local varaibles and other things.  Any C function that has local
-variables will cause the compiler to create code that moves the
-stack pointer as a way to allocate memory for that variable.  We will
-cover this topic more as we go, for now understand that the minimum
-bootstrap code for this platform is to set the stack pointer and then
-to branch to our top level C function.  Here is some code thae does
-that:
+the stack pointer and to branch to our C program.  Now I use notmain()
+as the name of my entry point into C.  But you ask:  What is a stack
+pointer?  You should have learned about stacks in general in your prior
+programming training or experience.  The stack is nothing more than a
+chunk of memory.  How it differs from memory is not that it is special
+because it isnt, it is how it is accessed.  Our apple and orange
+variables above are global, they are at a fixed place in memory, lets
+say they end up after compiling and linking these variables end up at
+addresses 0x1234 and 0x1238 respectively.  Any code in any function that
+wants to access them will after compiling and linking be accessing those
+addresses.   But what about our peach variable above, that is a local
+variable and you may have been told that that "lives on the stack".
+Instead of being at a fixed address in memory, the peach variable will,
+after compiling and linking be at a fixed OFFSET in memory, offset
+relative to what?  Relative to the stack pointer at some point in time
+in the function.  The stack pointer is simply a register that holds a
+number which is an address in memory.  Not special memory just memory on
+this platform the same memory we use for our program and our variables.
+When the compiler converts our C code into assembly code one of the
+things it has to do is manage these local varaibles and other things.
+Any C function that has local variables will cause the compiler to
+create code that moves the stack pointer as a way to allocate memory
+for that variable.  We will cover this topic more as we go, for now
+understand that the minimum bootstrap code for this platform is to set
+the stack pointer and then to branch to our top level C function.  Here
+is some code thae does that:
 
 .globl _start
 _start:
@@ -412,20 +467,20 @@ _start:
 
 Now I told you this is not a lesson in assembly language programming,
 but we will be looking at assembly language even if we dont know exactly
-what all the code means or does.  Many may disagree with me but disassembling
-your program is one of the fastest and easiest ways to debug your bare
-metal code.  I will keep saying this, a big part of bare metal programming
-is knowing your compiler tools, very often, esp with bootstrap code your
-bug may not be in the code itself but in the way you used the tools, the
-command lines or linker scripts that you used to compile that code.
-Get it wrong and no matter how bug free your code is it will not run and
-you will have a hard time figuring it out without looking at what the
-compiler and linker generated.  So the above code starts with a directive
-.globl, I think .global also works, both do the same thing, declare the
-label _start as global meaning it is visible to the linker.  In C
-everything (functions and non-local variables) is global unless you
-put the word static in front of it then it becomes
-local:
+what all the code means or does.  Many may disagree with me but
+disassembling your program is one of the fastest and easiest ways to
+debug your bare metal code.  I will keep saying this, a big part of bare
+metal programming is knowing your compiler tools, very often, esp with
+bootstrap code your bug may not be in the code itself but in the way you
+used the tools, the command lines or linker scripts that you used to
+compile and link that code.  Get it wrong and no matter how bug free
+your C code is it will not run and you will have a hard time figuring
+it out without looking at what the compiler and linker generated.  So
+the above code starts with a directive .globl, .global also works, both
+do the same thing, declare the label _start as global meaning it is
+visible to the linker.  In C everything (functions and non-local
+variables) is global unless you put the word static in front of it then
+it becomes local:
 
 static unsigned int apple;
 unsigned int orange:
@@ -440,10 +495,11 @@ matter if _start is our entry point, but for places where it is used
 it is a good habit to place it at our entry point for sake of habit.  And
 that is what we are doing here.
 
-The mov sp,  line basicall says put the number 0x00010000 in the reigster
-named sp, which is an alias for r13.  R13 in the ARM is a register that
-has special use as the stack pointer.  Registers in a processor are
-very much like variables in a C program in how they are used.
+The mov sp,  line basically says put the number 0x00010000 in the
+reigster named sp, which is an alias for r13.  R13 in the ARM is a
+register that has special use as the stack pointer.  Registers in a
+processor are very much like variables in a C program in how they are
+used.
 
 And the last line b notmain means branch to notmain.  Branch is also
 known as a jump in other assembly languages and is exactly like a goto
@@ -453,10 +509,10 @@ We are going to start using the tools that you installed, this step
 may be a major research project for you or it might just work.  You might
 only need to set the path to your tools to make this all work:
 
-> arm-none-eabi-as --version
+baremetal > arm-none-eabi-as --version
 arm-none-eabi-as: command not found
-> PATH=/gnuarm/bin/:$PATH
-> arm-none-eabi-as --version
+baremetal > PATH=/gnuarm/bin/:$PATH
+baremetal > arm-none-eabi-as --version
 GNU assembler (GNU Binutils) 2.22
 Copyright 2011 Free Software Foundation, Inc.
 This program is free software; you may redistribute it under the terms of
@@ -468,14 +524,15 @@ Your path may be and probably is different than mine.  Again this
 may be a research project for you or it may just work or somewhere
 in the middle.
 
-The gnu assembler is a program named as.  When we make it a cross assembler
-to not confuse it with the as assembler that we need for the operating
-system we are running on, we add a prefix to the name.  A common one you
-will find in this day and age for gnu tools is arm-none-eabi-.  That
-will be tacked on the front of everything and that is the one I will be
-using.  You may have arm-linux-gnueabi- or you may have arm-elf- or
-arm-thumb-elf- or many other prefixes.  Although they can vary in theory,
-the way I write my code, they should mostly come close to working.
+The gnu assembler is a program named as.  When we make it a cross
+assembler to not confuse it with the as assembler that we need for the
+operating system we are running on, we add a prefix to the name.  A
+common one you will find in this day and age for gnu tools is
+arm-none-eabi-.  That will be tacked on the front of everything in the
+GNU tools that we care about and that is the one I will be using.  You
+may have arm-linux-gnueabi- or you may have arm-elf- or arm-thumb-elf-
+or many other prefixes.  Although they can vary in theory, the way I
+write my code, they should mostly come close to working.
 
 Lets say I called that small bit of assembly bootstrap.s
 
@@ -504,14 +561,18 @@ sometimes it is all binary bits and bytes that make up your program.
 Most of the time, esp when running on an operating system, that file
 is a mixture of the bits and bytes of your program but wrapped by
 a file format that contains things like debugging information or other
-things, for example the global name _start is shown in the disassembly
-if all that was in the binary file was the 8 bytes
+things.
+
+If the file only contained the machine code and data that makes up the
+program it would only need these 8 bytes (this is not a real, functioning
+program remember).
 
 e3 a0 d8 01
 ea ff ff fe
 
-How does the disassembler know about the names _start and notmain?  the
-answer is the file is not 8 bytes it is larger
+How would the disassembler then know from that the names of things like
+_start and notmain?  The answer is the file is not 8 bytes it is
+larger
 
 baremetal > ls -al bootstrap.o
 -rw-r--r-- 1 root root 664 Sep 23 13:47 bootstrap.o
@@ -533,28 +594,29 @@ baremetal > hexdump -C bootstrap.o
 
 You can see at offset 0x34 in the file we see the 8 bytes of our program.
 
-There are many file formats supported by the gnu tools.  Elf is the
-default format for arm based programs and many others as well.  But we
+There are many file formats supported by the GNU tools.  Elf is the
+default format for ARM based programs and many others as well.  But we
 can convert those into other formats using another of the binutils tools
 and we will have to use that tool for the Raspberry Pi.  First off
 notice that the .elf file format is binary itself most of the information
-is not directly human readable you need to use other programs (like objdump)
-to extract information from that file.  Another format that you will
-see "binaries" in is the intel hex file format.  This is an ascii format
-file making it easier for us to read and manipulate as programmers and
+is not directly human readable you need to use other programs (like o
+bjdump) to extract information from that file.  Another format that you
+will see "binaries" in is the intel hex file format.  This is an ASCII
+format file making it easier for us to read and manipulate as programmers and
 hack at if so desired...You will still find this format used in various
 corners of the embedded world.  Many rom/flash programmers suppor this
-file format, many bootloaders (like my bootloader01) support this format.
+file format, many bootloaders (like my bootloader01) support this
+format.
 
 baremetal > arm-none-eabi-objcopy bootstrap.o -O ihex bootstrap.hex
 baremetal > cat bootstrap.hex
 :0800000001D8A0E3FEFFFFEAB6
 :00000001FF
 
-The objcopy command line takes a command line option -O with some predefined
-name like binary, ihex, srec, and others. If possible it determines
-the file format of the input file (bootstrap.o in this case) and then
-converts what it can to the output file format.
+The objcopy command line takes a command line option -O with some
+predefined name like binary, ihex, srec, and others. If possible it
+determines the file format of the input file (bootstrap.o in this case)
+and then converts what it can to the output file format.
 
 baremetal > arm-none-eabi-objcopy bootstrap.o -O binary a.bin
 baremetal > arm-none-eabi-objcopy bootstrap.hex -O binary b.bin
@@ -572,8 +634,8 @@ That little exercise shows how to take just the bytes of our program
 and put them in what we would most accurately call a binary file, just
 the 8 bytes of our program nothing more nothing less.  We will need
 to do this for the raspberry pi.  Notice how objcopy was not able
-to recognize the file format for the intel hex file and we had to specify
-it using the -I.
+to recognize the file format for the intel hex file and we had to
+specify it using the -I.
 
 To see the file formats supported by objcopy try this:
 
@@ -610,7 +672,7 @@ ihex
  (header endianness unknown, data endianness unknown)
   arm
 
-We have tried intel hex or ihex and I want to show you another ascii
+We have tried intel hex or ihex and I want to show you another ASCII
 based one called srec or s record
 
 baremetal > arm-none-eabi-objcopy bootstrap.o -O srec bootstrap.srec
@@ -619,18 +681,18 @@ S0110000626F6F7473747261702E7372656335
 S10B000001D8A0E3FEFFFFEAB2
 S9030000FC
 
-You can use wikipedia to get the definitions for the intel hex and s record
-file formats and very easily write a program that parses those files and
-extracts things, maybe write your own disassembler for educational
-purposes or write a bootloader or an instruction set simulator or any
-place where you need to take a compiler/assembler/linker generated
-program and read it for any reason.  Let me point out that the elf
-specification is as readily available and although there are libraries
-out there to parse those files, it is as easy to make an elf parser
-as it is to make an ihex or srec parser.  And you dont rely on some
-third party library that is going to change over time causing your
-code to no longer work or have to change to conform to some new
-standard for that library.
+You can use wikipedia to get the definitions for the intel hex and
+s record file formats and very easily write a program that parses those
+files and extracts things, maybe write your own disassembler for
+educational purposes or write a bootloader or an instruction set
+simulator or any place where you need to take a compiler/assembler/linker
+generated program and read it for any reason.  Let me point out that
+the elf specification is as readily available and although there are
+libraries out there to parse those files, it is as easy to make an elf
+parser as it is to make an ihex or srec parser.  If you make it yourself
+then you dont rely on some third party library that is going to change
+over time causing your code to no longer work or have to change to
+conform to some new standard for that library.
 
 So now lets make our first C program, this is not hello world, even
 simpler it does nothing, so we think:
@@ -650,7 +712,7 @@ Disassembly of section .text:
 00000000 <notmain>:
    0:   e12fff1e    bx  lr
 
-So what does bx lr mean?  Bx is an ARM instruciton that means branch
+So what does bx lr mean?  Bx is an ARM instruction that means branch
 exchange, and lr is the link register.  When you call a function in
 your C code your expectation is that the processor will jump somewhere
 and execute the code in the function then it will come back and
@@ -663,13 +725,13 @@ keep running your program/code after that funcion call.
 ...
 
 After calling the function fun() we expect the code to come back and run
-d = c * 5.  Well the way the arm does it is the call to a function uses
+d = c * 5.  Well the way the ARM does it is the call to a function uses
 an instruction called branch link, which saves the address of the code
 after the function call in a register called the link register.  Then
-at some point we encounter one of a couple instructions in arm that
-will allow the program to jump to the address in the link register returning
-to where we were executing just after the function call.  One is
-the branch exchange and the other is a mov pc = lr
+at some point we encounter one of a couple instructions in ARM that
+will allow the program to jump to the address in the link register
+returning to where we were executing just after the function call.  One
+is the branch exchange and the other is a mov pc = lr
 
 bx lr
 
@@ -680,18 +742,17 @@ mov pc,lr
 Depending on the tools and how you use them you should mostly see the
 bx lr in assembly and in the code generated by the compiler if you dont
 then there may be a reason which you may or may not be concerned about
-at this time.  I will keep saying this, this is not a tutorai on
+at this time.  I will keep saying this, this is not a tutorail on
 assembly language, but you may already see that assembly language is
 required in order to start up C code, and I argue required in order
 to debug bare metal code.  I am only touching on a little bit of
 asm readability which is a long way away from teaching how to program in
-assembly language.  I have to cover some basics so that we can get
-to our C code and also so we can see what the compiler and tools are doing.
-
-So now we have to objects bootstrap.o and notmain.o that we need to link
-together.  Way above we talked about having our program start at address
-0x8000, so lets try linking for the first time.
+assembly language.  I have to cover some basics so that we can get to
+our C code and also so we can see what the compiler and tools are doing.
 
+So now we have two objects bootstrap.o and notmain.o that we need to
+link together.  Way above we talked about having our program start at
+address 0x8000, so lets try linking for the first time.
 
 baremetal > arm-none-eabi-ld -Ttext 0x00008000 bootstrap.o notmain.o -o hello.elf
 baremetal > arm-none-eabi-objdump -D hello.elf
@@ -711,20 +772,21 @@ Disassembly of section .text:
 Cool, our first Raspberry Pi bare metal program.  Problem is we cannot
 run this, for a number of reasons.  First off I intentionally used the
 wrong instruction in the bootstrap code, second this is an elf file
-not a bin file.  how do we fix these things?
+not a bin file.  How do we fix these things?
 
 So now that I have mentioned the link register and how it is used to get
-back from one function after calling it.  If you think about the compilers
-job, at one level it doesnt really know or care what the name of your
-function is or its purpose, when compiling the code in the main() function
-it for the most part doesnt care if it is called main() or notmain()
-or pickle() it does a job, it assumes that function is called from another
-function and it uses the proper return instruction.  Since we called
-notmain() from assembly we should be prepared for the notmain() function
-to return, so we should have used a branch link instruction and put
-some code after the call to the notmain function.  If notmain() returns
-then we are pretty much done so we can put the processor into an infinite
-loop, waiting for the user to turn the power off to try another program.
+back from one function after calling it.  If you think about the
+compilers job, at one level it doesnt really know or care what the name
+of your function is or its purpose, when compiling the code in the
+main() function it for the most part doesnt care if it is called main()
+or notmain() or pickle() it does a job, it assumes that function is
+called from another function and it uses the proper return instruction.
+Since we called notmain() from assembly we should be prepared for the
+notmain() function to return, so we should have used a branch link
+instruction and put some code after the call to the notmain function.
+If notmain() returns then we are pretty much done so we can put the
+processor into an infinite loop, waiting for the user to turn the power
+off to try another program.
 
 .globl _start
 _start:
@@ -733,15 +795,13 @@ _start:
 hang: b hang
 
 So bl notmain performs a branch and link, branch like the b instruction
-is exactly like a goto in C.  The link part of it means save the address
-of the next instruction in the link register so that we can branch
-back to it after the function call.  In this case we send it into an
-infinite loop.  Need to remember to do something if we had simply changed
-the b to a bl in boostrap.s when the processor returned from our call
-to notmain it would start executing through whatever the linker placed
-after the b notmain instruction.  So here we go we  have patched up
-bootstrap.s and need to assemble it and link it with notmain.o
+is exactly like a goto in C, a branch and link is like calling a
+function in C.  So we have to remember to put something after the branch
+link in case the function returns. In this case we send it into an
+infinite loop.
 
+So here we go we  have patched up bootstrap.s and need to assemble it
+and link it with notmain.o
 
 baremetal > arm-none-eabi-as bootstrap.s -o bootstrap.o
 baremetal > arm-none-eabi-ld -Ttext 0x00008000 bootstrap.o notmain.o -o hello.elf
@@ -774,8 +834,8 @@ Now we have a file that we can put on our sd card and run.  It does
 nothing that we can see, so it isnt much use to us, but it will work.
 
 We can see that the linker has prepared the program such that our first
-instruciton is at address 0x8000.  we load the stack pointer and
-call notmain() not main does what it does (nothing) and returns from
+instruciton is at address 0x8000.  We load the stack pointer and
+call notmain().  Notmain does what it does (nothing) and returns from
 the function call which takes us back to the hang line which is an
 infinite loop, hang branches to hang forever or until the power is
 turned off.
@@ -785,6 +845,8 @@ files the address was zero not 0x8000.  Well the object files are by
 definition incomplete programs, even if everything we are going to
 run is there we should use the linker to polish that file.
 
+This is a disassembly of the object file bootstrap.o
+
 Disassembly of section .text:
 
 00000000 <_start>:
@@ -805,34 +867,35 @@ Disassembly of section .text:
 00008008 <notmain>:
     8008:   e12fff1e    bx  lr
 
-that the instruction changed from eafffffe to eaffffff, this is something
-the linker did when it figured out where notmain was going to be in
-memory it had to go back and fix all the references to notmain. which
-includes instructions.
+that the instruction changed from eafffffe to eaffffff, this is
+something the linker did when it figured out where notmain was going
+to be in memory it had to go back and fix all the references to notmain.
+Which includes instructions.
 
 The other thing you might have noticed is Disassembly of section .text
-what is a section and what is .text and what does text hve to do with
+what is a section and what is .text and what does text have to do with
 my programs machine code?
 
-Well, and this is not limited to gnu tools, for the sanity of the
-compiler and assembler and linker folks portions of our programs
+Well, and this is not limited to GNU tools, for the sanity of the
+compiler and assembler and linker folks, portions of our programs
 are broken into categories.  There is the program itself, the machine
 code and some other items that are needed for the machine code to run
 these are for some historical reason that I have not researched called
-.text.  Or the .text segment.  Data like the orange and pear stuff way
-above in an example is in the .data segment.  Data actually is broken
-up into different segments sometimes, and in particular with the gnu
-tools.  Most of the code out there that has global variables the
-globals are not defined, not initialized in the code, but the language
-declares those are assumed to be zero when you start using them (if you
-have not changed them before you used them).  So there is a special
-data segment called .bss which holds all of our global variables that
-when we start are going to be zero.  These are lumped together so that
-some code can easily go through that chunk of memory and zero that
+.text.  Or the .text segment.  The .data segment like the apple and
+orange global variables way above.  Data actually is broken up into
+different segments sometimes, and in particular with the GNU tools.
+Most of the code out there that has global variables the globals are
+not defined, not initialized in the code, but the language declares
+those are assumed to be zero when you start using them (if you have
+not changed them before you used them).  So there is a special data
+segment called .bss which holds all of our global variables that when
+we start running C code should be zero.  These are lumped together so
+that some code can easily go through that chunk of memory and zero that
 area before branching to the C entry point.  Another segment we may
-encounter is the .rodata segment.  Sometimes even with gnu tools you
-may find the read only data in the .text segment.  For fun lets
-make one of each:
+encounter is the .rodata segment.  Sometimes even with GNU tools you
+may find the read only data in the .text segment.
+
+For fun lets make one of each:
 
 
 unsigned int apple;
@@ -867,31 +930,26 @@ Disassembly of section .rodata:
    0:   00000009    andeq   r0, r0, r9
 
 
-So we see that the code is in .text.  The pre-initialized variable orange
-is in .data.  And the read only variable pickle is in .rodata.  What
-happened to apple and pear and peach and where is this .bss you were
-talking about?  Well notice that I used -O2 on the gcc command line this
-means optimization level 2.  -O0 or optimizaiton level 0 means no optimization
+So we see that the code is in .text.  The pre-initialized variable
+orange is in .data.  And the read only variable pickle is in .rodata.
+What happened to apple and pear and peach and where is the .bss segment?
+Well notice that I used -O2 on the gcc command line this means
+optimization level 2.  -O0 or optimizaiton level 0 means no optimization
 -O1 means some and -O2 is the maximum safe level of optimization using
-the gcc compiler.  The optimization level is modulo 3 of whatever you feed
-it so -O3 is the max optimization but it is not considered as reliable
-because it is a little cutting edge and it is not widely used. the -O2
-level is used by the compiler when compiling your operating system like
-Linux and other things so I would argue the -O2 option is the most tested
-flavor of output from the compiler.  for whatever reason -O3 is taught
-to be scary and avoided, yet you will see it used by some because it is
-not so scary if you know what is going on and how to debug the problems
-it may create.  I am not going to get into that but recommend you use
--O2 often, esp with embedded bare metal where size and speed are important.
-I use it here because it produces much less code than no optimization,
-you can play with compiling and disassembling these things on your
-own with less or without optimization to see what happens.
+the gcc compiler.  There is a -O3 but we are not supposed to trust that
+to be as tested as -O2.  I am not going to get into that but recommend
+you use -O2 often, esp with embedded bare metal where size and speed
+are important.  I use it here because it produces much less code than
+no optimization, you can play with compiling and disassembling these
+things on your own with less or without optimization to see what
+happens.
 
-So we didnt use apple, or pear or peach so the compiler optimized those
-away.  We didnt use orange or pickle either but because those were
-defined as something and were also both global variables the compiler
-when making an object doesnt know if other code is using those variables
-so it has to generate something for them for linking with other code.
+So our program didnt actually use use apple, or pear or peach so the
+compiler optimized those away.  We didnt use orange or pickle either
+but because those were defined as something and were also both global
+variables the compiler when making an object doesnt know if other code
+is using those variables so it has to generate something for them for
+linking with other code.
 
 Lets try to resolve this:
 
@@ -988,12 +1046,12 @@ Disassembly of section .rodata:
     8024:   00000009    andeq   r0, r0, r9
 
 
-So our apple variable has appeared as has the .bss section.  Notice
+So our apple variable has appeared is in the .bss section.  Notice
 on the linker command line I specified a few things the text segment
 address and data and bss but not the rodata.  The linker again has
 put the .text where we said and where we need it at 0x8000 we said
 to put .data at 0x9000 and it is there and notice it has the value
-5 from our orange varaible.  .bss is where we said at 0xA000.  Since
+5 from our orange variable.  .bss is where we said at 0xA000.  Since
 we didnt specify a home for .rodata notice how the linker has just
 tacked it onto the end of .text  the last thing in .text was a four
 byte address at address 0x8020, so the next address after that is 0x8024
@@ -1023,12 +1081,12 @@ We can see that the first thing in the file is our code that lives
 at address 0x8000, understand that the file offset and the memory offset
 are not the same.  What is important is that first thing in the file
 ends up at 0x8000 and since it is our entry code we are good from that
-perspective.  Now why isnt the file 48 bytes?  Because a binary file when
-we define it as a memory image means that if we have a few things at 0x8000
-a few things at 0x9000 and a few things at 0xA000 in order for those things
-to be in the right place in the file they need to be spaced apart, the
-file has to have some filler to put the important things at the right
-place.
+perspective.  Now why isnt the file 48 bytes?  Because a binary file
+when we define it as a memory image means that if we have a few things
+at 0x8000 a few things at 0x9000 and a few things at 0xA000 in order
+for those things to be in the right place in the file they need to be
+spaced apart, the file has to have some filler to put the important
+things at the right place.
 
 If this is at 0x8000
 
@@ -1070,17 +1128,18 @@ Our file grew but if you were to try to objcopy to a -O binary format
 20000000:   00000005    andeq   r0, r0, r5
 
 There are 0x60000000 bytes between these two items, that means the
-binary file created would at least be 0x60000000 bytes which is 1.6 gigabytes
-If you are like me you probably dont always have 1.6Gig of disk space
-handy.  Much less wanting it to be filled with a singel file which is
-mostly zeros.  You can start to see the appeal for these not really
-a binary binary file formats like elf and ihex and srec. they only
-define the real data and dont have to hold the zero filler.
+binary file created would at least be 0x60000000 bytes which is
+1.6 GigaBytes.  If you are like me you probably dont always have
+1.6Gig of disk space handy.  Much less wanting it to be filled with a
+singel file which is mostly zeros.  You can start to see the appeal for
+these not really a binary binary file formats like elf and ihex and
+srec.  They only define the real data and dont have to hold the zero
+filler.
 
-The bssdata directory gets into the things you need to do to deal with
-these problems on those kinds of systems.  For the Raspberry Pi we dont
-need to deal with all of this.  So you are actually not gaining some
-of these experiences by using this platform.
+The stuff I wrote in the bssdata directory continues with understanding
+how to control the GNU tools and segments.  For the Raspberry Pi we
+dont need to deal with all of this, you are actually missing out on
+some of the experience (pain).
 
 Here is something else I hope you caught:
 
@@ -1122,7 +1181,8 @@ Disassembly of section .rodata:
 00008024 <pickle>:
     8024:   00000009    andeq   r0, r0, r9
 
-I dont expect you to know that the assembly code is reading 0x8020
+I dont expect you to know that the notmain assembly code is reading the
+thing at 0x8020
 
     8020:   0000a000    andeq   sl, r0, r0
 
@@ -1206,9 +1266,9 @@ The file is now 8196 bytes
 0x8000 + 8196 = 0x8000 + 0x2004 = 0xA004
 
 And the objcopy -O binary has filled in the spaces with zeros so our
-.bss segment is there AND it is filled with zeros!  Need I say it again
-a big part of bare metal programming is knowing your tools.
-
+.bss segment is there in the binary AND it is filled with zeros!  Need
+I say it again a big part of bare metal programming is knowing your
+tools?
 
 
 One more thing:
@@ -1254,17 +1314,18 @@ it tacked it onto the end of .text, but in this case it didnt tack
 .bss onto the end of .text it added 0x2000 bytes of padding then it
 added it on there.  Why?  who knows.  The bottom line though is that
 we need to take more control over how we tell the linker to do things.
-In the gnu world this is through what is often called a linker script
+In the GNU world this is through what is often called a linker script
 yet another programming language that is parsed by the linker tool
 where we can go to or beyond the level of crazy complication.  And
 as you can guess I dont do that, I try for the minimal linker script
 I dont want to be tied to a tool, I want my code to be as portable
 as possible with minimal work.  Linker scripts are painful, because
-so many are so complicated it took me a long time to make this simple
-script and keep it working, I have actually had three different solutions
-which I thought each time where the simple, end all be all gnu linker
-script, they werent they worked on one version of tools and later failed.
-At this point I wouldnt be surprised if this script also fails some day.
+so many are so complicated, few if any simple examples, it took me a
+while to make to make this simple script and keep it working, I have
+actually had three different solutions which I thought each time where
+the simple, end all, be all, GNU linker script, they werent they worked
+on one version of tools and later failed.  At this point I wouldnt be
+surprised if this script also fails some day.
 
 MEMORY
 {
@@ -1305,7 +1366,6 @@ Disassembly of section .bss:
 00008024 <apple>:
     8024:   00000000    andeq   r0, r0, r0
 
-
 How about that now it is all packed together nice and tight.
 
 And to take this one step further:
@@ -1373,7 +1433,7 @@ There we go, 12 items all packed up tight in 48 bytes of binary
 
 
 All this work so far and we have not seen the stack, we have not seen
-or local variables.
+our local variables.
 
 
 bootstrap.s
@@ -1471,7 +1531,7 @@ set uses branch link (bl) to make function calls.  The branch means
 goto or jump or branch the program to some address.  The link means
 preserve a link back to the calling function, the hardware puts
 the address of the instruciton after the branch link in the link
-register so that you can return.  but what happens if you have
+register so that you can return.  But what happens if you have
 a function that calls a function?  Wont the second call overwrite the
 link register, making it so you cannot return to the original
 function?  Yes, on the surface that is true, this is where the stack
@@ -1486,20 +1546,20 @@ If the first thing we did in fun() was call fun() again then
 the stack pointer would go from 0x1018 to 0x1010, address 0x1010 would
 get the contents of r3 and 0x1014 would get the contents of the link
 register the address this instance of the fun() can needs to return,
-this of course would be an infinite loop, so we didnt do that.  what
+this of course would be an infinite loop, so we didnt do that.  What
 we did do is add 3 to the incoming value and call more_fun() this
 branch link call to more fun modifies the link register.  More_fun
 does its thing, we go through the rest of the fun() code then we pop
 r3 and lr off of the stack.  Because the stack pointer has not moved
-due to any other code relative to where it was when the push at the beginnning
-happened, that means r3 gets back the value it had when that push was
-executed and the link register also gets back its prior value, the value
-we needed to return to the fun() calling function.  So that bx lr that
-follows the pop returns to the proper place in notmain().  so you can
-see with a very small application we still need the stack set up
-meaning we need the stack pointer initialized in our bootstrap code.
-The compiler assumes it has been done, if we dont and leave that register
-out of our control we can get into trouble fast.
+due to any other code relative to where it was when the push at the
+beginnning happened, that means r3 gets back the value it had when that
+push was executed and the link register also gets back its prior value,
+the value we needed to return to the fun() calling function.  So that
+bx lr that follows the pop returns to the proper place in notmain().
+So you can see with a very small application we still need the stack
+set up meaning we need the stack pointer initialized in our bootstrap
+code.  The compiler assumes it has been done, if we dont and leave
+that register out of our control we can get into trouble fast.
 
 You may be asking why did I make those tiny functions separate files?
 This is from experience, I knew that I was using the optimizer and
@@ -1604,9 +1664,9 @@ to pear.
 
 I separated the files so that the compilers optimizer could not see
 all of the functions and would not be able to optimize to this level.
-Not just if but when you for example want to test some code that
-you suspect is the reason why your embedded program is too slow you
-might do something like this:
+
+So for example if you wanted to speed test a function, that you suspect
+is slow, you might want to do something like this:
 
 start=get_timer_tick();
 answer=fun(5,6);
@@ -1624,7 +1684,7 @@ where they normally might be variables:
 
 fun(a,b)
 
-the optimizer if allowed might simply replace all of your complicated
+The optimizer if allowed might simply replace all of your complicated
 algorithm with:
 
 start=get_timer_tick();
@@ -1642,18 +1702,17 @@ above.
 Lets go back to some basics and common mistakes.
 
 First you may ask why am I calling the assembler and linker and gcc
-all separate, cant I just put it all on one gcc command line?  Sure, you
-can but you are giving up control to the compiler and that requires
-even more knowledge to get the command line right to get it to build
-the program you want it to build.  Sometimes to get the compiler to
-do what you want or of you have borrowed some code you might have
-to have gcc do the assembling or linking.  Some folks like to put
-C stuff like defines and comment symbols in their assembler code which
-works fine if you feed it through gcc, but it is not assembly code it
-is some sort of hybrid.  Doesnt stop people from doing it, and when
-you borrow that code you either have to fix the code or use the C compiler
-as an assembler.
-
+all separate, cant I just put it all on one gcc command line?  Sure,
+you can but you are giving up control to the compiler and that
+requires even more knowledge to get the command line right to get it
+to build the program you want it to build.  Sometimes to get the
+compiler to do what you want or of you have borrowed some code you
+might have to have GCC do the assembling or linking.  Some folks like
+to put C stuff like defines and comment symbols in their assembler code
+which works fine if you feed it through gcc, but it is not assembly
+code it is some sort of hybrid.  Doesnt stop people from doing it, and
+when you borrow that code you either have to fix the code or use the C
+compiler as an assembler.
 
 
 bootstrap.s
@@ -1718,7 +1777,7 @@ Disassembly of section .text:
     801c:   e12fff1e    bx  lr
 
 Now I happen to always use the -nostdlib -nostartfiles -ffreestanding
-with gcc when making bare metal.
+with GCC when making bare metal.
 
 Also note that I dont use
 
@@ -1728,10 +1787,9 @@ Also note that I dont use
 and so on.
 
 Well I dont use C libraries, I dont want those triggering the tools
-to add more junk.  Might not happen with gcc but I have seen it happen
-elsewhere.
-
-
+to add more junk.  Might not happen with GCC but I have seen it happen
+elsewhere.  Also you have to have your paths right to find those files
+(that you arent using).
 
 
 
@@ -1760,8 +1818,8 @@ Disassembly of section .text:
 
 Changing the order of the items on the linker command line has changed
 where they are placed in the final binary.  And in this case we
-are in trouble, this is not working code we dont execute the bootstrap
-code.
+are in trouble, this code wont work because the first instruction of
+the boot strap is not at address 0x8000.
 
 Now changing the linker script to have the name of the boot code in
 the script and have that line before the rest of the .text
@@ -1877,7 +1935,7 @@ Disassembly of section .text:
 10008018 <notmain>:
 10008018:   e12fff1e    bx  lr
 
-You are telling me:  I dont see the problem..
+You are telling me:  I dont see the problem.
 The reason is the linker fixed the problem.
 
 I am trying to put the tool in a position where it has assembled a
@@ -1887,7 +1945,7 @@ code near the branch link, somewhere it could reach and used that
 as what I call a trampoline.  The tools have performed the branch
 link at the right place so the return address is in the link register
 then it used location that reads a value from memory and puts that
-in the program counter meaning it branches to that address.  being a
+in the program counter meaning it branches to that address.  Being a
 branch it does not modify the link register so notmain doesnt know
 any better how the program got there it returns to the right place.
 
@@ -1913,7 +1971,7 @@ Now the problem is that the linker is unable to find a place close enough
 to the bl instruction to put a trampoline so it has to error out.  This
 is not necessarily the exact error message I was after but it will do.
 
-The arm instructions have quite a bit of a reach other instruction
+The ARM instructions have quite a bit of a reach.  Other instruction
 sets have different limitations as to how far a branch can go and
 how you place the object files on the command line can affect how
 far the branches have to go to get from one place to another and
@@ -1924,6 +1982,17 @@ At this point I hope you have more than enough of a feel for the kinds
 of things you need to know from a gnu toolchain perspective to get
 started with ARM bare metal programming on the Raspberry Pi.
 
+Also, a side effect is that I hope that you can see without actually
+buying any hardware or running any code we were able to perform many
+experiments and learn many things about the tools.  It doesnt matter
+what instruction set or computer you can often do similar things,
+certainly with the GNU tools, create simple functions compile and
+disassemble just that function, or link it with something simple
+enough to get the linker to stop complaining.
+
+
+
+
 Now I am going to move into thumb mode, which creates a number of
 other problems that can be quite difficult to find.
 
@@ -1935,23 +2004,22 @@ the thumb instructions were converted to ARM instructions before
 being executed so that there only needed to be one execution unit in
 the processor.  The thumb instructions are 16 bits wide, originally
 fixed length, thumb2 extensions to the thumb instruction set create a
-bit of a mess with 16 and 32 bit thumb instructions along with the
-32 bit ARM instructions.  The 16 bit instructions provide some cost
-and performance benefits for embedded systems.  First off you can
-pack more instructions into the same amount of memory, understanding
-that it may take more instructions to perform the same task using
-thumb instructions than it would have using ARM.  My experiements at
-the time showed about 10-15% more instructions, but half the memory
-so that was a fair tradeoff.  I know of one platform that went so far
-as to use 16 bit memory busses, which actually made thumb mode run
-much faster than ARM mode on that platform.  That platform is/was
-the Nintendo Gameboy Advance.
+bit of a mess with 16 and 32 bit thumb instructions.  The 16 bit
+instructions provide some cost and performance benefits for embedded
+systems.  First off you can pack more instructions into the same
+amount of memory, understanding that it may take more instructions to
+perform the same task using thumb instructions than it would have using
+ARM.  My experiements at the time showed about 10-15% more instructions,
+but half the memory so that was a fair tradeoff.  I know of one platform
+that went so far as to use 16 bit memory busses, which actually made
+thumb mode run much faster than ARM mode on that platform.  That
+platform is/was the Nintendo Gameboy Advance.
 
 There are very specific rules for switching modes between the two modes.
-Specifically you have to use the bx instruction.  When you use
+Specifically you have to use the bx (or blx) instruction.  When you use
 the bx instruction the least significant bit of the address in the
 register you are using determines if the mode you switching to as
-you branch is arm mode or thumb mode.  Arm mode the bit is zero,
+you branch is ARM mode or thumb mode.  ARM mode the bit is zero,
 thumb mode the bit is a 1.  This may not be obvious and the ARM
 documents are a little misleading or incorrect as to what valid
 bits you can have in that register.  Note that that lower bit
@@ -1966,24 +2034,23 @@ get the ARM Architectural Reference Manual for this platform
 ARM and thumb instructions as well as other things that describe at
 least in part what I am talking about.  For example this flavor of
 ARM boots in a normal ARM way meaning the exception table is filled
-with 32 bit ARM instructions that get executed.  address 0x00000000
+with 32 bit ARM instructions that get executed.  Address 0x00000000
 contains the instruction executed on reset, 0x00000004 some other
 exception and so on, one for interrupt one for fast interrupt one
 for data abort, one for prefetch abort, etc.  At least the traditional
-ARM exception table, in recent years both the Cortex-M which is different
-and the ARM exception table are seeing changes from the past.  Anyway,
-I bring this up because it is important to know that in this case all
-exceptions are entered in ARM mode, even if you were in thumb mode
-when you were interrupted or otherwise had an exception.  The cpsr
+ARM exception table, in recent years both the Cortex-M which is
+different and the ARM exception table are seeing changes from the past.
+Anyway, I bring this up because it is important to know that in this
+case all exceptions are entered in ARM mode, even if you were in thumb
+mode when you were interrupted or otherwise had an exception.  The cpsr
 contains a T bit which is the mode bit, when you return from the
 interrupt or exception the cpsr is restored along with your
 program counter and you return to the mode you were in.  This is the
-exception to the rule that you use bx to change modes (actually there
-is a blx instruction as well but I rarely if ever see it used).
+exception to the rule that you use bx to change modes (or blx).
 
-So the arm is going to come out of reset in arm mode and whatever
-mechanism (I can guess) that the Raspberry Pi uses to have our code
-at 0x8000 run we start running our code in full 32 bit ARM mode.
+So the arm is going to come out of reset in ARM mode and whatever
+mechanism that the Raspberry Pi uses to have our code at 0x8000 run we
+start running our code in full 32 bit ARM mode.
 
 You probably know that the C language has somewhat of a standard
 every so often that standard is re-written and if you want to make a
@@ -1992,7 +2059,7 @@ least try.  Assembly language in general does not have a standard.
 A company designs a chip, which means they create an instruction set,
 binary machine code instructions, and generally they create an
 assembly language so that they can write down and talk about those
-instructions without going insane with confusing and/or pain.  And
+instructions without going insane with confusion and/or pain.  And
 not always but often if that company actually wants to sell those
 processors they create or hire someone to create an assembler and
 a compiler or few.  Assembly language, like C language, has
@@ -2004,28 +2071,50 @@ instructions in the manual they create or have someone create to
 provide to users of this processor they want to sell and if smart
 will have the assembler match that manual.  But that manual although
 you might consider it a standard, is not, the machine code is the
-hard and fast standard, the ascii assembly language is fair game and
+hard and fast standard, the ASCII assembly language is fair game and
 anyone can create their own assembly language for that processor
 with whatever syntax and directives that they want.  ARM has a nice
 set of compiler tools, or at least when I worked at a place that paid
 for the tools for a few years and tried them they were very nice and
-conformed of course to the arm documents.  Gnu assembler, in true
-gnu assembler fashion does not like to conform to the vendors assembly
-language and generally makes some sort of a mess out of it.  fortunately
-the arm mess is nowhere near as bad as the x86 mess.  Subtle things
-like the comment symbol are the most glaring problems with gnu assembler
-for arm.  Anyway, I dont remember the syntax or directives for the
-arm tools, the arm tools have evolved anyway.  At the time I did try
+conformed of course to the ARM documents.  GNU assembler, in true
+GNU assembler fashion does not like to conform to the vendors assembly
+language and generally makes some sort of a mess out of it.  Fortunately
+the ARM mess is nowhere near as bad as the x86 mess.  Subtle things
+like the comment symbol are the most glaring problems with GNU assembler
+for ARM.  Anyway, I dont remember the syntax or directives for the
+ARM tools, the ARM tools have evolved anyway.  At the time I did try
 to write asm that would compile on both ARMs tools and gnus tools with
-minimal massaging, and you will forever see me use ;@ for comments instead
-of @ because this ; is the proper, almost universal, symbol for a comment
-in assembly languages from many vendors.  This @ is not.  combined like
-this ;@ and you get code that is commented in both worlds equally.  Enough
-with that rant, this asm code will continue to be gnu assembler specific
-I dont know if it works on any other assembler.
+minimal massaging, and you will forever see me use ;@ for comments
+instead of @ because this ; is the proper, almost universal, symbol for
+a comment in assembly languages from many vendors.  This @ is not.
+Combined like this ;@ and you get code that is commented in both worlds
+equally.  Enough with that rant, this asm code will continue to be GNU
+assembler specific I dont know if it works on any other assembler.
+
+Another side effect of thumb and in particular thumb2 is that ARM
+decided to change their syntax in subtle ways to come up with a unified
+syntax, for example to perform the addition r0 = r0 + r1
+
+Thumb:
+add r0,r1
+
+ARM
+add r0,r0,r1
+
+Early on you had to write all three registers, but for thumb part of
+the reduction is one source and the destination have to be the same
+register for many of the alu instructions.  Now even not the unified but
+certainly the unified syntax attempted to resolve this into a dumbed
+down instruction set.  Naturally the unfied cant do everythign of every
+one of the flavors (ARM, thumbv1 and v2), for the most part you basically
+get to write thumb code and have it assemble for ARM without complaints.
+The GNU assembler has also adopted the unified syntax and relaxed its
+rules on the non-unified syntax.  I have not switched over to using the
+unified syntax...yet.  Eventually I will be forced to and then at that
+time I will likely always use it...
 
 There are games you need to play with assembly language directives
-using the gnu assembler in order to get the tool to properly create
+using the GNU assembler in order to get the tool to properly create
 thumb address for use with the bx instruction so you dont have to
 be silly and add one or or one to the address before you use it.
 
@@ -2124,13 +2213,12 @@ them kind of stand out in a crowd.  The .code 32 directive tells
 the assembler to assemble the following code using 32 bit arm
 instructions or at least until I tell you otherwise.  the .thumb
 directive is me telling the assembler otherwise.  Start assembling
-using 16 bit thumb instructions.  yes the bl is actually two 16
-bit instructions, at least I can make an argument to defend that,
-I have no actual knowledge of how ARM did or does decode those, I
-just know how I would do it (and have done it in my thumb simulator).
+using 16 bit thumb instructions.  Yes the bl is actually two separate
+16 bit instructions and are documented by ARM as such, but always shown
+as a pair in disassembly.
 
-the .thumb_func is used to tell the assembler that the label
-that follows is an entry point for thumb code, when you see this
+The .thumb_func is used to tell the assembler that the label
+that follows is branch destination for thumb code, when you see this
 label set the lsbit so that I dont have to play any games to switch
 or stay in the right mode.  You can see that the thumbstart label
 is at address 0x8010, but the thumb_start add is 0x8011, the thumbstart
@@ -2145,13 +2233,13 @@ not messed up, it will properly switch back and forth.  Problem is
 the compiler doesnt always get it right.  You may see or hear
 the word interwork or thumb interwork (command line options for the
 compiler/tools) which puts extra stuff in there to hopefully have
-it all work out.  I prefer as you know to use few/now gcclib or
+it all work out.  I prefer as you know to use few/no gcclib or
 clib canned functions (which can be in the wrong mode depending on
 your tools and how lucky you are when linking) and I prefer other
-than the asm startup code to remain as thumb pure as possible to minimize
-any of these problems.  this part of the tutorial of course is
-not necessarily about staying thumb pure but showing the problems or
-at least possible problems you will no doubt see when trying to use
+than the asm startup code to remain as thumb pure as possible to
+minimize any of these problems.  This part of the tutorial of course
+is not necessarily about staying thumb pure but showing the problems
+or at least possible problems you will no doubt see when trying to use
 thumb mode.
 
 So the simple program above all worked out fine, by remembering to