raspberrypi/blinker07/README


See the top level README file for more information on documentation
and how to run these programs.

Derived from prior blinker examples and uart04, this is an example using
the system timer interrupt.

For starters perhaps the ARM should not be using these interrupts, they
are only halfway documented.  As of this writing it appears that the
gpu is using counter match 0 and 2, so cm1 and 3 are not being used.
This example uses CM1.

The documentation says that the status flag will assert when the lower
half of the system timer matches the counter match register.   It also
says that the software interrupt service routine should change the
match register.  Basically software has to keep putting the counter match
out in front of the timer.  If you want an interrupt every 1234 counts
then each interrupt you need to add 1234 to the count match register,
the hardware wont do it for you.  The manual says in one place that
you write zero and read dont care, but elsewhere says that you write
one to clear the status bits.  The write one to clear appears to be
how it works.  So when the counter status match flag is set then we
write a 1 to that bit location in that register (write a 2 to counter
status to clear counter match 1).

To figure out what interrupt line this was tied to, using some uart code
so I could print stuff out and see it I enabled all interrupts, wrote
0xFFFFFFFF to both interrupt enable registers.  Then read the current
count, added 0x00400000 and wrote CM1 with that value.  Then went into
an infinite loop printing the interrupt status registers.  By doing this
both with CM1 and CM3 I figured out that irq 1 goes with CM1 and irq3 goes
with CM3.

The last bit of information required is how do you clear the interrupt.
When writing the 1 to the counter status register to clear the match
flag, that also clears the pending interrupt in the interrupt status
register.

This example demonstrates multiple things.  First it uses generic polling
of the system timer to blink the led on and off three times.  Then it
uses the counter match register and the status flag to time four on/off
blink cycles.  The blink rate is twice the speed of the first three.
Next it enables the interrupt in the interrupt controller, not to the ARM,
not yet, just to the chips interrupt controller.  When enabled the interrupt
line reflects the counter status match flag, so basically the next three
blinks are done the same way as the prior four except it is sampling
the match status using the interrupt controller status.  These blinks
are slower than the prior four.  The last thing it does is enable the
interrupt to the ARM.  And uses the interrupt to indicate the counter
match hit.  The ARM code then computes a new counter match and waits
for the next hit.  This loop happens to make the blink rate slower each
time so you can perhaps tell you are in that loop.  Eventually the
timer interval will be so large that it goes back to a small number
and starts blinking faster then progressively slower.  This should take
a long while to happen.

You realy need to get the ARM ARM (ARM Architectural Reference Manual)
for this architecture and or get the oldest architecture on their web
site which is currently the ARMv5 ARM (it includes the ARMv4 as well,
this is the original ARM ARM before it was split into multiple documents).
In the ARM ARM it describes the exception process in more detail.

The short answer is that starting at address 0x00000000 in ARM address
space there are a number of exception vectors.  Unlike many other processors
these are not addresses for the handers these are instructions that get
executed.  Being one word in size, you probably want those to be
branch instructions or ldr pc instructions.

The way I am using the Raspberry Pi is letting the gpu load the arm
program (kernel.img) at address 0x8000.  The gpu then puts an instruction
at address zero (and some other stuff between 0x0000 and 0x8000 for linux)
the lets the ARM boot.  I am not linux so dont care about the stuff between
0x0000 and 0x8000.  I do need to change at least the memory location for
the interrupt handler so that when the interrupt occurs the ARM executes
my handler.

Using basic ARM knowledge and letting the assembler and compiler do some
of the work I create an exception table at 0x8000 in such a way that
it can be copied to 0x0000 and still work.

Looking at the beginning of vectors.s which for any of my programs to
work need to be compiled and linked such that _start is at address 0x8000,
the first thing in the .bin file.

The assembly code uses .word to allocate 32 bit memory locations which
will each hold an address to a handler.

reset_handler:      .word reset

reset_handler is the label.  .word means I want to allocate 32 bit items
and reset is the name of another label.  The assembler does some of the
work then the linker does the rest to determine what the final value
of the reset labels ARM address is.  That address is placed in the binary
in this allocated space.  Which can be seen in the disassembly:

00008020 <reset_handler>:
    8020:   00008040    andeq   r8, r0, r0, asr #32

...

00008040 <reset>:
    8040:   e3a00902    mov r0, #32768  ; 0x8000


Here is where the ARM knowledge, or at least more of it, comes in.
Although the disassembly shows that the instruction is loading the
value 0x8020 or 0x8040 or whatever.  The instruction is actually loading
a pc relative address.  You can partially tell this from the disassembly
[pc,#24] means pc value plus 24 (0x18), it doesnt mean 0x8020, etc.

00008000 <_start>:
    8000:   e59ff018    ldr pc, [pc, #24]   ; 8020 <reset_handler>
    8004:   e59ff018    ldr pc, [pc, #24]   ; 8024 <undefined_handler>
    8008:   e59ff018    ldr pc, [pc, #24]   ; 8028 <swi_handler>

More ARM knowledge.  From a programmers perspective the PC is two
instructions ahead.  You are in arm mode  when you hit these exceptions
so the PC is 8 bytes ahead so at address 0x8000 the PC is 0x8008 when
you execute that instruction add 24 (0x18) to the PC, 0x8008+0x18 = 0x8020
and you get the address 0x8020.  the instructin is now ldr pc,[0x8020]
Memory location 0x8020 holds the value 0x8040 which is what is loaded into
the program counter and we begin executing at 0x8040 which is the reset
handler, that is what we wanted.

here is the tricky bit.  What if we copied both the reset handler stuff
and the list of addresses, all of it, to address 0x0000?  (at runtime
after all the compiling and linking were long over and we are running).

Instead of the addresses being this:

    8018:   e59ff018    ldr pc, [pc, #24]   ; 8038 <irq_handler>
  ...
00008038 <irq_handler>:
    8038:   000080c4    andeq   r8, r0, r4, asr #1

The copy of the data/instructions would now have these addresses:

    0018:   e59ff018    ldr pc, [pc, #24]
  ...
00000038 <irq_handler>:
    0038:   000080c4    andeq   r8, r0, r4, asr #1

When the interrupt occurs the ARM runs the instruction at address 0x0018
which says to take the value of the PC (two ahead remember so the pc is)
0x20 add 24 (which is 0x18) giving 0x0038 as the address to read from.
It reads 0x80C4 and loads that into the program counter so that the
next instruction executed is the one at 0x80C4.  Which is where our
interrupt handler really is.

Basically this is some position independent code with some absolute
addresses for the handlers, the address stuff is done by the assembler
and linker so we dont have to.


These instructions right after reset perform the copy of instructions and
data from where our program was loaded and started (0x8000) to where we
need the exception table (0x0000).

    mov r0,#0x8000
    mov r1,#0x0000
    ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9}
    stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9}
    ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9}
    stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9}

ldmia means load multiple.  the IA means increment after so what it does
is using the value in r0 as an address (when executing the first of the
two ldmia instructions r0 is 0x8000) so it loads 8 words starting at
0x8000 into register r2 through r9.  Then if there is an exclamation point
after the register (which there is) then it modifies that register to
point to the next word after the last one we loaded so we read 8 words
or 32 bytes at address 0x8000 so the last thing it does (increment after
the load) is add 0x20 and save so r0 is now 0x8020.

stmia is like the load but a store, r1 starts off as 0x0000 so it stores
those 8 words from 0x8000 to 0x0000, then it address 0x20 to r1.

so the second ldmia is going to read 8 more words from 0x8020 and the
second stmia is going to write those words to 0x0020.

The second ldm and stm do not have to have the exclamation point as we dont
care about r0 and r1, which means they dont need the ia.  The ia part
of the instruction is an either or thing either you decrement before you
use the address or you increment after, one bit in the instruction encoding
the exclamation point is a separate bit in the instruction that enables
or disables the saving of that value to the base register.  So if you
were to do this:

    ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9}
    stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9}
    ldm r0,{r2,r3,r4,r5,r6,r7,r8,r9}
    stm r1,{r2,r3,r4,r5,r6,r7,r8,r9}

the assembler is likely going to pick ldmia or ldmdb and when you
then disassemble it might look like this:

    ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9}
    stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9}
    ldmdb r0,{r2,r3,r4,r5,r6,r7,r8,r9}
    stmdb r1,{r2,r3,r4,r5,r6,r7,r8,r9}

it was easy to cut and paste the two lines as is, and if I wanted to
cut and paste more sets to copy more data it is easy.  So I left that
extra info on those latter instructions even though I am not using them.

So what those first 6 instructions did was to basicaly copy 0x40 bytes
from 0x8000 to 0x0000.  Since these are very early in the boot we are
not using register r2 to r9 so that made it easy to use them as scratch
registers.  If we had waited to copy the 0x40 bytes until later a loop
or some other way of copying that data likely would have happened since
many of those registers my be used by other code.

Note that the Cortex-M processors from arm which only execute in thumb
mode, cannot execute ARM mode instructions boot differently, have different
exception tables.  The Cortex-M processors have addresses not instructions
in the table and each flavor of Cortex-M or worse implementation has
different definitions for each of those entries.  The first few are
the same then it diverges and they can have hundreds of entries in the
vector table.  The classic ARM table though has not varied for many
flavors of ARM cores and good or bad all interrupts funnel into the
same handler.  (or handlers if you count the fiq).

The classic ARM design also has separate stack pointers for each mode.
Interrupt is a mode, when you get an interrupt you switch from whatever
mode you were in (service/super user) to interrupt mode, which means
you are using a different stack pointer.  this is all described in words
and pictures in the ARM ARM.  This means that if we are going to support
interrupts not only do we need to set our application stack pointer but
also need to set aside some memory for the interrupt stack and point
the interrupt stack pointer to it.  how do you change the interrupt stack
pointer if you are not in interrupt mode?  well you have to be in interrupt
mode.  How do you get into interrupt mode?  Well you modify the cpsr
which contains the mode bits and that magically changes you to that mode.
You can do this from any mode to any mode except from user mode, you cant
get out of user mode by changing the bits.  We are not in user mode on
boot and never switch to it in any of my examples do we dont have to
worry about getting out of it (normally you use an svc/swi instruction and
have a software interrupt handler that does things that are protected
from user mode).  So the next bit of code after copying the exception
handler stuff switches into irq mode and fiq mode and sets their
stack pointers (fiq in case you want to experience that mode, mostly
the same as irq, you have another bank of registers so you dont
have to preserve the system registers, making the handler a little faster
as in fast irq (fiq), I do not demonstrate fiq here).


    ;@ (PSR_IRQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
    mov r0,#0xD2
    msr cpsr_c,r0
    mov sp,#0x8000

    ;@ (PSR_FIQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
    mov r0,#0xD1
    msr cpsr_c,r0
    mov sp,#0x4000

    ;@ (PSR_SVC_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
    mov r0,#0xD3
    msr cpsr_c,r0
    mov sp,#0x8000000

    ;@ SVC MODE, IRQ ENABLED, FIQ DIS
    ;@mov r0,#0x53
    ;@msr cpsr_c, r0


the cpsr is also where you enable or disable the arm interrupt and fast
interrupt.  we want to start off with interrupts disabled so when
switching back to SVC mode we also make sure that they are disabled.

So the irq stack starts at 0x8000 (first location is 0x7FFC) and the fiq
stack is at 0x4000 (0x3FFC).  If you have re-compiled this program or
modified your config.txt to have the gpu load you say at address 0x0000
then these stacks may collide with your program and you need to move
them. Likewise I have put the SVC stack at 0x80000000 (0x7FFFFFFC) and
if you are using that memory you need to move that as well.  Bare metal
memory management is part of bare metal programming.  YOU decide where
things are and either hard code them in your code or linker script or
indirectly through the linker script.

The last thing I am going to say about the interrupt handler is that I
made it pretty stupid and mostly in C.

irq:
    push {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,lr}
    bl c_irq_handler
    pop  {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,lr}
    subs pc,lr,#4

When you read the ARM ARM you will see that the proper way to return
from an interrupt is using a subs pc,lr,#4.  Since you interrupted
application code which was likely using some most of the registers you
need to preserve those registers, in particular the link register, lr.
So what my assembly wrapper does is preserve all the registers call a
C function, upon return from that C function restore the registers then
return from interrupt.  Just like any other textbook interrupt handler.

You need to remember and understand that C code in an interrupt handler
needs to be lean and mean, get in get out dont mess around.  This example
simply modifies a global variable (has to be declared as volatile to
be shared properly between the handler and the rest of the app code) and
the application detects that to know the interrupt happend.  that adds
a latency but it is okay since our eyes will not see that difference
in the led blinks.