adding float03
This commit is contained in:
366
float03/README
Normal file
366
float03/README
Normal file
@@ -0,0 +1,366 @@
|
||||
|
||||
See the README one level up about where to find the reference manual
|
||||
for the stm32f4 and schematics for the board.
|
||||
|
||||
This is an experiment due to a post on the bare metal forum at Raspberry
|
||||
Pi.
|
||||
|
||||
VFP issues with denormal numbers posted by mstorsjo
|
||||
|
||||
I need to try this against another processor, but have not yet. What
|
||||
is going on is when an operation on a denormal or subnormal number
|
||||
(a number so small that it cannot be represented, think of
|
||||
0.0000..something with so many zeros there is not enough negative
|
||||
exponent) the operating system is called to handle the exception.
|
||||
Upon return the next operation does not complete properly.
|
||||
|
||||
In IEEE-754 single precision float (look it up on wikipedia) an exponent
|
||||
of zeros is a special number it either indicates the number zero or a
|
||||
denormal number.
|
||||
|
||||
hexstring(m4add(0x3F800000,0x00000000,0x3F801230,0x3F801110));
|
||||
hexstring(m4add(0x00012345,0x00000111,0x3F801230,0x3F801110));
|
||||
hexstring(m4add(0x00000111,0x00012345,0x3F801230,0x3F801110));
|
||||
|
||||
0x3F800000 is a normal number 1.0 or something along those lines
|
||||
0x00000000 is a normal number with a special exponent it is the value zero.
|
||||
The other numbers that start with 0x3F8 are also just fine they are a
|
||||
smidge larger than 0x3F800000.
|
||||
|
||||
This line has no denormals
|
||||
hexstring(m4add(0x3F800000,0x00000000,0x3F801230,0x3F801110));
|
||||
|
||||
.globl m4add
|
||||
m4add:
|
||||
vmov s0,r0
|
||||
vmov s1,r1
|
||||
vmov s2,r2
|
||||
vmov s3,r3
|
||||
vadd.f32 s4,s0,s1
|
||||
vadd.f32 s5,s2,s3
|
||||
vmov r0,s5
|
||||
bx lr
|
||||
|
||||
the assembly floating point code adds the first two numbers then
|
||||
adds the second two numbers and returns the result for the second number
|
||||
|
||||
The output of this program is
|
||||
|
||||
12345678
|
||||
400011A0
|
||||
00012345
|
||||
00000111
|
||||
12345678
|
||||
|
||||
so 0x400011A0 is supposed to be the result of the floating point numbers
|
||||
0x3F801230 and 0x3F801110 added together, you were expecting 0x3F802340
|
||||
perhaps, well it doesnt quite work that way, but if you take that 0x11A0
|
||||
portion of the result and look at it as binary
|
||||
|
||||
11A0 =
|
||||
00010001101000000
|
||||
then break that into hex values along different boundaries
|
||||
0 0010 0011 0100 0000
|
||||
there is the 0x234
|
||||
|
||||
these two lines do have denormals, both of the first two numbers
|
||||
in the first add are denormals.
|
||||
hexstring(m4add(0x00012345,0x00000111,0x3F801230,0x3F801110));
|
||||
hexstring(m4add(0x00000111,0x00012345,0x3F801230,0x3F801110));
|
||||
|
||||
Bear with me for a second on this tangent. In my blinker07 example I
|
||||
showed one way to move/create an exception table at address 0x00000000
|
||||
in arm address space. My program expects to be loaded at 0x8000 as a
|
||||
normal raspberry pi linux kernel.img is. This example shows another
|
||||
solution. if you look at the encoding of the branch and branch link
|
||||
instruction (same encoding, one bit distinguishes link or not) you will
|
||||
see 0xEAxxxxxx where xxxxxx is a value that indicates the offset.
|
||||
|
||||
805c: eb000094 bl 82b4 <notmain>
|
||||
|
||||
0x82b4 - 0x805C = 0x258
|
||||
0x258 / 4 = 0x96
|
||||
0x96 - 2 = 0x94
|
||||
|
||||
The short answer is I am placing a bunch of branch plus 0x8000 instructions
|
||||
at the first locations in memory so 0x0000 will branch to 0x8000, 0x0004
|
||||
will branch to 0x8004 and so on. Then I have a more proper table at 0x8000
|
||||
|
||||
.globl _start
|
||||
_start:
|
||||
b reset
|
||||
b undef
|
||||
bl other
|
||||
bl other
|
||||
bl other
|
||||
bl other
|
||||
bl other
|
||||
bl other
|
||||
bl other
|
||||
bl other
|
||||
bl other
|
||||
bl other
|
||||
bl other
|
||||
bl other
|
||||
bl other
|
||||
bl other
|
||||
|
||||
Yes there are too many items there, doesnt hurt. I used the other
|
||||
routine (I didnt have the b undef in there all of the lines other than
|
||||
reset were bl other) to figure out which exception was being hit.
|
||||
Once I figured out it was the undefined instruction.
|
||||
|
||||
I simply had the undef handler return as if the instruction had been
|
||||
handled
|
||||
|
||||
undef:
|
||||
movs pc,lr
|
||||
|
||||
You begin to see what mstorsjo found. the instruciton following the
|
||||
bad operation is messed up.
|
||||
|
||||
through more experiments I found more interesting information. In
|
||||
an old ARM ARM at least, the undefined is supposed to fill in the link
|
||||
register with the address of the instruction after. to verify that
|
||||
|
||||
based on the disassembly
|
||||
|
||||
000080b4 <m4add>:
|
||||
80b4: ee010a10 vmov s2, r0
|
||||
|
||||
Before calling m4add I put this line in notmain()
|
||||
|
||||
PUT32(0x80B4,0xFFFFFFFF);
|
||||
|
||||
You indeed get the bad instruction 0xFFFFFFFF.
|
||||
|
||||
so letting the program run (remove the 0xFFFFFFFF thing).
|
||||
|
||||
80bc: ee012a10 vmov s2, r2
|
||||
80c0: ee013a90 vmov s3, r3
|
||||
80c4: ee302a20 vadd.f32 s4, s0, s1
|
||||
80c8: ee712a21 vadd.f32 s5, s2, s3
|
||||
80cc: ee120a90 vmov r0, s5
|
||||
80d0: e12fff1e bx lr
|
||||
|
||||
|
||||
00000BAD
|
||||
EE712A21
|
||||
|
||||
which is the second add...strange. rearrange the movs so that the
|
||||
second add gets the denormals
|
||||
|
||||
vmov s2,r0
|
||||
vmov s3,r1
|
||||
vmov s0,r2
|
||||
vmov s1,r3
|
||||
|
||||
00000BAD
|
||||
EE120A90
|
||||
|
||||
80c4: ee302a20 vadd.f32 s4, s0, s1
|
||||
80c8: ee712a21 vadd.f32 s5, s2, s3
|
||||
80cc: ee120a90 vmov r0, s5
|
||||
|
||||
which is the floating point instruction after the bad one. interesting.
|
||||
Lets add some nops
|
||||
|
||||
nop
|
||||
nop
|
||||
vadd.f32 s4,s0,s1
|
||||
nop
|
||||
nop
|
||||
vadd.f32 s5,s2,s3
|
||||
nop
|
||||
nop
|
||||
vmov r0,s5
|
||||
nop
|
||||
nop
|
||||
|
||||
00000BAD
|
||||
EE120A90
|
||||
|
||||
no change, interesting, lets put back the vmovs
|
||||
|
||||
00000BAD
|
||||
EE712A21
|
||||
|
||||
So what it appears to be doing is giving you the address to the floating
|
||||
point instruction after the problem.
|
||||
|
||||
One more experiment
|
||||
|
||||
|
||||
.globl m4add
|
||||
m4add:
|
||||
vmov s0,r0
|
||||
vmov s1,r1
|
||||
vmov s2,r2
|
||||
vmov s3,r3
|
||||
vadd.f32 s4,s0,s1
|
||||
b skipper
|
||||
vmov r0,s5
|
||||
bx lr
|
||||
|
||||
skipper:
|
||||
vadd.f32 s5,s2,s3
|
||||
vmov r0,s5
|
||||
bx lr
|
||||
|
||||
00000BAD
|
||||
EE712A21
|
||||
|
||||
80c4: ee302a20 vadd.f32 s4, s0, s1
|
||||
80c8: ea000001 b 80d4 <skipper>
|
||||
80cc: ee120a90 vmov r0, s5
|
||||
80d0: e12fff1e bx lr
|
||||
|
||||
000080d4 <skipper>:
|
||||
80d4: ee712a21 vadd.f32 s5, s2, s3
|
||||
80d8: ee120a90 vmov r0, s5
|
||||
80dc: e12fff1e bx lr
|
||||
|
||||
good luck writing a handler for that if you work your way backward
|
||||
from 0x80d4 the prior float instruction is vmov. Basically good luck
|
||||
figuring out the prior float instruction based on the link register given.
|
||||
|
||||
|
||||
So putting it all back together
|
||||
|
||||
|
||||
hexstring(0x12345678);
|
||||
hexstring(m4add(0x3F800000,0x00000000,0x3F801230,0x3F801110));
|
||||
hexstring(m4add(0x00012345,0x00000111,0x3F801230,0x3F801110));
|
||||
hexstring(m4add(0x00000111,0x00012345,0x3F801230,0x3F801110));
|
||||
hexstring(m4add(0x3F801230,0x3F801110,0x00000111,0x00012345));
|
||||
hexstring(m4add(0x3F800000,0x00000000,0x3F801230,0x3F801110));
|
||||
hexstring(0x22222222);
|
||||
hexstring(m4add2(0x3F801230,0x3F801110,0x3F800000,0x00000000));
|
||||
hexstring(m4add2(0x00000111,0x00012345,0x3F801230,0x3F801110));
|
||||
hexstring(m4add2(0x3F801230,0x3F801110,0x00000111,0x00012345));
|
||||
hexstring(0x12345678);
|
||||
|
||||
12345678
|
||||
400011A0
|
||||
00012345
|
||||
00000111
|
||||
3F801230
|
||||
3F800000
|
||||
22222222
|
||||
3F801230
|
||||
00000111
|
||||
3F801230
|
||||
12345678
|
||||
|
||||
From the ARM1176jzfs TRM
|
||||
|
||||
The VFP11 coprocessor handles exceptions, other than inexact exceptions,
|
||||
imprecisely with respect to both the state of the ARM11 processor and
|
||||
the state of the VFP11 coprocessor. It detects an exceptional instruction
|
||||
after the instruction passes the point for exception handling in the
|
||||
ARM11 processor. It then enters the exceptional state and signals the
|
||||
presence of an exception by refusing to accept a subsequent VFP
|
||||
instruction. The instruction that triggers exception handling bounces
|
||||
to the ARM11 processor. The bounced instruction is not necessarily the
|
||||
instruction immediately following the exceptional instruction. Depending
|
||||
on sequence of instructions that follow, the bounce can occur several
|
||||
instructions later.
|
||||
|
||||
|
||||
So this means that the instruction after is not unexpected.
|
||||
|
||||
The exception bit in the FPEXC register is set, until it is cleared
|
||||
the fpu appears not to work. So I modified the undefined handler
|
||||
to restore the FPEXC to have the SIMD/FPU enabled and the exception
|
||||
bit cleared.
|
||||
|
||||
undef:
|
||||
push {r0}
|
||||
mov r0,#0x40000000
|
||||
fmxr fpexc,r0
|
||||
pop {r0}
|
||||
subs pc,lr,#4
|
||||
|
||||
In order to properly preserve the r0 register I went ahead and setup
|
||||
the undefined stack pointer up front.
|
||||
|
||||
;@ (PSR_UND_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
|
||||
mov r0,#0xDB
|
||||
msr cpsr_c,r0
|
||||
mov sp,#0x00100000
|
||||
|
||||
;@ (PSR_SVC_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
|
||||
mov r0,#0xD3
|
||||
msr cpsr_c,r0
|
||||
|
||||
So now it works better
|
||||
|
||||
hexstring(0x12345678);
|
||||
hexstring(m4add(0x3F800000,0x00000000,0x3F801230,0x3F801110));
|
||||
hexstring(m4add(0x00012345,0x00000111,0x3F801230,0x3F802220));
|
||||
hexstring(m4add(0x00000111,0x00012345,0x3F801230,0x3F803330));
|
||||
hexstring(m4add(0x3F801230,0x3F801110,0x00000111,0x00012345));
|
||||
hexstring(m4add(0x3F800000,0x00000000,0x3F801230,0x3F804440));
|
||||
hexstring(0x22222222);
|
||||
hexstring(m4add2(0x3F801230,0x3F805550,0x3F800000,0x00000000));
|
||||
hexstring(m4add2(0x00000111,0x00012345,0x3F801230,0x3F806660));
|
||||
hexstring(m4add2(0x3F801230,0x3F807770,0x00000111,0x00012345));
|
||||
hexstring(0x12345678);
|
||||
|
||||
|
||||
12345678
|
||||
400011A0
|
||||
40001A28
|
||||
400022B0
|
||||
400022B0 <-- stale result
|
||||
40002B38
|
||||
22222222
|
||||
400033C0
|
||||
400033C0 <-- stale result
|
||||
400044D0
|
||||
12345678
|
||||
|
||||
So the non-denormal operations worked. The denormal operations dont
|
||||
actually execute so the floating point register is not changed, to
|
||||
demonstrate this.
|
||||
|
||||
.globl m4vmov
|
||||
m4vmov:
|
||||
vmov s4,r0
|
||||
vmov s5,r0
|
||||
bx lr
|
||||
|
||||
hexstring(0x12345678);
|
||||
hexstring(m4add(0x3F800000,0x00000000,0x3F801230,0x3F801110));
|
||||
hexstring(m4add(0x00012345,0x00000111,0x3F801230,0x3F802220));
|
||||
m4vmov(0xABCDABCD);
|
||||
hexstring(m4add(0x00000111,0x00012345,0x3F801230,0x3F803330));
|
||||
m4vmov(0xABCDABCD);
|
||||
hexstring(m4add(0x3F801230,0x3F801110,0x00000111,0x00012345));
|
||||
m4vmov(0xABCDABCD);
|
||||
hexstring(m4add(0x3F800000,0x00000000,0x3F801230,0x3F804440));
|
||||
hexstring(0x22222222);
|
||||
hexstring(m4add2(0x3F801230,0x3F805550,0x3F800000,0x00000000));
|
||||
m4vmov(0xABCDABCD);
|
||||
hexstring(m4add2(0x00000111,0x00012345,0x3F801230,0x3F806660));
|
||||
m4vmov(0xABCDABCD);
|
||||
hexstring(m4add2(0x3F801230,0x3F807770,0x00000111,0x00012345));
|
||||
hexstring(0x12345678);
|
||||
|
||||
|
||||
12345678
|
||||
400011A0
|
||||
40001A28
|
||||
400022B0
|
||||
ABCDABCD <-- stale
|
||||
40002B38
|
||||
22222222
|
||||
400033C0
|
||||
ABCDABCD <-- stale
|
||||
400044D0
|
||||
12345678
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user