adding asmdelay experiment, one of many reasons why benchmarks cant be trusted.
This commit is contained in:
35
boards/pizero/asmdelay/Makefile
Normal file
35
boards/pizero/asmdelay/Makefile
Normal file
@@ -0,0 +1,35 @@
|
||||
|
||||
ARMGNU ?= arm-none-eabi
|
||||
#ARMGNU ?= arm-linux-gnueabi
|
||||
|
||||
COPS = -Wall -O2 -nostdlib -nostartfiles -ffreestanding
|
||||
|
||||
all : asmdelay.bin
|
||||
|
||||
clean :
|
||||
rm -f *.o
|
||||
rm -f *.bin
|
||||
rm -f *.hex
|
||||
rm -f *.elf
|
||||
rm -f *.list
|
||||
rm -f *.img
|
||||
rm -f *.bc
|
||||
rm -f *.clang.s
|
||||
|
||||
start.o : start.s
|
||||
$(ARMGNU)-as start.s -o start.o
|
||||
|
||||
asmdelay.o : asmdelay.c
|
||||
$(ARMGNU)-gcc $(COPS) -c asmdelay.c -o asmdelay.o
|
||||
|
||||
periph.o : periph.c
|
||||
$(ARMGNU)-gcc $(COPS) -c periph.c -o periph.o
|
||||
|
||||
asmdelay.bin : memmap start.o periph.o asmdelay.o
|
||||
$(ARMGNU)-ld start.o periph.o asmdelay.o -T memmap -o asmdelay.elf
|
||||
$(ARMGNU)-objdump -D asmdelay.elf > asmdelay.list
|
||||
$(ARMGNU)-objcopy asmdelay.elf -O ihex asmdelay.hex
|
||||
$(ARMGNU)-objcopy asmdelay.elf -O binary asmdelay.bin
|
||||
|
||||
|
||||
|
||||
249
boards/pizero/asmdelay/README
Normal file
249
boards/pizero/asmdelay/README
Normal file
@@ -0,0 +1,249 @@
|
||||
|
||||
See the top level README file for more information on documentation
|
||||
and how to run these programs.
|
||||
|
||||
Demonstrating the performance differences of a two instruction loop.
|
||||
Same machine code, but where you put it with and without cache
|
||||
and branch prediction, makes a vast difference in performance.
|
||||
|
||||
.globl ASMDELAY
|
||||
ASMDELAY:
|
||||
subs r0,r0,#1
|
||||
bne ASMDELAY
|
||||
bx lr
|
||||
|
||||
The two instructions in the loop are the subs and bne, so this is not
|
||||
even differences in compilers or options. Same two instructions
|
||||
131 thousand times in a loop. Should I explain this or my theories on
|
||||
this or not?
|
||||
|
||||
Here is the punch line
|
||||
|
||||
min max difference
|
||||
00016DDE 003E025D 003C947F
|
||||
|
||||
Yes! The minimum is 0.71 clocks per loop on average, less than one
|
||||
clock per instruction! How is that possible?
|
||||
|
||||
And the worst case I could get was 43 times slower! How could those
|
||||
two instructions on the same chip/board execute at such vastly
|
||||
different speeds? Do you really want to know just how bogus benchmarks
|
||||
really are? This is only a small taste, apply these simple things
|
||||
to any benchmark, add to that compiler differences same source code.
|
||||
Many folks dont realize that the same source code can execute several
|
||||
times faster or slower by simply changing compiler options, likewise
|
||||
two different compilers or versions of the same (or in the case of
|
||||
source distributions like gcc or llvm just building the compiler can
|
||||
change how it outputs without the different compiler builds having
|
||||
different command line options) can/will/do produce different results.
|
||||
|
||||
Simple alignment tricks like adding or removing a single instruction
|
||||
in the right place can/will move the whole binary up or down in memory
|
||||
changing where it falls in what I call fetch lines or cache lines (two
|
||||
separate but similar terms).
|
||||
|
||||
I have performed this stunt many times many ways, and there are
|
||||
things that can be done to further widen the performance gap, adding
|
||||
some magic number of nops between the subs and bne should help with
|
||||
branch prediction saving time and on the worse side cost more fetches
|
||||
per loop. Not going to do that today, these two instructions are enough.
|
||||
|
||||
This time around using self modifying code, traditionally I would
|
||||
re-assemble with more or fewer nops out front of the loop under test
|
||||
to adjust its alignment.
|
||||
|
||||
Using the disassembly of the loop in start.s
|
||||
|
||||
0000802c <ASMDELAY>:
|
||||
802c: e2500001 subs r0, r0, #1
|
||||
8030: 1afffffd bne 802c <ASMDELAY>
|
||||
8034: e12fff1e bx lr
|
||||
|
||||
We can see the raw instructions, the conditional branch is pc relative
|
||||
not absolute, basically position independent so can be used as is.
|
||||
|
||||
PUT32(ra+0x00,0xe2500001);
|
||||
PUT32(ra+0x04,0x1afffffd);
|
||||
PUT32(ra+0x08,0xe12fff1e);
|
||||
|
||||
I learned something new on this one, another ARM was doing fine the
|
||||
raspberry pi (zero) was hanging with branch prediction enabled. I
|
||||
didnt know there was a prefetch flush you needed to do. I went way
|
||||
overboard and used flushes and dmbs and dsbs liberally, needed or not.
|
||||
Prefetch flush made it so that the pi worked.
|
||||
|
||||
should I dive into this or not? hmm...
|
||||
|
||||
12345678 12345678 12345678 12345678 12345678
|
||||
0019F158
|
||||
0019F149
|
||||
0019F0FE
|
||||
0019F142
|
||||
0019F1C6
|
||||
00045C3F
|
||||
00045C28
|
||||
00045C27
|
||||
00045C28
|
||||
0000004A
|
||||
00000031
|
||||
00000031
|
||||
00000031
|
||||
00000041
|
||||
00000031
|
||||
00000031
|
||||
00000031
|
||||
C0000000 C0000000 C0000000 C0000000
|
||||
00050078
|
||||
00050078
|
||||
C0006000 002200D2 002200D2 002200D2 00000000
|
||||
C0006000 002200A6 002200A6 002200D2 0000002C
|
||||
C0006000 00220145 002200A6 00220145 0000009F
|
||||
C0006008 00220173 002200A6 00220173 000000CD
|
||||
C0006010 00280096 002200A6 00280096 0005FFF0
|
||||
C0006010 00280104 002200A6 00280104 0006005E
|
||||
C000601C 003E015C 002200A6 003E015C 001C00B6
|
||||
C000601C 003E01AA 002200A6 003E01AA 001C0104
|
||||
C000602C 0022009D 0022009D 003E01AA 001C010D
|
||||
C000603C 003E01BC 0022009D 003E01BC 001C011F
|
||||
C000603C 003E0211 0022009D 003E0211 001C0174
|
||||
C0006060 0022005E 0022005E 003E0211 001C01B3
|
||||
C00060FC 003E024D 0022005E 003E024D 001C01EF
|
||||
00050078
|
||||
00050878
|
||||
C0006000 001E0119 001E0119 001E0119 00000000
|
||||
C0006000 001E00FB 001E00FB 001E0119 0000001E
|
||||
C0006000 001E00C0 001E00C0 001E0119 00000059
|
||||
C0006004 00200101 001E00C0 00200101 00020041
|
||||
C0006008 001E00AD 001E00AD 00200101 00020054
|
||||
C000600C 0020015F 001E00AD 0020015F 000200B2
|
||||
C0006010 001E00A0 001E00A0 0020015F 000200BF
|
||||
C0006014 00200177 001E00A0 00200177 000200D7
|
||||
C000601C 003C010A 001E00A0 003C010A 001E006A
|
||||
C000601C 003C01C0 001E00A0 003C01C0 001E0120
|
||||
C0006028 001E008D 001E008D 003C01C0 001E0133
|
||||
C000603C 003C01EC 001E008D 003C01EC 001E015F
|
||||
C0006040 001E0065 001E0065 003C01EC 001E0187
|
||||
C000605C 003C0252 001E0065 003C0252 001E01ED
|
||||
C000609C 003C0258 001E0065 003C0258 001E01F3
|
||||
C00060B0 001E0064 001E0064 003C0258 001E01F4
|
||||
00050878
|
||||
00050078
|
||||
C0006000 0005B72B 0005B72B 0005B72B 00000000
|
||||
C0006000 0005B6F1 0005B6F1 0005B72B 0000003A
|
||||
C000601C 0005B731 0005B6F1 0005B731 00000040
|
||||
C0006058 0005B732 0005B6F1 0005B732 00000041
|
||||
C0006078 0005B73B 0005B6F1 0005B73B 0000004A
|
||||
00051078
|
||||
00051878
|
||||
C0006000 00016E12 00016E12 00016E12 00000000
|
||||
C0006000 00016DDE 00016DDE 00016E12 00000034
|
||||
C0006004 000224E4 00016DDE 000224E4 0000B706
|
||||
C000601C 000224F0 00016DDE 000224F0 0000B712
|
||||
00051878
|
||||
00051078
|
||||
80000000 80000000 80000000 80000000
|
||||
00050078
|
||||
00050078
|
||||
80006000 002200E1 002200E1 002200E1 00000000
|
||||
80006000 002200C5 002200C5 002200E1 0000001C
|
||||
80006000 002200B8 002200B8 002200E1 00000029
|
||||
80006000 002200E7 002200B8 002200E7 0000002F
|
||||
80006004 002200E9 002200B8 002200E9 00000031
|
||||
80006004 002200AE 002200AE 002200E9 0000003B
|
||||
80006004 0022018A 002200AE 0022018A 000000DC
|
||||
80006008 00220075 00220075 0022018A 00000115
|
||||
8000600C 0022005F 0022005F 0022018A 0000012B
|
||||
80006010 00280105 0022005F 00280105 000600A6
|
||||
8000601C 003E0168 0022005F 003E0168 001C0109
|
||||
8000601C 003E01B7 0022005F 003E01B7 001C0158
|
||||
8000603C 003E024B 0022005F 003E024B 001C01EC
|
||||
800060FC 003E025A 0022005F 003E025A 001C01FB
|
||||
00050078
|
||||
00050878
|
||||
80006000 001E00B2 001E00B2 001E00B2 00000000
|
||||
80006000 001E00CD 001E00B2 001E00CD 0000001B
|
||||
80006000 001E0158 001E00B2 001E0158 000000A6
|
||||
80006004 00200102 001E00B2 00200102 00020050
|
||||
80006004 0020010F 001E00B2 0020010F 0002005D
|
||||
80006004 002001FC 001E00B2 002001FC 0002014A
|
||||
80006008 001E006F 001E006F 002001FC 0002018D
|
||||
80006008 001E005C 001E005C 002001FC 000201A0
|
||||
8000601C 003C0161 001E005C 003C0161 001E0105
|
||||
8000601C 003C0267 001E005C 003C0267 001E020B
|
||||
8000603C 003C026C 001E005C 003C026C 001E0210
|
||||
80006048 001E005B 001E005B 003C026C 001E0211
|
||||
00050878
|
||||
00050078
|
||||
80006000 0005B711 0005B711 0005B711 00000000
|
||||
80006000 0005B6F3 0005B6F3 0005B711 0000001E
|
||||
80006004 0005B721 0005B6F3 0005B721 0000002E
|
||||
80006018 0005B732 0005B6F3 0005B732 0000003F
|
||||
80006018 0005B6F1 0005B6F1 0005B732 00000041
|
||||
80006058 0005B733 0005B6F1 0005B733 00000042
|
||||
00051078
|
||||
00051878
|
||||
80006000 00016E0A 00016E0A 00016E0A 00000000
|
||||
80006000 00016DDF 00016DDF 00016E0A 0000002B
|
||||
80006000 00016DDE 00016DDE 00016E0A 0000002C
|
||||
80006004 000224E4 00016DDE 000224E4 0000B706
|
||||
8000601C 000224F0 00016DDE 000224F0 0000B712
|
||||
00051878
|
||||
00051078
|
||||
40000000 40000000 40000000 40000000
|
||||
00050078
|
||||
00050078
|
||||
40006000 002200C8 002200C8 002200C8 00000000
|
||||
40006000 00220118 002200C8 00220118 00000050
|
||||
40006004 002200BB 002200BB 00220118 0000005D
|
||||
40006004 00220190 002200BB 00220190 000000D5
|
||||
40006008 002200A2 002200A2 00220190 000000EE
|
||||
4000600C 00220073 00220073 00220190 0000011D
|
||||
40006010 0028009C 00220073 0028009C 00060029
|
||||
40006010 002800AF 00220073 002800AF 0006003C
|
||||
40006010 002800BC 00220073 002800BC 00060049
|
||||
40006014 002800DD 00220073 002800DD 0006006A
|
||||
4000601C 003E014D 00220073 003E014D 001C00DA
|
||||
4000601C 003E015F 00220073 003E015F 001C00EC
|
||||
4000601C 003E0175 00220073 003E0175 001C0102
|
||||
4000601C 003E0255 00220073 003E0255 001C01E2
|
||||
4000603C 003E025D 00220073 003E025D 001C01EA
|
||||
400060AC 0022005F 0022005F 003E025D 001C01FE
|
||||
00050078
|
||||
00050878
|
||||
40006000 001E010C 001E010C 001E010C 00000000
|
||||
40006000 001E0109 001E0109 001E010C 00000003
|
||||
40006000 001E00DD 001E00DD 001E010C 0000002F
|
||||
40006004 002000D4 001E00DD 002000D4 0001FFF7
|
||||
40006004 00200103 001E00DD 00200103 00020026
|
||||
40006004 00200196 001E00DD 00200196 000200B9
|
||||
40006008 001E00AD 001E00AD 00200196 000200E9
|
||||
40006010 001E007C 001E007C 00200196 0002011A
|
||||
4000601C 003C025F 001E007C 003C025F 001E01E3
|
||||
40006020 001E0073 001E0073 003C025F 001E01EC
|
||||
40006020 001E006F 001E006F 003C025F 001E01F0
|
||||
4000603C 003C0267 001E006F 003C0267 001E01F8
|
||||
40006040 001E0069 001E0069 003C0267 001E01FE
|
||||
400060B0 001E0066 001E0066 003C0267 001E0201
|
||||
400060D0 001E0057 001E0057 003C0267 001E0210
|
||||
00050878
|
||||
00050078
|
||||
40006000 0005B712 0005B712 0005B712 00000000
|
||||
40006000 0005B6F3 0005B6F3 0005B712 0000001F
|
||||
40006000 0005B6F1 0005B6F1 0005B712 00000021
|
||||
40006008 0005B716 0005B6F1 0005B716 00000025
|
||||
4000600C 0005B71E 0005B6F1 0005B71E 0000002D
|
||||
40006018 0005B729 0005B6F1 0005B729 00000038
|
||||
4000601C 0005B72F 0005B6F1 0005B72F 0000003E
|
||||
4000605C 0005B730 0005B6F1 0005B730 0000003F
|
||||
40006078 0005B733 0005B6F1 0005B733 00000042
|
||||
00051078
|
||||
00051878
|
||||
40006000 00016E0A 00016E0A 00016E0A 00000000
|
||||
40006000 00016DDE 00016DDE 00016E0A 0000002C
|
||||
40006004 000224E5 00016DDE 000224E5 0000B707
|
||||
4000601C 000224F0 00016DDE 000224F0 0000B712
|
||||
4000603C 000224F2 00016DDE 000224F2 0000B714
|
||||
00051878
|
||||
00051078
|
||||
00016DDE 003E025D 003C947F
|
||||
12345678
|
||||
289
boards/pizero/asmdelay/asmdelay.c
Normal file
289
boards/pizero/asmdelay/asmdelay.c
Normal file
@@ -0,0 +1,289 @@
|
||||
|
||||
//-------------------------------------------------------------------
|
||||
// Copyright (C) 2010 Netronome Systems
|
||||
//-------------------------------------------------------------------
|
||||
|
||||
//d6004024 <ASMDELAY>:
|
||||
//d6004024: e2500001 subs r0, r0, #1
|
||||
//d6004028: 1afffffd bne d6004024 <ASMDELAY>
|
||||
//d600402c: e12fff1e bx lr
|
||||
|
||||
#define ARM_TIMER_LOD 0x2000B400
|
||||
#define ARM_TIMER_VAL 0x2000B404
|
||||
#define ARM_TIMER_CTL 0x2000B408
|
||||
#define ARM_TIMER_DIV 0x2000B41C
|
||||
#define ARM_TIMER_CNT 0x2000B420
|
||||
|
||||
|
||||
extern void PUT32 ( unsigned int, unsigned int );
|
||||
extern unsigned int GET32 ( unsigned int );
|
||||
extern void ASMDELAY ( unsigned int );
|
||||
extern void uart_init(void);
|
||||
extern void hexstrings ( unsigned int );
|
||||
extern void hexstring ( unsigned int );
|
||||
extern void HOP ( unsigned int, unsigned int );
|
||||
|
||||
extern unsigned int GET_CONTROL ( void );
|
||||
extern void SET_CONTROL ( unsigned int );
|
||||
extern void CLR_CONTROL ( unsigned int );
|
||||
extern void start_l1cache ( void );
|
||||
extern void stop_l1cache ( void );
|
||||
extern void invalidate_l1cache ( void );
|
||||
extern void PrefetchFlush ( void );
|
||||
|
||||
|
||||
unsigned int gmin,gmax;
|
||||
|
||||
void do_it ( unsigned int base )
|
||||
{
|
||||
unsigned int ra;
|
||||
unsigned int beg,end;
|
||||
unsigned int rb;
|
||||
unsigned int min,max;
|
||||
unsigned int rc;
|
||||
|
||||
stop_l1cache(); //just in case
|
||||
invalidate_l1cache();
|
||||
|
||||
|
||||
hexstrings(base);
|
||||
hexstrings(base);
|
||||
hexstrings(base);
|
||||
hexstring(base);
|
||||
|
||||
|
||||
hexstring(GET_CONTROL());
|
||||
CLR_CONTROL(1<<11);
|
||||
hexstring(GET_CONTROL());
|
||||
|
||||
max=0;
|
||||
min=0; min--;
|
||||
for(ra=base+0x6000;ra<base+0x6100;ra+=4)
|
||||
{
|
||||
unsigned int flag;
|
||||
|
||||
PUT32(ra+0x00,0xe2500001);
|
||||
PUT32(ra+0x04,0x1afffffd);
|
||||
PUT32(ra+0x08,0xe12fff1e);
|
||||
GET32(ra+0x08);
|
||||
PrefetchFlush();
|
||||
|
||||
for(rc=0;rc<4;rc++)
|
||||
{
|
||||
beg=GET32(ARM_TIMER_CNT);
|
||||
HOP(0x20000,ra);
|
||||
end=GET32(ARM_TIMER_CNT);
|
||||
rb=end-beg;
|
||||
flag=0;
|
||||
if(rb>gmax) gmax=rb;
|
||||
if(rb<gmin) gmin=rb;
|
||||
if(rb>max) { flag++; max=rb; }
|
||||
if(rb<min) { flag++; min=rb; }
|
||||
if(flag)
|
||||
{
|
||||
hexstrings(ra);
|
||||
hexstrings(rb);
|
||||
hexstrings(min);
|
||||
hexstrings(max);
|
||||
hexstring(max-min);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
hexstring(GET_CONTROL());
|
||||
SET_CONTROL(1<<11);
|
||||
hexstring(GET_CONTROL());
|
||||
if(1)
|
||||
{
|
||||
max=0;
|
||||
min=0; min--;
|
||||
for(ra=base+0x6000;ra<base+0x6100;ra+=4)
|
||||
{
|
||||
unsigned int flag;
|
||||
|
||||
PUT32(ra+0x00,0xe2500001);
|
||||
PUT32(ra+0x04,0x1afffffd);
|
||||
PUT32(ra+0x08,0xe12fff1e);
|
||||
GET32(ra+0x08);
|
||||
PrefetchFlush();
|
||||
|
||||
|
||||
for(rc=0;rc<4;rc++)
|
||||
{
|
||||
beg=GET32(ARM_TIMER_CNT);
|
||||
HOP(0x20000,ra);
|
||||
end=GET32(ARM_TIMER_CNT);
|
||||
rb=end-beg;
|
||||
flag=0;
|
||||
if(rb>gmax) gmax=rb;
|
||||
if(rb<gmin) gmin=rb;
|
||||
if(rb>max) { flag++; max=rb; }
|
||||
if(rb<min) { flag++; min=rb; }
|
||||
if(flag)
|
||||
{
|
||||
hexstrings(ra);
|
||||
hexstrings(rb);
|
||||
hexstrings(min);
|
||||
hexstrings(max);
|
||||
hexstring(max-min);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
hexstring(GET_CONTROL());
|
||||
CLR_CONTROL(1<<11);
|
||||
hexstring(GET_CONTROL());
|
||||
ra=GET32(ARM_TIMER_CNT);
|
||||
|
||||
|
||||
start_l1cache();
|
||||
|
||||
max=0;
|
||||
min=0; min--;
|
||||
for(ra=base+0x6000;ra<base+0x6100;ra+=4)
|
||||
{
|
||||
unsigned int flag;
|
||||
|
||||
PUT32(ra+0x00,0xe2500001);
|
||||
PUT32(ra+0x04,0x1afffffd);
|
||||
PUT32(ra+0x08,0xe12fff1e);
|
||||
GET32(ra+0x08);
|
||||
PrefetchFlush();
|
||||
|
||||
invalidate_l1cache();
|
||||
for(rc=0;rc<4;rc++)
|
||||
{
|
||||
beg=GET32(ARM_TIMER_CNT);
|
||||
HOP(0x20000,ra);
|
||||
end=GET32(ARM_TIMER_CNT);
|
||||
rb=end-beg;
|
||||
flag=0;
|
||||
if(rb>gmax) gmax=rb;
|
||||
if(rb<gmin) gmin=rb;
|
||||
if(rb>max) { flag++; max=rb; }
|
||||
if(rb<min) { flag++; min=rb; }
|
||||
if(flag)
|
||||
{
|
||||
hexstrings(ra);
|
||||
hexstrings(rb);
|
||||
hexstrings(min);
|
||||
hexstrings(max);
|
||||
hexstring(max-min);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
hexstring(GET_CONTROL());
|
||||
SET_CONTROL(1<<11);
|
||||
hexstring(GET_CONTROL());
|
||||
|
||||
max=0;
|
||||
min=0; min--;
|
||||
for(ra=base+0x6000;ra<base+0x6100;ra+=4)
|
||||
{
|
||||
unsigned int flag;
|
||||
|
||||
PUT32(ra+0x00,0xe2500001);
|
||||
PUT32(ra+0x04,0x1afffffd);
|
||||
PUT32(ra+0x08,0xe12fff1e);
|
||||
GET32(ra+0x08);
|
||||
PrefetchFlush();
|
||||
|
||||
invalidate_l1cache();
|
||||
for(rc=0;rc<4;rc++)
|
||||
{
|
||||
beg=GET32(ARM_TIMER_CNT);
|
||||
HOP(0x20000,ra);
|
||||
end=GET32(ARM_TIMER_CNT);
|
||||
rb=end-beg;
|
||||
flag=0;
|
||||
if(rb>gmax) gmax=rb;
|
||||
if(rb<gmin) gmin=rb;
|
||||
if(rb>max) { flag++; max=rb; }
|
||||
if(rb<min) { flag++; min=rb; }
|
||||
if(flag)
|
||||
{
|
||||
hexstrings(ra);
|
||||
hexstrings(rb);
|
||||
hexstrings(min);
|
||||
hexstrings(max);
|
||||
hexstring(max-min);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
hexstring(GET_CONTROL());
|
||||
CLR_CONTROL(1<<11);
|
||||
hexstring(GET_CONTROL());
|
||||
|
||||
stop_l1cache();
|
||||
}
|
||||
//-------------------------------------------------------------------------
|
||||
int notmain ( void )
|
||||
{
|
||||
//unsigned int ra,rb;
|
||||
unsigned int ra;
|
||||
unsigned int beg,end;
|
||||
|
||||
//uart_init();
|
||||
hexstrings(0x12345678);
|
||||
hexstrings(0x12345678);
|
||||
hexstrings(0x12345678);
|
||||
hexstrings(0x12345678);
|
||||
hexstring(0x12345678);
|
||||
|
||||
gmax=0;
|
||||
gmin=0; gmin--;
|
||||
|
||||
PUT32(ARM_TIMER_CTL,0x00000000);
|
||||
PUT32(ARM_TIMER_CTL,0x00000200);
|
||||
|
||||
beg=GET32(ARM_TIMER_CNT);
|
||||
ASMDELAY(100000);
|
||||
end=GET32(ARM_TIMER_CNT);
|
||||
hexstring(end-beg);
|
||||
|
||||
for(ra=0;ra<4;ra++)
|
||||
{
|
||||
beg=GET32(ARM_TIMER_CNT);
|
||||
ASMDELAY(100000);
|
||||
end=GET32(ARM_TIMER_CNT);
|
||||
hexstring(end-beg);
|
||||
}
|
||||
start_l1cache();
|
||||
for(ra=0;ra<4;ra++)
|
||||
{
|
||||
beg=GET32(ARM_TIMER_CNT);
|
||||
ASMDELAY(100000);
|
||||
end=GET32(ARM_TIMER_CNT);
|
||||
hexstring(end-beg);
|
||||
}
|
||||
invalidate_l1cache();
|
||||
for(ra=0;ra<4;ra++)
|
||||
{
|
||||
beg=GET32(ARM_TIMER_CNT);
|
||||
ASMDELAY(10);
|
||||
end=GET32(ARM_TIMER_CNT);
|
||||
hexstring(end-beg);
|
||||
}
|
||||
invalidate_l1cache();
|
||||
for(ra=0;ra<4;ra++)
|
||||
{
|
||||
beg=GET32(ARM_TIMER_CNT);
|
||||
ASMDELAY(10);
|
||||
end=GET32(ARM_TIMER_CNT);
|
||||
hexstring(end-beg);
|
||||
}
|
||||
stop_l1cache();
|
||||
|
||||
do_it(0xC0000000);
|
||||
do_it(0x80000000);
|
||||
do_it(0x40000000);
|
||||
|
||||
hexstrings(gmin); hexstrings(gmax); hexstring(gmax-gmin);
|
||||
hexstring(0x12345678);
|
||||
|
||||
return(0);
|
||||
}
|
||||
|
||||
12
boards/pizero/asmdelay/memmap
Normal file
12
boards/pizero/asmdelay/memmap
Normal file
@@ -0,0 +1,12 @@
|
||||
|
||||
MEMORY
|
||||
{
|
||||
ram : ORIGIN = 0x8000, LENGTH = 0x1000
|
||||
}
|
||||
|
||||
SECTIONS
|
||||
{
|
||||
.text : { *(.text*) } > ram
|
||||
.bss : { *(.bss*) } > ram
|
||||
}
|
||||
|
||||
152
boards/pizero/asmdelay/periph.c
Normal file
152
boards/pizero/asmdelay/periph.c
Normal file
@@ -0,0 +1,152 @@
|
||||
|
||||
//-------------------------------------------------------------------------
|
||||
//-------------------------------------------------------------------------
|
||||
|
||||
#define PBASE 0x20000000
|
||||
|
||||
extern void PUT32 ( unsigned int, unsigned int );
|
||||
extern void PUT16 ( unsigned int, unsigned int );
|
||||
extern void PUT8 ( unsigned int, unsigned int );
|
||||
extern unsigned int GET32 ( unsigned int );
|
||||
extern void dummy ( unsigned int );
|
||||
|
||||
#define ARM_TIMER_CTL (PBASE+0x0000B408)
|
||||
#define ARM_TIMER_CNT (PBASE+0x0000B420)
|
||||
|
||||
#define GPFSEL1 (PBASE+0x00200004)
|
||||
#define GPSET0 (PBASE+0x0020001C)
|
||||
#define GPCLR0 (PBASE+0x00200028)
|
||||
#define GPPUD (PBASE+0x00200094)
|
||||
#define GPPUDCLK0 (PBASE+0x00200098)
|
||||
|
||||
#define AUX_ENABLES (PBASE+0x00215004)
|
||||
#define AUX_MU_IO_REG (PBASE+0x00215040)
|
||||
#define AUX_MU_IER_REG (PBASE+0x00215044)
|
||||
#define AUX_MU_IIR_REG (PBASE+0x00215048)
|
||||
#define AUX_MU_LCR_REG (PBASE+0x0021504C)
|
||||
#define AUX_MU_MCR_REG (PBASE+0x00215050)
|
||||
#define AUX_MU_LSR_REG (PBASE+0x00215054)
|
||||
#define AUX_MU_MSR_REG (PBASE+0x00215058)
|
||||
#define AUX_MU_SCRATCH (PBASE+0x0021505C)
|
||||
#define AUX_MU_CNTL_REG (PBASE+0x00215060)
|
||||
#define AUX_MU_STAT_REG (PBASE+0x00215064)
|
||||
#define AUX_MU_BAUD_REG (PBASE+0x00215068)
|
||||
|
||||
//GPIO14 TXD0 and TXD1
|
||||
//GPIO15 RXD0 and RXD1
|
||||
//------------------------------------------------------------------------
|
||||
unsigned int uart_lcr ( void )
|
||||
{
|
||||
return(GET32(AUX_MU_LSR_REG));
|
||||
}
|
||||
//------------------------------------------------------------------------
|
||||
unsigned int uart_recv ( void )
|
||||
{
|
||||
while(1)
|
||||
{
|
||||
if(GET32(AUX_MU_LSR_REG)&0x01) break;
|
||||
}
|
||||
return(GET32(AUX_MU_IO_REG)&0xFF);
|
||||
}
|
||||
//------------------------------------------------------------------------
|
||||
unsigned int uart_check ( void )
|
||||
{
|
||||
if(GET32(AUX_MU_LSR_REG)&0x01) return(1);
|
||||
return(0);
|
||||
}
|
||||
//------------------------------------------------------------------------
|
||||
void uart_send ( unsigned int c )
|
||||
{
|
||||
while(1)
|
||||
{
|
||||
if(GET32(AUX_MU_LSR_REG)&0x20) break;
|
||||
}
|
||||
PUT32(AUX_MU_IO_REG,c);
|
||||
}
|
||||
//------------------------------------------------------------------------
|
||||
void uart_flush ( void )
|
||||
{
|
||||
while(1)
|
||||
{
|
||||
if((GET32(AUX_MU_LSR_REG)&0x100)==0) break;
|
||||
}
|
||||
}
|
||||
//------------------------------------------------------------------------
|
||||
void hexstrings ( unsigned int d )
|
||||
{
|
||||
//unsigned int ra;
|
||||
unsigned int rb;
|
||||
unsigned int rc;
|
||||
|
||||
rb=32;
|
||||
while(1)
|
||||
{
|
||||
rb-=4;
|
||||
rc=(d>>rb)&0xF;
|
||||
if(rc>9) rc+=0x37; else rc+=0x30;
|
||||
uart_send(rc);
|
||||
if(rb==0) break;
|
||||
}
|
||||
uart_send(0x20);
|
||||
}
|
||||
//------------------------------------------------------------------------
|
||||
void hexstring ( unsigned int d )
|
||||
{
|
||||
hexstrings(d);
|
||||
uart_send(0x0D);
|
||||
uart_send(0x0A);
|
||||
}
|
||||
//------------------------------------------------------------------------
|
||||
void uart_init ( void )
|
||||
{
|
||||
unsigned int ra;
|
||||
|
||||
PUT32(AUX_ENABLES,1);
|
||||
PUT32(AUX_MU_IER_REG,0);
|
||||
PUT32(AUX_MU_CNTL_REG,0);
|
||||
PUT32(AUX_MU_LCR_REG,3);
|
||||
PUT32(AUX_MU_MCR_REG,0);
|
||||
PUT32(AUX_MU_IER_REG,0);
|
||||
PUT32(AUX_MU_IIR_REG,0xC6);
|
||||
PUT32(AUX_MU_BAUD_REG,270);
|
||||
ra=GET32(GPFSEL1);
|
||||
ra&=~(7<<12); //gpio14
|
||||
ra|=2<<12; //alt5
|
||||
ra&=~(7<<15); //gpio15
|
||||
ra|=2<<15; //alt5
|
||||
PUT32(GPFSEL1,ra);
|
||||
PUT32(GPPUD,0);
|
||||
for(ra=0;ra<150;ra++) dummy(ra);
|
||||
PUT32(GPPUDCLK0,(1<<14)|(1<<15));
|
||||
for(ra=0;ra<150;ra++) dummy(ra);
|
||||
PUT32(GPPUDCLK0,0);
|
||||
PUT32(AUX_MU_CNTL_REG,3);
|
||||
}
|
||||
//------------------------------------------------------------------------
|
||||
void timer_init ( void )
|
||||
{
|
||||
//0xF9+1 = 250
|
||||
//250MHz/250 = 1MHz
|
||||
PUT32(ARM_TIMER_CTL,0x00F90000);
|
||||
PUT32(ARM_TIMER_CTL,0x00F90200);
|
||||
}
|
||||
//-------------------------------------------------------------------------
|
||||
unsigned int timer_tick ( void )
|
||||
{
|
||||
return(GET32(ARM_TIMER_CNT));
|
||||
}
|
||||
//-------------------------------------------------------------------------
|
||||
//-------------------------------------------------------------------------
|
||||
|
||||
|
||||
//-------------------------------------------------------------------------
|
||||
//
|
||||
// Copyright (c) 2012 David Welch dwelch@dwelch.com
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||
//
|
||||
//-------------------------------------------------------------------------
|
||||
136
boards/pizero/asmdelay/start.s
Normal file
136
boards/pizero/asmdelay/start.s
Normal file
@@ -0,0 +1,136 @@
|
||||
|
||||
;@-------------------------------------------------------------------------
|
||||
;@-------------------------------------------------------------------------
|
||||
|
||||
.globl _start
|
||||
_start:
|
||||
mov sp,#0x8000
|
||||
bl notmain
|
||||
hang: b hang
|
||||
|
||||
.globl PUT32
|
||||
PUT32:
|
||||
str r1,[r0]
|
||||
bx lr
|
||||
|
||||
.globl GET32
|
||||
GET32:
|
||||
ldr r0,[r0]
|
||||
bx lr
|
||||
|
||||
.globl dummy
|
||||
dummy:
|
||||
bx lr
|
||||
|
||||
.globl GETPC
|
||||
GETPC:
|
||||
mov r0,lr
|
||||
bx lr
|
||||
|
||||
|
||||
.globl BRANCHTO
|
||||
BRANCHTO:
|
||||
bx r0
|
||||
|
||||
.globl ASMDELAY
|
||||
ASMDELAY:
|
||||
subs r0,r0,#1
|
||||
bne ASMDELAY
|
||||
bx lr
|
||||
bne ASMDELAY
|
||||
bne ASMDELAY
|
||||
bne ASMDELAY
|
||||
nop
|
||||
|
||||
|
||||
.globl HOP
|
||||
HOP:
|
||||
bx r1
|
||||
|
||||
.globl GET_CONTROL
|
||||
GET_CONTROL:
|
||||
MRC p15,0,r0,c1,c0,0
|
||||
bx lr
|
||||
|
||||
.globl SET_CONTROL
|
||||
SET_CONTROL:
|
||||
MRC p15,0,r1,c1,c0,0
|
||||
orr r1,r0
|
||||
mov r0,#0
|
||||
MCR p15, 0, r0, c7, c10, 4 ;@ gross overkill
|
||||
MCR p15, 0, r0, c7, c10, 5 ;@ gross overkill
|
||||
MCR p15, 0, r0, c7, c5, 4 ;@ gross overkill
|
||||
MCR p15, 0, r0, c7, c5, 6 ;@ gross overkill
|
||||
MCR p15,0,r1,c1,c0,0
|
||||
MCR p15, 0, r0, c7, c10, 4 ;@ gross overkill
|
||||
MCR p15, 0, r0, c7, c10, 5 ;@ gross overkill
|
||||
MCR p15, 0, r0, c7, c5, 4 ;@ gross overkill
|
||||
MCR p15, 0, r0, c7, c5, 6 ;@ gross overkill
|
||||
bx lr
|
||||
|
||||
.globl CLR_CONTROL
|
||||
CLR_CONTROL:
|
||||
MRC p15,0,r1,c1,c0,0
|
||||
bic r1,r0
|
||||
MCR p15,0,r1,c1,c0,0
|
||||
bx lr
|
||||
|
||||
|
||||
.globl PrefetchFlush
|
||||
PrefetchFlush:
|
||||
MCR p15, 0, r0, c7, c5, 4
|
||||
MCR p15, 0, r0, c7, c5, 6
|
||||
bx lr
|
||||
|
||||
.globl start_l1cache
|
||||
start_l1cache:
|
||||
mov r0, #0
|
||||
mcr p15, 0, r0, c7, c7, 0 ;@ invalidate caches
|
||||
mcr p15, 0, r0, c8, c7, 0 ;@ invalidate tlb
|
||||
MCR p15, 0, r0, c7, c10, 4 ;@ DSB needed?
|
||||
MCR p15, 0, r0, c7, c10, 5 ;@ DMB needed?
|
||||
mrc p15, 0, r0, c1, c0, 0
|
||||
orr r0,r0,#0x1000
|
||||
mcr p15, 0, r0, c1, c0, 0
|
||||
bx lr
|
||||
|
||||
.globl stop_l1cache
|
||||
stop_l1cache:
|
||||
mrc p15, 0, r0, c1, c0, 0
|
||||
bic r0,r0,#0x1000
|
||||
mcr p15, 0, r0, c1, c0, 0
|
||||
bx lr
|
||||
|
||||
MCR p15, 0, r0, c7, c10, 4 ;@ DSB
|
||||
MCR p15, 0, r0, c7, c10, 5 ;@ DMB
|
||||
|
||||
|
||||
.globl invalidate_l1cache
|
||||
invalidate_l1cache:
|
||||
mov r0, #0
|
||||
mcr p15, 0, r0, c7, c7, 0 ;@ invalidate caches
|
||||
mcr p15, 0, r0, c8, c7, 0 ;@ invalidate tlb
|
||||
|
||||
MCR p15, 0, r0, c7, c5, 4 ;@ gross overkill
|
||||
MCR p15, 0, r0, c7, c5, 6 ;@ gross overkill
|
||||
|
||||
|
||||
MCR p15, 0, r0, c7, c10, 4 ;@ DSB needed?
|
||||
MCR p15, 0, r0, c7, c10, 5 ;@ DMB needed?
|
||||
bx lr
|
||||
|
||||
;@-------------------------------------------------------------------------
|
||||
;@-------------------------------------------------------------------------
|
||||
|
||||
|
||||
;@-------------------------------------------------------------------------
|
||||
;@
|
||||
;@ Copyright (c) 2012 David Welch dwelch@dwelch.com
|
||||
;@
|
||||
;@ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
|
||||
;@
|
||||
;@ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
||||
;@
|
||||
;@ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||
;@
|
||||
;@-------------------------------------------------------------------------
|
||||
Reference in New Issue
Block a user