See the top level README for information on where to find the schematic and programmers reference manual for the ARM processor on the raspberry pi. Also find information on how to load and run these programs. The code in this directory is for the old/original-ish raspberry pi which you can still get raspberry pi 1 if you will. The pi2 directory is for the raspberry pi 2 the one with the quad core arm. The plus directory is for the raspberry pi B+ and A+. You have to match the right program to the right raspberry pi. This simple example sets up a small stack, not much to this program so the stack doesnt need much room. It then enables gpio16 as an output. Then goes into a loop that sets the gpio, waits, resets it, waits, repeat. gpio16 is shown on the schematic to connect to the OK led. One of the bank of leds on the corner near the audio out and host usb ports. The blink rate for me is a few blinks a second perhaps. I normally set my stack pointer to be at the top of some bank of ram if only one ram bank in the system then the top of that ram bank. See the top level README, they force us to start somewhere other than zero so for such simple programs like these I am setting the program to start at 0x8000 and the stack to start at 0x7FFC. Note that on an ARM the stack pointer (r13) points at the first address after the top of the stack so setting r13 to 0x8000 means the stack starts at 0x7FFC and works down away from our program. vectors.s is the entry point for this program, even an application on an operating system like linux has some assembly up front before calling the main function. For this processor the minimum is to to set up the stack pointer and call the main function. Because some compilers add extra stuff if they see a main() funtion I use some function name other than main() typically for embedded systems like this. If you dont have any pre-initialized variables, and dont assume that un-initialized variables are zero, then you dont have or need a .data and wont need to zero .bss. Typically the asm that preceeds the call to the main function would prepare the .data segment if needed and zero the .bss segment. Also the linker script is usually more complicated to initialize global variables with the addresses and sizes of the segments. I dont do these things so my startup code only needs to set the stack pointer and branch to the main function. The example includes a Makefile that is capable of building using gnu/gcc tools or a hybrid clang(llvm)/gnu binutils to experience an alternate C compiler. The reason for the dummy function is that when a compiler optimizes code like this: for(ra=0;ra<0x1000;ra++) continue; It replaces it with this equivalent: ra = 0x1000; Which we dont want, we want the program to actually burn some time so we can see the led with our slow eyes. The compiler doesnt know what dummy does because the asm for it is not something the C compiler can inspect. So for(ra=0;ra<0x1000;ra++) dummy(ra); To properly implement your program the C compiler must in order call dummy(0), dummy(1), dummy(2), etc. For smaller loops it may choose to unroll the loop. that is fine. Another solution is to declare ra volatile and that will cause first the loop to not get optimized, and ra to be read from and saved to memory each time through the loop. I like one approach you may like another. The program is simple refer to the broadcom arm document for information on these registers. Remember that in the broadcom document the addresses will be 0x7Exxxxxx not 0x20xxxxxx.