re-writing mmu example WIP
This commit is contained in:
231
mmu/README
231
mmu/README
@@ -4,52 +4,161 @@ and how to run these programs.
|
||||
|
||||
This example demonstrates MMU basics.
|
||||
|
||||
So what an MMU does or at least what an MMU does for us is it translates
|
||||
virtual addresses into physical addresses as well as checking access
|
||||
permissions. And lastly it gives us tighter control over cachable
|
||||
So what an MMU does or at least what an MMU does for us is it
|
||||
translates virtual addresses into physical addresses as well as
|
||||
checking access permissions, and gives us control over cachable
|
||||
regions.
|
||||
|
||||
So what does all of that mean? Well so far, FROM THE ARM'S PERSPECTIVE
|
||||
we have been using physical addresses. We know from looking at the
|
||||
Broadcom manual for this chip that the actual physical addresses
|
||||
for peripherals for example as far as the manual is concerned are at
|
||||
addresses that start with 0x7E, but in the ARM's address space we use
|
||||
0x20. So there is already an address translation going on somewhere.
|
||||
But from the ARM's address bus perspective the 0x20... addresses are
|
||||
the physical addresses as is the memory we use starting at 0x00000000.
|
||||
So what does all of that mean?
|
||||
|
||||
So this address for example
|
||||
There is a boundary inside the chip around the ARM core, part of that
|
||||
boundary is the memory interface for the ARM for lack of a better term
|
||||
how the ARM accesses the world. Nothing special all processors have
|
||||
some sort of address and data based interface and your peripherals
|
||||
or edge of the chip or whatever is address and data based. That
|
||||
boundary uses physical addresses, that boundary is on the "chip side"
|
||||
or "world side" of the ARM's mmu. Within the ARM core there is the
|
||||
"processor side" of the mmu, and all accesses to the world go through
|
||||
the mmu. That is everything that is address based, all flavors of
|
||||
load and store.
|
||||
|
||||
#define ARM_TIMER_CTL 0x2000B408
|
||||
When the ARM powers up the mmu is disabled, which means all accesses
|
||||
pass through unmodified making the "processor side" or virtual address
|
||||
space equal to the world side physical address space. All of the
|
||||
examples thus far, blinkers and such are based on physical addresses.
|
||||
We already know that elswhere in the chip is another address translation
|
||||
of some sort, because the manual is written for 0x7Exxxxxx based
|
||||
adresses, but the ARM's physical addresses for those same things is
|
||||
0x20xxxxxx for the raspi 1 and 0x3Fxxxxxx for the raspi 2. For this
|
||||
discussion we only care about the ARM mmu processor side and the far
|
||||
side (world side, physical address side).
|
||||
|
||||
we consider to be a physical address within the arms address space, as
|
||||
is the address 0x00008000 where we assume our program is loaded before
|
||||
the GPU lets the ARM start.
|
||||
So when I say the mmu translates virtual addresses into physical
|
||||
addresses. What that means is on the processor side you may have
|
||||
one address you are accessing, but that does not have to be equal to
|
||||
the physical address. Lets say for example I am running a program on
|
||||
an operating system, Linux lets say, and I need to compile that program
|
||||
before I can use it and I need to link it for an address space so lets
|
||||
say that I link it to enter at address 0x8000 and use memory from
|
||||
0x00000000 to whatever I need and/or whatever is available. So that
|
||||
is all fine, except what if I have two programs and I want both running
|
||||
"at the same time" how can both use the same address space without
|
||||
clobbering each other? The answer is neither is at that address space
|
||||
the virtual address WHEN RUNNING one of them is in the virtual address
|
||||
space 0x00000000 to some number, but in reality program 1 might have
|
||||
that mapped to the physical address 0x01000000, program 2 might have its
|
||||
0x00000000 to some number mapped to 0x02000000. So when program 1
|
||||
thinks it is writing to address 0xABCDE it is really writing to
|
||||
0x010ABCDE and when program 2 thinks it is writing to address 0xABCDE
|
||||
it is really writing to 0x020ABCDE.
|
||||
|
||||
Basically ignore the man behind the curtain, you generally dont deal
|
||||
with this, the ARM is usually the main processor and the memory system
|
||||
is designed around it rather than what we have in this chip.
|
||||
It is techincally possible that some mmu out there might be able to
|
||||
translate any address into any address, but certainly not the ARM mmus
|
||||
you cannot have virtual 0x12345678 = physical 0xAAAABCDE. From a
|
||||
hardware perspective and hopefully a programmers perspective it makes
|
||||
most sense to draw a line in the address and the upper side gets
|
||||
translated and the lower stays the same. For example there is one
|
||||
mmu block size in the arm that is on one megabyte boundaries so with
|
||||
a 32 bit address space one megabyte is 20 bits, so the lower 20 bits
|
||||
dont change between virtual and physical but the upper 12 can/do. So
|
||||
address 0x12345678 virtual could be mapped to 0xCDE345678 using a
|
||||
one megabyte mmu table entry. The ARM mmu also allows for 4Kbyte
|
||||
pages for example, which means the lower 12 bits of the virtual and
|
||||
physical are the same but the upper 20 bits can be changed when going
|
||||
from virtual to physical.
|
||||
|
||||
So physical addresses are the addresses that are used on the ARM's
|
||||
address bus when accessing memory or peripherals. When we power up the
|
||||
MMU is off and the addresses we use when we write programs are physical
|
||||
addresses. But the MMU sits in the middle, when it is turned on then
|
||||
the addresses we program with are considered virtual addresses, the
|
||||
MMU converts them into physical addresses and the physical address
|
||||
goes out on the address bus. So we could for example program the
|
||||
MMU to have the virtual address 0x00000000 map to the physical address
|
||||
0x00100000 for example. Now we cannot have any address map to any
|
||||
address we cannot have 0x01234 map to 0x45678 for example, it doesnt
|
||||
work that way. If it did we would need a directory of addresses that
|
||||
is larger than the amount of memory we have, if we wanted to convert
|
||||
any address to any address we would need a look up table 4GBytes in
|
||||
size if any byte address could be any other byte address.
|
||||
What does access permission mean? Lets think about program 1 and
|
||||
program 2 above, we dont want program 1 to be able to invade program
|
||||
2s memory space, that would make hacking a computer super easy if any
|
||||
program could access the ram used by any other program (the operating
|
||||
system can sure, but we have to trust the operating system but not
|
||||
trust any rogue program). So when a program running at the application
|
||||
level is accessing something there has to be a mechanism to check the
|
||||
permissions of each access to make sure that that application is
|
||||
allowed, if not allowed the mmu has to abort the access and somehow
|
||||
call the operating system to handle this. Different processor families
|
||||
handle this differently. Initially we dont care as we are still
|
||||
running as the super user, which is also bound by the mmu, we just need
|
||||
to make sure we set the permissions so that we can access everything
|
||||
we care to access.
|
||||
|
||||
What really happens is we can break up the space into blocks and the
|
||||
whole block is virtualized somewhere else so for example we will in
|
||||
this example have the virtual address range 0x00200000 to 0x002FFFFF
|
||||
access the physical 0x00000000 to 0x000FFFFF range. Hopefully this
|
||||
will make sense soon...
|
||||
What does cachable regions mean? We know from polling the uart to
|
||||
see if there is a spot in the tx buffer for the next character that
|
||||
reads to the uart need to actually go to the uart register to read
|
||||
that status. But this is a memory mapped design, hardware registers
|
||||
like the uart status are accessed in the same way as some ram that
|
||||
contains a variable used in a program, using load and store
|
||||
instructions with some address. We can use the instruction cache
|
||||
without the mmu one because arm allows us to, second because the
|
||||
arms internal bus has a signal (or set of) that differentiate fetch
|
||||
read cycles from data read cycles. The mmu when disabled passes
|
||||
that through and it hits the cache which has different controls between
|
||||
instruction or i cache and data or d cache. So without the mmu we
|
||||
can enable instruction caching, and only instruction fetches get
|
||||
cached, I hope you know what that means, the cache is fast ram closer
|
||||
to the processor when you do a read from slow dram on the far side,
|
||||
a copy is kept in the cache (if the cache for that access type and
|
||||
address space are enabled) so that if you read that address a second
|
||||
time before that prior read is evicted the second and subsequent reads
|
||||
are closer from faster ram and return an answer much faster. Because
|
||||
fast ram is expensive you have a relatively small amount so only the
|
||||
last small number of answers is stored there, make too many reads at
|
||||
different addresses and some answers have to be evicted to make room
|
||||
for new answers. If the mmu is disabled then all accesses are marked
|
||||
as "cacheable" or able to be cached. If the cache for that type (i or
|
||||
d) is enabled. So you see the uart problem. If we were to enable
|
||||
the d cache with the mmu off then all data accesses would be cached,
|
||||
so if in a tight loop polling the uart to wait for a spot in the tx
|
||||
buffer the first time through the loop we read the uart status and
|
||||
it goes actually to the uart to get that status, if the tx buffer is
|
||||
not got a spot, then we continue to loop, the second read though
|
||||
gets the copy of the first read from the cache, which says no room
|
||||
yet, the third read gets the copy of the first read from the cache
|
||||
which says there is no room yet. This continues forever even after
|
||||
the uart has space for a character as we have stopped actually talking
|
||||
to the uart, we are reading a stale copy of the status register. This
|
||||
is true for any hardware peripheral register or ram. We cannot cache
|
||||
some or all of the peripheral address space. We want data accesses
|
||||
to be cached for all or most of ram but not for peripherals. In order
|
||||
to do that usually you use the mmu and for each of the chunks of
|
||||
address space controlled by an mmu entry there are bits in that entry
|
||||
that control whether or not that address space is cacheable. So with
|
||||
the mmu we could make the general purpose memory cacheable but the
|
||||
hardare peripherals not. This example will show that.
|
||||
|
||||
Now something not mentioned above is the notion of virtual memory, do
|
||||
not confuse that with virtual address space. We now know that you can
|
||||
allow the application some virtual address space to operate in and if
|
||||
it goes outside that space the operating system is alerted and takes
|
||||
over. What if we wanted to do that on purpose? Two very simple
|
||||
examples of this are, what if we wanted to pretend we have more memory
|
||||
than we really have. Doesnt make too much sense on the raspberry pi
|
||||
but makes a lot of sense on your desktop/laptop. You might have
|
||||
4GB of ram, but one or more TB of disk space. Wouldnt it be cool if
|
||||
a program that is using some ram but is not running just this moment
|
||||
could have its ram saved to disk to free up that ram for another program
|
||||
that is running, and then later when that other program needs its ram
|
||||
then we swap the ram back from disk to memory so it can use it as
|
||||
memory? that is exactly how swap or virtual memory works. we let the
|
||||
program run off the end of its space and crash into a protection fault
|
||||
but instead of issuing an error and stopping the program the operating
|
||||
system instead knows how much ram this program thinks it has, if it is
|
||||
within that range, then it looks for more ram for this program if there
|
||||
is some free it simply maps it in using the mmu, if not then it
|
||||
hopefully swaps some ram from some other application to disk, freeing
|
||||
some ram for this application. The second simplest use case would be
|
||||
a virtual machine, when I have say vmware running a virtual computer
|
||||
on a computer. What if I want to have the virtual machine access the
|
||||
network? I could make a range of address space that the virtual
|
||||
machine thinks is the network peripheral and let the virtual machine
|
||||
free run in some space, when it tries to access the network peripheral
|
||||
the operating system is alerted to the protection fault, but instead
|
||||
of stopping the program and issuing an error, it fakes the peripheral
|
||||
access and lets the program keep running.
|
||||
|
||||
All very cool stuff but it requires first and foremost that all memory
|
||||
accesses are funneled through a memory management unit or mmu of some
|
||||
flavor.
|
||||
|
||||
As with all baremetal programming, wading through documentation is
|
||||
the bulk of the job. Definitely true here, with the unfortunate
|
||||
@@ -59,17 +168,18 @@ are techically using an ARMv6 (architecture version 6) but when
|
||||
you go to http://infocenter.arm.com and look at the Reference Manuals
|
||||
there is an ARMv5 and then ARMv7 and ARMv8, but no ARMv6. Well
|
||||
the ARMv5 manual is actually the original ARM ARM, that I assume they
|
||||
realized couldnt maintain all the architecture variations forever,
|
||||
so the perhaps wisely went to one ARM ARM per rev. With respect to the
|
||||
MMU, that started in ARMv5 and with ARMv6 there were some changes
|
||||
made but it still has a backwards compatible mode such that programs
|
||||
that use the MMU (linux for example) dont necessarily need an overhaul
|
||||
every version. So you can look at the various architectural reference
|
||||
manuals or sometimes technical reference manuals for specific cores
|
||||
and see descriptions of the MMU tables and addressing but the
|
||||
part I mentioned as unfortunate is that the drawings and descriptions
|
||||
dont have the same look and feel. They have the same basic content
|
||||
though.
|
||||
realized couldnt maintain all the architecture variations forever in
|
||||
one document, so they perhaps wisely went to one ARM ARM per rev. With
|
||||
respect to the MMU, that started in ARMv5 and with ARMv6 there were
|
||||
some changes made but it still has a backwards compatible mode such
|
||||
that programs that use the MMU (linux for example) dont necessarily
|
||||
need an overhaul every version (or need a lot of if-then-else code
|
||||
to cover all the supported architectures in one binary). So you can
|
||||
look at the various architectural reference manuals or sometimes
|
||||
technical reference manuals for specific cores and see descriptions
|
||||
of the MMU tables and addressing but the part I mentioned as
|
||||
unfortunate is that the drawings and descriptions dont have the same
|
||||
look and feel. They have the same basic content though.
|
||||
|
||||
I am mostly using the ARMv5 Architectural Reference Manual. Possibly
|
||||
an older one than the one on ARMs page. ARM DDI0100I. Where the I is
|
||||
@@ -78,11 +188,24 @@ particular with respect to them MMU, so it is probably the right
|
||||
manual for this processor, although you could use the ARMv7 and be
|
||||
careful to ignore features added in v7.
|
||||
|
||||
So there are blocks they call sections and blocks they call pages. If we
|
||||
were to simply take every possible address and make a look up table
|
||||
and the contents of the table are the physical address, we could then
|
||||
translate any virtual address to any physical address, but it would
|
||||
take up to 4GBytes for that table. (and we would have to access
|
||||
So there are blocks they call sections and blocks they call pages.
|
||||
If we were to simply take every possible address and make a look up
|
||||
table and the contents of the table are the physical address, we could
|
||||
then translate any virtual address to any physical address, but it
|
||||
would take up to 4Giga-entries for that table for a 32 bit address
|
||||
space and each entry of the table would need to be more than 4 bytes,
|
||||
32 bits for the new address then some others for permissions and
|
||||
enables, so that would make no sense to have an mmu table larger than
|
||||
everything we would ever access.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
re-write in progress.
|
||||
|
||||
|
||||
. (and we would have to access
|
||||
everything as bytes since a scheme like that would allow the four
|
||||
bytes in an instruction or other word sized access to be in up to
|
||||
four different physical places) That is not exactly what happens
|
||||
|
||||
Reference in New Issue
Block a user