re-writing mmu example, work in progress

This commit is contained in:
dwelch
2015-10-13 17:30:49 -04:00
parent fc2286bcb6
commit ab8f770476

View File

@@ -2,13 +2,23 @@
See the top level README file for more information on documentation
and how to run these programs.
This example demonstrates MMU basics.
This example demonstrates ARM MMU basics.
You will need the ARM ARM (ARM Architectural Reference Manual) for
ARMv5. I have a couple of pages included in this repo, but you still
will need the ARM ARM.
This code so far does not work on the Raspberry pi 2 yet, will get
that working at some point, the knowledge here still applies, I expect
the differences to be subtle between ARMv6 and 7 but will see.
(This ONLY works on the Raspi 1 for now will get a Raspi 2 version
working at some point).
-- NEED TO RE-WRITE THIS AGAIN, SUBPAGES ENABLED, COARSE 1KB TABLES --
So what an MMU does or at least what an MMU does for us is it
translates virtual addresses into physical addresses as well as
checking access permissions, and gives us control over cachable
@@ -18,202 +28,157 @@ So what does all of that mean?
There is a boundary inside the chip around the ARM core, part of that
boundary is the memory interface for the ARM for lack of a better term
how the ARM accesses the world. Nothing special all processors have
some sort of address and data based interface and your peripherals
or edge of the chip or whatever is address and data based. That
boundary uses physical addresses, that boundary is on the "chip side"
or "world side" of the ARM's mmu. Within the ARM core there is the
"processor side" of the mmu, and all accesses to the world go through
the mmu. That is everything that is address based, all flavors of
load and store.
how the ARM accesses the world. Nothing special, all processors have
some sort of address and data based interface between the processor and
the ram and peripherals. That boundary uses physical addresses, that
boundary is on the memory side or "world side" of the ARM's mmu.
Within the ARM core there is the "processor side" of the mmu, and all
load and store (and fetch) accesses to the world go through the mmu.
When the ARM powers up the mmu is disabled, which means all accesses
pass through unmodified making the "processor side" or virtual address
space equal to the world side physical address space. All of the
space equal to the world side physical address space. All of my
examples thus far, blinkers and such are based on physical addresses.
We already know that elswhere in the chip is another address translation
of some sort, because the manual is written for 0x7Exxxxxx based
adresses, but the ARM's physical addresses for those same things is
0x20xxxxxx for the raspi 1 and 0x3Fxxxxxx for the raspi 2. For this
discussion we only care about the ARM mmu processor side and the far
side (world side, physical address side).
We already know that elswhere in the chip is another address
translation of some sort, because the manual is written for 0x7Exxxxxx
based adresses, but the ARM's physical addresses for those same things
is 0x20xxxxxx for the raspi 1 and 0x3Fxxxxxx for the raspi 2. For this
discussion we only care about that other mystery address translation
we care about the ARM and the ARM mmu.
So when I say the mmu translates virtual addresses into physical
addresses. What that means is on the processor side you may have
one address you are accessing, but that does not have to be equal to
the physical address. Lets say for example I am running a program on
an operating system, Linux lets say, and I need to compile that program
before I can use it and I need to link it for an address space so lets
say that I link it to enter at address 0x8000 and use memory from
0x00000000 to whatever I need and/or whatever is available. So that
is all fine, except what if I have two programs and I want both running
"at the same time" how can both use the same address space without
clobbering each other? The answer is neither is at that address space
the virtual address WHEN RUNNING one of them is in the virtual address
space 0x00000000 to some number, but in reality program 1 might have
that mapped to the physical address 0x01000000, program 2 might have its
0x00000000 to some number mapped to 0x02000000. So when program 1
thinks it is writing to address 0xABCDE it is really writing to
0x010ABCDE and when program 2 thinks it is writing to address 0xABCDE
it is really writing to 0x020ABCDE.
addresses. What that means is on the processor side there is an address
you are accessing, but that does not have to be the same address on
the physical address side of the mmu. Lets say for example I am
running a program on an operating system, Linux lets say, and I need
to compile that program before I can use it and I need to link it for
an address space so lets say that I link it to enter at address 0x8000
and use memory from 0x0000 to whatever I need and/or whatever is
available. So that is all fine, except what if I have two programs
and I want both running "at the same time" how can both use the same
address space without clobbering each other? The answer is neither is
at that address space the virtual address WHEN RUNNING one of them is
in the virtual address space 0x00000000 to some number, but in reality
program 1 might have that mapped to the physical address 0x01000000 and
program 2 might have its 0x00000000 to some number mapped to 0x02000000.
So when program 1 thinks it is writing to address 0xABCDE it is really
writing to 0x010ABCDE and when program 2 thinks it is writing to
address 0xABCDE it is really writing to 0x020ABCDE.
It is techincally possible that some mmu out there might be able to
translate any address into any address, but certainly not the ARM mmus
you cannot have virtual 0x12345678 = physical 0xAAAABCDE. From a
hardware perspective and hopefully a programmers perspective it makes
most sense to draw a line in the address and the upper side gets
translated and the lower stays the same. For example there is one
mmu block size in the arm that is on one megabyte boundaries so with
a 32 bit address space one megabyte is 20 bits, so the lower 20 bits
dont change between virtual and physical but the upper 12 can/do. So
address 0x12345678 virtual could be mapped to 0xCDE345678 using a
one megabyte mmu table entry. The ARM mmu also allows for 4Kbyte
pages for example, which means the lower 12 bits of the virtual and
physical are the same but the upper 20 bits can be changed when going
from virtual to physical.
If you think about it it doesnt make any sense to allow any virtual
address to map to any physical address, for example from 0x12345678
to 0xAABBCCDD. Think about it, we are talking about a 32 bit address
space or 4Giga addresses. If we allowed any address to convert to
any other address we would need a 4Giga to 4Giga map, we would actually
need 16Gigabytes just to hold the 4Giga physical adresses worst case.
To cut to the chase ARM has one option where the top 12 bits of the
virtual get translated to 12 bits of physical, the lower 20 bits in
that case are the same between the virtual and physical. This means
we can control 1MByte of address space with one definition, and have
4096 entries in some table somewhere to convert from virtual to
physical. That is quite managable. The minimum we would need to
store are the 12 replacement bits per table entry, but ARM uses a full
32 bit entry, which for this 1MB flavor, has the 12 physical bits plus
some other control bits.
What does access permission mean? Lets think about program 1 and
program 2 above, we dont want program 1 to be able to invade program
2s memory space, that would make hacking a computer super easy if any
program could access the ram used by any other program (the operating
system can sure, but we have to trust the operating system but not
trust any rogue program). So when a program running at the application
level is accessing something there has to be a mechanism to check the
permissions of each access to make sure that that application is
allowed, if not allowed the mmu has to abort the access and somehow
call the operating system to handle this. Different processor families
handle this differently. Initially we dont care as we are still
running as the super user, which is also bound by the mmu, we just need
to make sure we set the permissions so that we can access everything
we care to access.
What does cachable regions mean? The mmu also gives you the feature
of being able to choose per descriptor whether or not you want to
enable caching on that block. One obvious reason would be for the
peripherals. Think about a timer, ideally you read the current timer
tick and each time you read it you get the current timer tick and
as it changes you see it change. But what if when we turned on the
data cache it covered all addresses, all loads and stores? Then you
read the timer once, get a value, read it again, now you get the
cached value over and over again you dont see the real timer value
in the peripheral. That is not good, you cannot manage a peripheral
if you cannot read its status register or read the data coming out
of it, etc. So at a minimum your peripherals need to be in non-cached
blocks. Likewise, if you have some ram that is shared by more than
one resource, say the GPU and the ARM or for the raspberry pi 2 shared
between multiple ARM cores, you have a similar situation, another
resource may change the ram on the far side of your cache but your
cache assumes it has a copy of what is in ram. Basically a cache
only helps you if whatever on the far side of it is only modified by
writes through the cache, if there are ways to change the data on
the far side you should not cache that area. The mmu gives you
the ability to control cached and non-cahced spaces.
What does cachable regions mean? We know from polling the uart to
see if there is a spot in the tx buffer for the next character that
reads to the uart need to actually go to the uart register to read
that status. But this is a memory mapped design, hardware registers
like the uart status are accessed in the same way as some ram that
contains a variable used in a program, using load and store
instructions with some address. We can use the instruction cache
without the mmu one because arm allows us to, second because the
arms internal bus has a signal (or set of) that differentiate fetch
read cycles from data read cycles. The mmu when disabled passes
that through and it hits the cache which has different controls between
instruction or i cache and data or d cache. So without the mmu we
can enable instruction caching, and only instruction fetches get
cached, I hope you know what that means, the cache is fast ram closer
to the processor when you do a read from slow dram on the far side,
a copy is kept in the cache (if the cache for that access type and
address space are enabled) so that if you read that address a second
time before that prior read is evicted the second and subsequent reads
are closer from faster ram and return an answer much faster. Because
fast ram is expensive you have a relatively small amount so only the
last small number of answers is stored there, make too many reads at
different addresses and some answers have to be evicted to make room
for new answers. If the mmu is disabled then all accesses are marked
as "cacheable" or able to be cached. If the cache for that type (i or
d) is enabled. So you see the uart problem. If we were to enable
the d cache with the mmu off then all data accesses would be cached,
so if in a tight loop polling the uart to wait for a spot in the tx
buffer the first time through the loop we read the uart status and
it goes actually to the uart to get that status, if the tx buffer is
not got a spot, then we continue to loop, the second read though
gets the copy of the first read from the cache, which says no room
yet, the third read gets the copy of the first read from the cache
which says there is no room yet. This continues forever even after
the uart has space for a character as we have stopped actually talking
to the uart, we are reading a stale copy of the status register. This
is true for any hardware peripheral register or ram. We cannot cache
some or all of the peripheral address space. We want data accesses
to be cached for all or most of ram but not for peripherals. In order
to do that usually you use the mmu and for each of the chunks of
address space controlled by an mmu entry there are bits in that entry
that control whether or not that address space is cacheable. So with
the mmu we could make the general purpose memory cacheable but the
hardare peripherals not. This example will show that.
Now something not mentioned above is the notion of virtual memory, do
not confuse that with virtual address space. We now know that you can
allow the application some virtual address space to operate in and if
it goes outside that space the operating system is alerted and takes
over. What if we wanted to do that on purpose? Two very simple
examples of this are, what if we wanted to pretend we have more memory
than we really have. Doesnt make too much sense on the raspberry pi
but makes a lot of sense on your desktop/laptop. You might have
4GB of ram, but one or more TB of disk space. Wouldnt it be cool if
a program that is using some ram but is not running just this moment
could have its ram saved to disk to free up that ram for another program
that is running, and then later when that other program needs its ram
then we swap the ram back from disk to memory so it can use it as
memory? that is exactly how swap or virtual memory works. we let the
program run off the end of its space and crash into a protection fault
but instead of issuing an error and stopping the program the operating
system instead knows how much ram this program thinks it has, if it is
within that range, then it looks for more ram for this program if there
is some free it simply maps it in using the mmu, if not then it
hopefully swaps some ram from some other application to disk, freeing
some ram for this application. The second simplest use case would be
a virtual machine, when I have say vmware running a virtual computer
on a computer. What if I want to have the virtual machine access the
network? I could make a range of address space that the virtual
machine thinks is the network peripheral and let the virtual machine
free run in some space, when it tries to access the network peripheral
the operating system is alerted to the protection fault, but instead
of stopping the program and issuing an error, it fakes the peripheral
access and lets the program keep running.
All very cool stuff but it requires first and foremost that all memory
accesses are funneled through a memory management unit or mmu of some
flavor.
What is meant by access permissions? Lets think about those two
programs running "at the same time" on some operating system (Linux
for example) you dont want to allow one program to gain access to
the operating systems data nor some other programs data. Some
operating systems sure that are meant for only running trusted and
well mannered programs. But you dont want some video game on your
home computer to have access to your banking account data in another
window/program? The mechanisms vary across processor families but
an important job for the mmu is to provide a protection mechanism.
Such that when a particular program has a time slice on the processor
there is some mechanism to allow or restrict memory spaces. If some
code accesses an address that it does not have permission for then
an abort happens and the processor is notified. An interesting
side effect of this is that this doesnt have to be fatal, in fact it
could be by design. Think of a virtual machine, you could let the
virtual machine software run on the processor, and when it accesses
one of its peripherals the real operating system gets an abort but
instead of killing the virtual machine it actually simulates the
peripheral and lets the virtual machine keep running. Another one
that you have probably run into is when you run out of ram in your
computer, the notion of virtual memory which is differen than virtual
address space. Virtual memory in this case is when your program
ventures off the end of its allowed address space into ram it thinks
it has. The operating system gets an abort, finds some ram from
some other program, swaps that ram to disk for example, then allows
the program that was running to have a little more ram by mapping it
back in and allowing it to run. Later when the program whose data
got swapped to disk needs it it swaps back and whatever was in the
ram it swaps with then goes to disk. The term swap comes from the
idea that these blocks of ram are swapped back and forth to disk,
program A's ram goes to disk and is swapped with program T's, then
program T's is swapped with program K's and so on. This is why
starting right after you venture off that edge from real ram to
virtual, your computers performance drops dramatically and disk
activity goes way up, the more things running the more swapping going
on and disk is significantly slower than ram.
As with all baremetal programming, wading through documentation is
the bulk of the job. Definitely true here, with the unfortunate
problem that ARM's docs dont all look the same from one Archtectural
Reference Manual to an other. We have this other problem that we
are techically using an ARMv6 (architecture version 6) but when
you go to http://infocenter.arm.com and look at the Reference Manuals
there is an ARMv5 and then ARMv7 and ARMv8, but no ARMv6. Well
the ARMv5 manual is actually the original ARM ARM, that I assume they
realized couldnt maintain all the architecture variations forever in
one document, so they perhaps wisely went to one ARM ARM per rev. With
respect to the MMU, that started in ARMv5 and with ARMv6 there were
some changes made but it still has a backwards compatible mode such
that programs that use the MMU (linux for example) dont necessarily
need an overhaul every version (or need a lot of if-then-else code
to cover all the supported architectures in one binary). So you can
look at the various architectural reference manuals or sometimes
technical reference manuals for specific cores and see descriptions
of the MMU tables and addressing but the part I mentioned as
unfortunate is that the drawings and descriptions dont have the same
look and feel. They have the same basic content though.
are techically using an ARMv6 (architecture version 6)(for the raspi 1)
but when you go to ARM's website there is an ARMv5 and then ARMv7 and
ARMv8, but no ARMv6. Well the ARMv5 manual is actually the original
ARM ARM, that I assume they realized couldnt maintain all the
architecture variations forever in one document, so they perhaps
wisely went to one ARM ARM per rev. With respect to the MMU, the ARMv5
reference manual covers the ARMv4 (I didnt know there was an mmu option
there) ARMv5 and ARMv6, and there is mode such that you can have the
same code/tables and it works on all three, meaning you dont have to
if-then-else your code based on whatever architecture you find. This
raspi 1 example is based on subpages enabled which is this legacy or
compatibility mode across the three.
I am mostly using the ARMv5 Architectural Reference Manual.
ARM DDI0100I. Where the I is the rev of that ARM ARM document. The
ARMv5 ARM does show ARMv6 stuff in particular with respect to them MMU,
so it is probably the right manual for this processor.
ARM DDI0100I.
So there are blocks they call sections and blocks they call pages.
If we were to simply take every possible address and make a look up
table and the contents of the table are the physical address, we could
then translate any virtual address to any physical address, but it
would take up to 4Giga-entries for that table for a 32 bit address
space and each entry of the table would need to be more than 4 bytes,
32 bits for the new address then some others for permissions and
enables, so that would make no sense to have an mmu table larger than
everything we would ever access, actually we couldnt even access that
whole table as it takes more address space than we would have much
less the physical 32 bit address space we are trying to map to.
The 1MB sections mentioned above are called...sections...The ARM
mmu also has blobs that are smaller sizes 4096 byte pages for
example, will touch on those two sizes. The 4096 byte one is called
a small page.
As mentioned above, 32 bit address space, 1MB is 20 bits so 32-20 is
12 bits or 4096 possible combinations or the address space is broken
up into 4096 1MB sections. The top 12 bits of the virtual address
get translated to 12 bits of physical. No rules on the translation
you can have virtual = physical or have any combination, or have
a bunch of virtual sections point at the same physical space, whatever
you want/need.
ARM uses the term Virtual Memory System Architecture or VMSA and
they say things like VMSAv6 to talk about the ARMv6 VMSA. There
is a section in the ARM ARM titled Virtual Memory System Architecture.
In there we see the coprocessor registers, specifically CP15 register
2 is the translation table base register.
If we think about what arm did and we will get to the manual in a
second. Lets start with a 1MByte page. That means we take the 4GByte
possible addresses and divide them by 1MByte, we get 4096. That
is a manageable number. 1MByte is 20 bits, 32-20 is 12 (thus 4096).
So we would need to be able to replace the 12 bits of virtual address
with 12 bits of physical address plus have other bits in the table to
indicate permissions and cache control and ideally some to indicate
this is a 1MB page or not. And ARM has fit all of that into a 32
bit entry. So if we wanted to map the whole 32 bit virtual address
space for the ARM we could do that with a 4096 entry (4096*32 bits is
16KBytes) MMU table.
So the ARMv5 ARM ARM (ARM Architectural Reference Manual) is what
we need now. See the top level README for finding this document,
@@ -221,7 +186,8 @@ I have included a few pages in the form of postscript, any decent pdf
viewer should be able to handle these files. Before the pictures
though, the section in quesiton is titled Virtual Memory System
Architecture. In the CP15 subsection register 2 is the the translation
table base register.
table base register. There are three opcodes which give us access to
three things, TTBR0, TTBR1 and the control register.
First we read this comment
@@ -229,100 +195,154 @@ If N = 0 always use TTBR0. When N = 0 (the reset case), the translation
table base is backwards compatible with earlier versions of the
architecture.
we will leave that as N = 0 and not touch it and use TTBR0
That is the one we want, we will leave that as N = 0 and not touch it
and use TTBR0
Now what the TTBR0 description initially is telling me that bit 31
down to 14-n or 14 in our case since n = 0 is the base address, in
PHYSICAL address space (the mmu cant possibly go through the mmu to
figure out how to go through the mmu) we basically need to align to
16384 bytes. (2 to the power 14, the lower 14 bits if our TLB base
address needs to be all zeros).
PHYSICAL address space. Note the mmu cannot possibly go through the
mmu to figure out how to go through the mmu, the mmu itself only
operates on physical space and has direct access to it. In a second
we are going to see that we need the base address for the mmu table
to be aligned to 16384 bytes. (2 to the power 14, the lower 14 bits
of our TLB base address needs to be all zeros).
We write that register using
mcr p15,0,r0,c2,c0,0 ;@ tlb base
TLB = Translation Lookaside Buffer. As far as we are concerned think
of it as an array of 32 bit integers, each integer being used to
completely or partially convert from virtual to physical and describe
permissions and caching. Thinking of it as an array we can talk about
the 3rd thing in the table, but being 32 bits wide that is really
times 4 (and plus one depending on if we are talking zero based or
one based). This will hopefully make sense in a second.
of it as an array of 32 bit integers, each integer (descriptor) being
used to completely or partially convert from virtual to physical and
describe permissions and caching.
My example is going to have a define called MMUTABLEBASE which will
be where we start our TLB table.
So on the second page of the section_translation.ps file I have included
in this repo directory. This is hopefully not too complicated but in
order to do this kind of work you have to be able to manipulate/compute
addresses. So what this is telling us is we start with the MMUTABLEBASE
at the top, this is some space in physical memory that we have decided
we are going to use to keep our mmu table, which means nobody else
can mess with it, if we were an operating system we would only allow
us permission to touch it, and block all applications from it, but since
we are bare metal supervisor we just have to not step on our own toes.
Here is the reality of the world. Some folks struggle with bit
manipulation, orring and anding and shifting and such, some dont. The
MMU is logic so it operates on these tables in the way that logic would,
meaning from a programmers perspective it is a lot of bit manipulation
but otherwise is relatively simple to something a program could do. As
programmers we need to know how the logic uses portsion of the virtual
address to look into this descriptor table or TLB, and then extracts
from those bits the next thing it needs to do. We have to know this so
that for a particular virtual address we can place the descriptor we
want in the place where the hardware is going to find it. So we need
a few lines of code plus some basic understanding of what is going on.
Just like bit manipulation causes some folks to struggle, reading
a chapter like this mmu chapter is equally daunting. It is nice to
have somehone hold your hand through it. Hopefully I am doing more
good than bad in that respect.
SBZ = should be zero. Our MMUTABLEBASE as described above is 14 bits
of zeros at the bottom and 32-14 = 18 bits of whatever we choose within
our physical address space. Using a 0 for the MMUTABLEBASE would
not be a wise idea as interrupts and other vectors are there and we
cant be having both vectors and the mmu table in the same place so
the first sane place we could put this is 0x00004000 upper 18
bits being a 1 the lower 14 being all zeros. We will pick our address
in a bit.
There is a file, section_translation.ps in this repo, you should be
able to use a pdf viewer to open this file. The figure on the
second page shows just the address translation from virtual to physical
for a 1MB section. This picture uses X instead of N, we are using an
N = 0 so that means X = 0. The translation table base at the top
of the diagram is our MMUTABLEBASE, the address in physical space
of the beginning of our first level TLB or descriptor table. The
first thing we need to do is find the table entry for the virtual
address in question (the Modified virtual address in this diagram,
as far as we are concerned it is unmodified it is the virtual
address we intend to use). The first thing we see is the lower
14 bits of the translation table base are SBZ = should be zero.
Basically we need to have the translation table base aligned on a
16Kbyte boundary (2 to the 14th is 16K). It would not make sense
to use all zeros as the translation table base, we have our reset
and interrupt vectors at and near address zero in the arms address
space so the first sane address would be 0x00004000. The first
level descriptor is based on the top 12 bits of the virtual address
or 4096 entries, that is 16KBytes (not a coincidence), 0x4000 + 0x4000
is 0x8000, where our arm programs entry point is, so we have space
there if we want to use it. But any address with the lower 14 bits
being zero will work so long as you have enough memory at that address
and you are not clobbering anything else that is using that memory
space.
So this picture says take the MMUTABLEBASE address at the top, then
take bits 31-20 or the top 12 bits of the VIRTUAL ADDRESS, multiply
by 4 (shift left two zeros) and add that to the MMUTABLEBASE. This
is the address in PHYSICAL memory where the "First-level descriptor"
is found. This is how the hardware works so when we in our software
place a descriptor in memory we need to compute the address the same
way to get the descriptor in the right place.
Now *IF* the lower two bits of the first level descriptor are 0b10 then
this is a 1MB section descriptor. the picture then shows that we
create the physical address by taking the lower 20 bits of the virtual
address and placing the 12 bits from the first level descriptor on the
top (31:20) and that is how, for this section, we convert from
virtual to physical. Part of the virtual being used to look up into
the mmu table, and that first lookup being a 1MB section, and the
physical being a combination of the descriptor and the virtual.
If the lower two bits of the first level descriptor, the first lookup,
are not 0b10 then we will get to that in a second.
You should be able to find the same picture in your ARM ARM that I have
stolen here. The subsection titled "Hardware page table translation"
Now they have this optional thing called a supersection which is a 16MB
sized thing rather than 1MB and one might think that that would make
life easier, instead of 4096 entries we would only need 256 to describe
the whole world in the easiest way with the largest chunks. But
the lookup works the same bits 31:20 are used for the first lookup
no matter what (well we could play with that N=0 register, but are not
going to here, that is not legacy, lets start with legacy works on
the most chips) so you basically have to write 16 entries for a
super section, you dont save anything. the super section is broken into
16 1MB chunks and each 1MB chunk is a first level mmu table lookup. So
it doesnt buy us anything for now. Note how the hardware knows a
1MB section from a 16MB supersection is bit 18 in the first level entry.
Hopefully I have not lost you yet, we are doing address manipulation,
and maybe you are one step ahead of me, yes EVERY load and store with
the mmu enabled requires at least one mmu table lookup, the mmu when it
accesses this memory does not go through itself, but EVERY other fetch
and load and store. Which does have a performance hit, they do have
a bit of a cache in the mmu to store the last so many tlb lookups to
make walking through the same space much faster, but that tlb cache
is limited in size, if you jump around a lot in ram you will have
a penalty here. Cant really avoid it too much.
So what this picture is showing us is that we take the top 12 bits
of the virtual address, multiply by 4 or shift left 2, and add tat
to the translation table base, this gives the address for the first
level descriptor for that virtual address. The diagram shows the
first level fetch which returns a 32 bit value that we have placed
in the table. If the lower 2 bits of that first level descriptor are
0b10 then this is a 1MB Section. If a 1MB section then the top 12
bits of the first level descriptor replace the top 12 bits of the
virtual address to convert it into a physical address. Understand
here first and foremost so long as we do the N = 0 thing, the first
level descriptor or the first thing the mmu does is look at the top
12 bits of the virtual address, always. If the lower two bits of
the first level descriptor are not 0b10 then we get into
a second level descriptor and more virtual bits come into play, but
for now if we start by learning just 1MB sections, the conversion
from virtual to physical only cares about the top 12 bits of the
address. So for 1MB sections we dont have to concentrate on every
actual address we are going to access we only need to think about
the 1MB aligned ranges. The uart for example on the raspi 1 has
a number of registers that start with 0x202150xx, if we use a 1MB
section for those we only care about the 0x202xxxxx part of the
address. To not have to change our code we would want to have
the virtual = physical for that and do not mark it as cacheable.
So if my MMUTABLEBASE was 0x00004000 and I had a virtual address of
0x12345678 then the hardware is going to take the top 12 bits of that
address 0x123, multiply by 4 and add that to the MMUTABLEBASE.
0x4000+(0x123<<2) = 0x448C. and that is the address the mmu is going
to use for the first-level lookup.
to use for the first-level lookup. Ignoring the other bits in the
descriptor for now, if the first-level descriptor has the value
0xABC00002, the lower two bits are 0x10, a 1MB section, so the top
12 bits replace the virtual addresses top 12 bits and our 0x12345678
is converted to the physical address 0xABC45678.
Now they have this optional thing called a supersection which is a 16MB
sized thing rather than 1MB and one might think that that would make
life easier, right? Wrong. No matter what, assuming the N = 0 thing
the first level descriptor is found using the top 12 bits of the
virtual address, so in order to do some 16MB thing you need 16 entries
one for each of the possible 1MB sections. If you are already
generating 16 descriptors might as well just make them 1MB sections,
you can read up on the differences between super sections and sections
and try them if you want. For what I am doing here dont need them,
just wanted to point out you still need 16 entries per super section.
Hopefully I have not lost you yet with this address manipulation,
and maybe you are one step ahead of me, yes EVERY load and store with
the mmu enabled requires at least one mmu table lookup, the mmu when it
accesses this memory does not go through itself, but EVERY other fetch
and load and store. Which does have a performance hit, they do have
a bit of a cache in the mmu to store the last so many tlb lookups.
That helps, but you cannot avoid the mmu having to do the conversion
on every address.
In the ARM ARM I am looking at the subsection on first-level descriptors
has a table:
Table B4-1 First-level descriptor format (VMSAv6, subpages enabled)
What this is telling us is that if the first-level descriptor, the
32 bit number we place in the right place in the TLB, has the lower
two bits 0b10 then that entry is a 1MB section and the mmu can get
everything it needs from that first level descriptor. But if the
lower two bits are 0b01 then this is a coarse page table entry and
we have to go to a second level descriptor to complete the
conversion from virtual to physical. Not every address will need
this only the address ranges we want to be more coarsely divided than
1MB. Or the other way of saying it is of we want to control an
address range in chunks smaller than 1MB then we need to use pages
not sections. You can certainly use pages for the whole world, but
if you do the math, 4096Byte pages would mean your mmu table needs
to be 4MB+16K worst case. And you have to do more work to set that
all up.
The coarse_translation.ps file I have included in t
-- REWRITE IN PROGRESS HERE ---
If you look in the ARM ARM at the first level descriptor format. The
lower two bits of the value read at that address tells the mmu hardware