mirror of
https://github.com/xomboverlord/xomb.git
synced 2026-01-11 10:16:36 +01:00
Adds various documentation on the memory, process, and upcall model.
This commit is contained in:
@@ -3,3 +3,34 @@ The XOmB exokernel multiplexes hardware resources mostly through the use of the
|
||||
This means that the kernel needs to allocate memory pages in order to allocate the page table entries. So, it maintains the data structure responsible for keeping track of free pages of physical memory. Applications may request specific physical pages to be allocated. Applications may create resources which are represented by page table entries... in our case, the kernel is a PML4, a process is a PML3, and thus a resource is either a PML2 (PD) or a first-level page table (PT). An application or library OS (libOS) can do this by asking the kernel to allocate a resource with a particular physical page to be used as its relative page table root.
|
||||
|
||||
The kernel should be somewhat aware of hardware memory mapped ranges and allocate resources for those that can be securely provisioned when asked by a user application via a library OS. The kernel does not create those resources itself, however, or have any logic that is specific to any hardware except what it needs for debugging itself. An 'init' process will eventually exist that will allocate some of those initial resources that it can pass off to driving libOSes and applications that run later.
|
||||
|
||||
The kernel has the following flags for page table entries where some are hardware-defined and others are specifically maintained by the kernel:
|
||||
|
||||
* `ReadOnly` - When set, this page cannot be written to. This is generally an existing hardware-defined flag.
|
||||
* `NoExecute` - When set, this page cannot be executed as though it were runnable code. This is generally an existing hardware-defined flag, but not always available on all systems.
|
||||
* `Present` - Whether or not there is meant to be an actual page mapped in at the given moment. When it is clear, the page is 'not present' and therefore will fault on any attempt to access. One use of this is to have an otherwise valid entry, but one where the page data has been saved elsewhere and needs to be restored when it faults. This is generally an existing hardware-defined flag.
|
||||
* `Owner` - When set for a root table of a Resource or Process within the Process userspace address space, the current process owns the Resource or is the parent of the Process. This grants extra ability to the calling Process when maintaining the Resource or Process that simple grantees of the Resource do not have. See [RESOURCE](RESOURCE.md) and [PROCESS](PROCESS.md) files for more information about how a Resource or Process is maintained.
|
||||
* `Grant` - When set for an entry that points to a Resource, it denotes that the Resource is on Grant and not owned. When the kernel walks the page table structure, if it hits an entry that is marked as a `Grant`, that calling Process cannot affect that Resource in any way.
|
||||
|
||||
The only flags likely to not already exist are the `Owner` and `Grant` flags. The kernel reserves this. Library OSes can then establish what other flags might be on their own (although there may not be any bits left.) The kernel will facilitate setting any non-reserved bits in page table entries for the benefit of library OSes. For instance, a library OS will likely want to implement copy-on-write behavior and may want to introduce a `CopyOnWrite` flag for this purpose. When the kernel faults on a read-only page, the library OS can decide to then ask the kernel to map in a particular physical page. The kernel will zero this page.
|
||||
|
||||
When the flags are specified to a related kernel system call, they are supplied in this order where the first item is the least significant bit: Present, ReadOnly, NoExecute. To mark a page as `NoExecute` and `Present`, the flags field would equal 5. For a `Present` and `ReadOnly` page, the flags are equal to 3. The `Owner` bit cannot be specified as only the kernel uses this flag. The flags field can also specify user-defined bits. These bits are the 16th bit (starting from 0) on up. So, if we had a library OS want to create a `CopyOnWrite` bit, specifying that with a `Present` bit set would mean we would send hex `0x101` (decimal 257) for the `flags` field. The kernel sets the user-defined bits in its own order by mapping them to available bits in the hardware page table entries (PTEs). System calls that are given flags fields where there are bits set that the kernel does not expect, or more specifically user-defined bits that cannot be accommodated, that system call will fail and return some falsey value as an indication.
|
||||
|
||||
The kernel system calls related to page allocation and mapping have an enumerated return value to indicate the error or 0 if successful. The `MapperError` error codes are as follows:
|
||||
|
||||
* `0`: `SUCCESS` - Success! It performed the specified action.
|
||||
* `1`: `INVALID_FLAGS` - Invalid flags were specified.
|
||||
* `2`: `NOT_FREE` - The physical page requested is already allocated elsewhere.
|
||||
* `3`: `NOT_ALLOCATED` - The physical page cannot be freed because it was not allocated.
|
||||
* `4`: `INVALID_SOURCE` - The given virtual address is not a valid virtual address for the current Process.
|
||||
* `5`: `INVALID_TARGET` - The given virtual address to map to is not a valid virtual address for the current Process.
|
||||
|
||||
The kernel provides these basic physical memory primitive functions:
|
||||
|
||||
* `ALLOC_PAGE(physicalAddress, virtualAddress, flags) -> MapperError` - Allocates the given physical page to the given virtual address of the Process, if there is no page currently mapped there. `virtualAddress` points to the specific page table entry to modify using the recursive entry (`PML4[510]`) to do so. The kernel will mark that physical page allocated. This returns `NOT_FREE` if that physical page is already allocated. Sets the flags on the leaf PML1 page table entry. The `flags` field is a set of flags that are not in any hardware specific order. They are defined by the system. See the aforemented structure. The `flags` field must indicate the `Present` flag and otherwise returns `INVALID_FLAGS` and fails. If the given `virtualAddress` is not aligned to a page table entry, the page table structure does not exist, or it is not one that is owned by the current Process, then it fails with an `INVALID_TARGET` error. Ownership is decided by whether or not there is an `Owner` flag when walking the page table structure while not seeing a `Grant` flag before getting to the point specified by the given `virtualAddress`. The root of the process is always considered marked `Owner`. Therefore, you can only allocate pages into a Resource or child Process that is 'owned' by the calling Process.
|
||||
* `REMAP_PAGE(virtualAddress, targetAddress) -> MapperError` - Atomically moves the physical page that is allocated to the page table entry denoted by the given `virtualAddress` to being mapped into the empty page table entry given by `targetAddress`. Fails by returning `INVALID_SOURCE` if the `virtualAddress` is not aligned to a page table entry, is in a non-existing page table structure, is unmapped, or not owned by the current Process. Ownership is known by finding an `Owner` flag set when walking the page table structure. The root of the process is always considered marked `Owner`. Ownership is always void if a `Grant` flag is found when walking the structure instead. Does nothing if `virtualAddress` and `targetAddress` are effectively the same. Fails by returning `INVALID_TARGET` if the `targetAddress` is not aligned to a page table entry, is within a page table structure that does not exist, or it is not one that is owned by the current Process. It keeps the same flags.
|
||||
* `CHMOD_PAGE(virtualAddress, flags) -> MapperError` - Sets new flags on the given virtual address page table entry. Must be aligned to a page. Userspace processes can use the recursive page index `PML4[510]` in order to target an inner page table entry. Fails by returning `INVALID_SOURCE` if the page table entry to update is not in userspace. Fails by returning `INVALID_FLAGS` if the `flags` provided cannot be accommodated or specify flags that do not exist. Fails by returning `INVALID_SOURCE` if `virtualAddress` is not aligned to a page table entry, is within a non-existant page table structure, or the page table structure is not owned by the calling Process. Ownership is known by finding an `Owner` flag set when walking the page table structure. The root of the process is always considered marked `Owner`. Ownership is always void if a `Grant` flag is found when walking the structure instead.
|
||||
* `UNMAP_PAGE(virtualAddress) -> MapperError` - Clears the page mapped at the page table entry at the given `virtualAddress`. This frees that physical page while voiding out that page table entry within the Process. Fails by returning `NOT_ALLOCATED` if the physical page is not one that is in the allocatable space tracked by the page allocator. Fails by returning `INVALID_SOURCE` if the virtualAddress is unmapped or not owned by the current Process. Fails by returning `INVALID_SOURCE` if the page table structure is not owned by the calling Process. Ownership is known by finding an `Owner` flag set when walking the page table structure. The root of the process is always considered marked `Owner`. Ownership is always void if a `Grant` flag is found when walking the structure instead.
|
||||
* `MAP_ZERO(virtualAddress, flags) -> MapperError` - Maps in a kernel maintained 'zero' page which is a page that is prewritten with 0s. The kernel forcibly maps this in read-only. Therefore, this fails by returning `INVALID_FLAGS` if `flags` does not indicate the `ReadOnly` and `Present` flags. This fails with `INVALID_TARGET` if the `virtualAddress` is not aligned to a page table entry, is indicating a page table structure that does not exist, or is not owned by the calling Process. Ownership is known by finding an `Owner` flag set when walking the page table structure. The root of the process is always considered marked `Owner`. Ownership is always void if a `Grant` flag is found when walking the structure instead.
|
||||
|
||||
The kernel physical page bitmap that shows which pages are allocated and which ones are not is a readable structure at a well-known virtual address. All library OSes can read the bitmap to find a free page. Therefore, a library OS can implement its own page allocator, although it must cooperate with other page allocators on the system.
|
||||
|
||||
72
docs/PROCESS.md
Normal file
72
docs/PROCESS.md
Normal file
@@ -0,0 +1,72 @@
|
||||
A Process is just a page table. It is a PML4 page table that is placed as the current page table (via CR3 on x86-64) when the Process is to be run. Any Resource that is accessible by a Process is any Resource that is mapped into that Process virtual address space via that root page table. The kernel maintains the allocated Process structures as leaves in a `PML3` space off of `PML4[507]`. Even though a Process is a root page table (PML4), they are stored as PML2 nodes within this Process mapping space. That means the kernel can be aware of a total of 262K active processes and be able to view and manipulate the Process root page table's PML4 and PML3 entries. In order for the kernel to manipulate the PML2 or PML1 entries of a non-active Process, the kernel has to context switch to the Process first, and perhaps context switch back before returning. This means one Process manipulating another Process address space in this way would be a relatively expensive operation since it would incur a TLB flush in most cases.
|
||||
|
||||
These Process objects are otherwise handled the same way as any Resource (see [RESOURCE.md](RESOURCE.md) for more information about these primitives.) A Process has an identifier (a `processId`) that is constructed as the kernel-space virtual address of that root page table from within that `PML4[507]` virtual space.
|
||||
|
||||
When a process wants to create a child process, it simply has the kernel create a process page table (a PML4) from the given physical page. It will map that process root page table as an intermediate page table in the address space of the calling Process to the virtual address desired by the calling Process. From here, the parent Process will need to allocate the subsequent Resource for the child Process that will contain the code and data for that eventual program. It then grants each Resource to the child Process. Then it optionally changes the owner to that child Process so that this Process can further affect its own page table on its own. It can then yield to the child Process.
|
||||
|
||||
Yielding to a process is just a system call (`YIELD_PROCESS`) to switch address spaces to the address space of the given process. The process is identified by its root page table in memory which is typically called `processId`. So, yield takes one argument: The `processId`, which is the virtual address of a process's root page table from within the kernel's process map. The yield call is used for cooperative scheduling. Preemption is done by a user process via a libos process scheduler that listens to the system timer. The kernel does not implement preemption.
|
||||
|
||||
There is no `fork` action in the exokernel proper. Implementing a `fork` semantic would just be to create a Process and then grant each Resource to the new process. Any extra semantics one wishes to impose to satisfy whatever definition of 'forking' an OS has in mind is up to that library OS process implementation.
|
||||
|
||||
Any parent Process "owns" the child Process. Ownership is related to the flags set when mapping in that child Process root page table into the parent Process. That page table entry that points to the child's would-be PML4 is marked with the `Owner` flag. When the kernel wants to validate ownership (the parent status) of the calling Process, it looks at the Process-local Resource pointed to via the `processAddress` and looks for the `Owner` flag and also looks to see that the physical page pointed to by that page table entry is the same as that PML4 of the child Process via the `processId` and walking the structure within the process map (`PML4[507]`) in the kernel's memory. They need to match to confirm that relationship.
|
||||
|
||||
Since the kernel maintains the mapping of Process PML4s as PML2s off of `PML4[507]`, the leaves of this virtual address space point to the potential PML2s of that child Process. It is worth noting that when a Process is allocated, it is *only* an empty PML4. It is up to library OSes in userspace to denote what the page table structure of a Process actually looks like. However, considering a saturated page table structure for a hypothetical Process, the kernel mapping can always manipulate the PML2s of any running Process. This means that granting any PML2 (or larger) Resource to another Process can be done without context switching away from the granting Process. It also means that loading a new Process (say from a shell application) can be done by:
|
||||
|
||||
* Allocating a PML4 via `ALLOC_PROCESS` and mapping it into a PML2 of the calling parent Process.
|
||||
* Allocating a PML3 page in that child Process by allocating the PML1 in the appropriate place in the parent Process.
|
||||
* Allocating a PML2-sized Resource in the parent at some PML2 in the calling parent Process.
|
||||
* Loading the executable code into the PML2 by allocating the necessary page structure and leaf pages and filling them with executable and data content from a binary executable.
|
||||
* Granting the PML2-sized Resource to the child Process via `GRANT_RESOURCE`. This affectively updates the PML3 in the child to point to the PML2 root of the calling Process.
|
||||
* Changing the owner for this Resource via `CHOWN_RESOURCE` which now gives ownership to the child Process.
|
||||
|
||||
All of these require absolutely no context switching and can be done with the global page table mapping in the kernel. It possible to grant a PML1-sized (2MB on x86-64) Resource to another Process and modify the page table of that Process without a context switch. With very careful Resource management, most metadata and Process page structures are visible from the global page table available to the kernel and manipulation can be done cheaply.
|
||||
|
||||
The kernel also maintains a backward reference that maps physical addresses of the root page tables to Process metadata. This metadata contains the actual `processId` that would normally be used and also the kernel-aware upcalls. The `onYield` upcall and `onFree` upcall are virtual addresses that will be the instruction pointers upon yielding to the Process. Each of these upcalls will pass a single argument of the last running `processId`. The `onFree` upcall occurs when a parent Process is hinting that it is about to deallocate the Process page table so the child Process can responsibly respond and deallocate its own resources. Generally, yielding back to the calling Process is expected. For more information about upcalls refer to [UPCALL](UPCALL.md).
|
||||
|
||||
The kernel system calls related to process allocation and mapping have an enumerated return value to indicate the error or 0 if successful. The `ProcessError` error codes are as follows:
|
||||
|
||||
* `0`: `SUCCESS` - Successful operation with no other return value expected.
|
||||
* `1`: `INVALID_FLAGS` - Invalid flags were specified.
|
||||
* `2`: `NOT_FREE` - The physical page is not free and cannot be used to allocate a structure.
|
||||
* `4`: `INVALID_SOURCE` - The given virtual address for the resource is invalid. When allocating, this means that the `resourceAddress` specified is not free. On other calls, it means there's no valid resource at `resourceAddress`.
|
||||
* `5`: `INVALID_TARGET` - The processId to yield to was invalid.
|
||||
* `6`: `NO_ROOM` - We reached the maximum number of processes.
|
||||
|
||||
The kernel offers several primitive functions to faciliate the creation and control of processes:
|
||||
|
||||
* `ALLOC_PROCESS(physicalAddress, processAddress) -> ProcessError | processId` - Spawns a child process that will be mapped to the provided userspace address of the calling Process. Returns the `processId`, which is a virtual address in kernel space that, via the recursive page table entry (`PML4[510]`), points to the root page table of the new process. The `processAddress` needs to be a valid page table entry to attach the root page table for the child Process using the recursive entry `PML4[510]` to do so. Fails with `INVALID_TARGET` if the page table structure does not exist or not owned by the calling Process. This fails with `NO_ROOM` if there are no available processes left in the system because the kernel's process map is full. This fails with `NOT_FREE` if the `physicalAddress` specified is not actually free.
|
||||
* `YIELD_PROCESS(processId) -> ProcessError` - Cooperatively yields to the given Process. The `onYield` upcall of the target Process will be passed the calling Process `processId`. Effectively, this swaps the current root page table (PML4 via CR3) to the one for the given process. The process has to deal with restoring its own state. There's no context being stored because the kernel is effectively stateless except for maintaining access to all address spaces. On success, this function never returns. If the `processId` is in any way invalid, it will return with the `INVALID_TARGET` error.
|
||||
* `FREE_PROCESS(processId, processAddress) -> ProcessError` - Yields to a child process that was previously allocated via an early `ALLOC_PROCESS` call to allow it to deal with closing itself. It calls the `onFree` upcall while passing the current Process `processId`. The calling Process might expect that the target Process yield back. Fails with `INVALID_SOURCE` if the calling Process is not the parent of the given Process (that is, `processAddress` does not point to the same physical page as the PML4 within the address space rooted at `processId`). On success, the Process is effectively freed and can no longer be the target of a yield.
|
||||
|
||||
The kernel has the following check (which is identical to the one for checking if a Process owns a Resource) of whether or not the calling Process is the parent of the given Process, which it checks on the Process free operation:
|
||||
|
||||
* `is_mapped(processId, processAddress) -> boolean` - This looks at the `processId` and ensures that it is a virtual address that runs through the kernel's Process map. Let's say that we maintain a mapping of all processes on `PML4[507]`, so the 507th index of the root page table points to a PML3 that contains, as leaves of the tree, the root page tables of all known Process objects. We can then tell very easily if `processId` is a virtual address that uses the recursive entry (`PML4[510]`) to point to the physical page of the Process root. That physical page must be the same one pointed to by the given `processAddress`. We also verify that the `processAddress` is not in higher memory, which is always owned by privileged kernel code. If all of these hold true, the current calling Process owns the given Process as it is the parent of the given Process.
|
||||
|
||||
The kernel uses this procedure to create a Process (`ALLOC_PROCESS`):
|
||||
|
||||
* Marks the given `physicalAddress` allocated (and otherwise fails with `NOT_FREE`)
|
||||
* Uses this physical page as the root page table of the child Process.
|
||||
* Maps this root page table into the calling Process in the page table entry it specified.
|
||||
* It marks this page table entry with the `Owner` flag.
|
||||
* It maps this root page table into the Process map (`PML4[507]`) such that it is a PML2 there.
|
||||
* It determines the `processId` which is the virtual address that points to that PML2 in the Process map.
|
||||
* It stores this `processId` along with other initial metadata as the value in the Process hash using the physical address as the key. This can be used to forcibly kill a child Process when the owning parent Process is forcibly killed itself or crashes. Other metadata contained within here are the destination addresses of upcalls.
|
||||
* It returns the `processId` to the calling Process.
|
||||
|
||||
The kernel uses this procedure to yield to a Process (`YIELD_PROCESS`):
|
||||
|
||||
* Validates that `processId` is a virtual address that points into the Process map at `PML4[507]`. Fails with `INVALID_TARGET` if it is not.
|
||||
* Parses the `processId` to determine the physical address of that root page table.
|
||||
* Pulls out the process metadata from the Process hash.
|
||||
* Switches the root page table of the system to this one.
|
||||
* Flushes any virtual address translation caches (TLB, etc)
|
||||
* Returns to the `onYield` upcall by looking it up within the process metadata.
|
||||
|
||||
The kernel uses this procedure to free a Process (`FREE_PROCESS`):
|
||||
|
||||
* Validates that `processId` is a virtual address that points into the Process map at `PML4[507]`. Fails with `INVALID_SOURCE` if it is not.
|
||||
* Parses the `processId` to determine the physical address of that root page table.
|
||||
* Validates ownership of the calling Process by examining for equality the physical page pointed to by the given `processAddress` is the same as the root page table of the given Process. If `processAddress` is not a valid page table entry or does not equal the expected address, it fails with `INVALID_SOURCE`.
|
||||
* Yields to the `onFree` upcall of the other process while passing the calling Process `processId`.
|
||||
* Does not return to the calling Process. The calling Process must understand that it needs to listen to a timer to preempt and properly delete the child if it wants to ensure that the child Process is gone.
|
||||
* **Note**: When the calling Process regains control, it can just remove the root page table of the child Process, effectively stopping it from existing. The `FREE_PROCESS` call is just here to provide a kernel means of securely calling the `onFree` upcall.
|
||||
39
docs/RESOURCE.md
Normal file
39
docs/RESOURCE.md
Normal file
@@ -0,0 +1,39 @@
|
||||
A Resource is a region of memory or some other entity that can be adequately and securely accessed via the virtual memory system. It is represented as a PML3, PML2, PML1, or as a single page depending on the needs and size of that resource. A Process has access to a Resource if and only if that Resource is accessible via a virtual address during the runtime of that Process. That is, the Resource is currently mapped into the page table hierarchy of the Process, which is itself represented by a PML4. The kernel keeps track of all Resources via them being mapped into the virtual address space of the kernel. A Resource is then identified via a 64-bit identifier that is exactly the virtual address within that mapping of the root page table structure (PML3, PML2, PML1, or single page via the recursive entry at `PML4[510]`) of that Resource. We often lovingly refer to these by another name: a Gib.
|
||||
|
||||
A ResourceGroup is a set of resources that itself can be securely accessed via the virtual memory system. This is generally only ever a PML3 (which can contain one or more PML2, PML1, or single-page sized Resources), a PML2 (which can thus contain a set of PML1 or single-paged sized resources), or a very compact PML1 containing potentially a single-page Resource in one or more of its entries. A process can access a ResourceGroup if and only if it is accessible via a virtual address during the runtime of that Process. That is, if that ResourceGroup is currently mapped into the page table hierarchy of the Process. Consequentially, the one or more Resources that are themselves mapped into the ResourceGroup page table structure are themselves accessible by a Process if and only if that ResourceGroup itself is accessible. The access control and subsequent security is provided by the virtual memory hardware. There is nothing actually special about a ResourceGroup as it is just a Resource. It is just a useful concept that is given this special name for clarity. A Process can add a Resource to a ResourceGroup by simply allocating (or being granted) a Resource inside the address space of the ResourceGroup. The kernel does not see any distinction here. When a ResourceGroup is freed, however, the deallocation via walking the page table structure will also free any `Owner` marked Resource roots, which cascade the action by freeing these internal Resource objects along with their parent. This is abnormal behavior, however, since a competent application deallocates its own Resource objects page by page on its own.
|
||||
|
||||
A region of memory is the most obvious resource to encapsulate. This is just a set of pre-allocated and marked physical pages that are organized into a page table structure rooted by a PML3, PML2, or PML1 with the necessary leaves mapping to the aforementioned marked physical pages. When the root of this Resource is mapped into a Process, it is then accessible. If it is granted to a second Process, that memory is shared potentially with different access flags. Interestingly, the page table primitives allow for sections of the memory Resource to be marked independently readonly or no-execute in parts divisible by the page size of the architecture and its available access control flags. A Process can also map the Resource with particular flags that will take precedence. For instance, if the Process can only attach the Resource to itself read-only, the technically writable pages within will be inaccessible for writes when the root page table of the Resource is mapped into a page table entry within the Process page table with the read-only flag set. When granting a Resource as read-only, the target Process can read the data but not modify it with the same page tables as the owning Process, which can write to it freely. Only the owning Process can affect the page table flags of the intermediate page tables and entries within. It does this with the memory primitive function `CHMOD_PAGE` described in [MEMORY](MEMORY.md).
|
||||
|
||||
A memory-mapped IO region (MMIO) that controls a piece of hardware via reads and writes of otherwise non-tangible memory addresses are another obvious choice for a resource. This can provide a means of driving some hardware and perhaps controlling atomic, single-unit access to some device via ensuring that it is only mapped into one particular Process page table at a time. There are interesting implications of controlling hardware access via the same virtual memory system as normal memory regions. For one, page faults can indicate that the hardware is currently inaccessible if the kernel has unmapped it in order to pre-emptively schedule its use elsewhere. Another is the use of read-only flags to allow a process to perhaps read the state of a device but not drive it which can allow consumers of hardware states while securely multiplexing that device for single-use writing elsewhere. One of the main goals of an exokernel is to allow multiple driver implementations to co-exist to suit the needs of disparate applications. This allows such an ecosystem by allowing userspace pre-emption of devices so that their own library OSes can drive hardware. This is only strengthened by hardware device and DMA virtualization, such as SR-IOV and IOMMU, which allow for more efficiently multiplexing and DMA that can be mapped via virtual addresses and controlled by userspace code.
|
||||
|
||||
The kernel system calls related to resource allocation and mapping have an enumerated return value to indicate the error or either 0 or an id if successful. The `ResourceError` error codes are as follows:
|
||||
|
||||
* `0`: `SUCCESS` - Successful with no other return value.
|
||||
* `1`: `INVALID_FLAGS` - Invalid flags were specified.
|
||||
* `2`: `NOT_FREE` - The physical page is not free and cannot be used to allocate a structure.
|
||||
* `4`: `INVALID_SOURCE` - The given virtual address for the resource is invalid. When allocating, this means that the `resourceAddress` specified is not free. On other calls, it means there's no valid resource at `resourceAddress`.
|
||||
* `5`: `INVALID_TARGET` - The given virtual address for the target resource when changing owner is invalid or the virtual address to hold the newly allocated Resource is invalid.
|
||||
* `6`: `NO_ROOM` - We reached the maximum number of resources available in the system or we cannot fit the requested Resource size.
|
||||
|
||||
The kernel offers several primitive functions to faciliate the creation and control of resources:
|
||||
|
||||
* `ALLOC_RESOURCE(physicalAddress, resourceAddress) -> ResourceError | resourceId` - Creates an empty Resource with an implied depth using the given unallocated `physicalAddress` as the page. The `resourceAddress` is the virtual address of the userspace page table entry (PTE) to write the Resource root using the `PML4[510]` recursive route. The `depth` is thus determined by the `resourceAddress` by virtue of the type of page table structure the PTE would be written to. If the PTE is written to a PML3 in the Process, then the Resource must be a PML2 root and have a depth of 2. The `depth` then determines the size of the virtual address space the Resource will represent. The kernel will apply the `Owner` flag on the Resource when mapping it to the Process address space at the given `resourceAddress`. The kernel will essentially create the root page table for the Resource and nothing else while mapping that into the Process address space. Given that this succeeds, it will return the `resourceId` for the Resource, which is the virtual address in the kernel that points to kernel-space memory for the root page table of that Resource. **Note**: This does not return `0` on success. Instead, it returns a `resourceId` which is always, unsigned, larger than `0xff`. This fails with `INVALID_TARGET` if the `resourceAddress` points to a page table structure that does not exist or is not owned by the calling Process. This fails with `NO_ROOM` if we have hit the Resource cap or there's no room for it. This fails with `NOT_FREE` if the provided `physicalAddress` is not actually free.
|
||||
* `FREE_RESOURCE(resourceId, resourceAddress) -> ResourceError` - Frees an existing Resource via the resource id. Fails with `INVALID_SOURCE` if the Resource does not exist. A grantee of a Resource can also call this which will unmap the Resource. Resources are reference counted. When the owner frees the Resource, it is revoked from any Process that had been granted it.
|
||||
* `GRANT_RESOURCE(resourceId, resourceAddress, processId, targetResourceAddress, flags) -> ResourceError` - Attaches a shared copy of the Resource to the given Process. This will effectively add a page table entry signified by the `targetResourceAddress` in the target Process with the given `flags` and also give it the `Grant` flag. If any `flags` are invalid, it will fail with the `INVALID_FLAGS` error. Resources are reference counted. This will increment the reference count. The Resource can be unmapped from the target Process by calling `REVOKE_RESOURCE` with the same arguments. The Resource will be forcibly revoked if the owning Process calls `FREE_RESOURCE` on this Resource or otherwise has this Resource deallocated at cleanup. Fails with `INVALID_SOURCE` if the Resource does not exist or is not owned by the calling Process. Fails with `INVALID_TARGET` if the given Process does not have the `targetResourceAddress` free, that is it is already occupied in its own page table. Fails with `INVALID_TARGET` if the given Process provided by `processId` does not exist. Fails with `INVALID_SOURCE` if the `resourceAddress` is not the same Resource as indicated by `resourceId`.
|
||||
* `REVOKE_RESOURCE(resourceId, resourceAddress, processId, targetResourceAddress) -> ResourceError` - Detaches a Resource from a grantee that had been granted this Resource via `GRANT_RESOURCE` earlier. This unmaps the page table root for this Resource from the target Process given by `processId`. This fails with `INVALID_SOURCE` if the Resource does not exist or is not owned by the current Process. Fails with `INVALID_TARGET` if the given Process does not have the `targetResourceAddress` mapped in or if the `targetResourceAddress` is not the Resource indicated by `resourceId`.
|
||||
* `CHOWN_RESOURCE(resourceId, resourceAddress, processId, targetResourceAddress) -> ResourceError` - Updates the owner of the Resource to the given Process. Atomically swaps the `Owner` flags on each Process's mapped in Resource. Fails with `INVALID_SOURCE` if the Resource does not exist or is not owned by the calling Process. Fails with `INVALID_SOURCE` if the given Process does not have the Resource mapped in. Fails `INVALID_TARGET` if the given Process provided by `processId` does not exist. Fails with `INVALID_TARGET` if the given Resource provided by `targetResourceAddress` is not mapped into the given Process provided by `processId` or if that Resource is not the same Resource as indicated by `resourceId`.
|
||||
|
||||
The kernel creates a Resource by using this procedure:
|
||||
|
||||
* Allocate a root page table by allocating a single physical page.
|
||||
* Map this into the kernel memory space. The kernel maintains all created resources in a single root PML3. So, it can manage up to 512 3-level resources. It would be rare to create a 3-level resource (512 GiB of virtual space), so we can expect mostly 1GiB Resources which are rooted at PML2 and 2MiB resources rooted at the PML1 level. The kernel can keep 256M second-level (1GiB) resources mapped in at a time. The identifier for a Resource is, then, the virtual address that uses the recursive mapping (PML4[510]) to point to the root table of the Resource. Whenever a Process gives the Resource id, the kernel can securely check that the physical page pointed to by that virtual addres is also present within the Process's root page table.
|
||||
* The kernel places a record in its hash table for mapping the physical address of that root page to the virtual address which serves as that Resource id. This is so it may look it up in the event that it needs to deallocate the Resource forcibly at any point. Normally, the application will always deallocate the Resource at some point before it ends execution. However, the program may crash.
|
||||
* Then, it can map this Resource page table into the Process so that the Process now owns the Resource. It does this by mapping the new resource into the position requested by the Process via `resourceAddress`. The virtual address given by `resourceAddress` is effectively pointing to the same physical page that the kernel's own copy of the Resource is mapped to, which is known as `resourceId`. Therefore, this provides the kernel with a constant-time check for ownership: the `resourceId` is a virtual address which is pointing to the same physical address as the virtual address of `resourceAddress`. It should map the resource in with the `Owner` flag since it was the calling Process that created the Resource. (See `docs/MEMORY.md` for information about these flags)
|
||||
* It can then return to the userspace application the `resourceId` which is a virtual address constructed using the recursive page table entry (`PML4[510]`) to point to the root page table for the Resource.
|
||||
|
||||
The kernel has the following check of ownership, which it checks on every Resource operation:
|
||||
|
||||
* `is_mapped(resourceId, resourceAddress) -> boolean` - This looks at the `resourceId` and ensures that it is a virtual address that runs through the kernel's Resource map. Let's say that we maintain a mapping of all resources on `PML4[508]`, so the 508th index of the root page table points to a PML3 that contains, as leaves of the tree, the root page tables of all known Resource objects. We can then tell very easily if `resourceId` is a virtual address that uses the recursive entry (`PML4[510]`) to point to the physical page of the Resource root. That physical page must be the same one pointed to by the given `resourceAddress`. We also verify that the `resourceAddress` is not in higher memory, which is always owned by privileged kernel code. If all of these hold true, the current calling Process owns the given Resource.
|
||||
|
||||
The kernel grants a Resource to another Process by using this procedure:
|
||||
|
||||
11
docs/UPCALL.md
Normal file
11
docs/UPCALL.md
Normal file
@@ -0,0 +1,11 @@
|
||||
An upcall is an entrypoint to a Process. For more information about the semantics of a Process, see [PROCESS](PROCESS.md). An entrypoint is the instruction that is jumped to when yielding to the Process. Every Process has at least one valid upcall to be functional: the `onYield` upcall. This is the entrypoint when the Process is simply the target of a yield call. This contains the value of the instruction pointer that will be established as the kernel yields CPU control to that Process.
|
||||
|
||||
Only the Process itself or its owning parent can establish the upcalls for a Process. It does so via the `MAP_UPCALL` system call which sets the given address into the corresponding entry in the kernel's Process metadata.
|
||||
|
||||
The other type of upcall is `onFault` which occurs when a page fault happens during the runtime of the Process or when a different Process faulted on an address of a Resource owned by the targeted Process. This upcall also gets the `processId` of the Process active during the fault. It then also has the normal context of the fault available by the conventions available on the particular hardware.
|
||||
|
||||
Here are the available enumerated upcalls securely faciliated by the exokernel:
|
||||
|
||||
* `1`: `YIELD` - `onYield` upcall which is just a normal execution path.
|
||||
* `2`: `FREE` - `onFree` upcall is a hint that the Process is about to be deallocated which can only be invoked by an owning Process.
|
||||
* `3`: `FAULT` - `onFault` upcall happens on a page fault.
|
||||
Reference in New Issue
Block a user