13 KiB
A Process is just a page table. It is a PML4 page table that is placed as the current page table (via CR3 on x86-64) when the Process is to be run. Any Resource that is accessible by a Process is any Resource that is mapped into that Process virtual address space via that root page table. The kernel maintains the allocated Process structures as leaves in a PML3 space off of PML4[507]. Even though a Process is a root page table (PML4), they are stored as PML2 nodes within this Process mapping space. That means the kernel can be aware of a total of 262K active processes and be able to view and manipulate the Process root page table's PML4 and PML3 entries. In order for the kernel to manipulate the PML2 or PML1 entries of a non-active Process, the kernel has to context switch to the Process first, and perhaps context switch back before returning. This means one Process manipulating another Process address space in this way would be a relatively expensive operation since it would incur a TLB flush in most cases.
These Process objects are otherwise handled the same way as any Resource (see RESOURCE.md for more information about these primitives.) A Process has an identifier (a processId) that is constructed as the kernel-space virtual address of that root page table from within that PML4[507] virtual space.
When a process wants to create a child process, it simply has the kernel create a process page table (a PML4) from the given physical page. It will map that process root page table as an intermediate page table in the address space of the calling Process to the virtual address desired by the calling Process. From here, the parent Process will need to allocate the subsequent Resource for the child Process that will contain the code and data for that eventual program. It then grants each Resource to the child Process. Then it optionally changes the owner to that child Process so that this Process can further affect its own page table on its own. It can then yield to the child Process.
Yielding to a process is just a system call (YIELD_PROCESS) to switch address spaces to the address space of the given process. The process is identified by its root page table in memory which is typically called processId. So, yield takes one argument: The processId, which is the virtual address of a process's root page table from within the kernel's process map. The yield call is used for cooperative scheduling. Preemption is done by a user process via a libos process scheduler that listens to the system timer. The kernel does not implement preemption.
There is no fork action in the exokernel proper. Implementing a fork semantic would just be to create a Process and then grant each Resource to the new process. Any extra semantics one wishes to impose to satisfy whatever definition of 'forking' an OS has in mind is up to that library OS process implementation.
Any parent Process "owns" the child Process. Ownership is related to the flags set when mapping in that child Process root page table into the parent Process. That page table entry that points to the child's would-be PML4 is marked with the Owner flag. When the kernel wants to validate ownership (the parent status) of the calling Process, it looks at the Process-local Resource pointed to via the processAddress and looks for the Owner flag and also looks to see that the physical page pointed to by that page table entry is the same as that PML4 of the child Process via the processId and walking the structure within the process map (PML4[507]) in the kernel's memory. They need to match to confirm that relationship.
Since the kernel maintains the mapping of Process PML4s as PML2s off of PML4[507], the leaves of this virtual address space point to the potential PML2s of that child Process. It is worth noting that when a Process is allocated, it is only an empty PML4. It is up to library OSes in userspace to denote what the page table structure of a Process actually looks like. However, considering a saturated page table structure for a hypothetical Process, the kernel mapping can always manipulate the PML2s of any running Process. This means that granting any PML2 (or larger) Resource to another Process can be done without context switching away from the granting Process. It also means that loading a new Process (say from a shell application) can be done by:
- Allocating a PML4 via
ALLOC_PROCESSand mapping it into a PML2 of the calling parent Process. - Allocating a PML3 page in that child Process by allocating the PML1 in the appropriate place in the parent Process.
- Allocating a PML2-sized Resource in the parent at some PML2 in the calling parent Process.
- Loading the executable code into the PML2 by allocating the necessary page structure and leaf pages and filling them with executable and data content from a binary executable.
- Granting the PML2-sized Resource to the child Process via
GRANT_RESOURCE. This affectively updates the PML3 in the child to point to the PML2 root of the calling Process. - Changing the owner for this Resource via
CHOWN_RESOURCEwhich now gives ownership to the child Process.
All of these require absolutely no context switching and can be done with the global page table mapping in the kernel. It possible to grant a PML1-sized (2MB on x86-64) Resource to another Process and modify the page table of that Process without a context switch. With very careful Resource management, most metadata and Process page structures are visible from the global page table available to the kernel and manipulation can be done cheaply.
The kernel also maintains a backward reference that maps physical addresses of the root page tables to Process metadata. This metadata contains the actual processId that would normally be used and also the kernel-aware upcalls. The onYield upcall and onFree upcall are virtual addresses that will be the instruction pointers upon yielding to the Process. Each of these upcalls will pass a single argument of the last running processId. The onFree upcall occurs when a parent Process is hinting that it is about to deallocate the Process page table so the child Process can responsibly respond and deallocate its own resources. Generally, yielding back to the calling Process is expected. For more information about upcalls refer to UPCALL.
The kernel system calls related to process allocation and mapping have an enumerated return value to indicate the error or 0 if successful. The ProcessError error codes are as follows:
0:SUCCESS- Successful operation with no other return value expected.1:INVALID_FLAGS- Invalid flags were specified.2:NOT_FREE- The physical page is not free and cannot be used to allocate a structure.4:INVALID_SOURCE- The given virtual address for the resource is invalid. When allocating, this means that theresourceAddressspecified is not free. On other calls, it means there's no valid resource atresourceAddress.5:INVALID_TARGET- The processId to yield to was invalid.6:NO_ROOM- We reached the maximum number of processes.
The kernel offers several primitive functions to faciliate the creation and control of processes:
ALLOC_PROCESS(physicalAddress, processAddress) -> ProcessError | processId- Spawns a child process that will be mapped to the provided userspace address of the calling Process. Returns theprocessId, which is a virtual address in kernel space that, via the recursive page table entry (PML4[510]), points to the root page table of the new process. TheprocessAddressneeds to be a valid page table entry to attach the root page table for the child Process using the recursive entryPML4[510]to do so. Fails withINVALID_TARGETif the page table structure does not exist or not owned by the calling Process. This fails withNO_ROOMif there are no available processes left in the system because the kernel's process map is full. This fails withNOT_FREEif thephysicalAddressspecified is not actually free.YIELD_PROCESS(processId) -> ProcessError- Cooperatively yields to the given Process. TheonYieldupcall of the target Process will be passed the calling ProcessprocessId. Effectively, this swaps the current root page table (PML4 via CR3) to the one for the given process. The process has to deal with restoring its own state. There's no context being stored because the kernel is effectively stateless except for maintaining access to all address spaces. On success, this function never returns. If theprocessIdis in any way invalid, it will return with theINVALID_TARGETerror.FREE_PROCESS(processId, processAddress) -> ProcessError- Yields to a child process that was previously allocated via an earlyALLOC_PROCESScall to allow it to deal with closing itself. It calls theonFreeupcall while passing the current ProcessprocessId. The calling Process might expect that the target Process yield back. Fails withINVALID_SOURCEif the calling Process is not the parent of the given Process (that is,processAddressdoes not point to the same physical page as the PML4 within the address space rooted atprocessId). It does not return on success. The parent Process effectively frees a child Process on its own during its next quantum by deallocating all pages of the child Process.
The kernel has the following check (which is identical to the one for checking if a Process owns a Resource) of whether or not the calling Process is the parent of the given Process, which it checks on the Process free operation:
is_mapped(processId, processAddress) -> boolean- This looks at theprocessIdand ensures that it is a virtual address that runs through the kernel's Process map. Let's say that we maintain a mapping of all processes onPML4[507], so the 507th index of the root page table points to a PML3 that contains, as leaves of the tree, the root page tables of all known Process objects. We can then tell very easily ifprocessIdis a virtual address that uses the recursive entry (PML4[510]) to point to the physical page of the Process root. That physical page must be the same one pointed to by the givenprocessAddress. We also verify that theprocessAddressis not in higher memory, which is always owned by privileged kernel code. If all of these hold true, the current calling Process owns the given Process as it is the parent of the given Process.
The kernel uses this procedure to create a Process (ALLOC_PROCESS):
- Marks the given
physicalAddressallocated (and otherwise fails withNOT_FREE) - Uses this physical page as the root page table of the child Process.
- Maps this root page table into the calling Process in the page table entry it specified.
- It marks this page table entry with the
Ownerflag. - It maps this root page table into the Process map (
PML4[507]) such that it is a PML2 there. - It determines the
processIdwhich is the virtual address that points to that PML2 in the Process map. - It stores this
processIdalong with other initial metadata as the value in the Process hash using the physical address as the key. This can be used to forcibly kill a child Process when the owning parent Process is forcibly killed itself or crashes. Other metadata contained within here are the destination addresses of upcalls. - It returns the
processIdto the calling Process.
The kernel uses this procedure to yield to a Process (YIELD_PROCESS):
- Validates that
processIdis a virtual address that points into the Process map atPML4[507]. Fails withINVALID_TARGETif it is not. - Parses the
processIdto determine the physical address of that root page table. - Pulls out the process metadata from the Process hash.
- Switches the root page table of the system to this one.
- Flushes any virtual address translation caches (TLB, etc)
- Returns to the
onYieldupcall by looking it up within the process metadata.
The kernel uses this procedure to free a Process (FREE_PROCESS):
- Validates that
processIdis a virtual address that points into the Process map atPML4[507]. Fails withINVALID_SOURCEif it is not. - Parses the
processIdto determine the physical address of that root page table. - Validates ownership of the calling Process by examining for equality the physical page pointed to by the given
processAddressis the same as the root page table of the given Process. IfprocessAddressis not a valid page table entry or does not equal the expected address, it fails withINVALID_SOURCE. - Yields to the
onFreeupcall of the other process while passing the calling ProcessprocessId. - Does not return to the calling Process. The calling Process must understand that it needs to listen to a timer to preempt and properly delete the child if it wants to ensure that the child Process is gone.
- Note: When the calling Process regains control, it can just remove the root page table of the child Process, effectively stopping it from existing. The
FREE_PROCESScall is just here to provide a kernel means of securely calling theonFreeupcall.