Files
xomb/docs/RESOURCE.md

15 KiB

A Resource is a region of memory or some other entity that can be adequately and securely accessed via the virtual memory system. It is represented as a PML3, PML2, PML1, or as a single page depending on the needs and size of that resource. A Process has access to a Resource if and only if that Resource is accessible via a virtual address during the runtime of that Process. That is, the Resource is currently mapped into the page table hierarchy of the Process, which is itself represented by a PML4. The kernel keeps track of all Resources via them being mapped into the virtual address space of the kernel. A Resource is then identified via a 64-bit identifier that is exactly the virtual address within that mapping of the root page table structure (PML3, PML2, PML1, or single page via the recursive entry at PML4[510]) of that Resource. We often lovingly refer to these by another name: a Gib.

A ResourceGroup is a set of resources that itself can be securely accessed via the virtual memory system. This is generally only ever a PML3 (which can contain one or more PML2, PML1, or single-page sized Resources), a PML2 (which can thus contain a set of PML1 or single-paged sized resources), or a very compact PML1 containing potentially a single-page Resource in one or more of its entries. A process can access a ResourceGroup if and only if it is accessible via a virtual address during the runtime of that Process. That is, if that ResourceGroup is currently mapped into the page table hierarchy of the Process. Consequentially, the one or more Resources that are themselves mapped into the ResourceGroup page table structure are themselves accessible by a Process if and only if that ResourceGroup itself is accessible. The access control and subsequent security is provided by the virtual memory hardware. There is nothing actually special about a ResourceGroup as it is just a Resource. It is just a useful concept that is given this special name for clarity. A Process can add a Resource to a ResourceGroup by simply allocating (or being granted) a Resource inside the address space of the ResourceGroup. The kernel does not see any distinction here.

A region of memory is the most obvious resource to encapsulate. This is just a set of pre-allocated and marked physical pages that are organized into a page table structure rooted by a PML3, PML2, or PML1 with the necessary leaves mapping to the aforementioned marked physical pages. When the root of this Resource is mapped into a Process, it is then accessible. If it is granted to a second Process, that memory is shared potentially with different access flags. Interestingly, the page table primitives allow for sections of the memory Resource to be marked independently readonly or no-execute in parts divisible by the page size of the architecture and its available access control flags. A Process can also map the Resource with particular flags that will take precedence. For instance, if the Process can only attach the Resource to itself read-only, the technically writable pages within will be inaccessible for writes when the root page table of the Resource is mapped into a page table entry within the Process page table with the read-only flag set. When granting a Resource as read-only, the target Process can read the data but not modify it with the same page tables as the owning Process, which can write to it freely. Only the owning Process can affect the page table flags of the intermediate page tables and entries within. It does this with the memory primitive function CHMOD_PAGE described in MEMORY.

A memory-mapped IO region (MMIO) that controls a piece of hardware via reads and writes of otherwise non-tangible memory addresses are another obvious choice for a resource. This can provide a means of driving some hardware and perhaps controlling atomic, single-unit access to some device via ensuring that it is only mapped into one particular Process page table at a time. There are interesting implications of controlling hardware access via the same virtual memory system as normal memory regions. For one, page faults can indicate that the hardware is currently inaccessible if the kernel has unmapped it in order to pre-emptively schedule its use elsewhere. Another is the use of read-only flags to allow a process to perhaps read the state of a device but not drive it which can allow consumers of hardware states while securely multiplexing that device for single-use writing elsewhere. One of the main goals of an exokernel is to allow multiple driver implementations to co-exist to suit the needs of disparate applications. This allows such an ecosystem by allowing userspace pre-emption of devices so that their own library OSes can drive hardware. This is only strengthened by hardware device and DMA virtualization, such as SR-IOV and IOMMU, which allow for more efficiently multiplexing and DMA that can be mapped via virtual addresses and controlled by userspace code.

The kernel system calls related to resource allocation and mapping have an enumerated return value to indicate the error or either 0 or an id if successful. The ResourceError error codes are as follows:

  • 0: SUCCESS - Successful with no other return value.
  • 1: INVALID_FLAGS - Invalid flags were specified.
  • 2: NOT_FREE - The physical page is not free and cannot be used to allocate a structure.
  • 4: INVALID_SOURCE - The given virtual address for the resource is invalid. When allocating, this means that the resourceAddress specified is not free. On other calls, it means there's no valid resource at resourceAddress.
  • 5: INVALID_TARGET - The given virtual address for the target resource when changing owner is invalid or the virtual address to hold the newly allocated Resource is invalid.
  • 6: NO_ROOM - We reached the maximum number of resources available in the system or we cannot fit the requested Resource size.

The kernel offers several primitive functions to faciliate the creation and control of resources:

  • ALLOC_RESOURCE(physicalAddress, resourceAddress) -> ResourceError | resourceId - Creates an empty Resource with an implied depth using the given unallocated physicalAddress as the page. The resourceAddress is the virtual address of the userspace page table entry (PTE) to write the Resource root using the PML4[510] recursive route. The depth is thus determined by the resourceAddress by virtue of the type of page table structure the PTE would be written to. If the PTE is written to a PML3 in the Process, then the Resource must be a PML2 root and have a depth of 2. The depth then determines the size of the virtual address space the Resource will represent. The kernel will apply the Owner flag on the Resource when mapping it to the Process address space at the given resourceAddress. The kernel will essentially create the root page table for the Resource and nothing else while mapping that into the Process address space. Given that this succeeds, it will return the resourceId for the Resource, which is the virtual address in the kernel that points to kernel-space memory for the root page table of that Resource. Note: This does not return 0 on success. Instead, it returns a resourceId which is always, unsigned, larger than 0xff. This fails with INVALID_TARGET if the resourceAddress points to a page table structure that does not exist or is not owned by the calling Process. This fails with NO_ROOM if we have hit the Resource cap or there's no room for it. This fails with NOT_FREE if the provided physicalAddress is not actually free.
  • GRANT_RESOURCE(resourceId, resourceAddress, processId, targetResourceAddress, flags) -> ResourceError - Attaches a shared copy of the Resource to the given Process. This will effectively add a page table entry signified by the targetResourceAddress in the target Process with the given flags and also give it the Grant flag. If any flags are invalid, it will fail with the INVALID_FLAGS error. Resources are reference counted. This will increment the reference count. The Resource can be unmapped from the target Process by calling REVOKE_RESOURCE with the same arguments. Fails with INVALID_SOURCE if the Resource does not exist or is not owned by the calling Process. Fails with INVALID_TARGET if the given Process does not have the targetResourceAddress free, that is it is already occupied in its own page table. Fails with INVALID_TARGET if the given Process provided by processId does not exist. Fails with INVALID_SOURCE if the resourceAddress is not the same Resource as indicated by resourceId.
  • REVOKE_RESOURCE(resourceId, resourceAddress, processId, targetResourceAddress) -> ResourceError - Detaches a Resource from a grantee that had been granted this Resource via GRANT_RESOURCE earlier. This unmaps the page table root for this Resource from the target Process given by processId. This fails with INVALID_SOURCE if the Resource does not exist or is not owned by the current Process. Fails with INVALID_TARGET if the given Process does not have the targetResourceAddress mapped in or if the targetResourceAddress is not the Resource indicated by resourceId.
  • CHOWN_RESOURCE(resourceId, resourceAddress, processId, targetResourceAddress) -> ResourceError - Updates the owner of the Resource to the given Process. Atomically swaps the Owner flags on each Process's mapped in Resource. Fails with INVALID_SOURCE if the Resource does not exist or is not owned by the calling Process. Fails with INVALID_SOURCE if the given Process does not have the Resource mapped in. Fails INVALID_TARGET if the given Process provided by processId does not exist. Fails with INVALID_TARGET if the given Resource provided by targetResourceAddress is not mapped into the given Process provided by processId or if that Resource is not the same Resource as indicated by resourceId.

The kernel creates a Resource by using this procedure (ALLOC_RESOURCE):

  • Allocate a root page table for the Resource using the given physicalAddress and fail with NOT_FREE if that physical page is not free.
  • Map this into the kernel memory space. The kernel maintains all created resources in a single root PML3. So, it can manage up to 512 3-level resources. It would be rare to create a 3-level resource (512 GiB of virtual space), so we can expect mostly 1GiB Resources which are rooted at PML2 and 2MiB resources rooted at the PML1 level. The kernel can keep 256M second-level (1GiB) resources mapped in at a time. The identifier for a Resource is, then, the virtual address that uses the recursive mapping (PML4[510]) to point to the root table of the Resource. Whenever a Process gives the Resource id, the kernel can securely check that the physical page pointed to by that virtual addres is also present within the Process's root page table.
  • The kernel places a record in its hash table for mapping the physical address of that root page to the virtual address which serves as that Resource id. This is so it may look it up in the event that it needs to deallocate the Resource forcibly at any point. Normally, the application will always deallocate the Resource at some point before it ends execution. However, the program may crash.
  • Then, it can map this Resource page table into the Process so that the Process now owns the Resource. It does this by mapping the new resource into the position requested by the Process via resourceAddress. The virtual address given by resourceAddress is effectively pointing to the same physical page that the kernel's own copy of the Resource is mapped to, which is known as resourceId. Therefore, this provides the kernel with a constant-time check for ownership: the resourceId is a virtual address which is pointing to the same physical address as the virtual address of resourceAddress. It should map the resource in with the Owner flag since it was the calling Process that created the Resource. (See docs/MEMORY.md for information about these flags)
  • It can then return to the userspace application the resourceId which is a virtual address constructed using the recursive page table entry (PML4[510]) to point to the root page table for the Resource.

The kernel has the following check of ownership, which it checks on every Resource operation:

  • is_mapped(resourceId, resourceAddress) -> boolean - This looks at the resourceId and ensures that it is a virtual address that runs through the kernel's Resource map. Let's say that we maintain a mapping of all resources on PML4[508], so the 508th index of the root page table points to a PML3 that contains, as leaves of the tree, the root page tables of all known Resource objects. We can then tell very easily if resourceId is a virtual address that uses the recursive entry (PML4[510]) to point to the physical page of the Resource root. That physical page must be the same one pointed to by the page table entry given resourceAddress. We also verify that the resourceAddress is not in higher memory, which is always owned by privileged kernel code. If all of these hold true, the current calling Process owns the given Resource.

The kernel grants a Resource to another Process by using this procedure (GRANT_RESOURCE):

  • Validates that the resourceId matches the mapped in resourceAddress and otherwise fails with INVALID_SOURCE.
  • Validates that the current Process owns the given Resource.
  • Validates that the processId is a valid process otherwise fails with INVALID_TARGET.
  • Validates that the targetResourceAddress is pointing to a valid and empty page table entry that is owned by the target Process. Otherwise fails with INVALID_TARGET.
  • Writes the page table entry in the target Process to point to the same physical address as resourceAddress but without the Owner flag and with the Grant flag.
  • Returns SUCCESS to the original calling Process.

The kernel revokes a Resource from another Process by using this procedure (REVOKE_RESOURCE):

  • Validates that the resourceId matches the mapped in resourceAddress and otherwise fails with INVALID_SOURCE.
  • Validates that the current Process owns the given Resource.
  • Validates that the processId is a valid process otherwise fails with INVALID_TARGET.
  • Validates that the targetResourceAddress is pointing to a valid page table entry that is owned by the target Process and points to the given Resource. Otherwise fails with INVALID_TARGET.
  • Voids the page table entry in the target Process.
  • Returns SUCCESS to the original calling Process.

The kernel changes ownership of a Resource by using this procedure (CHOWN_RESOURCE):

  • Validates that the resourceId matches the mapped in resourceAddress and otherwise fails with INVALID_SOURCE.
  • Validates that the current Process owns the given Resource.
  • Validates that the processId is a valid process otherwise fails with INVALID_TARGET.
  • Validates that the targetResourceAddress is pointing to a valid page table entry that is owned by the target Process and points to the given Resource. Otherwise fails with INVALID_TARGET.
  • Atomically swaps the page table entries. Failing atomicity, it can set the Grant bit and clear the Owner in the calling Process and then set the Owner bit and clear the Grant bit on the target Process.
  • Returns SUCCESS to the original calling Process.