Each container was taking up more than 3KB of space at boot-time structures
This was due to having 4 pagers and 32 boot-time capabilities for each. This
caused the boot-time kernel size to vary a lot with capabilities. The new
numbers are optimum.
Particularly we always have a single pager per container, even though the
array structures allow more. Single pager makes container-wide privileges
and management simpler.
It is not very straightforward to reach a capabilities list. We
now use a single function to find out a capability by its id and
its list, since the two are used frequently together (i.e. cap
removal and destruction)
Implemented a protocol between a client and its pager to
request and get a capability to ipc to another client of the pager.
Pager first ensures the request is valid from its client.
It then tries to use a greater capability that it possesses, to
produce a new capability that the client requested. Once the kernel
validates the correct one and replicates/reduces it to client's
need, it grants it to the client.
In posix, test0 makes inter-space ipc for testing extended ipc. This
correctly fails when only the cap to ipc to pager is given to all tasks
in the container.
In order to overcome this problem, the tasks who fork for doing ipc to
each other make a request to the pager to get capabilities to do so.
Pager finds its own widened ipc capability over the container, replicates
it, validates and reduces it to desired boundaries (i.e. just ipc betw.
two spaces) and grants it as IMMUTABLE to requesting tasks.
This protocol may be useful in implementing a client/server capability
request relationship. Code builds but untested.
Thread ids now contain their container ids in the top 2 nibbles.
Threads on other containers can be addressed by changing those
two nibbles. The addressing of inter-container threads are
subject to capabilities.
Task ids are now unsigned as the container ids will need to be encoded
in the id fields as well.
For requests who require even more comprehensive id input, (such as
thread creation) also added is the container id so that threads
_could_ potentially be created in other containers as well.
VIRTMEM and PHYSMEM are theoretically separate resources to be
protected than a MAP resource, which is meant to protect the syscall
privileges.
In practice MAP is always used together with a VIRTMEM and a PHYSMEM
resource, therefore reach VIRTMEM/PHYSMEM resource is now merged with
the MAP capability, combining the micro-permission bits.
We still have to have the pager structs because they possess intermediate
data during boot up such as for transferring of capability lists to
boot stack one-by-one, and then to newly generated ktcbs.
Previously all pending events were handled on return of exceptions
in process context. This was causing threads that run in userspace
and take no exceptions not handle their pending events indefinitely.
Now scheduler handles them in irq context as well.
Multi-threaded apps can now wait on children to destroy.
WAIT_ON is useful when a child exists with an exit code and the pager
of the child does not want to take the hassle of destorying it via an
ipc. It provides an alternative method of synchronous thread destruction,
where the child destroys itself directly rather than the parent issuing
a destroy on it explicitly.
Reiterating again to simplify:
Working:
- Pager issues destroy, client also issues exit
they work in sync.
Missing
- Pager killing itself
- Pager killing all children while killing itself
- Pager waiting on children
One is related to the time distribution when a new child is created.
If the parent has one tick left, then both child and parent received
zero tick. When combined with
current_irq_nest_count = 1
voluntary_preempt = 0
values, this caused the scheduler from being invoked.
Second is related to the overall time distribution. When a thread
runs out of time, its new time slice is calculated by the below
formula:
new_timeslice = (thread_prio * SCHED_TICKS) / total_prio
If we consider total_prio is equal to the sum of the priorities of
all the threads in the system, it imposes a problem of getting
zero tick. In the new scenario, total_prio is equal to the priority
types in the system so it is fixed. Every thread gets a timeslice
in proportion of their priorities. Thus, there is no risk of taking
zero tick.
We moved initial list of a pager's caps from ktcb to task's space
since the task is expected to trust its space.
Most references to task->cap_list had to change. Although a single
cap list only tells part of the story about the task's caps, the
TASK_CAP_LIST macro works for us to get the first private set of
caps that a task has.
Capability checking for thread_control, exregs, mutex, cap_control,
ipc, and map system calls.
The visualised model is implemented in code that compiles, but
actual functionality hasn't been tested.
Need to add:
- Dynamic assignment of initial resources matching with what's
defined in the configuration.
- A paged-thread-group, since that would be a logical group of
seperation from a capability point-of-view.
- Resource ids for various tasks. E.g.
- Memory capabilities don't have target resources.
- Thread capability assumes current container for THREAD_CREATE.
- Mutex syscall assumes current thread (this one may not need
any changing)
- cap_control syscall assumes current thread. It may happen to
be that another thread's capability list is manipulated.
Last but not least:
- A simple and easy-to-use userspace library for dynamic expansion
of resource domains as new resources are created such as threads.
Pagers can now share their own private capabilities with their
paged children, or their siblings with whom they have a common pager
ancestor.
Added flags CAP_SHARE_CHILD and CAP_SHARE_SIBLINGS for that.
Notion of pager hierarchy introduced using the existing but unused
pagerid field.
Thread creation now has two more flags TC_AS_PAGER and TC_SHARE_PAGER.
The former sets creator as pager, the latter sets creator's pager as pager.
Thread group capability sharing now correctly carries shared capabilities
to the thread group leader's tgr_cap_list list, and this list is checked
during capability checking.