Pagers can now share their own private capabilities with their
paged children, or their siblings with whom they have a common pager
ancestor.
Added flags CAP_SHARE_CHILD and CAP_SHARE_SIBLINGS for that.
Notion of pager hierarchy introduced using the existing but unused
pagerid field.
Thread creation now has two more flags TC_AS_PAGER and TC_SHARE_PAGER.
The former sets creator as pager, the latter sets creator's pager as pager.
Thread group capability sharing now correctly carries shared capabilities
to the thread group leader's tgr_cap_list list, and this list is checked
during capability checking.
Capabilities will be shared among collection of threads. A pager
will have a right to share its own capabilities with its space,
its thread group and its container.
Currently sharing is possible with only all of the caps. Next,
it will be support for cap splitting, granting, and partial sharing
and granting.
Added support for pagers that fault to suspend and become zombies
along with all the threads that they manage. Zombie killing is to
be done at a later time, from this special zombie queue.
The implementation works same as a suspension, with the added action
that the thread is moved to a queue in kernel container.
Issues:
- A page-faulting thread suspends if receives -1 from pager page fault ipc.
This is fine if pager is about to delete the thread, but it is not if
it is a buggy pager.
- Need to find a way to completely get rid of suspended pager.
- A method of deleting suspended tasks could remedy both cases above.
Any thread that touches a utcb inside the kernel now properly checks
whether the utcb is mapped on its owner, and whether the mapped physical
address matches that of the current thread's tables. If not the tables
are updated.
This way, even though page tables become incoherent on utcb address
change situations (such as fork() exit(), execve()) they get updated
as they are referenced.
Since mappings are added only conditionally, caches are flushed only
when an update is necessary.
It makes more sense to have a scheduler (or runqueue pair) per-cpu
rather than per-container. This provides more flexible global scheduling
policy that is also simpler due to all scheduling details being controlled
from a single point.
Status:
- Capability initialization is a bit hacky with dummy current etc.
- All container caps belong to the pager
- Tasks refer to their pager's capabilities for mutex allocation - Hacky.
- Kernel container keeps quantitative caps and memory caps in separate lists - Hacky.
These will all evolve and get fixed.
Previously during ipc copy, only the currently active task flags were
checked. This means the flags of whoever doing the actual copy was used
in the ipc. Now flags are stored in the ktcb and checked by the copy routine.
Current use of the flags is to determine short/full/extended ipc.
Benefits & Facts:
- Messages up to 2 kilobytes may be sent.
- Both parties may use non-disjoint user buffers. E.g. any userspace address.
- Userspace buffers can page fault.
- Page faults punish timeslice of only the faulting thread.
- Any number of extended ipcs can take place at any one time, since
only ktcbs of ipc parties are engaged. No global buffer is used.
- This also provides smp-safety benefit.
Disadvantages:
- There is triple copying penalty. This has to be done:
- Sender buffer to sender ktcb
- Sender ktcb to receiver ktcb
- Receiver ktcb to receiver buffer.
This is due to the fact that buffers can be on non-disjoint userspace addresses.
If you want to avoid disadvantages and lose some of the benefits,
(e.g. address freedom, shorter copy size) use FULL IPC.
- Proper releasing of user pmd and pgds when a space is not used.
- Proper releasing of task, space ids.
- At occasions a starting thread gets bogus SPSR, this needs investigating.
- At a very rare occasion arch_setup_new_thread() had a kernel data abort during
register copying from one task to another. Needs investigating.
- Fixed potential concurrency bugs due to preemption being enabled.
- Introduced a new address space structure to better account for
address spaces and page tables.
- Currently executes fine up to forking. Will investigate.
- KIP's pointer to UTCB seems to work with existing l4lib ipc functions.
- Works up to clone()
- In clone we mmap() the same UTCB on each new thread - excessive.
- Generally during page fault handling, cloned threads may fault on the same page
multiple times even though a single handling would be enough for all of them.
Need to detect and handle this.
Added setting of utcb address to l4_thread_control.
This is going to be moved to exchange_registers() since we need to pass
both the utcb physical and virtual address and exregs fits such context
modification better than thread_control.
- Scheduler was increasing total priorities only when resuming tasks had 0 ticks.
This caused forked tasks that have parent's share of ticks to finish their jobs,
if these tasks exited quick enough, they would cause the total priorities to deduce
without increasing it in the first place. This is now fixed.
- Also strengthened rq locking, now both queues are locked before touching any.
- Also removed task suspends in irq, this would cause a race condition on ticks and
runqueues, since neither is protected against irqs.
- Implemented reasonable way to suspend task.
- A task that has a pending suspend would be interrupted
from its sleep via the suspender task.
- If suspend was raised and right after, task became about to sleep,
then scheduler wakes it up.
- If suspend was raised when task was in user mode, then an irq suspends it.
- Also suspends are checked at the end of a syscall so that if suspend was
raised because of a syscall from the task, the task is suspended before it
goes back to user mode.
- This mechanism is very similar to signals, and it may lead as a base for
implementing signal handling.
- Implemented common vma dropping for shadow vm object dropping and task exiting.
- Updated sleeping paths such that a task is atomically put into
a runqueue and made RUNNABLE, or removed from a runqueue and made SLEEPING.
- Modified vma dropping sources to handle both copy_on_write() and exit() cases
in a common function.
- Added the first infrastructure to have a pager to suspend a task and wait for
suspend completion from the scheduler.
A new scheduler replaces the old one.
- There are no sched_xxx_notify() calls that ask scheduler to change task state.
- Tasks now have priorities and different timeslices.
- One second interval is distributed among processes.
- There are just runnable and expired queues.
- SCHED_GRANULARITY determines a maximum running boundary for tasks.
- Scheduler can now detect a safe point and suspend a task.
Interruptible blocking is implemented.
- Mutexes, waitqueues and ipc are modified to have an interruptible nature.
- Sleep information is stored on the ktcb. (which waitqueue? etc.)
- test0 now forks 16 tasks that each modify a global variable.
- scheduler now gives 1/10th of a second per task. It also does not increase timeslice
of a task that has scheduled.
- When a memory is granted to the kernel, the distribution of this memory to memcaches
was calculated in a complicated way. This is now simplified.