Pagers kill all children but suspend themselves.
Currently not straightforward for a pager to delete its own tcb and quit.
It should take all allocator locks without sleeping, remove itself from
scheduler queue and then delete itself and quit. This is not so easy now
as some allocation locks are mutexes. (Address space lock, ktcb/space
allocators etc.)
An easier approach would be to have a kernel thread or a superior thread
that would delete the pager
Reiterating again to simplify:
Working:
- Pager issues destroy, client also issues exit
they work in sync.
Missing
- Pager killing itself
- Pager killing all children while killing itself
- Pager waiting on children
One is related to the time distribution when a new child is created.
If the parent has one tick left, then both child and parent received
zero tick. When combined with
current_irq_nest_count = 1
voluntary_preempt = 0
values, this caused the scheduler from being invoked.
Second is related to the overall time distribution. When a thread
runs out of time, its new time slice is calculated by the below
formula:
new_timeslice = (thread_prio * SCHED_TICKS) / total_prio
If we consider total_prio is equal to the sum of the priorities of
all the threads in the system, it imposes a problem of getting
zero tick. In the new scenario, total_prio is equal to the priority
types in the system so it is fixed. Every thread gets a timeslice
in proportion of their priorities. Thus, there is no risk of taking
zero tick.
Capability checking for thread_control, exregs, mutex, cap_control,
ipc, and map system calls.
The visualised model is implemented in code that compiles, but
actual functionality hasn't been tested.
Need to add:
- Dynamic assignment of initial resources matching with what's
defined in the configuration.
- A paged-thread-group, since that would be a logical group of
seperation from a capability point-of-view.
- Resource ids for various tasks. E.g.
- Memory capabilities don't have target resources.
- Thread capability assumes current container for THREAD_CREATE.
- Mutex syscall assumes current thread (this one may not need
any changing)
- cap_control syscall assumes current thread. It may happen to
be that another thread's capability list is manipulated.
Last but not least:
- A simple and easy-to-use userspace library for dynamic expansion
of resource domains as new resources are created such as threads.
Pagers can now share their own private capabilities with their
paged children, or their siblings with whom they have a common pager
ancestor.
Added flags CAP_SHARE_CHILD and CAP_SHARE_SIBLINGS for that.
Notion of pager hierarchy introduced using the existing but unused
pagerid field.
Thread creation now has two more flags TC_AS_PAGER and TC_SHARE_PAGER.
The former sets creator as pager, the latter sets creator's pager as pager.
Thread group capability sharing now correctly carries shared capabilities
to the thread group leader's tgr_cap_list list, and this list is checked
during capability checking.
Added support for pagers that fault to suspend and become zombies
along with all the threads that they manage. Zombie killing is to
be done at a later time, from this special zombie queue.
The implementation works same as a suspension, with the added action
that the thread is moved to a queue in kernel container.
Issues:
- A page-faulting thread suspends if receives -1 from pager page fault ipc.
This is fine if pager is about to delete the thread, but it is not if
it is a buggy pager.
- Need to find a way to completely get rid of suspended pager.
- A method of deleting suspended tasks could remedy both cases above.
Upon fork, child was created in a new space but as a copy of any
cloned thread in the parent space. This was due to the search of forker thread
by its space id (which is shared among many cloned threads).
Now fixed.
modified: src/api/thread.c
modified: tasks/mm0/src/task.c
- Proper releasing of user pmd and pgds when a space is not used.
- Proper releasing of task, space ids.
- At occasions a starting thread gets bogus SPSR, this needs investigating.
- At a very rare occasion arch_setup_new_thread() had a kernel data abort during
register copying from one task to another. Needs investigating.
- Fixed potential concurrency bugs due to preemption being enabled.
- Introduced a new address space structure to better account for
address spaces and page tables.
- Currently executes fine up to forking. Will investigate.
- KIP's pointer to UTCB seems to work with existing l4lib ipc functions.
- Works up to clone()
- In clone we mmap() the same UTCB on each new thread - excessive.
- Generally during page fault handling, cloned threads may fault on the same page
multiple times even though a single handling would be enough for all of them.
Need to detect and handle this.
Added setting of utcb address to l4_thread_control.
This is going to be moved to exchange_registers() since we need to pass
both the utcb physical and virtual address and exregs fits such context
modification better than thread_control.
- Directory creation, file read/write is OK.
- Cannot reuse old task's fds. They are not recycled for some reason.
- Problems with fork/clone/exit. They fail for a reason.
It turned out we used one version of kmalloc for malloc() and another for kfree()!
Now fixed.
Added parent-child relationship to tasks. Need to polish handling CLONE_PARENT and THREAD.
A new or forked thread will have its tgid same as its unique thread id.
A cloned thread (i.e. in the same space) will get its parent's tgid if
the parent tgid is supplied as the suggested tgid in the ids field. Otherwise
the thread will have its tgid same as its unique thread id.
Previously we also allocated a tgid from a separate pool, but this doesn't
make sense. It is cleaner to have the unique thread id get used also as the tgid.
- Implemented reasonable way to suspend task.
- A task that has a pending suspend would be interrupted
from its sleep via the suspender task.
- If suspend was raised and right after, task became about to sleep,
then scheduler wakes it up.
- If suspend was raised when task was in user mode, then an irq suspends it.
- Also suspends are checked at the end of a syscall so that if suspend was
raised because of a syscall from the task, the task is suspended before it
goes back to user mode.
- This mechanism is very similar to signals, and it may lead as a base for
implementing signal handling.
- Implemented common vma dropping for shadow vm object dropping and task exiting.
- Updated sleeping paths such that a task is atomically put into
a runqueue and made RUNNABLE, or removed from a runqueue and made SLEEPING.
- Modified vma dropping sources to handle both copy_on_write() and exit() cases
in a common function.
- Added the first infrastructure to have a pager to suspend a task and wait for
suspend completion from the scheduler.
A new scheduler replaces the old one.
- There are no sched_xxx_notify() calls that ask scheduler to change task state.
- Tasks now have priorities and different timeslices.
- One second interval is distributed among processes.
- There are just runnable and expired queues.
- SCHED_GRANULARITY determines a maximum running boundary for tasks.
- Scheduler can now detect a safe point and suspend a task.
Interruptible blocking is implemented.
- Mutexes, waitqueues and ipc are modified to have an interruptible nature.
- Sleep information is stored on the ktcb. (which waitqueue? etc.)
- Fixed do_mmap() so that it returns mapped address, and various bugs.
- A child seems to fork with new setup, but with incorrect return value.
Need to use and test exregs() for fork + clone.
- Shmat searches an unmapped area if input arg is invalid, do_mmap()
should do this.
- Added mutex_trylock()
- Implemented most of exchange_registers()
- thread_control() now needs a lock for operations that can modify thread context.
- thread_start() does not initialise scheduler flags, now done in thread_create.
TODO:
- Fork/clone'ed threads should retain their context in tcb, not syscall stack.
- exchange_registers() calls in userspace need cleaning up.