Previously all pending events were handled on return of exceptions
in process context. This was causing threads that run in userspace
and take no exceptions not handle their pending events indefinitely.
Now scheduler handles them in irq context as well.
Pagers kill all children but suspend themselves.
Currently not straightforward for a pager to delete its own tcb and quit.
It should take all allocator locks without sleeping, remove itself from
scheduler queue and then delete itself and quit. This is not so easy now
as some allocation locks are mutexes. (Address space lock, ktcb/space
allocators etc.)
An easier approach would be to have a kernel thread or a superior thread
that would delete the pager
Reiterating again to simplify:
Working:
- Pager issues destroy, client also issues exit
they work in sync.
Missing
- Pager killing itself
- Pager killing all children while killing itself
- Pager waiting on children
One is related to the time distribution when a new child is created.
If the parent has one tick left, then both child and parent received
zero tick. When combined with
current_irq_nest_count = 1
voluntary_preempt = 0
values, this caused the scheduler from being invoked.
Second is related to the overall time distribution. When a thread
runs out of time, its new time slice is calculated by the below
formula:
new_timeslice = (thread_prio * SCHED_TICKS) / total_prio
If we consider total_prio is equal to the sum of the priorities of
all the threads in the system, it imposes a problem of getting
zero tick. In the new scenario, total_prio is equal to the priority
types in the system so it is fixed. Every thread gets a timeslice
in proportion of their priorities. Thus, there is no risk of taking
zero tick.
Added support for pagers that fault to suspend and become zombies
along with all the threads that they manage. Zombie killing is to
be done at a later time, from this special zombie queue.
The implementation works same as a suspension, with the added action
that the thread is moved to a queue in kernel container.
Issues:
- A page-faulting thread suspends if receives -1 from pager page fault ipc.
This is fine if pager is about to delete the thread, but it is not if
it is a buggy pager.
- Need to find a way to completely get rid of suspended pager.
- A method of deleting suspended tasks could remedy both cases above.
It makes more sense to have a scheduler (or runqueue pair) per-cpu
rather than per-container. This provides more flexible global scheduling
policy that is also simpler due to all scheduling details being controlled
from a single point.
- Fixed potential concurrency bugs due to preemption being enabled.
- Introduced a new address space structure to better account for
address spaces and page tables.
- Currently executes fine up to forking. Will investigate.
- KIP's pointer to UTCB seems to work with existing l4lib ipc functions.
- Works up to clone()
- In clone we mmap() the same UTCB on each new thread - excessive.
- Generally during page fault handling, cloned threads may fault on the same page
multiple times even though a single handling would be enough for all of them.
Need to detect and handle this.
Added setting of utcb address to l4_thread_control.
This is going to be moved to exchange_registers() since we need to pass
both the utcb physical and virtual address and exregs fits such context
modification better than thread_control.
sched_resume_async() used to forbit current tasks to wake up themselves
since it seems tasks can never be runnable to wake themselves up. However
there's a special case in the scheduler where a task that is about to sleep
may notice it has a pending event and wake itself up asynchronously. Since
all sleeping preparation has already been done and scheduler code is a safe
zone, it is safe to undo it all and resume about-to-sleep task in the scheduler.
We may want to put a BKPT in the pager's suspend routine if it waits for the
sleeping task to resume itself, to see if such a wait is successful. It rarely happens.
- Scheduler was increasing total priorities only when resuming tasks had 0 ticks.
This caused forked tasks that have parent's share of ticks to finish their jobs,
if these tasks exited quick enough, they would cause the total priorities to deduce
without increasing it in the first place. This is now fixed.
- Also strengthened rq locking, now both queues are locked before touching any.
- Also removed task suspends in irq, this would cause a race condition on ticks and
runqueues, since neither is protected against irqs.
- Implemented reasonable way to suspend task.
- A task that has a pending suspend would be interrupted
from its sleep via the suspender task.
- If suspend was raised and right after, task became about to sleep,
then scheduler wakes it up.
- If suspend was raised when task was in user mode, then an irq suspends it.
- Also suspends are checked at the end of a syscall so that if suspend was
raised because of a syscall from the task, the task is suspended before it
goes back to user mode.
- This mechanism is very similar to signals, and it may lead as a base for
implementing signal handling.
- Implemented common vma dropping for shadow vm object dropping and task exiting.
- Updated sleeping paths such that a task is atomically put into
a runqueue and made RUNNABLE, or removed from a runqueue and made SLEEPING.
- Modified vma dropping sources to handle both copy_on_write() and exit() cases
in a common function.
- Added the first infrastructure to have a pager to suspend a task and wait for
suspend completion from the scheduler.
A new scheduler replaces the old one.
- There are no sched_xxx_notify() calls that ask scheduler to change task state.
- Tasks now have priorities and different timeslices.
- One second interval is distributed among processes.
- There are just runnable and expired queues.
- SCHED_GRANULARITY determines a maximum running boundary for tasks.
- Scheduler can now detect a safe point and suspend a task.
Interruptible blocking is implemented.
- Mutexes, waitqueues and ipc are modified to have an interruptible nature.
- Sleep information is stored on the ktcb. (which waitqueue? etc.)
- test0 now forks 16 tasks that each modify a global variable.
- scheduler now gives 1/10th of a second per task. It also does not increase timeslice
of a task that has scheduled.
- When a memory is granted to the kernel, the distribution of this memory to memcaches
was calculated in a complicated way. This is now simplified.
- Fixed do_mmap() so that it returns mapped address, and various bugs.
- A child seems to fork with new setup, but with incorrect return value.
Need to use and test exregs() for fork + clone.
- Shmat searches an unmapped area if input arg is invalid, do_mmap()
should do this.
- Added mutex_trylock()
- Implemented most of exchange_registers()
- thread_control() now needs a lock for operations that can modify thread context.
- thread_start() does not initialise scheduler flags, now done in thread_create.
TODO:
- Fork/clone'ed threads should retain their context in tcb, not syscall stack.
- exchange_registers() calls in userspace need cleaning up.