It turned out we used one version of kmalloc for malloc() and another for kfree()!
Now fixed.
Added parent-child relationship to tasks. Need to polish handling CLONE_PARENT and THREAD.
A new or forked thread will have its tgid same as its unique thread id.
A cloned thread (i.e. in the same space) will get its parent's tgid if
the parent tgid is supplied as the suggested tgid in the ids field. Otherwise
the thread will have its tgid same as its unique thread id.
Previously we also allocated a tgid from a separate pool, but this doesn't
make sense. It is cleaner to have the unique thread id get used also as the tgid.
l4_unmap now returns -1 if given range was only partially unmapped.
do_munmap() now only unmaps address ranges that have correspondence in
the unmapped vmas. Trying to unmap regions with no correspondent vmas
causes problems in corner cases, e.g. mm0 that tries to mmap its own
address space during initialisation would unmap its whole address space
and fail to execute.
sched_resume_async() used to forbit current tasks to wake up themselves
since it seems tasks can never be runnable to wake themselves up. However
there's a special case in the scheduler where a task that is about to sleep
may notice it has a pending event and wake itself up asynchronously. Since
all sleeping preparation has already been done and scheduler code is a safe
zone, it is safe to undo it all and resume about-to-sleep task in the scheduler.
We may want to put a BKPT in the pager's suspend routine if it waits for the
sleeping task to resume itself, to see if such a wait is successful. It rarely happens.
File open was failing when using 2 files with same name. TODO: Look at it in the future.
Need to increase writeable file size in fs0. 16 pages don't work.
- Scheduler was increasing total priorities only when resuming tasks had 0 ticks.
This caused forked tasks that have parent's share of ticks to finish their jobs,
if these tasks exited quick enough, they would cause the total priorities to deduce
without increasing it in the first place. This is now fixed.
- Also strengthened rq locking, now both queues are locked before touching any.
- Also removed task suspends in irq, this would cause a race condition on ticks and
runqueues, since neither is protected against irqs.
- Implemented reasonable way to suspend task.
- A task that has a pending suspend would be interrupted
from its sleep via the suspender task.
- If suspend was raised and right after, task became about to sleep,
then scheduler wakes it up.
- If suspend was raised when task was in user mode, then an irq suspends it.
- Also suspends are checked at the end of a syscall so that if suspend was
raised because of a syscall from the task, the task is suspended before it
goes back to user mode.
- This mechanism is very similar to signals, and it may lead as a base for
implementing signal handling.
- Implemented common vma dropping for shadow vm object dropping and task exiting.
- Updated sleeping paths such that a task is atomically put into
a runqueue and made RUNNABLE, or removed from a runqueue and made SLEEPING.
- Modified vma dropping sources to handle both copy_on_write() and exit() cases
in a common function.
- Added the first infrastructure to have a pager to suspend a task and wait for
suspend completion from the scheduler.
A new scheduler replaces the old one.
- There are no sched_xxx_notify() calls that ask scheduler to change task state.
- Tasks now have priorities and different timeslices.
- One second interval is distributed among processes.
- There are just runnable and expired queues.
- SCHED_GRANULARITY determines a maximum running boundary for tasks.
- Scheduler can now detect a safe point and suspend a task.
Interruptible blocking is implemented.
- Mutexes, waitqueues and ipc are modified to have an interruptible nature.
- Sleep information is stored on the ktcb. (which waitqueue? etc.)
- test0 now forks 16 tasks that each modify a global variable.
- scheduler now gives 1/10th of a second per task. It also does not increase timeslice
of a task that has scheduled.
- When a memory is granted to the kernel, the distribution of this memory to memcaches
was calculated in a complicated way. This is now simplified.
- Fixed do_mmap() so that it returns mapped address, and various bugs.
- A child seems to fork with new setup, but with incorrect return value.
Need to use and test exregs() for fork + clone.
- Shmat searches an unmapped area if input arg is invalid, do_mmap()
should do this.
- Added mutex_trylock()
- Implemented most of exchange_registers()
- thread_control() now needs a lock for operations that can modify thread context.
- thread_start() does not initialise scheduler flags, now done in thread_create.
TODO:
- Fork/clone'ed threads should retain their context in tcb, not syscall stack.
- exchange_registers() calls in userspace need cleaning up.
Stopped working on self_spawn() - going to finish clone() syscall first.
Arch-specific clone() library call that does ipc() and cloned child setup.
- Need to finish thread_create() that satisfy clone() necessities. i.e. setting up its stack.
Question: Does the pager (and thus the microkernel) have to explicitly set SP_USR?
Once the call is known to be successful, the library could set it.
- Added automatic utcb map/prefaulting of forked tasks for fs0
so that it does not need to explicitly request those tasks from mm0.
Eliminating fs0 requests to mm0 reduce deadlock possibilities.
- Replaced kmalloc with a public malloc implementation because of a bug in kmalloc.
- Fixed a kfree bug. default_release_pages was trying to free page_array pages.
- Adding prefaulting of fs0 to avoid page fault deadlocks.
- Fixed a bug that a vmo page_cache equivalence would simply drop a link to
an original vmo, even if the vmo could have more pages outside the page cache,
or if the vmo was not a shadow vmo.
- Fixed a bug with page allocator where recursion would corrupt global variables.
- Now going to fix or re-write a simpler page allocator that works.
Normally common kernel entries in 2nd level page table need not get
copied to new tasks since those entries will be used commonly by all tasks.
On fork, we were copying those unnecessarily.
Child needs rewound function stack in order to reach registers r9-r12
that have original userspace values. But we jump to return_from_syscall
without rewinding the stack. Therefore to ease context restore, we save
r9-r12 on the stack as well upon syscall entry.
This copies the parent kernel stack to child only for the part where
the previous context is saved. Then the child registers are modified
so that it would begin execution from returning of the system call.
serving mm0, if it page faults, system deadlocks because mm0 is waiting to be served by vfs.
FIX: To fix this, mm0 will need to fork itself and keep a separate thread solely for
page fault handling.
sys_timer accumulates timer ticks into seconds, minutes, hours and days.
It's left to the user to calculate from days into a date. It is not yet
known if the calculation is even roughly correct.
Reduced 2 kmem_reclaim/grant calls into one kmem_control call.
Removed some commented out code.
Removed excessive printfs.
Fixed spid not initialising for mm0
Fixed some faults with fs0.
TODO:
- Need to store vfs files in a separate list.
- Need to define vnum as a vfs-file-specific data, i.e. in priv_data field of vm_file.
- Need to then fix vfs_receive_sys_open.
Tasks boot fine up to doing ipc using their utcbs.
UTCB PLAN:
- Push ipc registers into private environment instead of a shared utcb,
but map-in a shared utcb to pass on long data to server tasks.
- Shared utcb has unique virtual address for every thread.
- Forked child does inherit parent's utcb, but cannot use it to communicate to
any server. It must explicitly obtain its own utcb for that.
- Clone could have a flag to explicitly not inherit parent utcb, which is the
right thing to do.
- MM0 serves a syscall to obtain self utcb.
- By this method, upon forks tasks don't need to map-in a utcb unless they want
to pass long data.
Next issues: For every read fault, the fault must traverse the
vma's object stack until the page is found. The problem was that
we were only searching the first object, that object was a writable
shadow, and the shadow didn't have the read-only page, and the 0
return value was interpreted with IS_ERR() and failed, so address
0 was mapped into the location, and QEMU blew off.
utcb as a shared page instead of the message registers.
Implemented the code that passes task information from mm0 to fs0
using the fs0 utcb. The code seems to work OK but:
There's an issue with anon pages that they end up on the same swapfile
and with same file offsets (e.g. utcb and stack at offset 0). Need to
fix this issue but otherwise this implementation seems to work.
TODO:
- Separate anon regions into separate vmfiles.
- Possibly map the stacks from virtual files so that they can be
read from userspace in the future for debugging.
- Possibly utcb could be created as a shared memory object using shmget/shmat
during startup.