Pagers kill all children but suspend themselves.
Currently not straightforward for a pager to delete its own tcb and quit.
It should take all allocator locks without sleeping, remove itself from
scheduler queue and then delete itself and quit. This is not so easy now
as some allocation locks are mutexes. (Address space lock, ktcb/space
allocators etc.)
An easier approach would be to have a kernel thread or a superior thread
that would delete the pager
Capability checking for thread_control, exregs, mutex, cap_control,
ipc, and map system calls.
The visualised model is implemented in code that compiles, but
actual functionality hasn't been tested.
Need to add:
- Dynamic assignment of initial resources matching with what's
defined in the configuration.
- A paged-thread-group, since that would be a logical group of
seperation from a capability point-of-view.
- Resource ids for various tasks. E.g.
- Memory capabilities don't have target resources.
- Thread capability assumes current container for THREAD_CREATE.
- Mutex syscall assumes current thread (this one may not need
any changing)
- cap_control syscall assumes current thread. It may happen to
be that another thread's capability list is manipulated.
Last but not least:
- A simple and easy-to-use userspace library for dynamic expansion
of resource domains as new resources are created such as threads.
Added support for pagers that fault to suspend and become zombies
along with all the threads that they manage. Zombie killing is to
be done at a later time, from this special zombie queue.
The implementation works same as a suspension, with the added action
that the thread is moved to a queue in kernel container.
Issues:
- A page-faulting thread suspends if receives -1 from pager page fault ipc.
This is fine if pager is about to delete the thread, but it is not if
it is a buggy pager.
- Need to find a way to completely get rid of suspended pager.
- A method of deleting suspended tasks could remedy both cases above.
Any thread that touches a utcb inside the kernel now properly checks
whether the utcb is mapped on its owner, and whether the mapped physical
address matches that of the current thread's tables. If not the tables
are updated.
This way, even though page tables become incoherent on utcb address
change situations (such as fork() exit(), execve()) they get updated
as they are referenced.
Since mappings are added only conditionally, caches are flushed only
when an update is necessary.
test0 now successfully runs its beginning.
test0 SConscript has a dependency problem.
Issues to be investigated:
- vm_file and vnodes need to be merged fully in all functions.
- libposix shared page references need to be removed.
- Any references to VFS_TID, PAGER_TID need to be removed.
Changes:
It is now possible to use do_mmap() from within mm0.
- pager_new_virtual()/delete_virtual() return addresses that are
disjoint from find_unmapped_area() used by mmap() interface for
anonymous or not-fixed areas.
- find_unmapped_area() now uses task->map_start task->map_end instead
of task->start and task->end. task->start/end are still valid task
space addresses for mmap(), but finding a new address is limited to
map_start/map_end.
- We have both interfaces because mmap() is only useful for backed-files.
When the pager needs to access a user memory range for example, that is
not backed by a file and thus we need to use pager_new_virtual() instead
of mmap() for mapping.
Previously during ipc copy, only the currently active task flags were
checked. This means the flags of whoever doing the actual copy was used
in the ipc. Now flags are stored in the ktcb and checked by the copy routine.
Current use of the flags is to determine short/full/extended ipc.
- Test0 has a full ipc mr read/write test.
- A full ipc occurs for definite only if both parties use the FULL IPC flag.
Otherwise the thread that makes the ipc copy rules on whether it was a short
or a full copy.
- Added a full ipc send/recv test
- Removed non-zero value checking in r2 for ipc that was there
to catch inadvertent full ipc calls.
- Added correct hanlding for read/write mrs for current status of utcb.
TODO:
- Add mapping of every utcb to every task for privileged access so that
the kernel can access every utcb without switching spaces.
- Removal of same mappings
- Upon thread creation need to copy page tables accordingly i.e.
each task will have its own utcb mapped with USER access, but every
other utcb as kernel access only. Need to handle this case upon page
table copying.
Increased inode block pointers to 40. The current maximum allowed (and checked).
Updates to file size after every file write ensures subsequent writes can
correctly operate using updated file size information (i.e. not try to add
more pages that are already present). We cannot do this inside write() because
directory writes rely on byte-granularity updates on file buffers, whereas
file updates are by page-granularity (currently).
File open was failing when using 2 files with same name. TODO: Look at it in the future.
Need to increase writeable file size in fs0. 16 pages don't work.
A new scheduler replaces the old one.
- There are no sched_xxx_notify() calls that ask scheduler to change task state.
- Tasks now have priorities and different timeslices.
- One second interval is distributed among processes.
- There are just runnable and expired queues.
- SCHED_GRANULARITY determines a maximum running boundary for tasks.
- Scheduler can now detect a safe point and suspend a task.
Interruptible blocking is implemented.
- Mutexes, waitqueues and ipc are modified to have an interruptible nature.
- Sleep information is stored on the ktcb. (which waitqueue? etc.)
- Added automatic utcb map/prefaulting of forked tasks for fs0
so that it does not need to explicitly request those tasks from mm0.
Eliminating fs0 requests to mm0 reduce deadlock possibilities.
- Replaced kmalloc with a public malloc implementation because of a bug in kmalloc.
- Fixed a kfree bug. default_release_pages was trying to free page_array pages.
- Adding prefaulting of fs0 to avoid page fault deadlocks.
- Fixed a bug that a vmo page_cache equivalence would simply drop a link to
an original vmo, even if the vmo could have more pages outside the page cache,
or if the vmo was not a shadow vmo.
- Fixed a bug with page allocator where recursion would corrupt global variables.
- Now going to fix or re-write a simpler page allocator that works.
Tasks boot fine up to doing ipc using their utcbs.
UTCB PLAN:
- Push ipc registers into private environment instead of a shared utcb,
but map-in a shared utcb to pass on long data to server tasks.
- Shared utcb has unique virtual address for every thread.
- Forked child does inherit parent's utcb, but cannot use it to communicate to
any server. It must explicitly obtain its own utcb for that.
- Clone could have a flag to explicitly not inherit parent utcb, which is the
right thing to do.
- MM0 serves a syscall to obtain self utcb.
- By this method, upon forks tasks don't need to map-in a utcb unless they want
to pass long data.
Next issues: For every read fault, the fault must traverse the
vma's object stack until the page is found. The problem was that
we were only searching the first object, that object was a writable
shadow, and the shadow didn't have the read-only page, and the 0
return value was interpreted with IS_ERR() and failed, so address
0 was mapped into the location, and QEMU blew off.
utcb as a shared page instead of the message registers.
Implemented the code that passes task information from mm0 to fs0
using the fs0 utcb. The code seems to work OK but:
There's an issue with anon pages that they end up on the same swapfile
and with same file offsets (e.g. utcb and stack at offset 0). Need to
fix this issue but otherwise this implementation seems to work.
TODO:
- Separate anon regions into separate vmfiles.
- Possibly map the stacks from virtual files so that they can be
read from userspace in the future for debugging.
- Possibly utcb could be created as a shared memory object using shmget/shmat
during startup.
Environment is backed by a special per-task file maintained by mm0 for each task.
This file is filled in by the env pager, by simple copying of env data into the
faulty page upon a fault. UTCB and all anon regions (stack) could use the same
scheme.
Fixed IS_ERR(x) to accept negative values that are above -1000 for errors. This
protects against false positives for pointers such as 0xE0000000.
modified: include/l4/generic/scheduler.h
modified: include/l4/macros.h
modified: src/arch/arm/exception.c
modified: tasks/fs0/include/linker.lds
modified: tasks/libl4/src/init.c
modified: tasks/libposix/shm.c
new file: tasks/mm0/include/env.h
modified: tasks/mm0/include/file.h
new file: tasks/mm0/include/lib/addr.h
deleted: tasks/mm0/include/lib/vaddr.h
modified: tasks/mm0/include/task.h
new file: tasks/mm0/include/utcb.h
new file: tasks/mm0/src/env.c
modified: tasks/mm0/src/fault.c
modified: tasks/mm0/src/file.c
modified: tasks/mm0/src/init.c
new file: tasks/mm0/src/lib/addr.c
modified: tasks/mm0/src/lib/idpool.c
deleted: tasks/mm0/src/lib/vaddr.c
modified: tasks/mm0/src/mmap.c
modified: tasks/mm0/src/shm.c
modified: tasks/mm0/src/task.c
new file: tasks/mm0/src/utcb.c
modified: tasks/test0/include/linker.lds
Headers 3 headers related to message registers and utcbs are now merged under
utcb.h in libl4. Some message register definitions used by the kernel are now
moved into kernel's glue/message.h. This avoids the duplication of same
definitions. Also the total number of mregs are now determined by arch-specific
kernel header, which is good.
Modified ipc handling so that from now on the kernel inspects and sets
the sender id if the receiver is receiving from L4_ANYTHREAD. This posed
a security problem since the receiver could not trust the sender for
sender information.