It is possible that in a mult-threaded address space write-faults
(or faults in general) on the same address can occur. This is because
threads may become runnable during the handling of the first fault.
This causes duplicate faults on the same private page in the same address space.
On the case of write faulted private page, this causes a spurious page allocation
Currently this case is detected and handled, but in the future we need
a generic way of detecting duplicate faults (of any kind) and cease duplicate
IPC at a very early stage. This is not done yet as it requires knowledge of the
PTEs of physical pages in the pager (like reverse mapping in Linux).
It turned out we used one version of kmalloc for malloc() and another for kfree()!
Now fixed.
Added parent-child relationship to tasks. Need to polish handling CLONE_PARENT and THREAD.
When sys_munmap() splits a vma, the new vma had no copy of the objects
in the original vma. Now we fixed that using a vma_copy_links() function
which can also be used as part of fork().
File open was failing when using 2 files with same name. TODO: Look at it in the future.
Need to increase writeable file size in fs0. 16 pages don't work.
- Fixed an important bug with shadow object handling.
When a shadow is dropped, if there are references left
to it, both the object in front and dropped object becomes
a shadow of the original object underneath. We had thought
of this case but had not increase the shadow count.
- Added a test mechanism that tests the number of objects,
vmfiles, shadows etc. by first counting them and trying to
reach the same number by other means, i.e. per-object-shadow counts.
It discovered a plethora of bugs.
- Added new set of functions to register objects, files and tasks
globally with the pager, these functions introduce a refcount as
well as adding structures to linked lists.
- fork/exit now seems to work stably i.e. no negative shadow counts etc.
- Added cleaner allocation of shm addresses by moving the allocation to do_mmap().
- Added deletion routine for all objects: shadow, vm_file of type vfs_file, shm_file, etc.
- Need to make sure objects get deleted properly after exit().
- Currently we allow a single, unique virtual address for each shm segment.
- Implemented reasonable way to suspend task.
- A task that has a pending suspend would be interrupted
from its sleep via the suspender task.
- If suspend was raised and right after, task became about to sleep,
then scheduler wakes it up.
- If suspend was raised when task was in user mode, then an irq suspends it.
- Also suspends are checked at the end of a syscall so that if suspend was
raised because of a syscall from the task, the task is suspended before it
goes back to user mode.
- This mechanism is very similar to signals, and it may lead as a base for
implementing signal handling.
- Implemented common vma dropping for shadow vm object dropping and task exiting.
- Updated sleeping paths such that a task is atomically put into
a runqueue and made RUNNABLE, or removed from a runqueue and made SLEEPING.
- Modified vma dropping sources to handle both copy_on_write() and exit() cases
in a common function.
- Added the first infrastructure to have a pager to suspend a task and wait for
suspend completion from the scheduler.
Now all system calls can simply return their final values and they
will be sent to client parties from a single location. Should have had this
simple cleanup a long time ago.
For clone, file descriptor and vm area structures need to be
separate from the tcb and reached via a pointer so that they
can be shared among multiple tcbs.
- Adding prefaulting of fs0 to avoid page fault deadlocks.
- Fixed a bug that a vmo page_cache equivalence would simply drop a link to
an original vmo, even if the vmo could have more pages outside the page cache,
or if the vmo was not a shadow vmo.
- Fixed a bug with page allocator where recursion would corrupt global variables.
- Now going to fix or re-write a simpler page allocator that works.
Added a list of links for vm objects so they can follow
the links that point at them.
More succinct handling of the case where a vm object
is dropped. Now depending on the object's number of link
references and shadow references, upon a drop it could
either be merged, deleted or kept.
Added opener reference count for vm files. Now files
have opener count, objects have shadow and link count.
Link count is also meaningful for how many tasks have
mmap'ed that object.
R/W shadow objects must be made read-only during a fork.
When searching for a writeable shadow object, it can only be the first
object on the vma object chain, therefore not traversing the object
chain and checking only the first object is sufficient.
Factored out mapping of the physical page as the final generic code
after all fault-specific handling is done.
Fixed the error that zero page didn't have an owner (devzero).
Fixed the error that struct dirent did not have the record length
field as u16 as expected by userspace.
Normally anon && shared pages need to COW once and any newer
accesses by other sharing processes should simply map the COW'ed
pages to themselves as writeable. We didn't have that but instead
every ANON && SHARED write fault was going into the COW path, causing
the same offset'ed page to be COW'ed twice.
Next: Factor out the l4_map code to a common point at the end of all
fault handling paths.
For anonymous shm, mmap now adds a shm_file and devzero behind it
as two vm_objects. Faults are handled by copy_on_write(). Just as
shadows copy r/w pages from original files, it should copy r/w
pages from devzero into the shm_file in front.
shmat/shmget uses mmap to set-up their areas.
Untested yet so bugs expected.
modified: tasks/libl4/src/init.c
modified: tasks/mm0/include/shm.h
modified: tasks/mm0/include/vm_area.h
modified: tasks/mm0/src/fault.c
modified: tasks/mm0/src/mmap.c
modified: tasks/mm0/src/shm.c
Tasks boot fine up to doing ipc using their utcbs.
UTCB PLAN:
- Push ipc registers into private environment instead of a shared utcb,
but map-in a shared utcb to pass on long data to server tasks.
- Shared utcb has unique virtual address for every thread.
- Forked child does inherit parent's utcb, but cannot use it to communicate to
any server. It must explicitly obtain its own utcb for that.
- Clone could have a flag to explicitly not inherit parent utcb, which is the
right thing to do.
- MM0 serves a syscall to obtain self utcb.
- By this method, upon forks tasks don't need to map-in a utcb unless they want
to pass long data.
Next issues: For every read fault, the fault must traverse the
vma's object stack until the page is found. The problem was that
we were only searching the first object, that object was a writable
shadow, and the shadow didn't have the read-only page, and the 0
return value was interpreted with IS_ERR() and failed, so address
0 was mapped into the location, and QEMU blew off.
utcb as a shared page instead of the message registers.
Implemented the code that passes task information from mm0 to fs0
using the fs0 utcb. The code seems to work OK but:
There's an issue with anon pages that they end up on the same swapfile
and with same file offsets (e.g. utcb and stack at offset 0). Need to
fix this issue but otherwise this implementation seems to work.
TODO:
- Separate anon regions into separate vmfiles.
- Possibly map the stacks from virtual files so that they can be
read from userspace in the future for debugging.
- Possibly utcb could be created as a shared memory object using shmget/shmat
during startup.
Environment is backed by a special per-task file maintained by mm0 for each task.
This file is filled in by the env pager, by simple copying of env data into the
faulty page upon a fault. UTCB and all anon regions (stack) could use the same
scheme.
Fixed IS_ERR(x) to accept negative values that are above -1000 for errors. This
protects against false positives for pointers such as 0xE0000000.
modified: include/l4/generic/scheduler.h
modified: include/l4/macros.h
modified: src/arch/arm/exception.c
modified: tasks/fs0/include/linker.lds
modified: tasks/libl4/src/init.c
modified: tasks/libposix/shm.c
new file: tasks/mm0/include/env.h
modified: tasks/mm0/include/file.h
new file: tasks/mm0/include/lib/addr.h
deleted: tasks/mm0/include/lib/vaddr.h
modified: tasks/mm0/include/task.h
new file: tasks/mm0/include/utcb.h
new file: tasks/mm0/src/env.c
modified: tasks/mm0/src/fault.c
modified: tasks/mm0/src/file.c
modified: tasks/mm0/src/init.c
new file: tasks/mm0/src/lib/addr.c
modified: tasks/mm0/src/lib/idpool.c
deleted: tasks/mm0/src/lib/vaddr.c
modified: tasks/mm0/src/mmap.c
modified: tasks/mm0/src/shm.c
modified: tasks/mm0/src/task.c
new file: tasks/mm0/src/utcb.c
modified: tasks/test0/include/linker.lds
This will help when syscalls have long arguments individual
utcbs can be mapped to server tasks and kept mapped in until the
tasks die, as opposed to map requests every time a server task maps
a different utcb at the same virtual address.
The changes have preparation code to also passing the utcb info
through the stack as part of the environment.
To sum up env and arg regions have also been added above the stack and
env region is to be used to pass on the utcb address information at
task startup.
Added reading pages from the page cache into user buffer for sys_read.
Increases stack sizes to 4 pages.
Updated README to include more details about multi-pager environments.