Added a memlayout.txt, revised README, reduced env size to 4kb

This commit is contained in:
Bahadir Balban
2008-03-18 18:21:09 +00:00
parent d2aa9a552b
commit 26e6366014
5 changed files with 150 additions and 475 deletions

30
README
View File

@@ -1,7 +1,7 @@
Codezero Microkernel 'Toy' release
Copyright (C) 2007 Bahadir Balban
Copyright (C) 2007, 2008 Bahadir Balban
What is Codezero?
@@ -62,11 +62,13 @@ for its own memory partition. This feature provides the option of having an
adjustable mixture of generalisation and specialisation of system services at
the same run-time, by using a combination of Codezero's abstract posix-like
page/file management services and an application-specific pager that depends on
its own paging abilities. For example a critical task could both use posix-like
files benefiting from the abstraction and simplification that it brings, but at
the same time rely on its own page-fault handling for its critical data so that
even though it handles its memory in a specialised way, it does not depend on
another pager's grace for correct, stable operation.
its own paging abilities. For example a critical task could both use mm0/fs0's
posix-like files benefiting from the abstraction and simplification that it
brings, but at the same time rely on its own page-fault handling for its
critical data so that even though it handles its memory in a specialised way,
it does not depend on another pager's grace for correct, stable operation.
Similarly, a whole operating system can be virtualised and both native and
virtualised applications can run on the same run-time.
License:
@@ -128,13 +130,13 @@ opportunity to incorporate the latest ideas in OS technology.
Can you summarise all this? Why should I use Codezero, again?
Codezero is an operating system that targets embedded systems. It supports the
most fundamental posix features. Different from other posix-like systems,
it is based on a microkernel design. It supports modern features such as
demand-paging, virtual filesystem support. It has a cleanly separated set of
services, and it is small. For these reasons it is a good candidate as systems
software to be used on embedded systems. Currently it has little or no users,
therefore compared to systems with a saturated user base it is possible to
tailor it rapidly towards the needs of any users who want to be the first to
incorporate it to their needs.
most fundamental posix calls and modern features such as demand-paging and has a
virtual filesystem layer. Different from other posix-like systems, it is based
on a microkernel design. It has a cleanly separated set of services, it is small
and well-focused. Its source code is also freely available. For these reasons it
is a good candidate as systems software to be used on embedded platforms.
Currently it has little or no users, therefore compared to systems with a
saturated user base it is possible to tailor it rapidly towards the needs of any
users who want to be the first to incorporate it for their needs.

460
TODO
View File

@@ -1,460 +0,0 @@
Build System:
=============
1) Pass on configuration parameters to scons from a configuration file: (Platform name, CPP defines.)
2) Build files in a separate directory. - DONE
3) Build files in subdirectories depending on CPP defines for the files in
those subdirectories.
CML2 will cover mostly for (1) and (3)
What to do next:
1) Define a more interesting linker script.
2) Add uart code to get immediate printfs. (For both loader + payload image)
3) Try linking with the loader and loading the payload image with the loader.
What to do next:
4) Define a KIP structure and page.
5) Implement a page table section.
6) Implement routines to modify page tables.
7) Define a platform memory description (Peripherals, memory pool).
8) Define a mapper.
9) Map platform peripheral memory.
10) Implement a memory allocator.
11) Implement a page fault handler for kernel page faults.
12) Implement arch-specific cache and tlb operations
The Big TODOs:
--------------
1) A memory manager
2) System calls and IPC calls.
3) Scheduler/Context switching.
4) Virtual memory support for v5 and v6.
The key approach:
-----------------
Progressing with minimal steps at a time.
Progress Log:
-------------
1) Don't forget to modify control register to enable high-vectors.
2) Investigate ROM/System bits, access permissions and domains.
3) Don't forget to map page tables region as unbufferable/uncacheable.
---- Done ----
1) Enable caches/wb after jumping to virtual memory. See if this helps.
2) Flush caches/wb right after jumping before any data references. See if this
also helps.
*) Don't forget to modify control register to enable high-vectors.
---- Done ----
1) Test printascii and printk
*) Don't forget to modify control register to enable high-vectors.
Was not accessing the right uart offset, which caused precise external aborts.
---- Done ----
1) Implement a boot memory allocator
2) Allocate memory for secondary level page tables -> At least to describe
kernel memory. For example:
first level table
------>>> section
------>>> section second level table
------>>>>>>>>>>>>>>>---------->>>>>>>>>>>>> Timer Page
------>>> section ---------->>>>>>>>>>>>> UART Page
---------->>>>>>>>>>>>> VIC Page
Here, one secondary table would be enough to map devices with page
granularity. This would be the table that describes the first IO region.
3) Add add/remove mapping functions with page size granularity.
*) Don't forget to modify control register to enable high-vectors.
---- Done ----
1) Add remove_mapping() function to remove early boot one-to-one mapping etc.
(This might internally handle both section and page mappings depending on
given size)
2) Test kmalloc/alloc_page. Implement solution for their dependency on each
other.
---- Done ---
1) Make sure to build 3 main executables, for kmalloc/alloc_page/memcache
2) Make sure to be able to pass parameters as arguments to each test.
3) Write python scripts that look for expected output.
(Determine criteria other than init_state == end_state)
---- Done ----
Reading python script output by hand works well for now.
1) Must allocate PGD's from a memcache? i.e. a cache with 10 pgds (175K)
should suffice.
2) Must allocate PMD's from a memcache? a cache with ~100 pmds, should suffice
(100K).
These must be allocated as unmapped/uncached. (use __alloc_page()?) Don't
think they need phys_to_ptab kind of conversion. They can have regular virtual
offsets as long as they're mapped with correct settings. (AP's and c/b bits)
3) Implement tcbs. User-space thread spawning and scheduler. -> Tough one!
- tcb's must keep track of the pgd_ptr of each task. which in turn are connected
to their pmds.
- each new pgd must copy all kernel-space pmds.
- each pgd has distinct pmds for userspace area and same pmds for kernel area.
How to initiate a svc task, switch back and forth with the kernel:
- Load the task.
- Kernel is aware of where it is loaded.
TODO:
- Load qemu/insight tutorial to wikia
- Try loader on a board and/or qemu
- Add bootdesc to kernel image, and try reading it and discovering inittask
from it.
NEXT:
- Try jumping to inittask from init.c
- Create a syscall page.
Done the inittask jumping. About to create syscalls.
TODO:
Link the inittask with l4lib. This will reveal what it exactly does
in userspace to call the system calls. i.e. what offset it jumps to,
and what it fills its registers with.
Then use this information to implement the first system call.
e.g. it would call a dummy system call by reading the kip.
the system call could then dump the state of all registers to see
what it passed to the kernel.
We need to do all this checking because it seems there's no syscall
page but just kip.
Hints:
See user.S. Implemented in kernel, it ensures sp contains syscall offset.
this is then checked in the syscall. Both implemented in kernel means
this can be changed without breaking users. e.g. it could be
the direct vector address in kip (like 0xFFFFFF00), they could all be swi's,
SVC and USR LR's determine where they came from and where to return.
Wrote the syscall page. Test from userspace each system call, i.e. detect
hat is called etc.
TODO:
Done all of above. Variety of system calls can be called. However,
it fails in various ways. Must fix this. qemu works up to some 4 system calls.
It is unknown why any more causes weird exceptions.
Real hardware fails in any setup, with Invalid CPU state. needs investigating.
TODO:
Fixed everything, wasn't saving context upon context switch. kernel corrupted
the registers.
TODO:
Fixed anything that wasn't there. Added irqs and scheduler. Now. We need to
sort out per process kernel context.
^
| Userspace
V
...
sp_svc
^
|
|
|
| 4 KB Page
|
|
|
V
tcb
I think every process must have unique virtual page for stack + ktcb.
If the same page is used (but different physical pages) Then I can't keep track
of ktcbs when they aren't runnable.
Now the paths.
-> USR running. (USR Mode)
-> System call occurs. (USR Mode -> SVC Mode)
- Save user context, (in the ktcb) load sp_svc, continue.
-> IRQ occurs ((USR Mode, SVC Mode, or IRQ Mode) -> IRQ Mode)
- If from USR Mode, save user context into ktcb->user_context.
- If from SVC Mode, save kernel context in the ktcb->kernel_context.
- If from IRQ Mode, save irq context in the irq stack.
-> Scheduling occurs (IRQ Mode -> (USR Mode or SVC Mode))
- Restore user context
CHANGES:
Forget above design. Each interrupter saves the context of interruptee on its
stack. E.g. svc saves usr, irq saves svc, or usr. Only that because irqs are
reentrant, irqs change to svc and save context to svc stack. Only upon context
switches the context is restored from the stack and pushed to ktcb context
frame. This way at any one time a non-runnable thread could have svc or usr
context in its frame depending on what mode it was interrupted and blocked.
TODO:
- 8-byte aligned stack on irq handler.
- What do those cpsr_fcxt flags mean? When are they needed?
Done:
- Tie up jump_usr(). Also check whether calling void schedule(void) push
anything to stack. It really shouldn't. (But I'm sure it pushes
r0-r3,r12,lr.) Need to rewind those.
- Is current macro correct? Check.
TODO:
- 8-byte aligned stack on irq handler.
- What do those cpsr_fcxt flags mean? When are they needed?
- Limit irq nesting so that it never overflows irq stack.
Things to do next:
------------------
- Add new tasks. - Done (Added a compile-time roottask)
- Add multi-page task support. (With data/text/bss sections transferred to bootdesc.)
- Implement all 9 system calls.
- Remove malformed linked lists from allocators.
- Add vfs task.
- Add 6 major POSIX calls. (open/close/read/write/creat/seek)
- Add microwindows/Keyboard/Mouse/CLCD support.
- Add FAT32? filesystem.
- Add Device driver framework. (Bus support, device probe/matching)
- Add ethernet + lwip stack.
Things to do right now:
-----------------------
readelf is broken for finding --lma-start-end and --find-firstpage. Fix those.
- Fixed in half an hour.
More quick TODOs:
-----------------
- Pass variable arguments, to every syscall with right args.
Previous stuff is done. Now to add:
------------------------------------
- I am implementing IPC system call. Currently utcb's are copied from one
thread to another (not even compiled but the feature is there), but no
synchronisation is implemented.
- TODO: Add waitqueues for ipc operations. E.g. a thread that does ipc_send()
waits for the thread that does ipc_receive() to wake it up or vice versa.
- Looks like every ipc instance between 2 unique threads requires a unique
waitqueue.
- If there are n threads, there could be a maximum of n(n-1) / 2 ipc instances
(e.g. a mesh) if simultaneous ipcs are allowed. wait, its not allowed, so,
for n threads there can be n / 2 ipc instances, which is better! for
waitqueue complexity.
Done:
- Some hash tables to keep ipc rendezvous. - Not working with lists yet.
TODO:
- At least wait_event() must be a macro and the sleep condition must be
checked after the spinlock is acquired (and before the sleepers++ in case the
condition is the sleepers count). Currently there is a race about this.
- Figure the problem with lists.
TODO:
- MAPPINGS: There must be an update_mappings() function that updates all new
physical to virtual mappings that are common for every process. For example,
any common kernel mapping that can be accessed by any process requires this.
Currently add_mapping() adds the mapping for the current process only.
We need a storyline for initialisation and interaction of critical server
tasks after the microkernel is finished initialising.
The Storyline:
--------------
Microkernel initialises itself.
Microkernel has filled in page_map array.
Microkernel has filled in physmem descriptor.
Microkernel reads bootdesc.
Microkernel allocates, maps, starts mm0. (Memory manager)
Microkernel allocates, maps, starts name0. (Naming server)
Microkernel allocates, maps, starts pm0. (Process manager)
== Servers Start ==
name0 waiting for start message from mm0.
pm0 waiting for start message from mm0.
mm0 invokes request_bootdesc on Microkernel.
mm0 invokes request_pagemap on Microkernel.
mm0 initialises the page allocator.
mm0 initialises the page_map arrays.
== mm0 in full control of name0 and pm0 address spaces, and can serve memory requests. ==
mm0 starts pm0.
pm0 invokes request_procdesc on Microkernel. (Learn what processes are running, and relationships)
== pm0 in full control of name0 and pm0 process information, and can serve process requests. ==
mm0 starts name0.
name0 initialises its naming service.
name0 waiting for advertise_method_list.
== Method Advertise Stage ==
mm0 invokes advertise_method_list to name0. (Tell who can invoke what method on mm0)
pm0 invokes advertise_method_list to name0. (Tell who can invoke what method on pm0)
== name0 in full control of what all servers can invoke on each other. ==
== Method Request Stage ==
pm0 invokes request_method_list on name0. (Learn what method pm0 can invoke on who)
pm0 initialises its remote method array.
mm0 invokes request_method_list on name0. (Learn what method pm0 can invoke on who)
mm0 initialises its remote method array.
== All servers in full awareness of what methods they can invoke on other servers ==
Remote methods:
---------------
Remote methods can pass up to the architecture-defined number of bytes without
copying to/from each other, and setting up a connection. Alternatively for
larger data sizes, a connection is set-up (a shared memory area) and a *single
structure* of arbitrary size can be read/written. Remote method invocation
semantics typically forbid using data types any more complex than a single raw
structure of arbitrary size. This greatly simplifies the communication
semantics and data transfer requirements. For a readily set-up connection, the
remote method invocation cost is the cost of a single ipc, (i.e. context
switch to server and back to client.) also the shared memory is currently
uncached. RMI in the usual case does *not* copy data. There's no object-level
support, e.g. no marshalling, no dynamic type discovery or any other object
oriented programming bloat. Objects *are* policy anyway. There's no support
for network-transparent communication either. If all these things were
supported the system would end up becoming large and complex like Chorus or
Sprite and guaranteed to fail.
Context Switch between servers:
-------------------------------
It is the intention that context switches between the critical servers have
minimal overhead. For example, the single-address-space linux kernel can
switch between kernel threads quite cheaply, because there's no need to change
page table mappings and therefore no cache/tlb trashing.
The idea is to link and run critical servers in non-overlapping address space
areas (e.g. simply link at its physical address,
Quick todo:
-----------
1) Fix page allocator as a library. - Done
2) Reduce physmem descriptor's fields. - Done
3) Export memcache as a library. - Done
4) Fix the problem of unmapping multiply-mapped pages.
5) Fix the problem of assuming a free() always occurs from kernel virtual addresses.
4-5) Hint: Set up the physical page array with mapping information.
6) Sort out how to map virt-to-phys phys-to-virt in tasks. Kernel virt-to-phys
must be replaced.
7) Refactor ipc_send()/ipc_recv() add_mapping/add_mapping_pgd(). Remove some
gotos.
The revised storyline, with no naming yet:
--------------
Microkernel initialises itself.
Microkernel has filled in page_map array.
Microkernel has filled in physmem descriptor.
Microkernel reads bootdesc. (3 tasks: mm0, pm0, task3)
Microkernel allocates, maps, starts mm0. (Memory manager)
Microkernel allocates, maps, starts pm0. (Process manager)
== Servers Start ==
pm0 waiting for start message from mm0.
mm0 invokes request_bootdesc on Microkernel.
mm0 invokes request_pagemap on Microkernel.
mm0 initialises the page allocator.
mm0 initialises the memory bank and page descriptors.
mm0 sets up its own address space temporarily.
== mm0 is somewhat initialised and can serve memory requests. ==
mm0 starts pm0.
pm0 invokes request_procdesc on Microkernel. (Learn what processes are running, and relationships)
pm0 sets up task_desc structures for running tasks.
== pm0 is somewhat initialised and can serve task-related requests. ==
pm0 calls mock-up execute() to demonstrate demand paging on the third task's execution.
Long Term TODOs:
-------------------
- Finish inittask (pager/task manager)
- Start on fs server (vfs + a filesystem)
- Finish: fork, execve, mmap, open, close, create, read, write
Current todo:
==============
- Use shmat/shmget/shmdt to map block device areas to FS0 and start implementing the VFS.
todo:
- Generate 4 vmfiles:
- env, stack, data, bss.
- Fill in env as a private file.
As faults occur on env, simply map file to process.
- Create an empty data, bss and stack file.
As faults occur on real data, copy on write onto proc->data file, by creating shadows.
As faults occur on devzero, copy on write onto proc->stack file, by creating shadows.
As faults occur on bss, copy on write onto proc->bss file, by creating shadows.
FORK:
If a fork occurs, copy all vmas into new task.
Find all RW and VM_PRIVATE regions. All RW shadows are eligible.
Create a fork file for each RW/VM_PRIVATE region. E.g.
task->fork->data
task->fork->stack
task->fork->bss
All RW/PRIVATE shadows become RO, with task->fork owners, rather than their original
owners e.g. proc->data, proc->stack etc. All pages under shadow are moved onto those files.
Increase file refcount for forker tasks.
As faults occur on fork->stack/bss/data, copy on write onto proc->stack/bss/data, by making
shadows RW again and copying those faulted pages from fork files onto the proc->x files.

99
docs/memlayout.txt Normal file
View File

@@ -0,0 +1,99 @@
Virtual memory layout on Codezero/ARMv5:
========================================
0xFFFF FFFF .---------------. End of virtual memory
| Syscall page |
0xFFFF F000 |---------------|
| Reserved |
0xFFFF 1000 |---------------|
| Vector page |
0xFFFF 0000 |---------------|
| Reserved |
0xF900 0000 |---------------| UTCB area ends
| |
| ... |
| --------- |
| UTCB page |
| --------- |
| UTCB page |
0xF800 0000 |---------------| UTCB area starts
| |
| Codezero |
| Microkernel |
| |
0xF000 0000 |---------------|
| |
| MM0 pager |
| |
0xE000 0000 |---------------|
| |
| |
| Reserved |
| |
| |
| ... |
0x2000 0000 |---------------| User task area ends
| |
| |
| |
| Task |
| Address Space |
| |
| |
0x1000 0000 |---------------| User task area starts
| |
| |
| Reserved |
| |
| |
0x0 '---------------' Start of virtual memory
User task layout on Codezero/ARMv5:
===================================
0x2000 0000 .---------------. End of user task address space
|4KB Environment|
0x1FFF F000 |---------------|
| 16KB Stack |
| | |
| v |
| |
0x1FFE F000 |---------------|
| |
| Memory |
| available |
| for mmap() |
| |
0x1xxx x000 |---------------|
| BSS |
|---------------|
| Data |
|---------------|
| Text |
0x1000 0000 '---------------' Start of user task address space

34
tasks/libposix/env.c Normal file
View File

@@ -0,0 +1,34 @@
/*
* Environment accessor functions
*
* Copyright (C) 2008 Bahadir Balban
*/
#include <string.h>
#include <stdlib.h>
char **__environ;
/*
* Search for given name in name=value string pairs located
* in the environment segment, and return the pointer to value
* string.
*/
char *getenv(const char *name)
{
char **envp = __environ;
int length;
if (!envp)
return 0;
length = strlen(name);
while(*envp) {
if (memcmp(name, *envp, length) == 0 &&
(*envp)[length] == '=')
return *envp + length + 1;
envp++;
}
return 0;
}

View File

@@ -20,7 +20,7 @@
#define TASK_FILES_MAX 32
/* POSIX minimum is 4Kb */
#define DEFAULT_ENV_SIZE SZ_16K
#define DEFAULT_ENV_SIZE SZ_4K
#define DEFAULT_STACK_SIZE SZ_16K
#define DEFAULT_UTCB_SIZE PAGE_SIZE