Pwnos design Document Version 0a



Download 135.4 Kb.
Page5/6
Date31.01.2017
Size135.4 Kb.
#13944
1   2   3   4   5   6

Booting


The actions for which the master boot record (MBR) code is responsible are:

  • Read the rest of the boot loader from the following sectors on the boot drive

  • Find and switch to a video mode based on desired resolution and either 24-bit or 32-bit colour

  • Disable interrupts, etc.

  • Initialize the GDT data

  • Enable 32-bit protected mode and jump to the rest of the boot loader

The actions for which the rest of the boot loader is responsible are:



  • Find and save relevant ACPI data as given by the BIOS. This includes data about the CPUs, memory, interrupts, and devices. This information is critical to the functionality of PwnOS.

  • Ensure that the CPU and system meet the requirements for PwnOS.

  • Configure the I/O APIC to route all I/O interrupts to the bootstrap processor.

  • Set Memory Type Range Registers (MTRRs) and Page Attribute Table (PAT) MSR to configure memory caching.

  • Configure Local APIC (LAPIC), and calibrate LAPIC timer using Programmable Interval Timer (PIT) during the waits of the INIT-SIPI-SIPI protocol.

  • Wait for all CPUs to configure memory caching.

  • Configure hard-coded paging setup on all CPUs.

  • Switch all CPUs to 64-bit mode.

  • Configure PCI for DMA with ATA devices and/or with USB devices.

  • Identify all ATA devices and look for all NTFS partitions.

  • Find an NTFS partition containing “PwnOS\Core.bin” and “PwnOS\Main.exe”.

  • Load the pieces of Core.bin to their appropriate places in virtual memory.

  • Initialise the modules of PwnOS (Thread Scheduler, I/O, Page Memory, ...)

  • Make this boot loader have core data structures as if it was a real process.

  • Call CreateProcess on “PwnOS\Main.exe”.

  • Call DestroyProcess on the fake boot process, to free its resources.

Inter-CPU communication in the boot loader (in order to initialise the non-bootstrap processors) is done using the mechanism built into the LAPIC, i.e. by sending special interrupts to other CPUs over the APIC bus.


The physical memory layout during boot is as follows (but may be subject to significant changes).

bootphysicalmemory.png

Figure : Physical Memory During Boot

The 15 stacks are for the up to 15 CPUs during boot. The ATA scratch memory is a buffer for loading data from disk during boot. The paging data is only used during boot, as the page directories are kept in their own page after boot.


For more information on bootstrapping, the I/O APIC, and ACPI, see (9), (10), and (11).

Memory

Page Memory Management


In order to ensure that page management and address translation are efficient and simple in PwnOS, the x86-64 option of using a 3-level page table tree with 2MB pages instead of a 4-level page table tree with 4KB pages is used. This means that page allocations and deallocations require much less updating of data, and fewer Translation Lookaside Buffer (TLB) invalidations.
It also means that all but 8KB (i.e. the PML4 and the PDPT) of the page table tree for the first 512GB of memory can be fit into a single 2MB page. This allows for the virtual addresses of these page directories (PDs) to be constant, making lookup by PwnOS very fast. It also allows the processor to cache entries for address translation much more reliably. For amounts of RAM significantly more than 512GB, additional measures may need to be taken, but for reasonably foreseeable amounts of RAM (about 16TB), this approach of fixed-address PDs works sufficiently (since it only requires 64MB of address space for 16TB of RAM). The 2MB pages do not cause a significant waste of physical or virtual memory, since few processes will be present on the system.
Page allocations will be assigned physical memory immediately (if there is enough), to avoid unnecessary and expensive page faults later. The option is given to reserve pages, though, which can be put into physical memory later.
Since the assumption is made that there are very few processes running on the system at any given time, and allocations/deallocations of pages are not frequent, the brute force TLB invalidation algorithm is sufficient, and in fact, just as efficient as elaborate algorithms for TLB invalidation. The brute force algorithm is:

  1. Interrupt all CPUs other than the current one with an indication that page tables have changed.

  2. Invalidate those entries on every CPU to ensure that they are not in the TLB.

  3. Resume all CPUs.

Although there may be many CPUs, it is probable that all of them are currently either:



  • running different threads of the same process

  • running the thread scheduler, in which case the CPU will not be interrupted and page tables will be updated immediately anyway, or

  • idle

This then means that most CPUs that can be updated need updating, so there is no significant loss in using the brute force algorithm.
Use of a page file (hence the dependency on I/O in the dependency diagram of the Organisational Overview section), if necessary, will be done in a very standard way. That is, pages that are paged-out will be marked as not present, and the bits that are then available in the corresponding entry will be used to indicate which page in the page file contains the data. Memory will only be put to the page file if necessary, or if there is idle CPU time and physical memory is 90% occupied or worse.
To keep track of the physical-to-virtual address mapping, how recently used physical pages are (with a variant of aging), and where free physical pages are, a reverse page table tree is used. Like the page tables, this structure also has fixed virtual addresses for faster lookup. Unlike the page tables, this structure must be updated at periodic intervals to update the age of used physical pages. However, if no page file is used, this update isn’t needed, and since the pages are large (2MB), this periodic interval can be very long compared to with 4KB pages, e.g. 100ms to 1 minute or longer, depending on the application.
The read-only memory pages used for common libraries will be in a fixed virtual address range, for convenient dynamic or static linking to these libraries (“static” in this case meaning that the addresses are hard-coded in the compiled application). Being read-only, fixed address, and common to all processes means that these pages never need to have their TLB entries invalidated, and so TLB misses for these libraries should almost never happen (also because of the 2MB page size). The ability to share other memory between processes may eventually be added, but that is not of concern at the moment, because the focus is on systems with few processes and many threads.
The overall virtual memory layout is as follows.

virtualmemory.png

Figure : Overall Virtual Memory Layout

Naturally, each process must have its own page table tree, but the two page directories for the memory from 2GB to 4GB will be shared between all processes.


For more detailed information on page table trees and their maintenance, please see (9).

Heap Memory Management


The code for managing heap memory in PwnOS is accessible directly from PL3 and PL0. To prevent accidental corruption of this code, and to improve performance, it is kept in read-only pages shared among all processes. Each process will have a heap, and the core code will have its own heap.
Heap memory in PwnOS is maintained using a data structure representing two AVL trees stored at the end of the heap, along with a header of general data about the heap.
heap.png
These data are stored at the end of heap memory instead of the beginning of heap memory to aid in allocation of aligned blocks of memory within the heap.
One of the two AVL trees (the “address tree”) is a tree of all ranges of memory within the usable portion of the heap, both free and allocated, sorted on the address of the range. The other AVL tree (the “free tree”) is a tree of all free ranges within the heap, sorted on the size of the range. Having these trees ensures that memory can be allocated with “best fit” (or a variant thereof), and that memory can be freed, both in O(log n) time, where n is the number of ranges. The operations of finding an allocation’s size from its address and finding the start of an allocation from an address it contains also run in O(log n) time.
However, in order to avoid significant complications and/or performance hits (both asymptotic and clock-time), the two trees must share compound nodes to represent these ranges. As a concrete example to illustrate what this means, suppose that the usable portion of the heap has the following ranges. (Suppose that ranges starting with “f” are free, and those starting with “a” are allocated.)
heapranges.png

Figure : Example Heap Memory Ranges

These ranges could have the following address tree and free tree.


heaptrees.png

Figure : Heap Memory Range Trees

Together, the trees would then be the following.


heapcompoundtree.png

Figure : Heap Memory Range Trees with Compound Nodes

Manipulation of these trees is identical to that of normal AVL trees, with the exception that upon removal of a node from the address tree, the node space left vacant must be filled by moving the node that is first in memory to that position. Additionally, the size of the free range preceding the trees must be updated upon adding and removing address nodes. If ever this free range reaches a size of zero, the heap cannot have more allocated because the trees would then intersect allocated ranges. Alternatively, another heap could be allocated in such a case, to extend the first, without requiring a significant change in the tree management.


All allocations on the heap are aligned to 16 bytes to support SSE# operations that require aligned memory operands. The AllocateAlignedMemory function allows for alignment to higher powers of two. Despite this alignment, the number of bytes requested is not rounded up to a multiple of the alignment. This is so that when checking whether a certain address is in the allocated range using GetAllocationStart and GetAllocationSize, it will be correctly determined that any bytes past the end are not allocated. Also, no range less than 16 bytes will be recorded as a free memory range.


Download 135.4 Kb.

Share with your friends:
1   2   3   4   5   6




The database is protected by copyright ©ininet.org 2024
send message

    Main page