Paged VMM HM
Paged Virtual Memory Management (VMM) (11/05/2003)
With the advent of 64-bit address architectures, the ancient principle of paged Virtual Memory Management (VMM) experiences a revival in the early 2000s. Originally, the driving principle for developing VMM systems (paged or segmented or both) was the hunger for more addressable program memory than was available in physical memory. Scarceness was caused by the high cost of memory. The early technology of core memories carried a high price tag due to the tedious manual labor involved. Now the days of high cost per byte of memory are gone. But the days of having insufficient physical memory have returned, with 64 bit addresses. What applications are sufficiently served with just a few Gigabytes of physical memory? Only trivial ones
Paged VMM is based on the idea that with small physical storage some memory areas can be relocated out to disk, while others areas can be moved back from disc into memory when needed. Disk space abounds while main memory is limited. This relocation in and out, called swapping, can be handled transparently, thus imposing no additional constraint of the application programmer. The system must detect situations in which an address references an object that is on disk and must therefore perform the hidden swap-in automatically.
Paged VMM trades speed for address range. The loss in speed is caused by the need to map logical-to-physical, and by the more than occasional disk accesses. A typical disk access can be 100s of 1000s to millions of times more expensive -in number of cycles- than a memory access (load or a store operation). However, if virtual memory mapping allows large programs to run, albeit slowly, that previously could not execute due to their high demands of memory, then the trade-off seems worth the loss in speed. The real trade-off is: being able to execute programs hungry for memory but slowly versus not executing them but swiftly.
Synopsis -
Steps of Paged Virtual Memory Management
-
The Mapping Steps (Two-Level Mapping)
-
Definitions
-
History of Paging
-
Goals and Methods of Paging
-
An Unrealistic Paging Scheme
-
A Realistic Paging Scheme for 32-bit Architecture
-
Typical PD and PT Entries
Steps of Paged Virtual Memory Management
-
Instruction references a logical address
-
VMM determines, whether logical address maps onto a resident page
-
If yes, the memory access completes. In a system with L1 cache, such an access is fast
-
If such access is a store operation, (aka write) that fact is recorded
-
If not on a resident page, the corresponding swapped-out page is found on disk and made available in memory, or else it is created the first time ever
-
Making a new page available requires finding memory space (page frame)
-
If such space can be allocated from unused memory (usually during initial program execution), it is now reserved
-
If no page frame is available, a currently resident page is swapped out and the freed space is reused for the new page
-
Should the page to be swapped out have be dirty, it must be written to disk
-
Otherwise a copy exists on disk already and the swap-out operation is empty
The Mapping Steps (Two-Level Mapping)
-
Instruction references a logical address (la)
-
Processor finds start address of Page Directory (PD)
-
Logical address is partitioned into three bit fields, Page Directory Index, Page Table (PT) Index, and Page Offset
-
Entry in PD is found by adding PD Index left-shifted by 2 to the start address of PD; this yields a PT address
-
Add PT Index left-shifted by 2 to previously found PT address; this yields Page Address
-
Add Page Offset to previously found Page Address; this yields byte address
-
Along the way there may have been 3 page faults
Definitions Demand Paging:
Policy that allocates a page in physical memory only if an address on that page is actually referenced (demanded) in the executing program.
Dirty Bit:
Data structure (single-bit suffices) that tells whether the associated page was written after its last swap-in (or creation).
Global Page:
A page that is used in more than one program; typically found in multi-programming environment with shared pages.
Logical Address:
Address as defined by the architecture. Synonym on Intel architecture: Linear address. Antonym: physical address.
Page:
A portion of logical memory that is fixed in size. The start address of a page is an integer multiple of the page size. Antonym: Segment. A logical page is placed into a physical page frame.
Page Frame:
A portion of physical memory that is fixed in size to one page. It starts at a boundary that is evenly divisible by the page size. Total physical memory should be an integral multiple of the page size.
Page Directory: A list of addresses for Page Tables. Typically this directory consumes an integral number of pages as well. In addition to page table addresses, each entry also contains information about presence, access rights, written to or not, global, etc. similar to Page Table entries. Page Directory Base Register (pdbr):
Resource (typically a register) that holds the address of the Page Directory page.
Page Table Base Register (ptbr):
Resource (typically a register) that holds the address of the Page Table. Used in a single-level paging scheme. In dual-level scheme use pdbr.
Page Fault:
Logical address references a page that is not resident. Consequently, space must be found for the page referenced, and that page must be (created or) swapped in.
Page Frame:
A physical piece of memory, of page-size, and page-size aligned, into which a page of actual information can be placed.
Page Table:
A list of addresses of Pages. Typically each page table consumes an integral number of pages. In addition to page addresses, each entry also contains information about presence, access rights, written to or not, global, etc. similar to Page Directory entries.
Physical Memory:
Main memory actually available physically on a processor. Antonym: Logical memory.
Present Bit:
Single-bit data structure that tells, whether the associated page is resident or swapped out onto disk.
Resident:
Attribute of memory object referenced by executing code: Object is physically in memory, then it is resident, or is not in memory, then it is non-resident.
Swap-in:
Transfer of a page of information from secondary storage to primary storage (into a page frame in memory); from disk to physical memory.
Swap-out:
Transfer of a page of information from primary to secondary storage; from physical memory to disk.
Thrashing:
Excessive amount of swapping. When this happens, performance is severely degraded. This is an indicator for the working set being too small.
Translation Look-Aside Buffer:
Special-purpose cache for storing Page Directory and Page Table entries.
Virtual Contiguity:
Memory management policy that separates physical from logical memory. In particular, virtual contiguity creates the impression that two addresses are adjacent in main memory, while in reality they are an arbitrary number of physical locations apart from one another.
Virtual Memory:
Memory management policy that separates physical from logical memory. In particular, virtual memory can create the impression that a larger amount of memory is addressable than is really available on the target.
Working Set:
That number of allocated physical page frames that ensures that the program executes without thrashing. The amount of physical memory to execute a system must exceed the working set by the amount of memory necessary for all other system functions.
History of Paging
-
Invented about 1960 at University of Manchester for Atlas Computer
-
Used commercially in KDF-9 computer of English Electric Co.
-
Ever since claimed to have been invented by most computer manufacturers
-
In fact, KDF-9 was one of the major architectural milestones in computer history aside from von Neumann’s and Atanasoff’s machines; incorporated first cache and VMM, had hardware display for stack frame addresses, etc.
-
In the late 1960s to early 1980s, memory was expensive, processors were expensive and getting fast, and programs grew large. Insufficient memories were common and were one aspect of the growing Software Crisis of the late 1970s
-
16-bit minicomputers and 32-bit mainframes became common; also 18-bit address architectures (CDC and Cyber) of 60-bit words were common
-
Paging grew increasingly popular: fast execution was gladly traded against large address range at cost of slower performance
-
By mid 1980s, memories became cheaper, faster, large ones prevailed
-
By the late 1980s, memories had become cheaper yet, the address range remained 32-bit, and large physical memories became possible and available
-
Several supercomputers were designed with operating systems that provided no memory virtualization at all, no memory mapping. For example, Cray systems were built without VMM. The Intel Hypercube NX ® operating system had no virtual memory management
-
Just when VMM was falling into disfavor, the addressing limitation of 32 bits started constraining programs. In the early 1990, 64-bit architectures started becoming common-place, rather than an exception
-
Intermediate steps between architectural jumps: The Harris 3-byte system with 24-bit addresses, not making the jump to 32 bit quite. The Pentium Pro ® with 36 bits in extended addressing mode, not quite yet making the jump to 64 bit addresses. Even the early Itanium ® family processors had only 44 physical address bits, not quite making the jump to the physical 64 bit address space
-
By end of 1990s, 64-bit addresses were common, 64-bit integer arithmetic will be performed in hardware, not longer via slow library extensions
-
Pentium Pro has 4kB pages by default, 4MB pages if page size extension (pse) bit is set in normal addressing mode
-
However, if the physical address extension (pae) bit and pse bit are set, the default page size changes from 4kB to 2 MB
Goals and Methods of Paging (VMM)
-
Make full logical (virtual) address space available, even if smaller physical memory installed
-
Perform mapping transparently. Thus, if at a later time the target receives a larger physical memory, the same program will run unchanged except faster
-
Map logical onto physical address, even if this costs some performance
-
Implement the mapping in a way that the overhead is small in relation to the total program execution
-
A necessary requirement is a sufficiently large working set, but also:
-
This is accomplished by caching page directory- and page table entries
-
Or by placing complete page directory into a special cache
An Unrealistic Paging Scheme
-
Assume 32-bit architecture, byte-addressable, 4-byte words, 4kB page size
-
Single-level paging mechanism --later we’ll cover multi-level mapping
-
With 4kB page size, rightmost 12 bits of each page address are all 0, or implied
-
Page Table thus has (32-12 = 20) 220 entries, each entry typically 4 bytes
-
Page offset of 12 bits identifies each byte on a 4kB page
-
With each Page Table entry consuming 4 bytes, results in page table of 4 MB
-
This is not a good paging scheme, since the scarce resource, physical memory, consumes already a 4MB overhead. Note that Page Tables should be resident, as long as some entries point to resident user pages
-
Thus the overhead may exceed the total available resource - memory
-
Moreover, almost all entries are empty; their associated pages do not exist yet, pointers are null
-
Problem: The one data structure that should be resident is too large, consuming all/most of the physical memory that is so scarce
-
So: break it into smaller units!
-
Disadvantage of additional mapping: more memory accesses
-
Advantage: Avoiding large 4MB Page Table
A Realistic Paging Scheme for 32-bit Architecture
-
Again: 32-bit architecture, byte-addressable, 4-byte words, 4kB page size
-
Two-level paging mechanism, consisting of Page Directory and Page Tables in addition to user pages
-
With 4kB page size, again rightmost 12 bits of any page address 0 implied
Mechanism to determine a physical address:
-
Have HW register point to Page Directory
-
Or else implement Page Directory as a special-purpose cache
-
Or place PD into a-priori known memory location
-
But there must be a way to locate the PD, best without extra memory access
-
Design principle: have user pages, pages of Page Tables, and pages of Page Directory look the same: all consume a page frame each
-
Every logical address is broken into three parts: two indices + one offset
-
Note that Page Directory Index is indeed an index; to find the entry in PD, first << 2 (multiply by 4), add this to the start address of PD, then PD entry found
-
Similarly, Page Table Index is an index; to find the entry: << 2 and add result to the start address of PT found in previous step, thus entry in PT found
-
The PT entry holds the address of the user page, rightmost 12 bit implied all 0
-
The rightmost 12 bits of a logical address define the page offset within the found user page
-
Since pages are 4 kB in size, 12 bits suffice to identify any byte in a page
-
Given that the address of a user page is found, add the offset to its address, and the final byte address is identified
-
Disadvantage: Multiple memory accesses, total of 3
-
In fact, any of these total 3 accesses could cause page faults, resulting in 3 swap-ins
-
Performance loss could be tremendous; i.e. several decimal orders of magnitude slower than a single memory access
-
Thus some of these data structures must be cached
-
In Intel Pentium Pro ® the Translation Look-Aside Buffer (TLB) is a special purpose cache of PD or PT entries
-
Also possible to cache the complete PD, since contained in size O( 4 kB )
Typical PD and PT Entries
-
Entries in PD and PT need 20 bits for address; lower 12 bits implied
-
Other 12 bits can be used for additional information
-
P-Bit: Is referenced page present? (aka resident)
-
R/W/E bit: Can referenced page be read, written, executed, all?
-
User/Supervisor bit: OS dependent, is page reserved for privileged code?
-
Typically the P-Bit is positioned for quick, easy access: rightmost bit in 4-byte word
-
Some information is unique to PD or PT
-
For example, P-Bit not needed in PD entries, if Page Tables must be present
-
On systems with varying page sizes, page size information must be recorded
-
See examples below:
If the Operating System exercises a policy to not ever swap-out Page Tables, then the respective PT entries in the PD do not need to track the fact that a PT was modified (i.e. the Dirty bit). Hence there may be reasons that PT and PD entries have minor differences. But in general, the format and structure of PTs and PDs are identical, and the size of user pages, PTs and PDs are identical.
Share with your friends: |