0.2PAE: 32- vs. 64-Bit Systems
Addressing physical memory above 4 GB requires more than the 32 bits of address offered by the standard operating mode of Intel (32-bit) processors. To this end, Intel introduced the 36-bit physical addressing mode called PAE, starting with the Intel Pentium Pro processor.
This article describes some techniques that Microsoft Windows operating systems and several UNIX operating systems use to provide support to applications using PAE mode addressing. Because processes running in these environments have 32-bit pointers, the operating system must manage and present PAE's 36 bits of address in such a way that the applications can practically use it. The key question is: how does the operating system solve this problem? The performance, functionality, simplicity of programming, and reliability of how these issues are handled will determine the usefulness of the large memory support.
PAE is supported only on 32-bit versions of the Windows operating system; 64-bit versions of Windows do not support PAE. For information about device driver and system requirements for 64-bit versions of Windows, see 64-bit System Design. The Address Windowing Extension (AWE) API is supported on 32-bit systems. It is also supported on x64 systems for both native and Wow64 applications.
Although support for PAE memory is typically associated with support for more than 4 GB of RAM, PAE can be enabled on Windows XP SP2, Windows Server 2003, and later 32-bit versions of Windows to support hardware-enforced Data Execution Prevention (DEP).
The information in this article applies to Windows 2000, Windows XP Professional, Windows Server 2003, and later versions of these operating systems, referred to as "Windows" in this paper.
Top of page
0.3Technical Background
Address Translation in standard 32-bit mode
All IA-32 processors (Intel Pentium, Pentium Pro, Pentium II Xeon, and Pentium III Xeon) support 32 bits of physical address (4 GB), allowing applications to address 4 GB of virtual address when they are running. The system must translate the 32-bit virtual address that the applications and operating system use to the 32-bit physical address used by the hardware. (Pentium Pro was the first processor in the IA-32 family to support PAE, but chipset support is also required for 36-bit physical addresses, which was usually lacking.)
Windows uses two levels of mapping to do the translation, which is facilitated by a set of data structures called page directories and page tables that the memory manager creates and maintains.
PSE Mode
IA-32 supports two methods to access memory above 4 GB (32 bits). PSE (Page Size Extension) was the first method, which shipped with the Pentium II. This method offers a compatibility advantage because it kept the PTE (page table entry) size of 4 bytes. However, the only practical implementation of this is through a driver. This approach suffers from significant performance limitations, due to a buffer copy operation necessary for reading and writing above 4 GB. PSE mode is used in the PSE 36 RAM disk usage model.
PSE uses a standard 1K directory and no page tables to extend the page size 4-MB (eliminating one level of indirection for that mode). The Page Directory Entries (PDE) contains 14 bits of address, and when combined with the 22-bit byte index, yields the 36 bits of extended physical address. Both 4-KB and 4-MB pages are simultaneously supported below 4 GB, with the 4-KB pages supported in the standard way.
Note that pages located above 4 GB must use PSE mode (with 4-MB page sizes).
PAE Mode
PAE is the second method supported to access memory above 4 GB; this method has been widely implemented. PAE maps up to 64 GB of physical memory into a 32-bit (4 GB) virtual address space using either 4-KB or 2-MB pages. The Page directories and the page tables are extended to 8 byte formats, allowing the extension of the base addresses of page tables and page frames to 24 bits (from 20 bits). This is where the extra four bits are introduced to complete the 36-bit physical address.
Windows supports PAE with 4-KB pages. PAE also supports a mode where 2-MB pages are supported. Many of the UNIX operating systems rely on the 2 MB-page mode. The address translation is done without the use of page tables (the PDE supplies the page frame address directly).
Top of page
0.4Operating System Implementation and Application Support
The next issue is how the operating system can manage and present PAE's 36 bits of address in such a way that an application (with 32-bit pointers) can practically use the additional memory.
There are five application support models. The first two models (Server Consolidation and Large Cache) are completely handled within the operating system and require no changes to the application. The second two models (Application Windowing and Process Fork) require application changes to support API extensions for large memory. The last model (PSE 36 RAM Disk) requires no changes to the operating system (implemented in a driver), but mandates application changes to support the driver.
1. Server Consolidation
A PAE-enabled operating system should be capable of utilizing all physical memory provided by the system to load multiple applications; for example, App#1, App#2, App #N, each consisting of 4 GB (maximum) of virtual address. In a non-PAE enabled system, the result can be a great deal of paging, since maximum physical memory in the system is limited to 4 GB.
With the additional physical memory supported under PAE mode, an operating system can keep more of these applications in memory without paging. This is valuable in supporting server consolidation configurations, where support of multiple applications in a single server is typically required. Note that no application changes are required to support this capability.
2. Large Cache
Using additional PAE-enabled memory for a data cache is also possible. If the operating system supports this feature, applications need not be recoded to take advantage of it. Windows Advanced Server and Datacenter Server support caching on a PAE platform and can utilize all of the available memory.
3. Application Windowing
A PAE-enabled operating system can introduce an API to allow a properly coded application access to physical memory anywhere in the system, even though it may be above 4 GB. Ideally, the API to allocate "high" physical memory and create or move the window should be quick and simple to code. This is highly advantageous for applications that require fast access to large amounts of data in memory.
Sharing high memory between processes can introduce quite a bit of complexity into the API and the implementation. Windows avoids this kind of sharing.
In addition, the support of paging makes the design and implementation of the operating system much more difficult and makes deterministic performance more difficult to achieve. Windows avoids paging of high memory as well.
4. Process Fork and Shared Memory
This application support model splits the current process into two or more nearly identical copies. A copy is made of the user and system stacks, the allocated data space, and the registers. The major difference is that one has the Process ID (PID) of the parent; the other has a new PID. The fork returns a value that is a PID. The PID is zero for the copy that is the child or for the PID of the child for the copy that is the parent.
5. PSE36 RAM Disk
Through use of a kernel device driver, much like a RAM disk, it is possible to utilize memory above 4 GB with no change whatsoever to the operating system. Compatibility between the base operating system (running in 32-bit mode) and the driver (running in PAE mode) is maintained since the page tables are kept at 4 bytes wide. The trade-offs for this very low development impact are several:
•
|
Performance degrades due to all I/O being forced to perform double buffering.
|
•
|
Application development impact is not appreciably less than that required for current APIs.
|
•
|
It cannot be used as a "consolidation server" because all applications share the same 4 GB physical memory space.
|
Design Implementation
The operating system implementations for large memory support must directly address these issues in order to be successful. The simplicity, reliability, and performance of the operating system will be directly impacted, based on the design choices made in handling these issues.
Top of page
Share with your friends: |