Memory sizes of Microsoft Windows Operating Systems


Technical Issues with Large Memory Support in IA32



Download 0.82 Mb.
Page5/9
Date08.01.2017
Size0.82 Mb.
#7641
1   2   3   4   5   6   7   8   9

0.5Technical Issues with Large Memory Support in IA32


Memory Sharing and Inter-Process Communications
In all cases where memory remap is being used for allocating memory to processes, which is common to many PAE variants, memory sharing is problematic. The physical memory being remapped is "outside" the process virtual address space. Thus, the physical memory is less connected to the process in the sense of sharing the process's internal access and security controls, as well as those provided by the operating system.

To apply access and security controls, it is necessary to greatly increase the bookkeeping required of the operating system memory manager as well as the API set the application developer must use. This negatively impacts the high performance possible using very fast remap operations. It is also important to remember that IPC/memory sharing may still take place between two processes' virtual address spaces in any case, regardless of the physically mapped memory each may be using.



TLB Shoot-down
Translation Look-aside Buffers (TLBs) are processor registers, or a cache, that provides a direct logical-to-physical mapping of page table entries. Once loaded, the processor has to read the page directories very infrequently (TLB misses) unless a task switch occurs.

During a remap operation, it is necessary to ensure that all processors have valid logical-to-physical mapping on chip. Therefore, remap operations require a TLB shoot-down, because the logical-to-physical association is invalidated by the remap (where "logical" = the application/process view of memory).

There is a performance impact while the processor (or processors) reload the TLB. All operating systems have this issue, and in the case of PAE memory support, they ameliorate the issue in different ways:



Windows provides the ability for a single application to "batch" the remap operations required so that all happen simultaneously and only cause one TLB shoot-down and one performance dip instead of random remaps, each of which would impact performance. This is quite adequate for large applications, which are typically running on single-purpose systems.



Other operating systems provide "victim" buffers or allow one process to share another process's mappings, but at a cost of more synchronization and API complexity.
Windows XP also provides this "batch" or Scatter/Gather functionality. Additionally, performance of these operations has been improved for Windows Server 2003, Enterprise Edition and Datacenter Edition.

I/O
At one level or another, all the PAE variants support both 32-bit and 64-bit DMA I/O devices with the attendant drivers. However, there are a number of provisos and conditions.

Kernel and memory organization
Typically, kernel memory space organization is unchanged from the standard kernel for the operating system. In many cases, items such as the memory pool size remain the same. For backward compatibility, PCI base address registers (BARs) remain the same. Larger memory sizes cause some shifting of kernel address space, usually when between 16 GB and 32 GB of memory is physically present in the system.

One difference between operating systems is whether memory allocations are dynamic:





Some operating systems require the administrator to configure the amount of memory used for various purposes (caching, mapping, consolidation, and so on).



Windows does not require the administrator to configure memory allocations, because the usage is dynamic, within the constraints of the APIs used.

Hardware Support
The PCI standard provides a method whereby adapters may physically address more than 4 GB of memory by sending the high 32 bits of address and the low 32 bits of address in two separate sends. This is called Dual Address Cycle (DAC) and is used both for 32-bit adapters that understand 64-bit addresses but have only 32 address lines and for adapters that do have 64 address lines. This is a backward compatibility feature.

Given the method with which PCI addresses memory beyond 32 bits, there is a failure mode that is subtle. Any I/O range that "spans" across two 4-GB regions must be treated specially. If not, the address range will be correctly decoded for only one part of the transfer and the remaining part will be transposed to an incorrect memory location. This will corrupt memory and will crash the system, crash the application, or silently corrupt data at that location. Applications cannot prevent this because they are only presented virtual addresses and have no visibility to the physical level. All operating systems that use PAE face this problem, but some do not explicitly prevent this from occurring and instead depend on the device driver to take the correct actions.

Windows, however, explicitly prevents this problem. When an I/O range spans in this fashion, Windows returns two separate addresses and ranges to the device and driver. The final special case is the first transition from 4 GB to beyond. No DAC is required for the region below 4 GB, but DAC is required for the rest of the transfer. Again, Windows returns two separate addresses and ranges in this case to prevent memory corruption.

Obviously, DAC or 64-bit adapters and drivers provide the best performance as no buffering of I/O occurs. This buffering is required, however, whenever the adapter and driver cannot utilize more than 32 bits of address information. All operating systems that utilize PAE mode addressing support this "double buffering" in some fashion, as a backward compatibility feature. This buffering does have a performance penalty that is dependent on several factors:





Adapter hardware performance



Driver performance



Operating system support provided for double buffering



Amount of physical memory installed in the system

As the physical memory increases, the relative amount of I/O addresses beyond 32 bits also increases in relation to those addresses below 32 bits. In most cases, the operating system transparently provides double buffering, although some Unix variants do not provide any assistance in this function and require any 32-bit devices and drivers to manage their own double buffering routines and allocations.

Driver Issues
Typically, device drivers must be modified in a number of small ways. Although the actual code changes may be small, they can be difficult. This is because when not using PAE memory addressing, it is possible for a device driver to assume that physical addresses and 32-bit virtual address limits are identical. PAE memory makes this assumption untrue.

Several assumptions and shortcuts that could previously be used safely do not apply. In general, these fall in to three categories:





Buffer alignment in code that allocates and aligns shared memory buffers must be modified so that it does not ignore the upper 32 bits of the physical address.



Truncation of addresses information in the many locations this might be kept must be avoided.



It is necessary to strictly segregate virtual and physical address references so DMA operations do not transfer information to or from random memory locations.

PAE mode can be enabled on Windows XP SP2, Windows Server 2003 SP1 and later versions of Windows to support hardware-enforced DEP. However, many device drivers designed for these systems may not have been tested on system configurations with PAE enabled. In order to limit the impact to device driver compatibility, changes to the hardware abstraction layer (HAL) were made to Windows XP SP2 and Windows Server 2003 SP1 Standard Edition to limit physical address space to 4 GB. Driver developers are encouraged to read about DEP.

Paging
Most operating systems supporting PAE support virtual memory paging of some nature for the physical memory beyond 4 GB. This usually occurs with some restrictions such as limiting the boot/system paging file to 4 GB or spreading the paging file (or files) across multiple operating system-organized volumes (not necessarily physical spindles).

Although this allows the obvious benefits of virtual memory, the downside is the performance impact on applications that have one or more of the following characteristics:





Use a large amount of physical memory for their data sets



Do a great deal of I/O



Have large executable working sets

Finally, paging support typically comes at the expense of increasing the API set and slowing development and version migration.

User APIs
All operating systems supporting PAE have APIs that allow for use of physical memory by processes beyond the virtual address range possible on IA-32 processors. These differ primarily in how much support they provide for the items described earlier: memory sharing, inter-process communications, paging, and so on. A simple and straightforward API set is provided by Windows--the Address Windowing Extensions (AWE) API set--which consists of only five API calls, with the most complex API being four times larger and involving kernel and user-level calls.

The proliferation of proprietary APIs--some of which are tied directly to the processor architecture (kernel level)--makes porting applications from one Unix variant to another expensive, time-consuming, and a constant struggle to balance costs versus performance optimization. Windows provides an API set which is simple, fast, and completely portable between 32-bit and 64-bit hardware platforms, requiring only a recompile in order to function.



Page Size
Almost all operating systems supporting PAE use differing page sizes when providing physical memory beyond 4 GB to an application. The primary exception is Windows, which presents to applications only 4 KB pages on IA-32 platforms (this is different on Itanium-based platform).

The issue with using varying page sizes for applications is related to additional application complexity required to function correctly with differing memory allocation sizes, as well as subtle effects related to the underlying assumptions that almost all applications have with page size. Although research shows a small class of applications can benefit from larger page sizes (2 MB or 4 MB), because each TLB entry spans a greater address range, the general rule is applications don't benefit from larger page sizes.



Top of page

Download 0.82 Mb.

Share with your friends:
1   2   3   4   5   6   7   8   9




The database is protected by copyright ©ininet.org 2024
send message

    Main page