在读DirectXSDK中关于内存池的描述时，对于Video memory, AGP memory and System memory这三个概念比较模糊，google了一下，找到了一些很好的解释。引用如下：
Those two identifiers are hints to the driver for how the buffer will be used, to optimize how the card accesses the data. They make sense even without AGP memory.
On systems with AGP memory, there are three classes of memory:
1) System Memory. This is cached, and reasonably fast to read from and write to with the CPU. However, it typically needs an additional copy before the graphics card can use it. System and scratch pool memory goes here.
2) AGP Memory. This is still CPU-local RAM, but it is not cached. This means that it's slow to read from, and it's slow to write to, UNLESS you write sequentially, without doing too much other memory traffic inbetween, and overwrite every byte, so that the write combiners don't need to fetch lines from RAM to do a combine. Thus, generating software-transformed vertices as a stream into this buffer might still be fast. For the GPU, the AGP memory is directly accessible, so no additional copy is needed. Dynamic pool memory goes here.
3) Video Memory. This is RAM that's local to the GPU. It typically has insanely high throughput. It is accessible over the bus for the CPU, but going over the bus is really slow; typically both for reading and for writing. Thus writing directly into this memory (or even worse, reading out of it), is not recommended. Default pool memory goes here.
On systems with PCI-Express, some of the AGP vs system memory differences are reduced, but the usage hints you're giving the driver ("I will change the data by writing it sequentially" vs "I will not change the data much") are still useful for optimizing performance.
Video memory is the memory chips physically located on the card. The card can easily access this memory, while reading it from the CPU is extremely slow.
AGP memory a part of your main memory on the motherboard that has been set aside for talking to the graphics card. The card and your CPU can access this memory at a decent speed.
This pageshows that your BIOS "AGP aperture size" controls the size of your AGP memory, and explains how "reducing the AGP aperture size won't save you any RAM. Again, what setting the AGP aperture size does is limit the amount of RAM the AGP bus can appropriate when it needs to. It is not used unless absolutely necessary. So, setting a 64MB AGP aperture doesn't mean 64MB of your RAM will be used up as AGP memory. It will only limit the maximum amount that can be used by the AGP bus to 64MB (with a usable AGP memory size of only 26MB)."
1) video memory can mean one of two things depending on the context the term is used in:
a. video memory is generally any memory which is used by the graphics chip.
b. video memory (correctly "local video memory") is memory that exists on the graphic card itself (i.e. RAM chips that live on the graphics card, they are 'local' to the graphics chip).
2) AGP memory is main memory on your system motherboard that has been specially assigned for graphics use. The "AGP Aperture" setting in your system BIOS controls this assignment. The more you have assigned for AGP use, the less you have for general system use. AGP memory is sometimes also known as "non-local video memory".
3a) 'Local' video memory is very fast for the graphics chip to read from and write to because it is 'local' to the graphics chip.
3b) 'Local' video memory is extremely slow to read from using for the system CPU, and reasonably slow to write to using the system CPU. This is for a number of reasons; partly because the memory is physically on a different board (the graphics card) to the CPU (i.e. it's not 'local' for the CPU); partly because that memory isn't cached at all for reads using the CPU, and only burst cached for writes; partly due to the way data transfers over bus standards such as AGP must be done.
4a) AGP memory is reasonably fast for the graphics chip to read from or write to, but not as fast as local video memory.
4b) AGP memory is fairly slow to read from using the system CPU because it is marked as "Write Combined" so any reads don't benefit from the L2 and L1 caches (i.e. each read is effectively a cache-miss). AGP memory is however faster than local video memory to read from using the CPU since it is local to the CPU.
4c) AGP memory is reasonably fast to write to using the system CPU. Although not fully cached, "Write Combined" memory uses a small buffer that collects sequential writes to memory (32 or 64 bytes IIRC) and writes them out in one go. This is why sequential access of vertex data using the CPU is preferable for performance.
5) D3DUSAGE_DYNAMIC is only a hint to the display driver about how you intend using that resource, usually it will give you AGP memory, but it isn't guaranteed (so don't rely it!).
6) Generally, for vertex buffers which you need to Lock() and update using the CPU regularly at runtime should be D3DUSAGE_DYNAMIC, and all others should be static.
7) Graphics drivers use techniques such as "buffer renaming" where multiple copies of the buffer are created and cycled through to reduce the chance of stalls when dynamic resources are locked. This is why it's essential to use the D3DLOCK_DISCARD and D3DLOCK_NOOVERWRITE locking flags correctly if you want good performance. It's also one of the many reasons you shouldn't rely on the data pointer from a Lock() after the resource has been unlocked.
8) General advice for good performance: - treat all graphics resources as write-only for the CPU, particularly those in local video memory. CPU reads from graphics resources is a recipe for slowness.
- CPU writes to locked graphics resources should be done sequentially.
- it's better to write all of a vertex out to memory with the CPU than it is to skip elements of it. Skipping can harm the effectiveness of write combining, and even cause hidden reads in some situations (and reads are bad - see above).
since the "local video memory" is fast for video card to manipulate, and the video card dedicated to GRAPHICS PROCESS,why bother to use the "AGP memory"? is that only because the "local video memory" may be not enough for graphic data storage? what role does the CPU play in the process of graphics??
Yes. That's one of the main reasons. AGP comes from a time (~10 years ago!) when a typical graphics card would have, say, 2MB of local video memory and a typical PC system had 64-128MB of main system memory, so it made sense to set some system memory aside for situations where there wasn't enough local memory.
In these days of monster graphics cards with 512MB of local video memory, it's less likely used as an overflow.
Another reason is dynamic graphics data - any data that needs to be regularly modified with the CPU is usually better off in AGP memory (it's write combined, but it's local to the CPU too, so uses less CPU time to access) Not very much these days. Mostly application-side jobs like writing vertex data into locked buffers, object culling, traversing scene graphs, loading resources into main memory, things like that. On the D3D and device driver side: handling the D3D API, swizzling and other conversion when some types of resources are locked/unlocked [I believe some GPUs can even do their own swizzling now though], and setting up the command buffer for the GPU.
Before hardware T&L, the CPU also handled all vertex processing.
The fact that modern GPUs now handle so much of the graphics pipeline makes avoiding unnecessary serialization between CPU and GPU all the more important (i.e. stalls where one has a resource locked and the other wants to use it), thus things like buffer renaming. Serialization between CPU and GPU throws away the GPUs processing ability.