Video Memoires



Dual-ported Video RAM, or VRAM, is a dual-ported variant of dynamic RAM (DRAM), which was once commonly used to store the framebuffer in graphics adapters. Note that most computers and game consoles do not use this form of memory, and dual-ported VRAM should not be confused with other forms of video memory.

Samsung Electronics VRAM

It was invented by F. Dill, D. Ling and R. Matick at IBM Research in 1980, with a patent issued in 1985 (US Patent 4,541,075).[1] The first commercial use of VRAM was in a high-resolution graphics adapter introduced in 1986 by IBM for its RT PC system, which set a new standard for graphics displays. Prior to the development of VRAM, dual-ported memory was quite expensive, limiting higher resolution bitmapped graphics to high-end workstations. VRAM improved the overall framebuffer throughput, allowing low cost, high-resolution, high-speed, color graphics. Modern GUI-based operating systems benefitted from this and thus it provided a key ingredient for proliferation of graphical user interfaces (GUIs) throughout the world at that time.

Monitoring the amount of per-GPU used video memory with a tool such as GPU-Z, reducing the texture quality (to lower the video memory footprint), or testing with a GPU with more video memory can give you hints. However, for various reasons, the GPU-Z “Memory Used” counter may be below the amount of available dedicated video memory but the. Valiant Hearts: The Great War is the story of 4 crossed destinies and a broken love in a world torn apart. Dive into a 2D animated comic book adventure, mixing exploration, action and puzzles. Lost in the middle of the trenches, play as each of the 4 strangers, relive the War and help a young German soldier find his love.

VRAM has two sets of data output pins, and thus two ports that can be used simultaneously. The first port, the DRAM port, is accessed by the host computer in a manner very similar to traditional DRAM. The second port, the video port, is typically read-only and is dedicated to providing a high throughput, serialized data channel for the graphics chipset.[2]

Typical DRAM arrays normally access a full row of bits (i.e. a word line) at up to 1,024 bits at one time, but only use one or a few of these for actual data, the remainder being discarded. Since DRAM cells are destructively read, each row accessed must be sensed, and re-written. Thus, 1,024 sense amplifiers are typically used. VRAM operates by not discarding the excess bits which must be accessed, but making full use of them in a simple way. If each horizontal scan line of a display is mapped to a full word, then upon reading one word and latching all 1,024 bits into a separate row buffer, these bits can subsequently be serially streamed to the display circuitry. This will leave the DRAM array free to be accessed (read or write) for many cycles, until the row buffer is almost depleted. A complete DRAM read cycle is only required to fill the row buffer, leaving most DRAM cycles available for normal accesses.

Such operation is described in the paper 'All points addressable raster display memory' by R. Matick, D. Ling, S. Gupta, and F. Dill, IBM Journal of R&D, Vol 28, No. 4, July 1984, pp. 379–393. To use the video port, the controller first uses the DRAM port to select the row of the memory array that is to be displayed. The VRAM then copies that entire row to an internal row-buffer which is a shift register. The controller can then continue to use the DRAM port for drawing objects on the display. Meanwhile, the controller feeds a clock called the shift clock (SCLK) to the VRAM's video port. Each SCLK pulse causes the VRAM to deliver the next data bit, in strict address order, from the shift register to the video port. For simplicity, the graphics adapter is usually designed so that the contents of a row, and therefore the contents of the shift-register, corresponds to a complete horizontal line on the display.

Through the 1990s, many graphic subsystems used VRAM, with the number of megabits touted as a selling point. In the late 1990s, synchronous DRAM technologies gradually became affordable, dense, and fast enough to displace VRAM, even though it is only single-ported and more overhead is required. Nevertheless, many of the VRAM concepts of internal, on-chip buffering and organization have been used and improved in modern graphics adapters.

See also[edit]

References[edit]

  1. ^Patent US4541075, retrieved 2017-06-07
  2. ^SM55161A 262144×16 bit VRAM data sheet(PDF), Austin Semiconductor, retrieved 2009-03-02
Memoires
Retrieved from 'https://en.wikipedia.org/w/index.php?title=Video_RAM_(dual-ported_DRAM)&oldid=1012827597'

While developing and playing PC games on Windows/WDDM, it is common for stuttering (uneven frame times) to start happening when enabling graphics features or increasing the screen resolution. There are a lot of possible root causes for stuttering; one of the most common is video-memory overcommitment which happens when an application is using more video memory than is physically available on the GPU. In this article, I will describe a method we have been using at NVIDIA to determine whether video-memory overcommittement is happening and is causing any stuttering on Windows Vista, 7, 8 or 8.1. (The method described in this article may not apply to Windows 10/WDDMv2, which has a different memory residency model.)

I assume that you already have a way to monitor the CPU & GPU frame times in your game engine (and/or you are using FRAPS to measure the CPU frame times), and you have identified that your game is stuttering badly at some location, on a certain GPU, with specific graphics settings. In a blog article and GDC 2015 talk, Iain Cantlay discussed how one can quantify stuttering by using frame-time percentiles. Various possible causes & fixes for stuttering were discussed by Cem Cebenoyan at GDC China 2012, John McDonald at GDC 2014, and Iain Cantlay at GDC 2015. Video-memory overcommittement is one of the most common causes of stuttering.

Now, you may suspect that some stutter could be caused by video-memory overcommitment and you may not be sure how to best prove this hypothesis. Monitoring the amount of per-GPU used video memory with a tool such as GPU-Z, reducing the texture quality (to lower the video memory footprint), or testing with a GPU with more video memory can give you hints. However, for various reasons, the GPU-Z “Memory Used” counter may be below the amount of available dedicated video memory but the application may actually still be over-committing video memory. So in general, looking at the used video memory alone is not enough to know if you are over-committing.

Note that in this context, the term “committed” is equivalent to “referenced”, that is, a resource “used” (or bound) in any render call (clear, copy, draw or dispatch call). Only the resources that are referenced in an actual render call are considered to be placed in (dedicated) video memory. Resources that are just created but never referenced in any render call will never be committed to video memory.

At NVIDIA, we have developed a method for determining whether or not any Windows/WDDM graphics application is overcommitting video memory - by using GPUView. GPUView is a free tool provided by Microsoft as part of the Windows Performance Toolkit (which ships with the Windows 8.1 SDK). It can be used with all GL and D3D applications on Windows Vista and forward. If you are not familiar with the tool yet, introductions to GPUView are available on Matthew Fisher’s website and Jon Story’s GDC 2012 presentation. The tool also ships with a help file which I recommend checking out.

How To Detect Video Memory Overcommittement

To determine whether a Windows application is running out of video memory or not, the first thing I do is capture a GPUView trace (see Appendix) from a run where stuttering is happening consistently. I then open up the trace in GPUView and:

  1. Check the GPU Hardware Queues
  2. Check the CPU Context Queues
  3. Check for any EvictAllocation Events

In this article, I am going to take the example of a DX11 application running on a GeForce GTX 680 (with 2 GB of dedicated video memory) on Windows 8 and I am going to use the GPUView build from the Win 8.1 SDK. I have captured 3 GPUView traces from the same in-game location, with different screen resolutions and super-sampling settings: 1920x1200 & 2560x1600 with no super-sampling and 2560x1600 + 120% super-sampling (that is, an effective 3072x1920 resolution).

Step 1: Check the GPU Hardware Queues

Here is how the GPU hardware queues look near the end of the GPUView traces, in time intervals containing six Present packets. In both cases, the top hardware queue is the Graphics queue for the application — which contains the hashed Present packets and other graphics packets — and the bottom-most hardware queue is the Copy Engine queue which contains the red Paging packets.

Figure 1. GPU Queues in the 1920x12v00 trace. No stuttering.

Figure 2. GPU Queues in the 3072x1920 trace. Heavy stuttering.

The times and percentages highlighted in the red boxes below each hardware queue are, respectively, the total amount of GPU time and the fraction of the elapsed time that the queue was not empty in the current time interval.

In the 1920x1200 trace, the Graphics queue was occupied for 100% of the time, so there was no problem at the queue level.

In the 3072x1920 trace, the Graphics queue was occupied only 54.1% of the time and there were large gaps of variable length in the queue. In practice, each of these GPU gaps in the GPU Graphics queue results in a GPU frame-time stutter.

Note that you can have large gaps in the GPU Graphics queue and not have stuttering, if the gaps are always consistent. The reason these gaps are causing stuttering is because they are of variable length.

Let’s zoom in to the first frame from Figure 2 — after the first Present packet and up to the next Present packet. In this frame, the Graphics queue is occupied for only 22.9% of the time. If we select the largest Graphics-queue gap in this frame, we see it is taking 66 ms, as displayed in the bottom-right corner of the GPUView window:

Figure 3. Large 66ms gap in the GPU Graphics Queue, in the 3072x1920 trace.

Step 2: Check the CPU Context Queues

If we scroll down in GPUView and look at the CPU Graphics queue from the application (the one which has the hashed Present packets and the correct process name), we see that the CPU queue always contains between one and two Present packets. (A maximum of two is expected in this case because this application is limiting the number of queued frames to at most two, by using event queries. We happen to know this from talking to the application developer – the event query usage details are not explicitly discernible from the GPUView trace itself.) The point is that the CPU Graphics queue is never empty; if it was empty, that would indicate a problem.

Video Memories Bakersfield

Figure 4. CPU & GPU queues for one GPU frame in the 3072x1920 trace.

We now know that the large gaps in the GPU Graphics queue are not caused by any CPU-GPU sync point or by the application being CPU bound. Otherwise, the CPU queue would go empty at some point.

Step 3: Check for any EvictAllocation Events

The EvictAllocation events can be listed by going to Tools -> Event List and selecting the “DxgKrnl EvictAllocation” events in the GUID List.

The 1920x1200 trace contains no EvictAllocation events during gameplay. In contrast, the 3072x1920 trace has 244 of those events in the current time interval:

Figure 5. EvictAllocation events confirming video-memory overcommitment.

In GPUView, each of these selected events is visualized with a red vertical bar on the time line. As shown in Figure 5, there are multiple of these events per frame and the events are correlated with the gaps in the Graphics GPU queue and with activity in dxgmms1.sys — the OS video memory manager.

If we zoom out and look at a 40s interval from this trace, we see that there are EvictAllocation events all over:

Figure 6. EvictAllocation events in the 3072x1920 trace over a 40 second interval.

Finally, in the 2560x1600 with no super-sampling enabled, there are still some GPU-queue gaps and EvictAllocation events correlated with the GPU queue gaps. So the application is not only over-committing the 2GB of video memory in 3072x1920 but also in 2560x1600, to a lesser extent.

Figure 7. EvictAllocation events in the 2560x1600 trace causing a stutter.

Video Memories Of Breakdancing

Summary

Overall, you know video-memory overcommittement is causing stuttering if:

  1. There are large gaps (>1ms) of variable length in the GPU Graphics queue.
  2. There are no gaps in the game’s CPU Graphics queue.
  3. There are DxgKrnl EvictAllocation events happening during gameplay.
  4. The EvictAllocation events are correlated with the GPU Graphics queue gaps.

TIP: The first thing you can do when opening a GPUView trace is enabling all DxgKrnl EvictAllocation events in the Event List. If you don’t see any such events, then you know you don’t have any video-memory overcommittement.

Appendix: Using the GPUView Reference Chart

Whenever a resource is created, our driver fills in an array of “Preferred Segments” which are used by the OS Video Memory Manager to decide if a resource should be promoted or evicted to/from video memory at any time.

GPUView lets us visualize the percentage of referenced resources that are currently in their preferred segment (also known as “P0”), as well as their fallback segments (P1, P2, etc.). To enable this visualization, you can go to the Charts menu and click on “Toggle Reference Charts”. This adds a Reference Chart below each CPU context queue, for example:

Figure 8. Reference Chart for the CPU Graphics queue in the 3072x1920 trace.

TIP: In practice, if you see any non-zero P2 percentage in any reference charts for your application, then you know you are overcommitting video memory.

Note that this is not a necessary condition though: if there are no P2 references, you may still be over-committing video memory and can check for GPU idle gaps and EvictAllocation events to make sure. Still, it’s nice to be able to confirm that video-memory overcommittement is happening using another data point.

Appendix: Monitoring Available Video Memory

Note that the amount of video memory that is currently available to any application in the whole system can be queried by using the NvAPI_GPU_GetMemoryInfo function from NVAPI.

Here is an example helper class that does it:

The dedicatedVideoMemory and availableDedicatedVideoMemory counters are constant and availableDedicatedVideoMemory may be lower than the dedicatedVideoMemory.

As for the curAvailableDedicatedVideoMemory counter, it is variable and depends on the graphics applications currently running on the system. If you see this counter getting lower than 50MB, then you know you are likely overcommitting video memory (or are going to do it soon) and can verify if that is the case by using GPUView. You may want to query it every frame and display it on screen in QA builds.

Appendix: How To Capture a GPUView Trace

I am assuming that your system has enough RAM (16GB should be enough typically) so that capturing the trace in memory should not affect the system performance significantly. If you have enough RAM, the trace should reflect the underlying problems well.

Video Memories Syracuse

Memories

Using a single computer

For convenience, I use a batch file containing this single line:

cmd /K 'cd C:Program Files (x86)Windows Kits8.1Windows Performance Toolkitgpuview'

And I launch this batch file with Right Click -> “Run as Administrator”. (If you don’t launch this command prompt as Administrator, the logging may fail.)

Video Memories Barbra Streisand

Video

To capture a trace, I normally do the following:

  1. Type “log light” in the GPUView command line (to start capturing the trace)
  2. Launch the game in full-screen mode and reach the repro location
  3. Wait for 20 seconds (ideally without moving the camera to get more stable results)
  4. ALT-Tab back to the GPUView cmd line and type “log” (to stop capturing)
  5. Run “GPUView.exe Merged.etl“ to open up the GPUView trace

Note that it is possible to start the capture after the game has been launched. However, we have found that the CPU queues are not getting captured correctly when doing so on Win8. Starting the capture before launching the game (which is what I am recommending) does not have this corruption problem. This has the drawback of generating larger traces but avoids any potential doubts or confusion.

Using two computers

This method is assuming that you have a second computer available and connected to the same network as the main computer you want to capture a GPUView trace from. On the second computer, you can download the PsExec package and launch this command line:

PsExec YourComputerName -s -u YourDomainYourUserName cmd /K 'cd C:Program Files (x86)Windows Kits8.1Windows Performance Toolkitgpuview'

Video Memories Boise

This will open a remote command prompt on your local computer. You can then type “log light” to start logging and “log” to stop logging.

Video Memories Elmhurst

The “-s” PsExec argument is needed to launch the remote command prompt as Administrator.