Archive for February, 2012

Why is GPU emulation so demanding? [Part 1: How to emulate a GPU]

Recently scummos from the Dolphin forums created a thread asking why GPU emulation in Dolphin is so demanding. This article series is meant to give some insight into the reasons behind this. Explaining this topic in-depth is very hard if you don’t know how hardware emulation works, but I’ll try to describe the most important concepts.

In order to understand how GPU emulation works, you should first ask yourself the question: How is the GPU even accessed by the currently running application? From an application programmer’s perspective it’s fairly simple because their are multiple Application Programming Interfaces (API) which can be used for this task. On PC hardware, you have Direct3D and OpenGL for this. On gaming consoles, the console vendor usually includes an own GPU programming API in the console’s Software Development Kit (SDK). The Gamecube’s/Wii’s GPU API is called GX. These GPU APIs aren’t really special, they’re just a bunch of functions programmers can use to access the GPU; for example an IDirect3DDevice9 object in Direct3D9 has a method called Clear() which can be used to clear the render target’s color/depth/stencil buffers.ย IDirect3DDevice9::CreateTexture() can be used to create a texture object with the specified parameters. IDirect3DDevice9::DrawIndexedPrimitive() will render some triangles to the screen. Similar calls exist in other GPU APIs (e.g. OpenGL: glClear, glGetTexture, glDrawArrays).

However, these functions aren’t the lowest level of access, their implementation are actually just a bunch of C code (or whatever language was used for implementing them) as well. Internally, those functions access the GPU registers in order to directly program the GPU.

Now you probably wonder what GPU registers are. For that, you’ll need to understand the concept of memory-mapped I/O (MMIO). It boils down to using the same address bus for both memory (RAM) and hardware devices (more precisely: I/O devices). That means that a certain part of the whole address range is dedicated to RAM, i.e. when the CPU tries to access that part, the request will transparently be redirected to the RAM. Another part of the whole address range might be dedicated to one of the I/O devices, etc. Take a look at Wikipedia for an example of a memory map. Assuming we have a dedicated chunk of mapped memory for the GPU, we’ll refer to that chunk as the GPU register block. Furthermore the GPU register block can be seen as a group of equally sized registers (Flipper uses 4 bytes for each register). We can refer to a register either by its index or (more commonly) by its byte offset from the beginning of the GPU register block. Each GPU register usually is linked to a specific part of GPU functionality (“execute a clear operation”) or a specific GPU configuration parameter (“color used for clearing the render target”).

Using MMIO, a GPU (or an I/O device in general) may expose certain capability to the CPU via the GPU registers. By reading from certain registers, the CPU may request information from the GPU. By writing to certain registers, the CPU may tell the GPU to execute some action (or to setup some parameters). For example, the implementation of a function like IDirect3DDevice9::Clear (see above) will write to some register X to tell the GPU what clear color to use will write to another register Y to tell the GPU to actually execute the clear operation. The nice thing about this is: glClear (OpenGL) and the clear functionality of GX work the same way internally! Of course, the GPU registers will be different on each GPU (I cheated a bit on this, IDirect3DDevice9::Clear actually sits a few layers above the GPU registers; one of them is the GPU driver which knows the GPU registers of the accessed hardware). In emulation, however, this isn’t an issue because, well, a console always uses the same hardware ๐Ÿ˜‰

Note that this is the lowest-level way to access the GPU, so there’s nothing which may go wrong when emulating on register level (I’ll dedicate another blog post to explain what can go wrong).

But how do we actually emulate the Gamecube’s GPU (Flipper) or the Wii’s GPU (Hollywood)? First, let’s note that whenever I just say Flipper or Hollywood, I usually refer to both of them (unless noted otherwise). That’s because they are both using the same register set (something which is uncommon even for different GPU generations from the same vendor) and the only real difference is the clock rate and the amount of texture memory (both don’t really matter in Dolphin).

The way GPU emulation works is fairly simple once you’ve got perfect CPU emulation (ha! :P). Whenever the CPU accesses memory, the CPU emulation code checks what the accessed address is mapped to. If it’s mapped to the GPU, we’ve got a bunch of handler functions which will check what the accessed register is supposed to do and execute the proper action then. In case of an emulator like Dolphin which tries to use the GPU where appropriate for this, this includes translating the GPU registers to the available Direct3D/OpenGL API functions (using these APIs is the only way to access the GPU on PC hardware by the way because of user-mode restrictions, i.e. we can’t directly access the GPU registers from Dolphin; only GPU drivers may do so).

For example, Flipper has a range of registers called BP memory which describe the current render pipeline configuration (it’s similar to the pixel shader stage in modern GPU APIs or to render and sampler states in D3D9). Dolphin has a list of these BP registers and defines a name for each one for better readability in BPMemory.h. The actual emulation of BP registers happens in BPStructs.cpp in the BPWritten() function which gets called whenever the BP memory has been written to. It has a fairly big “switch” statement which basically just check what register was written to in order to call the proper handler function; you might notice that many register handlers don’t do anything at all (e.g. BPMEM_LOADTLUT0 which just “break”s) because they’re just general parameters which don’t need any immediate reaction but just affect behavior of later actions.

Our render target clearing example from above is chosen a bit unfortunate, but I’ll explain it anyway: If thereย was a GXClear function in the GX API (there isn’t), it would poke the BPMEM_CLEAR_AR and BPMEM_CLEAR_GB registers and write the alpha/red and green/blue components of the color used for clearring the back buffer there. To actually execute the clear operation the function would also poke the BPMEM_TRIGGER_EFB_COPY register. The value written to that register is described by the UPE_Copy struct in BPMemory.h: The 12th bit (named “clear” in the Dolphin code) is set to 1 (true) if a clear operation should be executed. Checking the implementation in BPStructs.cpp under the case BPMEM_TRIGGER_EFB_COPY, you can see that if (PE_copy.clear) is true, the function ClearScreen (implemented in BPFunctions.cpp) is called which does some other fancy stuff and then calls g_renderer->ClearScreen which is the backend specific implementation. Looking at the various Render.cpp files in each video backend’s source directory (e.g. OpenGL) you can see that these methods don’t do anything else than calling glClear, i.e. the respective clear function of the GPU API used for emulation!

Well, almost, because when you look at Direct3D9/11 you can see that they don’t use those functions – because they differ a bit in functionality from their GX counterpart. This is a limitation of the emulation in our hardware accelerated video backends: We can only emulate the hardware functionality which is actually exposed to us from the GPU APIs used for emulating Flipper. In case of ClearScreen a workaround is possible, but this isn’t possible for all Flipper features. That’s the nice thing about software rasterizer backends: You don’t end up debugging a strange graphics glitch just to find out that it can’t be fixed with the limited functionality exposed by Direct3D9 but you can just go ahead and fix it ๐Ÿ˜€

I’ll further elaborate on the limitations of Direct3D and OpenGL when emulating Flipper in another blog post, but that’s it for now – all you need to know about how GPU emulation works at the lowest level. Or at least a starting point for further reading ๐Ÿ˜‰