Why is GPU emulation so demanding? [Part 1: How to emulate a GPU]

Recently scummos from the Dolphin forums created a thread asking why GPU emulation in Dolphin is so demanding. This article series is meant to give some insight into the reasons behind this. Explaining this topic in-depth is very hard if you don’t know how hardware emulation works, but I’ll try to describe the most important concepts.

In order to understand how GPU emulation works, you should first ask yourself the question: How is the GPU even accessed by the currently running application? From an application programmer’s perspective it’s fairly simple because their are multiple Application Programming Interfaces (API) which can be used for this task. On PC hardware, you have Direct3D and OpenGL for this. On gaming consoles, the console vendor usually includes an own GPU programming API in the console’s Software Development Kit (SDK). The Gamecube’s/Wii’s GPU API is called GX. These GPU APIs aren’t really special, they’re just a bunch of functions programmers can use to access the GPU; for example an IDirect3DDevice9 object in Direct3D9 has a method called Clear() which can be used to clear the render target’s color/depth/stencil buffers. IDirect3DDevice9::CreateTexture() can be used to create a texture object with the specified parameters. IDirect3DDevice9::DrawIndexedPrimitive() will render some triangles to the screen. Similar calls exist in other GPU APIs (e.g. OpenGL: glClear, glGetTexture, glDrawArrays).

However, these functions aren’t the lowest level of access, their implementation are actually just a bunch of C code (or whatever language was used for implementing them) as well. Internally, those functions access the GPU registers in order to directly program the GPU.

Now you probably wonder what GPU registers are. For that, you’ll need to understand the concept of memory-mapped I/O (MMIO). It boils down to using the same address bus for both memory (RAM) and hardware devices (more precisely: I/O devices). That means that a certain part of the whole address range is dedicated to RAM, i.e. when the CPU tries to access that part, the request will transparently be redirected to the RAM. Another part of the whole address range might be dedicated to one of the I/O devices, etc. Take a look at Wikipedia for an example of a memory map. Assuming we have a dedicated chunk of mapped memory for the GPU, we’ll refer to that chunk as the GPU register block. Furthermore the GPU register block can be seen as a group of equally sized registers (Flipper uses 4 bytes for each register). We can refer to a register either by its index or (more commonly) by its byte offset from the beginning of the GPU register block. Each GPU register usually is linked to a specific part of GPU functionality (“execute a clear operation”) or a specific GPU configuration parameter (“color used for clearing the render target”).

Using MMIO, a GPU (or an I/O device in general) may expose certain capability to the CPU via the GPU registers. By reading from certain registers, the CPU may request information from the GPU. By writing to certain registers, the CPU may tell the GPU to execute some action (or to setup some parameters). For example, the implementation of a function like IDirect3DDevice9::Clear (see above) will write to some register X to tell the GPU what clear color to use will write to another register Y to tell the GPU to actually execute the clear operation. The nice thing about this is: glClear (OpenGL) and the clear functionality of GX work the same way internally! Of course, the GPU registers will be different on each GPU (I cheated a bit on this, IDirect3DDevice9::Clear actually sits a few layers above the GPU registers; one of them is the GPU driver which knows the GPU registers of the accessed hardware). In emulation, however, this isn’t an issue because, well, a console always uses the same hardware 😉

Note that this is the lowest-level way to access the GPU, so there’s nothing which may go wrong when emulating on register level (I’ll dedicate another blog post to explain what can go wrong).

But how do we actually emulate the Gamecube’s GPU (Flipper) or the Wii’s GPU (Hollywood)? First, let’s note that whenever I just say Flipper or Hollywood, I usually refer to both of them (unless noted otherwise). That’s because they are both using the same register set (something which is uncommon even for different GPU generations from the same vendor) and the only real difference is the clock rate and the amount of texture memory (both don’t really matter in Dolphin).

The way GPU emulation works is fairly simple once you’ve got perfect CPU emulation (ha! :P). Whenever the CPU accesses memory, the CPU emulation code checks what the accessed address is mapped to. If it’s mapped to the GPU, we’ve got a bunch of handler functions which will check what the accessed register is supposed to do and execute the proper action then. In case of an emulator like Dolphin which tries to use the GPU where appropriate for this, this includes translating the GPU registers to the available Direct3D/OpenGL API functions (using these APIs is the only way to access the GPU on PC hardware by the way because of user-mode restrictions, i.e. we can’t directly access the GPU registers from Dolphin; only GPU drivers may do so).

For example, Flipper has a range of registers called BP memory which describe the current render pipeline configuration (it’s similar to the pixel shader stage in modern GPU APIs or to render and sampler states in D3D9). Dolphin has a list of these BP registers and defines a name for each one for better readability in BPMemory.h. The actual emulation of BP registers happens in BPStructs.cpp in the BPWritten() function which gets called whenever the BP memory has been written to. It has a fairly big “switch” statement which basically just check what register was written to in order to call the proper handler function; you might notice that many register handlers don’t do anything at all (e.g. BPMEM_LOADTLUT0 which just “break”s) because they’re just general parameters which don’t need any immediate reaction but just affect behavior of later actions.

Our render target clearing example from above is chosen a bit unfortunate, but I’ll explain it anyway: If there was a GXClear function in the GX API (there isn’t), it would poke the BPMEM_CLEAR_AR and BPMEM_CLEAR_GB registers and write the alpha/red and green/blue components of the color used for clearring the back buffer there. To actually execute the clear operation the function would also poke the BPMEM_TRIGGER_EFB_COPY register. The value written to that register is described by the UPE_Copy struct in BPMemory.h: The 12th bit (named “clear” in the Dolphin code) is set to 1 (true) if a clear operation should be executed. Checking the implementation in BPStructs.cpp under the case BPMEM_TRIGGER_EFB_COPY, you can see that if (PE_copy.clear) is true, the function ClearScreen (implemented in BPFunctions.cpp) is called which does some other fancy stuff and then calls g_renderer->ClearScreen which is the backend specific implementation. Looking at the various Render.cpp files in each video backend’s source directory (e.g. OpenGL) you can see that these methods don’t do anything else than calling glClear, i.e. the respective clear function of the GPU API used for emulation!

Well, almost, because when you look at Direct3D9/11 you can see that they don’t use those functions – because they differ a bit in functionality from their GX counterpart. This is a limitation of the emulation in our hardware accelerated video backends: We can only emulate the hardware functionality which is actually exposed to us from the GPU APIs used for emulating Flipper. In case of ClearScreen a workaround is possible, but this isn’t possible for all Flipper features. That’s the nice thing about software rasterizer backends: You don’t end up debugging a strange graphics glitch just to find out that it can’t be fixed with the limited functionality exposed by Direct3D9 but you can just go ahead and fix it 😀

I’ll further elaborate on the limitations of Direct3D and OpenGL when emulating Flipper in another blog post, but that’s it for now – all you need to know about how GPU emulation works at the lowest level. Or at least a starting point for further reading 😉

  1. #1 by herpderp on 25. February 2012 - 01:57

    mind=blown, nonetheless an interesting read. I’ve noticed that you like roflstomping n00bs(read. educating) when it comes to knowledge of dolphin. You should visit 4chan’s new board /vg/. There’s always an emu thread going with hundred of posts by clueless fags trying out dolphin. You should post there lawl, cottonvibes has already shed some light there.

    btw, would it be possible to implement an ability to load shaders into d3d backends, like enbseries does, however, natively, w/o any hooking/injecting(which is limited to x86 as well : / ). KrossX has created a hack for pcsx2 that achives this by exploiting their built-in fxaa support = d. It’d be very a useful feature. Personally, I’d do some fxaa combined with sharpening and color correction, such as saturation.

    • #2 by no_cluez on 25. February 2012 - 14:49

      Heh, thanks for the pointer but I guess I’m busy enough keeping an eye on the Dolphin forums ;D

      About loading shaders into d3d backends: I guess you’re referring to post-processing shaders like in OpenGL? That’d perfectly be possible and wouldn’t require too much effort, but at the moment I’m not interested in adding support for it. However, if you happen to know some C I could tell you what code you need to look at for implementing it by yourself.

  2. #3 by herpington on 29. February 2012 - 15:35

    well, i was mostly talking about some more advanced shaders in .fx format that require no re-compilation and can be tweaked on-the-fly. They are directx exclusive as far as i know. I was reading pcsx2’s interface that’s responsible for handling it but it’s nowhere near similar to dolphin so i gave up = d. Check recent pcsx2 commits, they added a nice shader called shadeboost, really cool stuff.

  3. #4 by fake on 27. November 2012 - 05:40

    so THATS why it’s so slow. So why not have a specific driver for dolphin, and ‘translate’ between the two GPUs? Sure you’d have to re-code the translation for every GPU, but it would be a huge improvement

    • #5 by no_cluez on 19. December 2012 - 10:28

      That’s actually a very good question (and sorry for the late reply btw) :>

      Having raw access to the host GPU would solve some problems related to resource management and probably give a noticeable speedup over using the D3D or OpenGL layers. However, you’re still limited by what the host GPU actually supports. For example, Flipper supports something called “Bounding Boxes”, which basically tells you the top-left-most and bottom-right-most pixel position that has been drawn to recently. That’s a pretty exotic feature and not something that I would expect to be supported by PC GPUs at all, so you’d still not be able to emulate it correctly.

      Also, I’m not sure you’re aware of this but GPU drivers are a hell of a project to create on their own. Have a look at the open-source driver for radeon GPUs on Linux: These guys actually get all their docs directly from AMD and still need months to get GPUs working at all, and /years/ to get them running efficiently! Even with recent developments like Gallium3D, it would be a tremendous effort to get this working in an emulator project.

      That said, IMO time would be spent better if people just focused on getting a good software renderer working – you don’t suffer any feature limitations there and given the right design it should be perfectly possible to achieve full fps rendering, even with a software renderer.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

<span>%d</span> bloggers like this: