This tutorial is part of a Collection: 04. DirectX 12 - Braynzar Soft Tutorials
rate up
1
rate down
9395
views
bookmark
03. Initializing DirectX 12

This tutorial will get us started using Direct3D 12

BzTut03.rar 23.82 kb
253 downloads
####Introduction to Direct3D 12#### DirectX 12 is Microsoft's latest iteration of the DirectX APIs. With DirectX 12 comes Direct3D 12, which is the graphics API in the DirectX API collection (other API's include DirectSound, DirectInput, DirectDraw, etc). Direct3D 12 performs much better than any previous iteration of Direct3D. Direct3D provides lower level control over the graphics hardware which allows for more efficient use of threads. We are able to use multiple threads to populate command lists. Part of having more control means we are now responsible for a lot more, such as CPU/GPU synchronization and memory management. Direct3D also minimizes CPU overhead by using pre-compiled pipeline state objects and command lists (*bundles*). In the initialization stage of our application, we will create many pipeline state objects, which consist of shaders (vertex, pixel, etc) and other pipeline states (blending, rasterizer, primitive topology, etc). Then during runtime, the driver does not have to create the pipeline state when we change the state of the pipeline as it did in Direct3D 11. Instead, we provide a pipeline state object, and when we call draw, it will use the pipeline, and we do not have the overhead of creating the pipeline state on the fly. We can also create groups of commands during initialization which we can reuse over and over called *Bundles*. Another cool thing about Direct3D is that it has far fewer API calls, around 200 according to MSDN (and about one third of that do all the hard work). What we will be learning in this tutorial are the following: ##Overview of the Graphics Pipeline## *- The Compute Shader - Input Assembler (IA) Stage - Vertex Shader (VS) Stage - Hull Shader (HS) Stage - Tessellator (TS) Stage - Domain Shader (DS) Stage - Geometry Shader (GS) Stage - Stream Output (SO) Stage - Rasterizer Stage (RS) - Pixel Shader (PS) Stage - Output Merger (OM) Stage* ##Overview of how Direct3D 12 works## *- The Device - Pipeline State Objects - Command Lists - Bundles - Command Queues - Command Allocators - Resources - Descriptors (Resource Views) - Descriptor Tables - Descriptor Heaps - Root Signatures - Resource Barriers - Fences and Fence Events - Overview of Application Flow Control for Direct3D 12 - Multithreading in Direct3D 12* ##Initializing Direct3D 12## *- Creating a device - Creating a command queue - Creating a swap chain - Creating a descriptor heap - Creating a command allocator - Creating a root signature - Compiling and Creating shader bytecode - Creating a pipeline state object - Creating a command list - Creating a fence and fence event* ####Overview of the Graphics Pipeline#### *The Graphics Pipeline* is a sequence of processes, called *Stages*, that run on graphics hardware. We push data into the pipeline which runs the data through these stages to get a final 2D image representing the 3D scene. We are also able to use the graphics pipeline to stream out processed geometry from the Stream Output stage. Some of the pipeline stages can be configured (*Fixed Function*), while others can be programmed (*Programmable*). The stages that can be programmed are called *Shaders*, and they are programmed in the *.[https://msdn.microsoft.com/en-us/library/windows/desktop/bb509561(v=vs.85).aspx][High Level Shading Language (HLSL])*. +[http://www.braynzarsoft.net/image/100202][Direct3D Graphics Pipeline] The shaders in the graphics pipeline are: - .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb205146(v=vs.85).aspx][Vertex Shader] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/ff476340(v=vs.85).aspx][Hull Shader] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/ff476340(v=vs.85).aspx][Domain Shader] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb205146(v=vs.85).aspx][Geometry Shader] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb205146(v=vs.85).aspx][Pixel Shader] ##The Compute Shader##.[https://msdn.microsoft.com/en-us/library/windows/desktop/ff476331(v=vs.85).aspx][MSDN Compute Shader] The compute shader (a.k.a. Dispatch Pipeline) is used for to do extremely fast computations by expanding the processing power of the CPU by using the GPU as a sort of parallel processor. This does not have to have anything to do with graphics. For example, you could do very performance expensive operations, such as accurate collision detection, on the GPU using the compute shader pipeline. The compute shader will not be discussed in this lesson. ##Input Assembler (IA) Stage##.[https://msdn.microsoft.com/en-us/library/windows/desktop/bb205116(v=vs.85).aspx][MSDN Input Assembler Stage] The first stage of the graphics pipeline is called the Input Assembler (IA) Stage. This is a fixed function stage, which means we do not do the programming to implement it. Instead, we instruct the device to configure the IA so that it knows how to create the geometric primitives like triangles, lines or points from the data we give it in the form of buffers containing vertex and index data. We provide an *Input Layout* to the IA so that it knows how to read the vertex data. After it assembles the data into primitives, it feeds those primitives to the rest of the pipeline. The IA stage does have another function. As it's putting together the primitives, it attaches system generated values in the form of strings to the primitives (primitive id, instance id, vertex id, etc). These values are called *.[https://msdn.microsoft.com/en-us/library/windows/desktop/bb509647(v=vs.85).aspx][Semantics]*. An example of an input layout we might provide the IA could look like this: D3D12_INPUT_ELEMENT_DESC layout[] = { { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_PER_VERTEX_DATA, 0 }, }; This input layout tells the IA that each vertex in the vertex buffer has one element, which should be bound to the "POSITION" parameter in the vertex shader. It also says that this element starts at the first byte of the vertex (second argument being 0), and contains 3 floats, each being 32 bits, or 4 bytes (third parameter, DXGI_FORMAT_R32G32B32_FLOAT). We will talk more about the Input Layout in a later tutorial. ##Vertex Shader (VS) Stage##.[https://msdn.microsoft.com/en-us/library/windows/desktop/bb205146(v=vs.85).aspx#Vertex_Shader_Stage][MSDN Vertex Shader Stage] The VS is the first shader (programmable) stage, which means we have to program it ourselves. The VS Stage is what ALL the vertices go through after the primitives have been assembled in the AI. Every vertex drawn will be put through the VS. With the VS, you are able to do things like transformation, scaling, lighting, displacement mapping for textures and stuff like that. The Vertex Shader must always be implemented for the pipeline to work, even if the vertices in the program do not need to be modified. The most simple vertex shader would simply pass the vertex position on to the next stage: float4 main(float4 pos : POSITION) : SV_POSITION { return pos; } This vertex shader simply returns the input position. Notice the POSITION right after the Pos in the VS parameters. This is an example of a *Semantic*. When we create our vertex (input) layout, we specify POSITION for the position values of our vertex, so they will be sent to this parameter in the VS. You can change the name from POSITION if you want. ##Hull Shader (HS) Stage##.[https://msdn.microsoft.com/en-us/library/windows/desktop/ff476340(v=vs.85).aspx][MSDN Tesselation Stages] The HS stage is the first of three optional stages, called the *.[https://msdn.microsoft.com/en-us/library/windows/desktop/ff476340(v=vs.85).aspx][Tessellation Stages]*. The *Tesselation Stages* include the Hull Shader, Tessellator, and the Domain Shader stages. They all work together to implement something called tesselation. What tesselation does, is take a primitive object, such as a triangle or line, and divide it up into many smaller sections to increase the detail of models, and extremely fast. It creates all these new primitives on the GPU before they are put onto the screen, and they are not saved to memory, so this saves a lot of time than creating them on the CPU where they would need to be stored in memory. You can take a simple low polly model, and turn it into a very highly detailed polly using tesselation. So, back to the Hull Shader. This is another programmable stage. I'm not going to go into detail, but what this stage does is calculate how and where to add new vertices to a primitive to make it more detailed. It then sends this data to the Tessellator Stage and the Domain Shader Stage. ##Tessellator (TS) Stage##.[https://msdn.microsoft.com/en-us/library/windows/desktop/ff476340(v=vs.85).aspx][MSDN Tesselation Stages] The tessellator stage is the second stage in the tessellation process. This is a Fixed Function stage. What this stage does is take the input from the Hull Shader, and actually do the dividing of the primitive. It then passes the data out to the Domain Shader. ##Domain Shader (DS) Stage##.[https://msdn.microsoft.com/en-us/library/windows/desktop/ff476340(v=vs.85).aspx][MSDN Tesselation Stages] This is the third of three stages in the tessellation process. This is a programmable function stage. What this stage does is take the Positions of the new vertices from the Hull Shader Stage, and transform the vertices recieved from the tessallator stage to create the more detail, since just adding more vertices in the center of a triangle or line would not increase the detail in any way. Then it passes the vertices to the geometry shader stage. ##Geometry Shader (GS) Stage##.[https://msdn.microsoft.com/en-us/library/windows/desktop/bb205146(v=vs.85).aspx#Geometry_Shader_Stage][MSDN Geometry Shader Stage] This is another optional shader stage. It's also another Programmable Function Stage. It accepts primitives as input, such as 3 vertices for triangles, 2 for lines, and one for a point. It can also take data from edge-adjacent primitives as input, like an additional 2 vertices for a line, or an additional 3 for a triangle. An advantage to the GS is that it can create or destroy primitives, where the VS cannot (it takes in one vertex, and outputs one). We could turn one point into a quad or a triangle with this stage, which makes it perfect for use in a particle engine for example. We are able to pass data from the GS to the rasterizer stage, and/or though the Stream Output to a vertex buffer in memory. We'll learn more about this shader stage in a later tutorial. ##Stream Output (SO) Stage##.[https://msdn.microsoft.com/en-us/library/windows/desktop/bb205121(v=vs.85).aspx][MSDN Stream Output Stage] This Stage is used to obtain Vertex data from the pipeline, specifically the Geometry Shader Stage or the Vertex Shader Stage if there is no GS. Vertex data sent to memory from the SO is put into one or more vertex buffers. Vertex data output from the SO are always sent out as lists, such as line lists or triangle lists. Incomplete primitives are NEVER sent out, they are just silently discareded like in the vertex and geometry stages. Incomplete primitives are primitives such as triangles with only 2 vertices or a line with only one vertex. ##Rasterizer Stage (RS)##.[https://msdn.microsoft.com/en-us/library/windows/desktop/bb205125(v=vs.85).aspx][MSDN Rasterizer Stage] The RS stage takes the vector information (shapes and primitives) sent to it and turns them into pixels by interpolating per-vertex values across each primitive. It also handles the clipping, which is basically cutting primitives that are outside the view of the screen. This is decided by what we call the *Viewport*, which we can set in code. ##Pixel Shader (PS) Stage##.[https://msdn.microsoft.com/en-us/library/windows/desktop/bb205146(v=vs.85).aspx#Pixel_Shader_Stage][MSDN Pixel Shader Stage] This stage does calculations and modifies each pixel that will be seen on the screen, such as lighting on a per pixel base. It is another Programmable shader, and an optional stage. The RS invokes the pixel shader once for each pixel in a primitive. Like we said before, the values and attributes of each vertex in a primitive are interpolated accross the entire primitive in the RS. Basically it's like the vertex shader, where the vertex shader has a 1:1 mapping (it takes in one vertex and returns one vertex), the Pixel shader also has a 1:1 mapping (it takes in one pixel and returns one pixel). The job of the pixel shader is to calculate the final color of each pixel fragment. A pixel fragment is each potential pixel that will be drawn to the screen. For example, there is a solid square behind a solid circle. The pixels in the square are pixel fragments and the pixels in the circle are pixel fragments. Each has a chance to be written to the screen, but once it gets to the output merger stage, which decides the final pixel to be drawn to the screen, it will see the depth value of the circle is less than the depth value of the square, so only the pixels from the circle will be drawn. The PS outputs a 4D color value. An example of a simple Pixel Shader might look like this: float4 main() : SV_TARGET { return float4(1.0f, 1.0f, 1.0f, 1.0f); } This pixel shader sets all pixels drawn to the screen with geometry to white. Basically with this pixel shader, any geometry drawn will be completely white. ##Output Merger (OM) Stage##.[https://msdn.microsoft.com/en-us/library/windows/desktop/bb205120(v=vs.85).aspx][MSDN Output Merger Stage] The final Stage in the Pipeline is the Output Merger Stage. Basically this stage takes the pixel fragments and depth/stencil buffers and determines which pixels are actually written to the render target. It also applies blending based on the blend model and blend factor we set. The render target is a Texture2D resource which we bind to the OM using the device interface. Once scene has finished rendering onto the render target, we can call present on the swapchain to display the results! ####Overview of how Direct3D 12 works#### This is just an overview of direct3d 12. Later tutorials will get into more depth. +[http://www.braynzarsoft.net/image/100204][Overview of Direct3D 12] ##Pipeline State Objects (PSO)## .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899196(v=vs.85).aspx][MSDN Pipeline States] *Pipeline State Objects* are represented by the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788705(v=vs.85).aspx][ID3D12PipelineState] interface, and created with the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788663(v=vs.85).aspx][CreateGraphicsPipelineState()] method by the device interface. To set a *Pipeline State Object*, you can call the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903918(v=vs.85).aspx][SetPipelineState()] method of a *Command List*. This interface is part of what makes Direct3D 12 perform so well. During initialization time you will create many of these *Pipeline State Objects*, then setting them with a *Command List* takes very little CPU overhead, since the pipeline state object is already created by the time it is set, and setting it on the GPU is as simple as passing a pointer. There is no limit to how many of these you can create. When creating a *Pipeline State Object*, you must fill out a .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770370(v=vs.85).aspx][D3D12_GRAPHICS_PIPELINE_STATE_DESC] structure. This structure will determine the state of the pipeline when the *Pipeline State Object* is set. Most of the pipeline states can be set in the *Pipeline State Object*, but there are a couple that cannot be set in the *Pipeline State Object*, and are instead set by a *Command List*. **States that can be set in the pipeline state object** - Shader bytecode for vertex, pixel, domain, hull, and geometry shaders (.[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770405(v=vs.85).aspx][D3D12_SHADER_BYTECODE]) - The stream output buffer (.[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770410(v=vs.85).aspx][D3D12_STREAM_OUTPUT_DESC]) - The blend state (.[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770339(v=vs.85).aspx][D3D12_BLEND_DESC]) - The rasterizer state (.[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770387(v=vs.85).aspx][D3D12_RASTERIZER_DESC]) - The depth/stencil state (.[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770356(v=vs.85).aspx][D3D12_DEPTH_STENCIL_DESC]) - The input layout (.[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770378(v=vs.85).aspx][D3D12_INPUT_LAYOUT_DESC]) - The primitive topology (.[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770385(v=vs.85).aspx][D3D12_PRIMITIVE_TOPOLOGY_TYPE]) - The number of render targets (this tutorial we have 2 for double buffering, but you could use 3 for tripple buffering. swap-chains have a 3 queued frames limit before DXGI will start blocking in present()) - Render Target View formats (.[https://msdn.microsoft.com/en-us/library/windows/desktop/bb173059(v=vs.85).aspx][DXGI_FORMAT]) - Depth Stencil View format (.[https://msdn.microsoft.com/en-us/library/windows/desktop/bb173059(v=vs.85).aspx][DXGI_FORMAT]) - Sample description (.[https://msdn.microsoft.com/en-us/library/windows/desktop/bb173072(v=vs.85).aspx][DXGI_SAMPLE_DESC]) **States that are set by the Command List** - Resource Bindings (includes .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn986883(v=vs.85).aspx][vertex buffers], .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn986882(v=vs.85).aspx][index buffers], .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn986886(v=vs.85).aspx][stream output targets], .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn986884(v=vs.85).aspx][render targets], .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903908(v=vs.85).aspx][descriptor heaps], and graphics root arguments) - .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903900(v=vs.85).aspx][Viewports] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903899(v=vs.85).aspx][Scissor Rectangles] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903886(v=vs.85).aspx][Blend factor] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903887(v=vs.85).aspx][Depth/Stencil reference value] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903885(v=vs.85).aspx][Primitive topology order and adjacency type] Pipeline states that were set by a *Pipeline State Object* are not inherited by *Command Lists* (pipeline states that were set by *Pipeline State Objects* from previous command lists when a *Command Queue* executes more than one *Command List* at a time are not inherited by the next *Command List* in the queue) or *Bundles* (pipeline states set by *Pipeline State Objects* from the calling *Command List* are not inherited by *Bundles*). The initial graphics pipeline state for both *Command Lists* and *Bundles* are set at creation time of the *Command List* or *Bundle*. Pipeline states that were not set by a *Pipeline State Object* also are not inherited by *Command Lists*. *Bundles* on the other hand, inherit all graphics pipeline states that are not set with a *Pipeline State Object*. When a *Bundle* changes the pipeline state through a method call, that state persists back to the *Command List* after the *Bundle* has finished executing. The default graphics pipeline states not set by *Pipeline State Objects* for *Command Lists* and *Bundles* are: - The Primitive Topology is set to .[https://msdn.microsoft.com/en-us/library/windows/desktop/ff728726(v=vs.85).aspx#D3D_PRIMITIVE_TOPOLOGY_UNDEFINED][D3D_PRIMITIVE_TOPOLOGY_UNDEFINED] - Viewports are set to all zeros - Scissor Rectangles are set to all zeros - Blend factor is set to zeros - Depth/Stencil reference value is set to zeros - Predication is disabled You can set the pipeline state for a *Command List* back to defaults by calling the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903847(v=vs.85).aspx][ClearState] method. If you call this method on a *Bundle*, the call to the command lists "close()" function will return *E_FAIL*. Resource bindings that are set by *Command Lists* are inherited by *Bundles* that the command list executes. Resource bindings that are set by *Bundles* also stay set for the calling command list when the bundle finishes executing. ##The Device## The device is represented by the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788650(v=vs.85).aspx][ID3D12Device] interface. The device is a virtual adapter which we use to create command lists, pipeline state objects, root signatures, command allocators, command queues, fences, resources, descriptors and descriptor heaps. Computers may have more than one GPU, so we can use a DXGI factory to enumerate the devices and find the first device that is feature level 11 (compatible with direct3d 12) that is not a software device. One of Direct3D's biggest features is that it's a lot more compatible with multi-threaded applications. In this tutorial we will only create one device which is the first one we find that is compatible with direct3d 12, but we could actually find all compatible devices and use them all. Once we find the adapter index we want to use, we create a device by calling .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770336(v=vs.85).aspx][D3D12CreateDevice()]. ##Command Lists (CL)## .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899205(v=vs.85).aspx][MSDN Command Lists and Bundles] +[http://www.braynzarsoft.net/image/100206][Command Lists] Command lists are represented by the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770465(v=vs.85).aspx][ID3D12CommandList] interface, and created with the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788656(v=vs.85).aspx][CreateCommandList()] method by the device interface. We use *Command Lists* to allocate commands we want to execute on the GPU. Commands may include setting the pipeline state, setting resources, transitioning resource states (*Resource Barriers*), setting the vertex/index buffer, drawing, clearing the render target, setting the render target view, executing *bundles* (groups of commands), etc. Command lists are associated with a *Command Allocator*, which store the commands on the GPU. When we first create a command list, we need to specify what kind of command list it is using a .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770348(v=vs.85).aspx][D3D12_COMMAND_LIST_TYPE] flag, and provide a command allocator the list is associated with. There are 4 types of command lists; direct, bundle, compute, and copy. We talk about direct and bundle command lists in this tutorial. A direct command list is a command list that the GPU can execute. Direct command lists need to be associated with a direct command allocator (command allocator created with the D3D12_COMMAND_LIST_TYPE_DIRECT flag). To set a *Command List* to the recording state, we call the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903895(v=vs.85).aspx][]Reset()] method of the command list, providing a *Command Allocator* and a *Pipeline State Object*. Passing *NULL* as an argument for the *Pipeline State Object* is valid, and will set a default pipeline state. When we finish populating the command list, we must call the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903855(v=vs.85).aspx][close()] method to set the command list in a not recording state. After we call close we are able to use a *Command Queue* to execute the command list. As soon as we execute a command list, we are able to reset it, even if the GPU is not finished with it (the commands running on the GPU are stored by the *Command Allocator* once we call execute). This allows us to reuse the memory allocated to the command list (on the CPU side, not the GPU side where commands are stored in memory by the *Command Allocator*). We will do this in this tutorial, and in the multithreading section I will explain this a little better. ##Bundles## .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899205(v=vs.85).aspx][MSDN Command Lists and Bundles] *Bundles* are represented by the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770465(v=vs.85).aspx][ID3D12CommandList] interface, the same as *Direct Command Lists*, the only difference is when creating a bundle (by calling the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788656(v=vs.85).aspx][CreateCommandList()] method), you create it with the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770348(v=vs.85).aspx][D3D12_COMMAND_LIST_TYPE_BUNDLE] flag, rather than the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770348(v=vs.85).aspx][D3D12_COMMAND_LIST_TYPE_DIRECT] flag. Bundles are a group of commands that are reused frequently. They are useful because most of the CPU work involved with the group of commands is done at bundle creation time. For the most part, *Bundles* are the same thing as *Command Lists*, except they can only be executed by a *Direct Command List*, while a *Direct Command Lists* can only be executed by a *Command Queue*. Command Lists *can* be reused, but the GPU must be finished executing that command list before calling execute on that command list again. In practice, it is pretty unlikely you will reuse a command list, as the scene will change from frame to frame, which means the command list will change from frame to frame. Nvidia has a nice article on Direct3D best practices (.[https://developer.nvidia.com/dx12-dos-and-donts][DX12 Do's And Don'ts]), and relaying their suggestion, a bundle should only have up to around 12 commands, otherwise if you add too many commands, the reusability of the bundle takes a hit, meaning you will not be able to reuse it as often. It's better to create many small bundles you can reuse often, rather than a couple big bundles that you can't reuse often, as the whole point of bundles are reusable groups of commands. *Bundles* cannot be executed directly from a *Command Queue*. You can execute a bundle on a *Command List* by calling .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903882(v=vs.85).aspx][ExecuteBundle()] from a *Direct Command List*. *Bundles* do not inherit pipeline state set by the calling *Direct Command List*. ##Command Queues (CQ)## .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899124(v=vs.85).aspx][MSDN Command Queues] +[http://www.braynzarsoft.net/image/100207][Command Queues] *Command Queues* are represented by the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788627(v=vs.85).aspx][ID3D12CommandQueue] interface, and created with the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788657(v=vs.85).aspx][CreateCommandQueue()] method of the device interface. We use the command queue to submit *Command Lists* to be executed by the GPU. *Command Queues* are also used to update resource tile mappings. ##Command Allocators (CA)## .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899205(v=vs.85).aspx][MSDN Command Allocators] *Command Allocators* are represented by the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770463(v=vs.85).aspx][ID3D12CommandAllocator] interface, and created with the .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788655(v=vs.85).aspx][CreateCommandAllocator ()] method of the device interface. *Command Allocators* represent the GPU memory that commands from *Command Lists* and *Bundles* are stored in. Once a *Command List* has finished executing, you may call .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770464(v=vs.85).aspx][reset()] on the *Command Allocator to free memory. Although *reset()* may be called on a *Command List* immediately after a *Command Queue* calls execute with it, The *Command List* associated with the *Command Allocator* **must** be completely finished executing on the GPU before we call *reset()*, otherwise the call will fail. This is because the GPU may be executing commands stored in the memory represented by the *Command Allocator*. This is where our application must use *Fences* to synchronize the CPU and GPU. Before we call *reset()* on a *Command Allocator*, we must check the fence to make sure the *Command List* associated with the *Command Allocator* has finished executing. Only one *Command List* associated with a *Command Allocator* can be in the recording state at any time. This means that for each thread populating command lists, you will want *at least* one *Command Allocator* and *at least* one *Command List*. ##Resources##.[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899206(v=vs.85).aspx][MSDN Resource Binding] Resources contain the data used to build your scene. They are chunks of memory that store geometry, textures, and shader data, where the graphics pipeline can access them. **Resource Types** are the type of data that the resource contains. **Resource Types** - .[https://msdn.microsoft.com/en-us/library/windows/desktop/ff471517(v=vs.85).aspx][Texture1D] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/ff471518(v=vs.85).aspx][Texture1DArray] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/ff471525(v=vs.85).aspx][Texture2D] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/ff471526(v=vs.85).aspx][Texture2DArray] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/ff471540(v=vs.85).aspx][Texture2DMS] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/ff471541(v=vs.85).aspx][Texture2DMSArray] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/ff471562(v=vs.85).aspx][Texture3D] - .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788709(v=vs.85).aspx][Buffers (ID3D12Resource )] **Resource References/Views** - Constant buffer view (CBV) - Unordered access view (UAV) - Shader resource view (SRV) - Samplers - Render Target View (RTV) - Depth Stencil View (DSV) - Index Buffer View (IBV) - Vertex Buffer View (VBV) - Stream Output View (SOV) ##Descriptors (Resource Views)## .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899109(v=vs.85).aspx][MSDN Descriptors] +[http://www.braynzarsoft.net/image/100208][Descriptors] *Descriptors* are are a structure which tells shaders where to find the resource, and how to interpret the data in the resource. You can look at descriptors in D3D12 as you looked at resource views in D3D11. You might create multiple descriptors for the same resource because different stages of the pipeline may use it differently. For example, we create a Texture2D resource. We create a Render Target View (RTV) so that we can use that resource as the output buffer of the pipeline (bind the resource to the Output Merger (OM) stage as the RTV). We can also create a Unordered Access View (UAV) for that same resource, which we can use as a shader resource and texture our geometry with (you might do this for example if there is a security camera in the scene somewhere. We render the scene the camera sees onto the resource (RTV), then we render that resource (UAV) onto a tv in a security room). Descriptors can ONLY be placed in *Descriptor Heaps*. There is no other way to store descriptors in memory (except for some root descriptors, which can only be CBV's, and raw or structure UAV or SRV buffers. Complex types like a Texture2D SRV cannot be used as a root descriptor). ##Descriptor Tables (DT)## .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899113(v=vs.85).aspx][MSDN Descriptor Tables] +[http://www.braynzarsoft.net/image/100209][Descriptor tables] *Descriptor Tables* are an array of descriptors inside a descriptor heap. All a descriptor table is, is an offset and length into a descriptor heap. Shaders can access descriptors in a descriptor heap through the *Root Signature's* descriptor tables by index. So to access a descriptor in a shader, you will index into the root signatures descriptor tables. CBV's, UAV's, SRV's and Samplers are stored in descriptor heaps and can be referenced by descriptors by shaders. RTV's, DSV's, IBV's, VBV's and SOV's are not referenced through descriptor tables, but instead bound directly to the pipeline. The MSDN docs are a little bit confusing on a part of this, and so to tell the truth, i'm not completely sure about this, but MSDN says that these are not stored in descriptor heaps, but thats not completely true for RTV's, DSV's and SOV's, since you need to create a heap and descriptors for them. As far as I understand, there is no other way to create them. ##Descriptor Heaps (DH)## .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899110(v=vs.85).aspx][MSDN Descriptor Heaps] *Descriptor Heaps* are represented by the interface ****ID3D12DescriptorHeap**** and are created with the method ****ID3D12Device::CreateDescriptorHeap()****. Descriptor Heaps are a list of descriptors. They are a chunk of memory where the descriptors are stored. Samplers cannot go into the same descriptor heaps as resources. Descriptor heaps can also be **Shader Visible** or **Non-Shader Visible**. Shader Visible descriptor heaps are heaps that contain descriptors that shaders can access. These types of heaps may include CBV, UAV, SRV, and Sampler descriptors. Non-Shader Visible descriptor heaps are heaps that the shaders cannot reference. These types of heaps include RTV, DSV, IBV, VBV, and SOV resource types. A normal map might have three descriptor heaps, one for samplers, one for shader visible resources, and one for non shader visible resources. This tutorial will only have one descriptor heap, which stores the descriptor for the render target views. The next tutorial will have two, one for the render target views, and one for the vertex buffer view (we will be drawing a triangle in the next tutorial). Only one shader visible heap and one sampler heap can be bound to the pipeline at any given time. You want the descriptor heap to have the correct descriptors for the largest ammount of time as possible (according to MSDN, "ideally an entire frame of rendering or more"). The descriptor heap must have enough space to define descriptor tables on the fly for every set of state needed. To do this, you can reuse descriptor space when the state of the pipeline changes (For example, you are rendering a tree, you have a descriptor in the descriptor heap that points to a UAV of tree bark while you are drawing the base of the tree. the pipeline changes when you need to draw the leaves of the tree, so you reuse the UAV of the tree bark texture by replacing it with a UAV of the leaf texture). D3D12 allows you to change the descriptor heap multiple times in a command list. This is useful because older and low power GPUs have only 65k of storage space for descriptor heaps. Changing a descriptor heap causes the GPU to "flush" the current descriptor heap, which is an expensive operation, so you want to do this as infrequently as possible, and while the GPU is not doing a lot of work, such as at the beginning of a command list. Bundles are only allowed to call SetDescriptorHeaps once, and the descriptor heap that is being set by this command MUST exactly match the descriptor heap that the command list which called the bundle has set. There are a couple ways to manage descriptor heaps, here are two (These are mentioned in MSDN docs): **Basic Method** (inneficient, but very easy to implement) The first way, and most basic way, is right before the draw call, you add all the descriptors you need for the draw to the descriptor heap, then set a descriptor table in the root signature to point to the new descriptors. This way is nice because there is no need to keep track of all the descriptors in the descriptor heap. However, since we are adding all the descriptors needed for a draw call to the heap, we will have a lot of repeating descriptors in the heap, making this method quite inneficient, especially if similar objects or scenes are being rendered. The reason we must add new descriptors to free space in the descriptor heap, rather that overwriting descriptors that are already in the descriptor heap from previous draws, is because the GPU can actually do multiple draw calls at the same time, meaning that the descriptors already in the descriptor heap may by in use by the time we start overwriting them for the current draw. With this method, you may want an additional descriptor heap for one or both of the following reasons: The scenes are large and complex, and you run out of descriptor space, or there may be synchronization issues, so you have one descriptor heap that the GPU reads from, and another for the CPU to be filling out while the GPU is executing a command list, then you would swap these two heaps every frame. **Second Method** (much more efficient, more difficult to implement) Another method is to keep track of the index of each descriptor in a descriptor heap. This way is efficient because you can reuse descriptors for similar objects and scenes. This way is efficient because there will be very little to no repetition of descriptors in the descriptor heap. The downside to this method is that it is a bit more complex to implement. If you have a small enough scene, and you don't have resources that change throughout the scene, you could actually create one giant descriptor table and just flush the thing when the scene ends and you need to reload new resources. This would work if the only things that change throughout the scene are root constants and root descriptors. the descriptor table (defined in the root signature) would be stay the same throughout the scene. A couple other ways to make some optimizations is have two descriptor tables in your descriptor heap. One descriptor table has resources that do not change throughout your scene, while the other has resources that change frequently. Another thing you will want to do is make sure root constants and root descriptors contain constants and descriptors that change most frequently. ##Root Signatures (RS)## .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899208(v=vs.85).aspx][MSDN Root Signatures] +[http://www.braynzarsoft.net/image/100210][Root signatures] *Root Signatures* define the data (resources) that shaders access. Root signatures are like a parameter list for a function, where the function is the shaders, and the parameter list is the type of data the shaders access. Root signatures contain **Root Constants**, **Root Descriptors**, and **Descriptor Tables**. A **Root Parameter** is one entry, being either a root constant, root descriptor, or descriptor table, into the root signature. The actual data of root parameters, which the application can change, are called **Root Arguments**. The maximum size of a root signature is always **64 DWORDS**. **Root Constants** *Root Constants* are inline 32-bit values (they cost **1 DWORD**). These values are stored directly inside the root signature. Because memory is limited for root signatures, you want to store only the most often changed constant values shaders access here. These values show up as a constant buffer to shaders. There is no cost to access these variables from shaders (no redirection), so accessing them is very fast. **Root Descriptors** *Root Descriptors* are inlined descriptors that are accessed most often by the shaders. These are 64-bit virtual addresses (2 DWORDs). These descriptors are limited to CBV's and raw or structured SRV's and UAV's. Complex types like Texture2D SRV's cannot be used. There is a cost of one redirection when referencing Root Descriptors from shaders. Another thing to note about Root Descriptors, is they are only a pointer to the resource, they do not include a size of the data, which means there can be no out of bounds checking when accessing resources from root descriptors, unlike descriptors stored in a descriptor heap, which do include a size, and where out of bounds checking can be done. **Descriptor Tables** Talked about above, *Descriptor Tables* are an offset and a length into a descriptor heap. Descriptor tables are only 32-bits (1 DWORD). There is no limit to how many descriptors are inside a descriptor table (except indirectly the number of descriptors that can fit in the maximum allowed descriptor heap size). There is a cost of two indirections when accessing resources from a descriptor table. The first indirection is from the descriptor table pointer to the descriptor stored in the heap, then from the descriptor heap to the actual resource. ##Resource Barriers## .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899226(v=vs.85).aspx][MSDN Resource Barriers] *Resource Barriers* are used to change the state or usage of a resource or subresources. Direct3D 12 introduces Resource Barriers as part of its multi-threaded friendly API. Resource Barriers are used to help synchronize the use of resources between multiple threads. There are three types of resource barriers: *Transition Barrier*, *Aliasing Barrier*, and *Unordered Access View (UAV) Barrier*. **Transition Barrier** A *Transition Barrier* is used when you want to transition the state of a resource or subresources from one state to another. An example of when you would change the state of a resource is when you change the resource from a render target state to a present state before flipping the swapchain. **Aliasing Barrier** *Aliasing Barriers* are used with .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn786477(v=vs.85).aspx][Tiled Resources]. These barriers are used to change the usages of two different resources that have mappings into the same tile pool (from msdn). At this time, i do not have a thorough understanding of tiled resources, so i won't try to explain it here. **Unordered Access View (UAV) Barrier** *UAV Barriers* are used to make sure that all Read/Write are finished by the time this barrier is called. This is so that, for example, if a UAV is being written to, then there is a draw call, that the writting to the UAV is finished before the draw call is executed. There is no need to create a UAV barrier between two draw or dispatch calls that only read from the UAV. It is also not needed if the same UAV is written to by two different draw or dispatch calls as long as the application knows for certain that one is completely finished before the other begins. You might use a UAV Barrier on a UAV if you are drawing to a texture, then using that texture to draw on a model. The UAV Barrier will make sure that the calls that draw to the UAV are finished before you use it as a texture on a model. ##Fences and Fence Events## Part of DirectX 12 being "closer to the metal", is the fact we can send a command queue to the GPU to start execution, and can then immediately start work again on the CPU. To make sure we don't modify or delete content that the GPU is currently using, we use fences. Fences and Fence Events will let us know where the GPU is at in its execution of the command queue. In this app, what we will do is tell the GPU to execute the command queue, update our game logic, check/wait for the GPU to finish executing the command queue, update the pipeline by querying more commands into the command list, then again execute the command queue. This works by first populating the command list with commands, executing the command queue, signaling the command list to set the fence value to a specified value, then check if the fence value is the value that we told the command list to set it to. if it is, we know the command list has completed its list of commands and can reset the command list and queue, and repopulate the command list. If the fence value is still not what we signaled it to be, we then create a fence event and wait for the GPU to signal that event. Fences are represented by the ****ID3D12Fence**** interface, while a fence event is a handle, ****HANDLE****. Fences are created by the device using the method ****ID3D12Device::CreateFence()****, and fence events are created with the ****CreateEvent()**** method. ##Overview of Application Flow Control for Direct3D 12## +[http://www.braynzarsoft.net/image/100211][Direct3D 12 Lifecycle] Of course there are many ways to do things, this is a typical outline of a direct3d application: 1. Initialize Application 2. Start Main Loop 3. Setup Scene (if new scene) 4. Update game logic 5. Load/Release resources if needed 6. Wait for GPU 7. Reset command allocator and command list 8. Populate command list(s) 9. Wait for command list threads if multi-threaded 10. Execute command list(s) 11. go to 3 Now to explain in a little more detail each phase **1. Initialize Application** This phase may include things like: - Load setting from file or database - Making sure this is the only instance of the application - Checking for updates - Checking memory requirements - Checking liscense (such as trial version, demo version, or even if it were a pirated version) - Creating a window - Initializing the scripting engine - Setting up resource manager - Set up audio system - Set up networking - Set up controllers - Initializing Direct3D which include the following: - Setting up descriptor heaps (Descriptor heap manager if you've got one) - Setting up command lists - Setting up RTV's - Setting up command allocators - Setting up all pipeline state objects (you will have many of these) - Setting up all Root signatures (generally you'll only need one) **2. Start Main Loop** This is where you pretty much just start your main loop, checking for windows messages and if there are none, continue updating your game **3. Setup Scene** This is inside the main loop of course because you may have many scenes in your game. You can do whatever you want of course, but usually you do not have to exit the main loop when the scene changes) This phase includes things like: - Load in resources that are needed throughout the scene (things that won't be released until the scene is changed or the player quits). this includes textures, geometry, text, etc. - Load in initial resources (these are resources that you will need immediately in the scene, like if you start inside a room, you will load in the rooms texture and any models inside that room. these resources may be release once you leave the room). These may be the same as the item above if your scene is small enough to load in every resource needed in the entire scene. - Setting up the camera, along with the initial viewport, and view and projection matrices. - Set up Command Bundles that may be needed throughout the scene **4. Update game logic** This is the heart of the game really. This is where you will do things like update A.I., Check for input from network or user, update objects in the scene, such as position and animation. You know, just updating the game logic. (I'm leaving updating the other systems out for now, such as audio, a.i., network, controller, animation, etc.) **5. Load/Release resources if needed** This is where your resource manager comes in. If an object has entered the scene that has a texture that you do not have loaded, you can load that here. If an object has left the scene, you can also unload that here. You could put the resource manager on a separate thread if you'd like. One way you could do this is if an object enters the scene, the game logic lets the resource manager know. the resource manager will start loading the textures needed while the main loop continues. if the texture has not been loaded by the time the object is being drawn, the resource manager provides a temporary or default texture. This is useful for games that have open worlds. the same as when an object leaves the scene, rather than the main loop waiting for the resource manager to release the resources before continuing, it lets the resource manager (on a separate thread) know which resources are not needed (usually be a reference count reaching zero), and continues while the resource manager releases the resources. **6. Wait for GPU** Most likely you will be double or tripple buffering, which means you will have AT LEAST 2 - 3 command allocators. The reason for this is because command allocators cannot be reset while a command list associated with it is being executed (command lists on the other hand, can be reset as soon as you execute them using the command queue). This means that for each frame, at this point, right before you reset the command allocator, you will check that the GPU has finished executing the command list associated with this command allocator. You will use fences and fence events for this. You will also need a fence for each frame, and a fence event for each thread. When you execute the command list on frame "f", the next step is to wait for the GPU to finish with frame "f+1". for tripple buffering, it will look like this: - wait for GPU to finish with frame 1 - render frame 1 - wait for GPU to finish with frame 2 - render frame 2 - wait for GPU to finish with frame 3 - render frame 3 - wait for GPU to finish with frame 1 - render frame 1 **7. Reset command allocator and command list** After you have finished waiting for the GPU to finish with the command allocator you are about to use, you reset it, along with resetting the command list. You don't ALWAYS have to reset the command list if absolutely nothing has changed from the previous frame, but this is almost NEVER the case. If there are sequences of commands that you know get repeated often, you put them in a bundle, then execute the bundle on the command list. **8. Populate command list(s)** This includes the majority of things you want the GPU to do, such as binding resources like vertex and index buffers, textures, creating descriptors, setting pipeline state, using resource barriers, setting fence values, etc. **9. Wait for command list threads if multi-threaded** If you have a multi-threaded application, you may want to populate command lists on separate threads. Only one thread can access a command list at a time, so each thread will need it's own command list, along with its own fences, fence event, and command allocators. You call execute an array of command lists with the command queue, so the main thread will wait until the command list threads have finished populating their command lists. it will then put the command lists in an array if they are not already, and execute them with the command queue. mutli-threading is talked below. **10. Execute Command Lists** This is where you call ****ExecuteCommandLists()**** on the command queue to render your scene. ##Multithreading in Direct3D 12## I felt i needed to have a quick word on the structure of an application that takes advantage of multi-threading, as that's where the real power of Direct3D 12 comes from. It's actually pretty simple how it works out. It goes kind of like this: 1. Initialize Application (including d3d and everything else) 2. Start Main Loop 3. Update Game Logic 4. Spawn multiple threads 5. Each thread waits for GPU to finish executing previous frame's command list 6. Each thread resets previous frames command allocator 7. Each thread resets it's command list 8. Each thread fills out it's command list 9. Main thread waits for command list threads to finish filling out their command lists 10. Execute command queue with the an array of the finished command lists 11. go to 3. *Each thread gets it's own command list* This is where it gets interesting for a multi-threaded application. You need to somehow logically split your scene so that each thread can populate a command list for part of the scene. There are a couple ways you can do this, but keep in mind that when executing the command lists, they are executed in the order that you provide them in the array. One thing you always want to do to get the most performance out of your application, is group entities by pipeline state. so if two objects in the scene use a specific PSO, you would want to try to draw them together, so that you only need to change the PSO once for them. If they are not drawn together, you may have to change the PSO twice, once for each object. This is not always the best way to group commands though. Due to certain things like transparency, you will almost always need to draw your scene from far away to close to the camera. If those two objects sharing the same PSO were windows, and one window was far away, and one was in front of the camera, if you were to group those two objects together because they shared the same PSO, the scene would be rendered wrong. if you drew them before anything else, nothing would appear behind the front window. The first thing you want to group by is most likely distance from the camera. you might have a command list that draws the far away objects, then a command list that draws the close up objects, then a command list to draw the far away background landscape and sky, then a command list to draw the user interface such as health status, and another command list for post-processing the frame. *Command Allocators in multithreading* Command Allocators can only have one command list recording at any time. This means that for each thread, you must have a separate command list. You cannot reset a command allocator while a command list is being executed, which means you will need command allocators for each frame buffer (double buffering you need 2, tripple buffering you need 3). Because of the above, the number of command allocators in your program need to be: **NumberOfThreads * NumberOfFrameBuffers** If your application has 2 threads to fill out command allocators, and you are using tripple buffering, you will need 2 * 3=6 command allocators. *Descriptor Heap Management* ####Initializing Direct3D 12#### In this tutorial, we are only setting up Direct3D 12, and just so we can see something, we use the command list to clear the render target. Because of this, we do not need anything other than the default pipeline state, so pipeline state objects (PSOs) and Root Signatures are not used in this tutorial. In the next tutorial, we will be drawing a triangle, where we will need to set up a PSO and Root Signature. **Declarations** The first new stuff we have from the last tutorial are a bunch of declarations. The first chunk of these are interfaces and variables we need to interact with the GPU, while the second part are new functions. We will talk more about most of these when we start using them. We are going to use tripple buffering (3 frame buffers), and that is the suggestion i give to you to use in your app. There is really no need to even give players an option between double and tripple buffering, you might as well just give them tripple buffering automatically. The number of some of these objects depends on how many frame buffers and threads you have. These objects include: Render Targets: Number of frame buffers Command Allocators: Number of frame buffers * number of threads Fences: Number of threads Fence Values: Number of threads Fence Events: Number of threads Command Lists: Number of threads // direct3d stuff const int frameBufferCount = 3; // number of buffers we want, 2 for double buffering, 3 for tripple buffering ID3D12Device* device; // direct3d device IDXGISwapChain3* swapChain; // swapchain used to switch between render targets ID3D12CommandQueue* commandQueue; // container for command lists ID3D12DescriptorHeap* rtvDescriptorHeap; // a descriptor heap to hold resources like the render targets ID3D12Resource* renderTargets[frameBufferCount]; // number of render targets equal to buffer count ID3D12CommandAllocator* commandAllocator[frameBufferCount]; // we want enough allocators for each buffer * number of threads (we only have one thread) ID3D12GraphicsCommandList* commandList; // a command list we can record commands into, then execute them to render the frame ID3D12Fence* fence[frameBufferCount]; // an object that is locked while our command list is being executed by the gpu. We need as many //as we have allocators (more if we want to know when the gpu is finished with an asset) HANDLE fenceEvent; // a handle to an event when our fence is unlocked by the gpu UINT64 fenceValue[frameBufferCount]; // this value is incremented each frame. each fence will have its own value int frameIndex; // current rtv we are on int rtvDescriptorSize; // size of the rtv descriptor on the device (all front and back buffers will be the same size) // function declarations bool InitD3D(); // initializes direct3d 12 void Update(); // update the game logic void UpdatePipeline(); // update the direct3d pipeline (update command lists) void Render(); // execute the command list void Cleanup(); // release com ojects and clean up memory void WaitForPreviousFrame(); // wait until gpu is finished with command list ##WinMain()## In our main function, we need to initialize Direct3D. If the initialization fails, we will present a message and close our application. After our main loop exits (Running is false), we need to wait for the GPU to finish up with whatever its doing (WaitForPreviousFrame()) before we release our resources and COM objects. We also need to close our fence event handle. ... // initialize direct3d if (!InitD3D()) { MessageBox(0, L"Failed to initialize direct3d 12", L"Error", MB_OK); Cleanup(); return 1; } ... // we want to wait for the gpu to finish executing the command list before we start releasing everything WaitForPreviousFrame(); // close the fence event CloseHandle(fenceEvent); ... ##InitD3D()## Here's the big part of the tutorial. This is where we will initialize Direct3D 12. bool InitD3D() { HRESULT hr; ##Creating the Direct3D Device## The first thing we need to do to initialize Direct3D 12, is create the device. We may have more than one compatible device, so we will just choose the first device that is compatible with feature level 11 (directx 12) and that is NOT a software device. After we find the adapter (the actual device) we create the direct3d 12 device by calling the method ****D3D12CreateDevice()****. HRESULT WINAPI .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770336(v=vs.85).aspx][D3D12CreateDevice]( _In_opt_ IUnknown *pAdapter, D3D_FEATURE_LEVEL MinimumFeatureLevel, _In_ REFIID riid, _Out_opt_ void **ppDevice ); D3D12CreateDevice() has 3 parameters: - **pAdapter** - *the first parameter is a pointer to the adapter (GPU) we would like our Direct3D 12 device to use* - **MinimumFeatureLevel** - *the second parameter is the feature level we would like the device to use* - **riid** - *the third parameter is the type id of the interface we want to store our device in* - **ppDevice** - *this is a pointer to a pointer to a device interface. By giving our device (reference to our device casted to a void pointer to a pointer) here, once this function completes it will point our device interface to a block of memory that is (as far as we are concerned) the actual device* you will notice that to find an adapter that is compatible, we call D3D12CreateDevice() with a NULL fourth parameter. this is so we do not create a device quite yet, because we want to make sure that this method succeeds before we create the device. if it succeeds, we know we have an adapter (GPU) that supports feature level 11. If we do not find an adapter, our Direct3D initialization has failed, so we return false to let our main function know to close the application. // -- Create the Device -- // IDXGIFactory4* dxgiFactory; hr = CreateDXGIFactory1(IID_PPV_ARGS(&dxgiFactory)); if (FAILED(hr)) { return false; } IDXGIAdapter1* adapter; // adapters are the graphics card (this includes the embedded graphics on the motherboard) int adapterIndex = 0; // we'll start looking for directx 12 compatible graphics devices starting at index 0 bool adapterFound = false; // set this to true when a good one was found // find first hardware gpu that supports d3d 12 while (dxgiFactory->EnumAdapters1(adapterIndex, &adapter) != DXGI_ERROR_NOT_FOUND) { DXGI_ADAPTER_DESC1 desc; adapter->GetDesc1(&desc); if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE) { // we dont want a software device adapterIndex++; // add this line here. Its not currently in the downloadable project continue; } // we want a device that is compatible with direct3d 12 (feature level 11 or higher) hr = D3D12CreateDevice(adapter, D3D_FEATURE_LEVEL_11_0, _uuidof(ID3D12Device), nullptr); if (SUCCEEDED(hr)) { adapterFound = true; break; } adapterIndex++; } if (!adapterFound) { return false; } Once we have found a hardware adapter that is compatible with feature level 11.0 (Direct3D 12), we create our device. You may be wondering why there are only 3 arguments here, and the third one has a reference to our device interface, but when you look at the D3D12CreateDevice() parameters, you see that the third parameter should be a REFIID, or the type of our interface. What is actually happening here is a macro is used which provides two parameters in one, **IID_PPV_ARGS**. You will see this used throughout the tutorial code, and throughout MSDN code. This is just to make it easier, but in face, we can do without it. First I will show you the macro: #define IID_PPV_ARGS(ppType) __uuidof(**(ppType)), IID_PPV_ARGS_Helper(ppType) You can see the first is the uuid of our device, which would be ID3D12Device. The second is actually a template method, which is defined as: template<typename T> _Post_equal_to_(pp) _Post_satisfies_(return == pp) void** IID_PPV_ARGS_Helper(T** pp) { #pragma prefast(suppress: 6269, "Tool issue with unused static_cast") static_cast<IUnknown*>(*pp); // make sure everyone derives from IUnknown return reinterpret_cast<void** >(pp); } basically what this method is doing is making sure that the interface we have provided is derived from IUnknown. What we can do instead of using the IID_PPV_ARGS macro, is this: D3D12CreateDevice( adapter, D3D_FEATURE_LEVEL_11_0, _uuidof(ID3D12Device), reinterpret_cast<void** >(&device) ); But this tutorial will use the IID_PPV_ARGS macro for consistency and slightly less code (the above does not have the extra type safety as IID_PPV_ARGS provides). This is how we will create the device: // Create the device hr = D3D12CreateDevice( adapter, D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&device) ); if (FAILED(hr)) { return false; } ##Creating the RTV Command Queue## This is where we create a command queue for our device. We will use this command queue to execute command lists, which contain commands that tell the GPU what to do. To create a command queue, we call the **CreateCommandQueue()** method of our device interface. This method looks like this: HRESULT .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788657(v=vs.85).aspx][CreateCommandQueue]( [in] const D3D12_COMMAND_QUEUE_DESC *pDesc, REFIID riid, [out] void **ppCommandQueue ); - **pDesc** - *This is a pointer to a filled out D3D12_COMMAND_QUEUE_DESC structure, which describes the type of command queue* - **riid** - *the type id of our command queue interface* - **ppCommandQueue** - *a pointer to a pointer to our command queue interface.* If our GPU runs out of memory, this function will return E_OUTOFMEMORY. We have to fill out a D3D12_COMMAND_QUEUE_DESC that we can provide the CreateCommandQueue() method with, this structure looks like this: typedef struct .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903796(v=vs.85).aspx][D3D12_COMMAND_QUEUE_DESC] { D3D12_COMMAND_LIST_TYPE Type; INT Priority; D3D12_COMMAND_QUEUE_FLAGS Flags; UINT NodeMask; } D3D12_COMMAND_QUEUE_DESC; - **Type** - *This is a D3D12_COMMAND_LIST_TYPE enumeration. There are 3 types of command queues, which i will talk about below. The default value is D3D12_COMMAND_LIST_TYPE_DIRECT* - **Priority** - *This is a D3D12_COMMAND_QUEUE_PRIORITY enumeration. The default value is D3D12_COMMAND_QUEUE_PRIORITY_NORMAL. If you have multiple command queues, you can change this to D3D12_COMMAND_QUEUE_PRIORITY_HIGH if one queue needs priority.* - **Flags** - *Another enumeration, but from the D3D12_COMMAND_QUEUE_FLAGS enumeration. The default value is D3D12_COMMAND_QUEUE_FLAG_NONE, but you can change to D3D12_COMMAND_QUEUE_FLAG_DISABLE_GPU_TIMEOUT if you do not want the GPU to timeout when executing the command queue. Unless you know an operation will take a very very long time, i would not suggest using D3D12_COMMAND_QUEUE_FLAG_DISABLE_GPU_TIMEOUT. If there is a problem on the GPU that causes it to hang, it will timeout and stop executing a command queue. using D3D12_COMMAND_QUEUE_FLAG_DISABLE_GPU_TIMEOUT and having a queue that causes the GPU to hang could result in your computer freezing up and needing to reboot.* - **NodeMask** - *This is a bit field that says which GPU node this command queue should execute on. By default this is set to 0 (zero), and should be if you are only using one GPU. If you have multiple GPU's, refer to .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn933253(v=vs.85).aspx][Multi-Adapter]* There are 3 types of command queues, defined by the D3D12_COMMAND_LIST_TYPE provided to the D3D12_COMMAND_QUEUE_DESC structure. They are: - **Direct Command Queue** -*Defined by the D3D12_COMMAND_LIST_TYPE_DIRECT enumeration. This is the default command queue. A Direct Command Queue is a queue which accepts all commands. This is the type we will be using* - **Compute Command Queue** -*Defined by the D3D12_COMMAND_LIST_TYPE_COMPUTE enumeration. Compute Command Queues only accept compute and copy commands* - **Copy Command Queue** - *Defined by the D3D12_COMMAND_LIST_TYPE_COPY enumeration. Copy Command Queues only accept copy commands* Notice the use of the IID_PPV_ARGS macro again. I explained above when creating the device. // -- Create the Command Queue -- // D3D12_COMMAND_QUEUE_DESC cqDesc = {}; // we will be using all the default values hr = device->CreateCommandQueue(&cqDesc, IID_PPV_ARGS(&commandQueue)); // create the command queue if (FAILED(hr)) { return false; } ##Creating the Swap Chain## This is where we create a swap chain. The swap chain will be used to present the finished render target. We will use tripple buffering, so we will also have to keep track of which render target we should be rendering onto. the dxgi factory will create a IDXGISwapChain, but we want a IDXGISwapChain3 in order to get the current backbuffer index. we can safely cast IDXGISwapChain to IDXGISwapChain3. We start by filling out a DXGI_SWAP_CHAIN_DESC structure, defined as follows: typedef struct .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb173075(v=vs.85).aspx][DXGI_SWAP_CHAIN_DESC] { DXGI_MODE_DESC BufferDesc; DXGI_SAMPLE_DESC SampleDesc; DXGI_USAGE BufferUsage; UINT BufferCount; HWND OutputWindow; BOOL Windowed; DXGI_SWAP_EFFECT SwapEffect; UINT Flags; } DXGI_SWAP_CHAIN_DESC; - **BufferDesc** - *This is a DXGI_MODE_DESC that describes the display mode, such as width, height and format* - **SampleDesc** - *This is a DXGI_SAMPLE_DESC which describes our multi-sampling.* - **BufferUsage** - *This is a DXGI_USAGE enumeration, which tells the swapchain if this is a render target or shader input. I'm not sure what the uses of having this be a shader input, so you'll have to find out on your own. We are using this as a render target, so we will use DXGI_USAGE_RENDER_TARGET_OUTPUT. The default for this parameter is DXGI_CPU_ACCESS_NONE.* - **BufferCount** - *This is the number of back buffers we want. We are using tripple buffering in this tutorial, so we set this to 3 (or frameBufferCount). The default is 0.* - **OutputWindow** - *This is a handle to our window we will be displaying the back buffer on. The default value is a null pointer.* - **Windowed** - *This says whether we should display in full screen mode or windowed mode. There is actually quite a difference between the two. Watch .[https://www.youtube.com/watch?v=E3wTajGZOsA][this video] if you want to know about unlocked FPS. The present method will actually block while it waits for the refresh rate. In DirectX 12, there is a very very specific combination of settings you must use in your app if you would like unlocked FPS. Unlocked FPS can cause tearing when presenting your render targets, so this is not something you would ever want to do in a release build, it is only for benchmarking. The present mode will wait for your monitor to refresh before presenting the back buffer, which means you are locked at a multiple of the refresh rate of your monitor. Using double buffering, you can get 60FPS if your refresh rate is 60 hertz. We use tripple buffering, which means we are able to get 120 FPS. Using more than 3 buffers though i have read somewhere there can be issues and the present will lock up or something. you'll have to do research and experimentation on this if you want to know more. Also, taken from MSDN: "We recommend that you create a windowed swap chain and allow the end user to change the swap chain to full screen through IDXGISwapChain::SetFullscreenState; that is, do not set this member to FALSE to force the swap chain to be full screen. However, if you create the swap chain as full screen, also provide the end user with a list of supported display modes through the BufferDesc member because a swap chain that is created with an unsupported display mode might cause the display to go black and prevent the end user from seeing anything."* - **SwapEffect** - *This is a .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb173077(v=vs.85).aspx][DXGI_SWAP_EFFECT] enumeration, which describes how the buffer is handled after it is presented. The default is DXGI_SWAP_EFFECT_DISCARD. We will be using DXGI_SWAP_EFFECT_FLIP_DISCARD* - **Flags** - *This is a .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb173076(v=vs.85).aspx][DXGI_SWAP_CHAIN_FLAG] enumeration, which you can | together. The default is 0, and we will keep it that way for this tutorial.* Now we will take a look at the DXGI_MODE_DESC structure: typedef struct .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb173064(v=vs.85).aspx][DXGI_MODE_DESC] { UINT Width; UINT Height; DXGI_RATIONAL RefreshRate; DXGI_FORMAT Format; DXGI_MODE_SCANLINE_ORDER ScanlineOrdering; DXGI_MODE_SCALING Scaling; } DXGI_MODE_DESC; - **Width** - *This is the width resolution of our backbuffer. The default value is 0. If 0 is specified, when calling the CreatSwapChain() method from the device interface, it will set this value to the width of the window. You can then call GetDesc() on the swapchain interface to get the width of the backbuffer. We set this to the width of our window manually, but in our tutorial code, we could just leave it at the default 0 and get the same result.* - **Height** - *Same as Width, but the height of our backbuffer instead.* - **RefreshRate** - *A .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb173069(v=vs.85).aspx][DXGI_RATIONAL] structure defining the refresh rate in hertz of the swap chain. The default is a 0 numerator and 0 denominator. This structure represents a rational number. 0/0 (or denominator/numerator) is valid and will result in 0/1. 0/number results in 0 (meaning the default value of 0/0 is replaced by 0/1, which is then 0). whole numbers are represented by 1/number.* - **Format** - *This is the display format of our swap chain, described by the .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb173059(v=vs.85).aspx][DXGI_FORMAT] enumeration. The default format is DXGI_FORMAT_UNKNOWN, and will cause an error if you try to keep it this way. in our tutorial we set it to a 32 bit unsigned normal integer rgba format, where rgba each have 8 bits.* - **ScanlineOrdering** - *This is a .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb173067(v=vs.85).aspx][DXGI_MODE_SCANLINE_ORDER] structure that describes the scanline drawing mode. The default is DXGI_MODE_SCANLINE_ORDER_UNSPECIFIED, which means the scanline mode is unspecified. We will leave it this way.* - **Scaling** - *A .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb173066(v=vs.85).aspx][DXGI_MODE_SCALING] enumeration. This structure defines if scaling is specified, if the buffer is centered or if the buffer image should be stretched. The default is DXGI_MODE_SCALING_UNSPECIFIED. By using unspecified scaling, a mode change is not triggered when the window is resized, unlike the other two enumerations of centered or stretched. * This is the DXGI_SAMPLE_DESC structure, used to describe multi-sampling. We are not using multisampling, so we set the sample count to 1. The reason we need to set the sample count to 1 is because we need to get at least one sample from the backbuffer. Multi-sampling is used so when the image is further away, or closer to the camera, we get less artifacts and a smoother appearance. typedef struct .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb173072(v=vs.85).aspx][DXGI_SAMPLE_DESC] { UINT Count; UINT Quality; } DXGI_SAMPLE_DESC; - **Count** - *The number of samples we will take of each pixel (at different resolutions). The default is 0, which will cause an error if we try to present it, because we will not have taken a sample from the backbuffer to show. Setting this to 1 will take one sample, and anything higher is called multi-sampling.* - **Quality** - *The quality of the sample taken. The default is 0, which we will leave for default sampling.* Finally, to create the swap chain, we call the **CreateSwapChain()** method of our DXGI factory. The function signature looks like this: HRESULT .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb174537(v=vs.85).aspx][CreateSwapChain]( [in] IUnknown *pDevice, [in] DXGI_SWAP_CHAIN_DESC *pDesc, [out] IDXGISwapChain **ppSwapChain ); - **pDevice** - *This is a pointer to a direct3d device that will write the images to the swap chains back buffer.* - **pDesc** - *This is a reference to the swap chain description we talked about above, which defines the swapchain and it's backbuffers.* - **ppSwapChain** - *This is a pointer to a .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb174569(v=vs.85).aspx][IDXGISwapChain] interface. In our application we use a .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903673(v=vs.85).aspx][IDXGISwapChain3] so that we can get the current backbuffer from the swapchain, but since this function returns a pointer to a IDXGISwapChain, we create a temporary IDXGISwapChain interface which we pass to this function, then static_cast it to the derrived IDXGISwapChain3 and set our swapChain interface to point to the created swapchain memory.* After we create the swapchain, we get the current back buffer in the swap chain by calling **GetCurrentBackBufferIndex()**. This is why we need to use IDXGISwapChain3, since it provides this method. IDXGISwapChain3 is derrived from IDXGISwapChain, but the only difference is that the derrived IDXGISwapChain3 provides a couple more methods, one of which is the GetCurrentBackBufferIndex() method which we use to get the current back buffer. // -- Create the Swap Chain (double/tripple buffering) -- // DXGI_MODE_DESC backBufferDesc = {}; // this is to describe our display mode backBufferDesc.Width = Width; // buffer width backBufferDesc.Height = Height; // buffer height backBufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; // format of the buffer (rgba 32 bits, 8 bits for each chanel) // describe our multi-sampling. We are not multi-sampling, so we set the count to 1 (we need at least one sample of course) DXGI_SAMPLE_DESC sampleDesc = {}; sampleDesc.Count = 1; // multisample count (no multisampling, so we just put 1, since we still need 1 sample) // Describe and create the swap chain. DXGI_SWAP_CHAIN_DESC swapChainDesc = {}; swapChainDesc.BufferCount = frameBufferCount; // number of buffers we have swapChainDesc.BufferDesc = backBufferDesc; // our back buffer description swapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT; // this says the pipeline will render to this swap chain swapChainDesc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD; // dxgi will discard the buffer (data) after we call present swapChainDesc.OutputWindow = hwnd; // handle to our window swapChainDesc.SampleDesc = sampleDesc; // our multi-sampling description swapChainDesc.Windowed = !FullScreen; // set to true, then if in fullscreen must call SetFullScreenState with true for full screen to get uncapped fps IDXGISwapChain* tempSwapChain; dxgiFactory->CreateSwapChain( commandQueue, // the queue will be flushed once the swap chain is created &swapChainDesc, // give it the swap chain description we created above &tempSwapChain // store the created swap chain in a temp IDXGISwapChain interface ); swapChain = static_cast<IDXGISwapChain3*>(tempSwapChain); frameIndex = swapChain->GetCurrentBackBufferIndex(); ##Creating the render target descriptor heap## This is where we create the descriptor heap to hold our render targets. We start by filling out a D3D12_DESCRIPTOR_HEAP_DESC structure to describe the descriptor heap we want to create: typedef struct .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770359%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][D3D12_DESCRIPTOR_HEAP_DESC] { D3D12_DESCRIPTOR_HEAP_TYPE Type; UINT NumDescriptors; D3D12_DESCRIPTOR_HEAP_FLAGS Flags; UINT NodeMask; } D3D12_DESCRIPTOR_HEAP_DESC; - **Type** - *A .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn859379(v=vs.85).aspx][D3D12_DESCRIPTOR_HEAP_TYPE] enumeration. There are three types of descriptor heaps, CBV/SRV/UAV, Sampler, RTV, and DSV. We are creating a RTV heap so we set this to D3D12_DESCRIPTOR_HEAP_TYPE_RTV* - **NumDescriptors** - *This is the number of descriptors we will store in this descriptor heap. We are doing tripple buffering, so we need 3 back buffers, which means we have 3 descriptors* - **Flags** - *A .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn859378(v=vs.85).aspx][D3D12_DESCRIPTOR_HEAP_FLAGS] enumeration. The flags property defines whether this heap is shader visible or not. Shaders do not access RTVs, so we do not need this heap to be shader visible. We do this by setting this property to D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE. Non-shader visible heaps are not stored on the GPU, so they are not constrained in size like shader visible descriptor heaps. .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899211%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][Shader Visible Descriptor Heaps], .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899199(v=vs.85).aspx][Non-Shader Visible Descriptor Heaps]* - **NodeMask** - *This is a bit field that determines which GPU this heap is stored on. The default value is 0.* Shaders can only access descriptors in a CBV/SRV/UAV or Sampler heaps. Command lists can only populate these two types of descriptor heaps. To create the descriptor heap, we call the **CreateDescriptorHeap()** method of the device interface: HRESULT .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788662%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][CreateDescriptorHeap]( [in] const D3D12_DESCRIPTOR_HEAP_DESC *pDescriptorHeapDesc, REFIID riid, [out] void **ppvHeap ); - **pDescriptorHeapDesc** - *This is a pointer to the D3D12_DESCRIPTOR_HEAP_DESC structure we filled out, describing the heap we want to create.* - **riid** - *This is the type id of the descriptor heap interface we will create.* - **ppvHeap** - *This is a void pointer to a pointer to our RTV descriptor heap interface.* Once we create the RTV descriptor heap, we need to get the size of the RTV descriptor type size on the GPU. There is no guarentee that a descriptor type on one GPU is the same size as a descriptor on another GPU, which is why we need to ask the device for the size of a descriptor type size. We need the size of the descriptor type so we can iterate over the descriptors in the descriptor heap. We do this by calling the **GetDescriptorHandleIncrementSize()** method of the device: UINT .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn899186%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][GetDescriptorHandleIncrementSize]( [in] D3D12_DESCRIPTOR_HEAP_TYPE DescriptorHeapType ); - **DescriptorHeapType** - *The only parameter for this function is the type of the descriptor we want to find the size of. This is a .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn859379(v=vs.85).aspx][D3D12_DESCRIPTOR_HEAP_TYPE] type enumeration.* Once we have the descriptor type size for RTV types, we want to get a handle to the descriptor in the heap. There are two types of descriptor handles, GPU and CPU. Our descriptor heap is not shader visible, which means it is stored on the CPU side, which also means we need a CPU handle to the descriptor. A descriptor handle is basically a pointer, but we can not use them like a traditional pointer in C++. These pointers are for the Direct3D drivers to use to locate descriptors. We can get a handle to the first descriptor in the descriptor heap by calling the **GetCPUDescriptorHandleForHeapStart()** method of descriptor heap interface. The d3dx12.h helper file we added in the first tutorial on DirectX 12 provides some helper structures, which include the CD3DX12_CPU_DESCRIPTOR_HANDLE structure which we will use for the RTV descriptor handle. We can loop through the RTV descriptors in the heap by offsetting the current handle we have by the descriptor heap size we got from the GetDescriptorHandleIncrementSize() function. When we have our descriptor handle to the first RTV descriptor in the heap, point each RTV descriptor to the back buffers in our swap chain. We can get a pointer to the buffer in the swap chain by calling the **GetBuffer()** method of the swap chain interface. Using that method we can set our render target resources (ID3D12Resource) to the swap chain buffers. HRESULT .[https://msdn.microsoft.com/en-us/library/windows/desktop/bb174570(v=vs.85).aspx][GetBuffer]( UINT Buffer, [in] REFIID riid, [out] void **ppSurface ); - **Buffer** - *This is the index to the buffer we want to get* - **riid** - *This is the type id of the interface we will store the pointer in* - **ppSurface** - *This is a void pointer to a pointer to the interface we want to point to the buffer* Now that we have 3 resources that point to the swap chain buffers, we can "create" the RTVs using the device interfaces **CreateRenderTargetView()** method. This method will create a descriptor that points to the resource and store it in a descriptor handle. void .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788668(v=vs.85).aspx][CreateRenderTargetView]( [in, optional] ID3D12Resource *pResource, [in, optional] const D3D12_RENDER_TARGET_VIEW_DESC *pDesc, [in] D3D12_CPU_DESCRIPTOR_HANDLE DestDescriptor ); - **pResource** - *Pointer to the resource that is the render target buffer* - **pDesc** - *A pointer to a .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770389%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][D3D12_RENDER_TARGET_VIEW_DESC] structure. This is used if we are using subresources. We can pass a null pointer here.* - **DestDescriptor** - *This is a handle to a cpu descriptor in a descriptor heap that will point to the render target resource* To get to the next descriptor, we can offset the current descriptor by the descriptor type size, by calling the **Offset()** method of the helper structure CD3DX12_CPU_DESCRIPTOR_HANDLE. The first parameter is the number of descriptors we want to offset by (we want to go to the next one, so we use 1), and the second parameter is the size of the descriptor type. // -- Create the Back Buffers (render target views) Descriptor Heap -- // // describe an rtv descriptor heap and create D3D12_DESCRIPTOR_HEAP_DESC rtvHeapDesc = {}; rtvHeapDesc.NumDescriptors = frameBufferCount; // number of descriptors for this heap. rtvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; // this heap is a render target view heap // This heap will not be directly referenced by the shaders (not shader visible), as this will store the output from the pipeline // otherwise we would set the heap's flag to D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE rtvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; hr = device->CreateDescriptorHeap(&rtvHeapDesc, IID_PPV_ARGS(&rtvDescriptorHeap)); if (FAILED(hr)) { return false; } // get the size of a descriptor in this heap (this is a rtv heap, so only rtv descriptors should be stored in it. // descriptor sizes may vary from device to device, which is why there is no set size and we must ask the // device to give us the size. we will use this size to increment a descriptor handle offset rtvDescriptorSize = device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_RTV); // get a handle to the first descriptor in the descriptor heap. a handle is basically a pointer, // but we cannot literally use it like a c++ pointer. CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(rtvDescriptorHeap->GetCPUDescriptorHandleForHeapStart()); // Create a RTV for each buffer (double buffering is two buffers, tripple buffering is 3). for (int i = 0; i < frameBufferCount; i++) { // first we get the n'th buffer in the swap chain and store it in the n'th // position of our ID3D12Resource array hr = swapChain->GetBuffer(i, IID_PPV_ARGS(&renderTargets[i])); if (FAILED(hr)) { return false; } // the we "create" a render target view which binds the swap chain buffer (ID3D12Resource[n]) to the rtv handle device->CreateRenderTargetView(renderTargets[i], nullptr, rtvHandle); // we increment the rtv handle by the rtv descriptor size we got above rtvHandle.Offset(1, rtvDescriptorSize); } ##Creating the command allocators## The command allocator is used to allocate memory on the GPU for the commands we want to execute by calling execute on the command queue and providing a command list with the commands we want to execute. We are using tripple buffering, so we need to create 3 direct command allocators. We need three because we cannot reset a command allocator while the GPU is executing a command list that was associated with it. To create a command allocator, we can call the **CreateCommandAllocator()** method of the device interface, providing the type of command allocator, the type id of the interface of the command allocator, and finally a pointer to a command allocator interface so that we can use it. HRESULT .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788655(v=vs.85).aspx][CreateCommandAllocator]( [in] D3D12_COMMAND_LIST_TYPE type, REFIID riid, [out] void **ppCommandAllocator ); - **type** - *A D3D12_COMMAND_LIST_TYPE type enumeration. We can have either a direct command allocator, or a bundle command allocator. A direct command allocator can be associated with direct command lists, which are executed on the GPU by calling execute on a command queue with the command list. A bundle command allocator stores commands for bundles. Bundles are used multiple times for many frames, so we do not want bundles to be on the same command allocator as direct command lists because direct command allocators are usually reset every frame. We do not want to reset bundles, otherwise they would not be useful.* - **riid** - *The type id of the interface we will be using* - **ppCommandAllocator** - *Pointer to a pointer to a command allocator interface* // -- Create the Command Allocators -- // for (int i = 0; i < frameBufferCount; i++) { hr = device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&commandAllocator[i])); if (FAILED(hr)) { return false; } } ##Creating the command list## You will want as many command lists as you have threads recording commands. We are not making a multi-threaded app, so we only need one command list. While command allocators cannot be reset while the GPU is executing a command list associated with that allocator, command lists can be reset immediately after we call execute on a command queue with that command list. This is why we only need one command list, but 3 command allocators (for a tripple buffer single threaded app). To create a command list, we can call the **CreateCommandList()** method of the device interface: HRESULT .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788656%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][CreateCommandList]( [in] UINT nodeMask, [in] D3D12_COMMAND_LIST_TYPE type, [in] ID3D12CommandAllocator *pCommandAllocator, [in, optional] ID3D12PipelineState *pInitialState, REFIID riid, [out] void **ppCommandList ); - **nodeMask** - *This is a bit field specifying which GPU to use. The default GPU is 0.* - **type** - *This is a .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770348%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][D3D12_COMMAND_LIST_TYPE], saying which type of command list we want to create.* - **pCommandAllocator** - *When creating a command list, you must specify a command allocator that will store the commands on the GPU made by the command list.* - **pInitialState** - *This is the default, or starting pipeline state object for the command list. It is a pointer to a ID3D12PipelineState interface. Specifying NULL will keep the pipeline state at it's default values (if you were drawing anything onto the screen, you need to AT LEAST specify a vertex shader, however in this tutorial we are only clearing the render target and do not need a pipeline state object, that will come next tutorial)* - **riid** - *The type id of the command list interface we are creating* - **ppCommandList** - *A pointer to a pointer to a command list interface* There are 4 different types of command lists, specified by using the D3D12_COMMAND_LIST_TYPE enumeration: typedef enum .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn770348%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][D3D12_COMMAND_LIST_TYPE] { D3D12_COMMAND_LIST_TYPE_DIRECT = 0, D3D12_COMMAND_LIST_TYPE_BUNDLE = 1, D3D12_COMMAND_LIST_TYPE_COMPUTE = 2, D3D12_COMMAND_LIST_TYPE_COPY = 3 } D3D12_COMMAND_LIST_TYPE; - **D3D12_COMMAND_LIST_TYPE_DIRECT** - *A **Direct Command List** is a command list where commands can be executed by the GPU. This is the command list we want to create.* - **D3D12_COMMAND_LIST_TYPE_BUNDLE** - *A **Bundle** is a command list that contains a group of commands that are used often. This type of command list cannot be executed directly by a command queue, instead, a direct command list must execute bundles. A bundle inherits all pipeline state except for the currently set PSO and primitive topology.* - **D3D12_COMMAND_LIST_TYPE_COMPUTE** - *A **Compute command list** is for the compute shader.* - **D3D12_COMMAND_LIST_TYPE_COPY** - *A copy command list* We need to create a direct command list so that we can execute our clear render target command. We do this by specifying D3D12_COMMAND_LIST_TYPE_DIRECT for the second parameter. Since we only need one command list, which is reset each frame where we specify a command allocator, we just create this command list with the first command allocator. When a command list is created, it is created in the "recording" state. We do not want to record to the command list yet, so we Close() the command list after we create it. // create the command list with the first allocator hr = device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT, commandAllocator[0], NULL, IID_PPV_ARGS(&commandList)); if (FAILED(hr)) { return false; } // command lists are created in the recording state. our main loop will set it up for recording again so close it now commandList->Close(); ##Creating a fence & Fence event## The final part of our initializing direct3d function is creating the fences and fence event. We are only using a single thread, so we only need one fence event, but since we are tripple buffering, we have three fences, one for each frame buffer. We also have 3 current fence values, represented by the fenceValue array, so that we can keep track of the actual fence value. The first thing we do here is create 3 fences by calling the **CreateFence()** function of the device interface (for each fence): HRESULT CreateFence( UINT64 InitialValue, D3D12_FENCE_FLAGS Flags, REFIID riid, [out] void **ppFence ); - **InitialValue** - *This is the initial value we want the fence to start with* - **Flags** - *A .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn986729(v=vs.85).aspx][D3D12_FENCE_FLAG_NONE] type enumeration. This flag is for a shared fence. We are not sharing this fence with another GPU so we set this to D3D12_FENCE_FLAG_NONE* - **riid** - *The type id of the fence interface we want* - **ppFence** - *A pointer to a pointer to a fence interface* Once we create all three fences and initialize the fence value array, we create a fence event using the windows **CreateEvent()** function: HANDLE WINAPI .[https://msdn.microsoft.com/en-us/library/windows/desktop/ms682396%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][CreateEvent]( _In_opt_ LPSECURITY_ATTRIBUTES lpEventAttributes, _In_ BOOL bManualReset, _In_ BOOL bInitialState, _In_opt_ LPCTSTR lpName ); - **lpEventAttributes** - *This is a pointer to a .[https://msdn.microsoft.com/en-us/library/windows/desktop/aa379560(v=vs.85).aspx][SECURITY_ATTRIBUTES] structure. Setting this to a null pointer will use a default security structure.* - **bManualReset** - *If this is set to true, we will have to automatically reset the event to NOT TRIGGERED (by using the ResetEvent() function) after we wait for it to be set by the GPU. Setting this to false, which we do, will cause this event to be automatically reset to not triggered after we have waited for the fence event.* - **bInitialState** - *Setting this to true will cause the initial state of this event to be signaled. We don't want it to be signaled yet so we say false.* - **lpName** - *Setting this to a null pointer will cause the event to be created without a name.* // -- Create a Fence & Fence Event -- // // create the fences for (int i = 0; i < frameBufferCount; i++) { hr = device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&fence[i])); if (FAILED(hr)) { return false; } fenceValue[i] = 0; // set the initial fence value to 0 } // create a handle to a fence event fenceEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr); if (fenceEvent == nullptr) { return false; } return true; } ##Update() function## The update function does nothing right now, but later we will add logic to this function that can run while the gpu is executing the command queue. We could have changed the render target clear color here if we wanted it to change each frame. void Update() { // update app logic, such as moving the camera or figuring out what objects are in view } ##UpdatePipeline() function## This function is where we will add commands to the command list, which include changing the state of the render target, setting the root signature and clearing the render target. later we will be setting vertex buffers and calling draw in this function. void UpdatePipeline() { HRESULT hr; ##Resetting the Command Allocator and Command List## As mentioned before, Command Allocators cannot be reset while a GPU is executing commands from a command list associated with it. This is why we have fences and fence event. The first thing we do, before we reset this frames command allocator, is make sure the GPU is finished executing the command list that was associated with this command allocator. You will see in the render function, after we call execute on the command queue, we call **Signal()** on the command queue. This will basically insert a command after the command list we just executed that will increment this frames fence value. We call WaitForPreviousFrame() which will check the value of the fence and see if it has been incremented. If it has, we know that the command list that frame has been executed, and it is safe to reset the command allocator. After we have reset this frames command allocator, we want to reset the command list. Unlike a command allocator, once we call execute on a command queue, we can immediately reset that command list and reuse it. So we reset the command list here, giving it this frames command allocator and a null PSO (we are not drawing anything yet, and so do not need to set any kind of pipeline state). Resetting a command list puts it in the recording state. // We have to wait for the gpu to finish with the command allocator before we reset it WaitForPreviousFrame(); // we can only reset an allocator once the gpu is done with it // resetting an allocator frees the memory that the command list was stored in hr = commandAllocator[frameIndex]->Reset(); if (FAILED(hr)) { Running = false; } // reset the command list. by resetting the command list we are putting it into // a recording state so we can start recording commands into the command allocator. // the command allocator that we reference here may have multiple command lists // associated with it, but only one can be recording at any time. Make sure // that any other command lists associated to this command allocator are in // the closed state (not recording). // Here you will pass an initial pipeline state object as the second parameter, // but in this tutorial we are only clearing the rtv, and do not actually need // anything but an initial default pipeline, which is what we get by setting // the second parameter to NULL hr = commandList->Reset(commandAllocator[frameIndex], NULL); if (FAILED(hr)) { Running = false; } ##Recording commands with the command list## Now we get to the fun part of Direct3D 12, recording commands. For this tutorial, the only commands we record are changing the state of the previous and current render target resources, and clearing the render target to a certain color. Render target resources must be in the render target state for the Output Merger to output on. We can change the state of a resource by using a resource barrier. This is done with the command list interface's **ResourceBarrier()** command. We need a transition barrier because we are transitioning the state of the render target from the present state, which it needs to be in for the swap chain to present it (or you will get debug errors), to the render target state which it needs to be in for the output merger to output on. void .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903898%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][ResourceBarrier]( [in] UINT NumBarriers, [in] const D3D12_RESOURCE_BARRIER *pBarriers ); - **NumBarriers** - *This is the number of barrier descriptions we are submitting (we are only submitting one here)* - **pBarriers** - *This is a pointer to an array of .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn986740(v=vs.85).aspx][D3D12_RESOURCE_BARRIER] (resource barrier descriptions).* This is where we use the d3dx12.h helper library again. We use CD3DX12_RESOURCE_BARRIER::Transition to create a transition resource barrier description. We pass in the render target resource, and the current state and state we want to transition to. Here we are transitioning the current render target from the present state to the render target state, so that we can clear it to a color. After we are finished with our commands for this render target, we want to transition it's state again, but this time from render target state to the present state, so that the swap chain can present it. We want to clear the render target, so what we do is get a handle to the render target. We use the CD3DX12_CPU_DESCRIPTOR_HANDLE structure, and provide it with the first descriptor in the RTV descriptor heap, the index of the current frame, and the size of each RTV descriptor (basically we get a pointer to the beginning of the descriptor heap, then increment that pointer frameIndex times rtvDescriptorSize) Once we have the descriptor handle to the current render target, we need to set the render target to be the output of the Output Merger. We do this with the command **OMSetRenderTargets()**: void .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn986884%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][OMSetRenderTargets]( [in] UINT NumRenderTargetDescriptors, [in, optional] const D3D12_CPU_DESCRIPTOR_HANDLE *pRenderTargetDescriptors, [in] BOOL RTsSingleHandleToDescriptorRange, [in, optional] const D3D12_CPU_DESCRIPTOR_HANDLE *pDepthStencilDescriptor ); - **NumRenderTargetDescriptors** - *The number of render target descriptor handles* - **pRenderTargetDescriptors** - *A pointer to an array of render target descriptor handles* - **RTsSingleHandleToDescriptorRange** - *If this is true, then the pRenderTargetDescriptors is a pointer to the beginning of a contiguous chunk of descriptors in a descriptor heap. When getting the next descriptor, D3D offsets the current descriptor handle by the size of the descriptor type. When setting this to false, pRenderTargetDescriptors is a pointer to an array of render target descriptor handles. This is less efficient than when setting this to true because to get the next descriptor, D3D needs to dereference handle in the array to get to the render target. Since we only have one render target, we set this to false because we are passing a reference to a handle to the only descriptor handle we are using.* - **pDepthStencilDescriptor** - *A pointer to a depth/stencil descriptor handle. We set this to null in this tutorial because we do not have a depth/stencil buffer yet* Finally to clear the render target, we use the **ClearRenderTargetView()** command: void .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn903842%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][ClearRenderTargetView]( [in] D3D12_CPU_DESCRIPTOR_HANDLE RenderTargetView, [in] const FLOAT ColorRGBA[4], [in] UINT NumRects, [in] const D3D12_RECT *pRects ); - **RenderTargetView** - *A descriptor handle to the render target we want to clear* - **ColorRGBA[4]** - *An array of 4 float values, representing Red, Green, Blue, and Alpha* - **NumRects** - *The number of rectangles on the render target to clear. Set this to 0 to clear the entire render target* - **pRects** - *This is a pointer to an array of D3D12_RECT structures representing the rectangles on the render target we want to clear. This is nice for when you do not want to clear the entire render target, but instead only one ore more rectangles. If we set NumRects to 0, we pass a null pointer here* Once we are finished recording our commands, we need to close the command list. If we do not close the command list before we try to execute it with the command queue, our application will break. Another note on closing the command list. In direct3d 12, if you do something illegal during recording the command list, your program will continue running until you call close, where close will fail. You must enable the debug layer in order to see what exactly failed when calling close. // here we start recording commands into the commandList (which all the commands will be stored in the commandAllocator) // transition the "frameIndex" render target from the present state to the render target state so the command list draws to it starting from here commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(renderTargets[frameIndex], D3D12_RESOURCE_STATE_PRESENT, D3D12_RESOURCE_STATE_RENDER_TARGET)); // here we again get the handle to our current render target view so we can set it as the render target in the output merger stage of the pipeline CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(rtvDescriptorHeap->GetCPUDescriptorHandleForHeapStart(), frameIndex, rtvDescriptorSize); // set the render target for the output merger stage (the output of the pipeline) commandList->OMSetRenderTargets(1, &rtvHandle, FALSE, nullptr); // Clear the render target by using the ClearRenderTargetView command const float clearColor[] = { 0.0f, 0.2f, 0.4f, 1.0f }; commandList->ClearRenderTargetView(rtvHandle, clearColor, 0, nullptr); // transition the "frameIndex" render target from the render target state to the present state. If the debug layer is enabled, you will receive a // warning if present is called on the render target when it's not in the present state commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(renderTargets[frameIndex], D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_PRESENT)); hr = commandList->Close(); if (FAILED(hr)) { Running = false; } } ##Render() function## The first thing we do here is update our pipeline (record the command list) by calling our UpdatePipeline() function. once our command list has been recorded, we create an array of our command lists. We only have one command list, but if we had multiple threads, we would have a command list for each thread. here we would organize our command lists in the array in the order we want to execute them. We can execute the command lists by calling **ExecuteCommandLists()** on the commandQueue and provide the number of command lists to execute, and a pointer to the command lists array. void .[https://msdn.microsoft.com/en-us/library/windows/desktop/dn788631%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396][ExecuteCommandLists]( [in] UINT NumCommandLists, [in] ID3D12CommandList *const *ppCommandLists ); - **NumCommandLists** - *The number of command lists to execute* - **ppCommandLists** - *An array of command lists to execute. The command lists will be executed in the order they were put into the array.* After we instruct the GPU to execute our command list, we want to insert a command into the command queue to set the fence for this frame. The Signal() method basically inserts another command that sets a fence to a specific value and signals a fence event. We do this so that when we get back to this frame buffer, we can check to see if the GPU has finished executing the command list. We will know when it has finished because the signal command will have been executed and the fence will have been set to the value we told it to set it to. Finally we present the next back buffer by calling the Present() method of the swapchain. void Render() { HRESULT hr; UpdatePipeline(); // update the pipeline by sending commands to the commandqueue // create an array of command lists (only one command list here) ID3D12CommandList* ppCommandLists[] = { commandList }; // execute the array of command lists commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists); // this command goes in at the end of our command queue. we will know when our command queue // has finished because the fence value will be set to "fenceValue" from the GPU since the command // queue is being executed on the GPU hr = commandQueue->Signal(fence[frameIndex], fenceValue[frameIndex]); if (FAILED(hr)) { Running = false; } // present the current backbuffer hr = swapChain->Present(0, 0); if (FAILED(hr)) { Running = false; } } ##Cleanup() function## This function just releases the interface objects we created. Before we release anything, we want to make sure the GPU has finished with everything before we start releasing things. void Cleanup() { // wait for the gpu to finish all frames for (int i = 0; i < frameBufferCount; ++i) { frameIndex = i; WaitForPreviousFrame(); } // get swapchain out of full screen before exiting BOOL fs = false; if (swapChain->GetFullscreenState(&fs, NULL)) swapChain->SetFullscreenState(false, NULL); SAFE_RELEASE(device); SAFE_RELEASE(swapChain); SAFE_RELEASE(commandQueue); SAFE_RELEASE(rtvDescriptorHeap); SAFE_RELEASE(commandList); for (int i = 0; i < frameBufferCount; ++i) { SAFE_RELEASE(renderTargets[i]); SAFE_RELEASE(commandAllocator[i]); SAFE_RELEASE(fence[i]); }; } ##WaitForPreviousFrame() function## Finally we have the wait for previous frame function. This function is where we need the fence and fence event. The first thing we do is check the current value of the current frames fence. If the current value is less than the value we wanted it to be, we know the GPU is still executing commands for this frame, and we must enter the if block, where we set the fence event which will get triggered once the fence value equals what we wanted it to equal. We do this with the **SetEventOnCompletion()** method of the fence interface. HRESULT SetEventOnCompletion( UINT64 Value, HANDLE hEvent ); - **Value** - *This is the value we want the fence to equal* - **hEvent** - *This is the event we want triggered when the fence equals Value* After we set up the event, we wait for it to be triggered. We do this with the windows **WaitForSingleObject()** function. DWORD WINAPI WaitForSingleObject( _In_ HANDLE hHandle, _In_ DWORD dwMilliseconds ); - **hHandle** - *This is the fence event we want to wait to be triggered. (If the fence event HAPPENS to be triggered in the very small ammount of time between this function call and the time we set the fence event, this function will return immediately* - **dwMilliseconds** - *This is the number of milliseconds we want to wait for the fence event to be triggered. We can use the INFINITE macro which means this method will block forever or until the fence event is triggered.* Once we see the GPU has finished executing this frames command list, we increment our fence value for this frame, set the current back buffer in the swap chain, and continue. void WaitForPreviousFrame() { HRESULT hr; // swap the current rtv buffer index so we draw on the correct buffer frameIndex = swapChain->GetCurrentBackBufferIndex(); // if the current fence value is still less than "fenceValue", then we know the GPU has not finished executing // the command queue since it has not reached the "commandQueue->Signal(fence, fenceValue)" command if (fence[frameIndex]->GetCompletedValue() < fenceValue[frameIndex]) { // we have the fence create an event which is signaled once the fence's current value is "fenceValue" hr = fence[frameIndex]->SetEventOnCompletion(fenceValue[frameIndex], fenceEvent); if (FAILED(hr)) { Running = false; } // We will wait until the fence has triggered the event that it's current value has reached "fenceValue". once it's value // has reached "fenceValue", we know the command queue has finished executing WaitForSingleObject(fenceEvent, INFINITE); } // increment fenceValue for next frame fenceValue[frameIndex]++; } And thats it for initializing Direct3D 12! Let me know if you see any mistakes or what you think about the tutorial in the comments below! Here is the final code for this tutorial: ##stdafx.h## #pragma once #ifndef WIN32_LEAN_AND_MEAN #define WIN32_LEAN_AND_MEAN // Exclude rarely-used stuff from Windows headers. #endif #include <windows.h> #include <d3d12.h> #include <dxgi1_4.h> #include <D3Dcompiler.h> #include <DirectXMath.h> #include "d3dx12.h" #include <string> // this will only call release if an object exists (prevents exceptions calling release on non existant objects) #define SAFE_RELEASE(p) { if ( (p) ) { (p)->Release(); (p) = 0; } } // Handle to the window HWND hwnd = NULL; // name of the window (not the title) LPCTSTR WindowName = L"BzTutsApp"; // title of the window LPCTSTR WindowTitle = L"Bz Window"; // width and height of the window int Width = 800; int Height = 600; // is window full screen? bool FullScreen = false; // we will exit the program when this becomes false bool Running = true; // create a window bool InitializeWindow(HINSTANCE hInstance, int ShowWnd, bool fullscreen); // main application loop void mainloop(); // callback function for windows messages LRESULT CALLBACK WndProc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam); // direct3d stuff const int frameBufferCount = 3; // number of buffers we want, 2 for double buffering, 3 for tripple buffering ID3D12Device* device; // direct3d device IDXGISwapChain3* swapChain; // swapchain used to switch between render targets ID3D12CommandQueue* commandQueue; // container for command lists ID3D12DescriptorHeap* rtvDescriptorHeap; // a descriptor heap to hold resources like the render targets ID3D12Resource* renderTargets[frameBufferCount]; // number of render targets equal to buffer count ID3D12CommandAllocator* commandAllocator[frameBufferCount]; // we want enough allocators for each buffer * number of threads (we only have one thread) ID3D12GraphicsCommandList* commandList; // a command list we can record commands into, then execute them to render the frame ID3D12Fence* fence[frameBufferCount]; // an object that is locked while our command list is being executed by the gpu. We need as many //as we have allocators (more if we want to know when the gpu is finished with an asset) HANDLE fenceEvent; // a handle to an event when our fence is unlocked by the gpu UINT64 fenceValue[frameBufferCount]; // this value is incremented each frame. each fence will have its own value int frameIndex; // current rtv we are on int rtvDescriptorSize; // size of the rtv descriptor on the device (all front and back buffers will be the same size) // function declarations bool InitD3D(); // initializes direct3d 12 void Update(); // update the game logic void UpdatePipeline(); // update the direct3d pipeline (update command lists) void Render(); // execute the command list void Cleanup(); // release com ojects and clean up memory void WaitForPreviousFrame(); // wait until gpu is finished with command list ##main.cpp## #include "stdafx.h" using namespace DirectX; // we will be using the directxmath library int WINAPI WinMain(HINSTANCE hInstance, //Main windows function HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd) { // create the window if (!InitializeWindow(hInstance, nShowCmd, FullScreen)) { MessageBox(0, L"Window Initialization - Failed", L"Error", MB_OK); return 1; } // initialize direct3d if (!InitD3D()) { MessageBox(0, L"Failed to initialize direct3d 12", L"Error", MB_OK); Cleanup(); return 1; } // start the main loop mainloop(); // we want to wait for the gpu to finish executing the command list before we start releasing everything WaitForPreviousFrame(); // close the fence event CloseHandle(fenceEvent); // clean up everything Cleanup(); return 0; } // create and show the window bool InitializeWindow(HINSTANCE hInstance, int ShowWnd, bool fullscreen) { if (fullscreen) { HMONITOR hmon = MonitorFromWindow(hwnd, MONITOR_DEFAULTTONEAREST); MONITORINFO mi = { sizeof(mi) }; GetMonitorInfo(hmon, &mi); Width = mi.rcMonitor.right - mi.rcMonitor.left; Height = mi.rcMonitor.bottom - mi.rcMonitor.top; } WNDCLASSEX wc; wc.cbSize = sizeof(WNDCLASSEX); wc.style = CS_HREDRAW | CS_VREDRAW; wc.lpfnWndProc = WndProc; wc.cbClsExtra = NULL; wc.cbWndExtra = NULL; wc.hInstance = hInstance; wc.hIcon = LoadIcon(NULL, IDI_APPLICATION); wc.hCursor = LoadCursor(NULL, IDC_ARROW); wc.hbrBackground = (HBRUSH)(COLOR_WINDOW + 2); wc.lpszMenuName = NULL; wc.lpszClassName = WindowName; wc.hIconSm = LoadIcon(NULL, IDI_APPLICATION); if (!RegisterClassEx(&wc)) { MessageBox(NULL, L"Error registering class", L"Error", MB_OK | MB_ICONERROR); return false; } hwnd = CreateWindowEx(NULL, WindowName, WindowTitle, WS_OVERLAPPEDWINDOW, CW_USEDEFAULT, CW_USEDEFAULT, Width, Height, NULL, NULL, hInstance, NULL); if (!hwnd) { MessageBox(NULL, L"Error creating window", L"Error", MB_OK | MB_ICONERROR); return false; } if (fullscreen) { SetWindowLong(hwnd, GWL_STYLE, 0); } ShowWindow(hwnd, ShowWnd); UpdateWindow(hwnd); return true; } void mainloop() { MSG msg; ZeroMemory(&msg, sizeof(MSG)); while (Running) { if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) { if (msg.message == WM_QUIT) break; TranslateMessage(&msg); DispatchMessage(&msg); } else { // run game code Update(); // update the game logic Render(); // execute the command queue (rendering the scene is the result of the gpu executing the command lists) } } } LRESULT CALLBACK WndProc(HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam) { switch (msg) { case WM_KEYDOWN: if (wParam == VK_ESCAPE) { if (MessageBox(0, L"Are you sure you want to exit?", L"Really?", MB_YESNO | MB_ICONQUESTION) == IDYES) { Running = false; DestroyWindow(hwnd); } } return 0; case WM_DESTROY: // x button on top right corner of window was pressed Running = false; PostQuitMessage(0); return 0; } return DefWindowProc(hwnd, msg, wParam, lParam); } bool InitD3D() { HRESULT hr; // -- Create the Device -- // IDXGIFactory4* dxgiFactory; hr = CreateDXGIFactory1(IID_PPV_ARGS(&dxgiFactory)); if (FAILED(hr)) { return false; } IDXGIAdapter1* adapter; // adapters are the graphics card (this includes the embedded graphics on the motherboard) int adapterIndex = 0; // we'll start looking for directx 12 compatible graphics devices starting at index 0 bool adapterFound = false; // set this to true when a good one was found // find first hardware gpu that supports d3d 12 while (dxgiFactory->EnumAdapters1(adapterIndex, &adapter) != DXGI_ERROR_NOT_FOUND) { DXGI_ADAPTER_DESC1 desc; adapter->GetDesc1(&desc); if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE) { // we dont want a software device adapterIndex++; continue; } // we want a device that is compatible with direct3d 12 (feature level 11 or higher) hr = D3D12CreateDevice(adapter, D3D_FEATURE_LEVEL_11_0, _uuidof(ID3D12Device), nullptr); if (SUCCEEDED(hr)) { adapterFound = true; break; } adapterIndex++; } if (!adapterFound) { return false; } // Create the device hr = D3D12CreateDevice( adapter, D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&device) ); if (FAILED(hr)) { return false; } // -- Create a direct command queue -- // D3D12_COMMAND_QUEUE_DESC cqDesc = {}; cqDesc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE; cqDesc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT; // direct means the gpu can directly execute this command queue hr = device->CreateCommandQueue(&cqDesc, IID_PPV_ARGS(&commandQueue)); // create the command queue if (FAILED(hr)) { return false; } // -- Create the Swap Chain (double/tripple buffering) -- // DXGI_MODE_DESC backBufferDesc = {}; // this is to describe our display mode backBufferDesc.Width = Width; // buffer width backBufferDesc.Height = Height; // buffer height backBufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; // format of the buffer (rgba 32 bits, 8 bits for each chanel) // describe our multi-sampling. We are not multi-sampling, so we set the count to 1 (we need at least one sample of course) DXGI_SAMPLE_DESC sampleDesc = {}; sampleDesc.Count = 1; // multisample count (no multisampling, so we just put 1, since we still need 1 sample) // Describe and create the swap chain. DXGI_SWAP_CHAIN_DESC swapChainDesc = {}; swapChainDesc.BufferCount = frameBufferCount; // number of buffers we have swapChainDesc.BufferDesc = backBufferDesc; // our back buffer description swapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT; // this says the pipeline will render to this swap chain swapChainDesc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD; // dxgi will discard the buffer (data) after we call present swapChainDesc.OutputWindow = hwnd; // handle to our window swapChainDesc.SampleDesc = sampleDesc; // our multi-sampling description swapChainDesc.Windowed = !FullScreen; // set to true, then if in fullscreen must call SetFullScreenState with true for full screen to get uncapped fps IDXGISwapChain* tempSwapChain; dxgiFactory->CreateSwapChain( commandQueue, // the queue will be flushed once the swap chain is created &swapChainDesc, // give it the swap chain description we created above &tempSwapChain // store the created swap chain in a temp IDXGISwapChain interface ); swapChain = static_cast<IDXGISwapChain3*>(tempSwapChain); frameIndex = swapChain->GetCurrentBackBufferIndex(); // -- Create the Back Buffers (render target views) Descriptor Heap -- // // describe an rtv descriptor heap and create D3D12_DESCRIPTOR_HEAP_DESC rtvHeapDesc = {}; rtvHeapDesc.NumDescriptors = frameBufferCount; // number of descriptors for this heap. rtvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; // this heap is a render target view heap // This heap will not be directly referenced by the shaders (not shader visible), as this will store the output from the pipeline // otherwise we would set the heap's flag to D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE rtvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; hr = device->CreateDescriptorHeap(&rtvHeapDesc, IID_PPV_ARGS(&rtvDescriptorHeap)); if (FAILED(hr)) { return false; } // get the size of a descriptor in this heap (this is a rtv heap, so only rtv descriptors should be stored in it. // descriptor sizes may vary from device to device, which is why there is no set size and we must ask the // device to give us the size. we will use this size to increment a descriptor handle offset rtvDescriptorSize = device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_RTV); // get a handle to the first descriptor in the descriptor heap. a handle is basically a pointer, // but we cannot literally use it like a c++ pointer. CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(rtvDescriptorHeap->GetCPUDescriptorHandleForHeapStart()); // Create a RTV for each buffer (double buffering is two buffers, tripple buffering is 3). for (int i = 0; i < frameBufferCount; i++) { // first we get the n'th buffer in the swap chain and store it in the n'th // position of our ID3D12Resource array hr = swapChain->GetBuffer(i, IID_PPV_ARGS(&renderTargets[i])); if (FAILED(hr)) { return false; } // the we "create" a render target view which binds the swap chain buffer (ID3D12Resource[n]) to the rtv handle device->CreateRenderTargetView(renderTargets[i], nullptr, rtvHandle); // we increment the rtv handle by the rtv descriptor size we got above rtvHandle.Offset(1, rtvDescriptorSize); } // -- Create the Command Allocators -- // for (int i = 0; i < frameBufferCount; i++) { hr = device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&commandAllocator[i])); if (FAILED(hr)) { return false; } } // -- Create a Command List -- // // create the command list with the first allocator hr = device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT, commandAllocator[0], NULL, IID_PPV_ARGS(&commandList)); if (FAILED(hr)) { return false; } // command lists are created in the recording state. our main loop will set it up for recording again so close it now commandList->Close(); // -- Create a Fence & Fence Event -- // // create the fences for (int i = 0; i < frameBufferCount; i++) { hr = device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&fence[i])); if (FAILED(hr)) { return false; } fenceValue[i] = 0; // set the initial fence value to 0 } // create a handle to a fence event fenceEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr); if (fenceEvent == nullptr) { return false; } return true; } void Update() { // update app logic, such as moving the camera or figuring out what objects are in view } void UpdatePipeline() { HRESULT hr; // We have to wait for the gpu to finish with the command allocator before we reset it WaitForPreviousFrame(); // we can only reset an allocator once the gpu is done with it // resetting an allocator frees the memory that the command list was stored in hr = commandAllocator[frameIndex]->Reset(); if (FAILED(hr)) { Running = false; } // reset the command list. by resetting the command list we are putting it into // a recording state so we can start recording commands into the command allocator. // the command allocator that we reference here may have multiple command lists // associated with it, but only one can be recording at any time. Make sure // that any other command lists associated to this command allocator are in // the closed state (not recording). // Here you will pass an initial pipeline state object as the second parameter, // but in this tutorial we are only clearing the rtv, and do not actually need // anything but an initial default pipeline, which is what we get by setting // the second parameter to NULL hr = commandList->Reset(commandAllocator[frameIndex], NULL); if (FAILED(hr)) { Running = false; } // here we start recording commands into the commandList (which all the commands will be stored in the commandAllocator) // transition the "frameIndex" render target from the present state to the render target state so the command list draws to it starting from here commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(renderTargets[frameIndex], D3D12_RESOURCE_STATE_PRESENT, D3D12_RESOURCE_STATE_RENDER_TARGET)); // here we again get the handle to our current render target view so we can set it as the render target in the output merger stage of the pipeline CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(rtvDescriptorHeap->GetCPUDescriptorHandleForHeapStart(), frameIndex, rtvDescriptorSize); // set the render target for the output merger stage (the output of the pipeline) commandList->OMSetRenderTargets(1, &rtvHandle, FALSE, nullptr); // Clear the render target by using the ClearRenderTargetView command const float clearColor[] = { 0.0f, 0.2f, 0.4f, 1.0f }; commandList->ClearRenderTargetView(rtvHandle, clearColor, 0, nullptr); // transition the "frameIndex" render target from the render target state to the present state. If the debug layer is enabled, you will receive a // warning if present is called on the render target when it's not in the present state commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(renderTargets[frameIndex], D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_PRESENT)); hr = commandList->Close(); if (FAILED(hr)) { Running = false; } } void Render() { HRESULT hr; UpdatePipeline(); // update the pipeline by sending commands to the commandqueue // create an array of command lists (only one command list here) ID3D12CommandList* ppCommandLists[] = { commandList }; // execute the array of command lists commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists); // this command goes in at the end of our command queue. we will know when our command queue // has finished because the fence value will be set to "fenceValue" from the GPU since the command // queue is being executed on the GPU hr = commandQueue->Signal(fence[frameIndex], fenceValue[frameIndex]); if (FAILED(hr)) { Running = false; } // present the current backbuffer hr = swapChain->Present(0, 0); if (FAILED(hr)) { Running = false; } } void Cleanup() { // wait for the gpu to finish all frames for (int i = 0; i < frameBufferCount; ++i) { frameIndex = i; WaitForPreviousFrame(); } // get swapchain out of full screen before exiting BOOL fs = false; if (swapChain->GetFullscreenState(&fs, NULL)) swapChain->SetFullscreenState(false, NULL); SAFE_RELEASE(device); SAFE_RELEASE(swapChain); SAFE_RELEASE(commandQueue); SAFE_RELEASE(rtvDescriptorHeap); SAFE_RELEASE(commandList); for (int i = 0; i < frameBufferCount; ++i) { SAFE_RELEASE(renderTargets[i]); SAFE_RELEASE(commandAllocator[i]); SAFE_RELEASE(fence[i]); }; } void WaitForPreviousFrame() { HRESULT hr; // swap the current rtv buffer index so we draw on the correct buffer frameIndex = swapChain->GetCurrentBackBufferIndex(); // if the current fence value is still less than "fenceValue", then we know the GPU has not finished executing // the command queue since it has not reached the "commandQueue->Signal(fence, fenceValue)" command if (fence[frameIndex]->GetCompletedValue() < fenceValue[frameIndex]) { // we have the fence create an event which is signaled once the fence's current value is "fenceValue" hr = fence[frameIndex]->SetEventOnCompletion(fenceValue[frameIndex], fenceEvent); if (FAILED(hr)) { Running = false; } // We will wait until the fence has triggered the event that it's current value has reached "fenceValue". once it's value // has reached "fenceValue", we know the command queue has finished executing WaitForSingleObject(fenceEvent, INFINITE); } // increment fenceValue for next frame fenceValue[frameIndex]++; }
Comments
you repeated the tutorial excluding the full source code
on Dec 13 `15
Caseofgames
which tutorial do you mean?
on Dec 13 `15
iedoc
oh haha, thats a mistake! thanks for noticing
on Dec 13 `15
iedoc
New to 3D, trying to understand all of this. Downloaded tutorial shows 464 errors in VS2015 Community Edition.
on Sep 16 `16
aidevelopment
Could you start a question in the questions section and post some or all of the errors?
on Sep 16 `16
iedoc
The downloadable project when run on my machine freezes my pc. The memory consumption keeps increasing and consumes upto 91% of mermory (I have a 4GB RAM). Is it supposed be so or am I doing something wrong.
on Jan 19 `17
JorahMormont
I've got the same problem, memory dump and pc freezes up. Solution -> Main.cpp: line 180 while (dxgiFactory->EnumAdapters1(adapterIndex, &adapter) != DXGI_ERROR_NOT_FOUND) { DXGI_ADAPTER_DESC1 desc; adapter->GetDesc1(&desc); if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE) { // we dont want a software device adapter->Release(); adapterIndex++; continue; } // we want a device that is compatible with direct3d 12 (feature level 11 or higher) hr = D3D12CreateDevice(adapter, D3D_FEATURE_LEVEL_11_0, _uuidof(ID3D12Device), nullptr); if (SUCCEEDED(hr)) { adapterFound = true; break; } adapter->Release(); adapterIndex++; } --- I was looking forward to go into these tutorials on my holidays … but my NVidia GT540M is not supported for dx12 with the latest drivers.
on Jul 05 `17
isolator
Hey sorry to hear your gfx card doesn't support dx12. Are you on Windows 10? D3d12 is only supported on win10. You can acquire the software device to continue the tutorials, although it's obviously going to be much slower
on Jul 05 `17
iedoc
Thanks for the reply, I’ve also tried to allow a software device but there are none in the list. Same issues with the project you linked for d3dx12.h (GitHub from MS) Hello Triangle. I’ve learned that NVidia just added some Fermi family (same as my gpu) to support dx12 in win10 (driver v384.76). I presume next driver pack will allow my gpu to support dx12. I’ll try to resume next week on another pc.
on Jul 07 `17
isolator
Hi there, really nice tutorial :). However, there seems to be a bug in the cleanup phase. In cleanup you set the frameIndex variable manually, and then call WaitForPreviousFrame, where the frameIndex is overwritten by call to swapChain->GetCurrentBackBufferIndex(). The problem is that when calling Cleanup, the value swapChain->GetCurrentBackBufferIndex() will always be the same for every call, regardless of what you manually set in cleanup (due to this overwritting), because there is no swapChain->Present(). This will cause the app to hang on cleanup. I've fixed it by adding boolean argument to WaitForPreviousFrame and only calling swapChain->GetCurrentBackBufferIndex() if it's true. Also, i think with this version of the app, the conditional break on WM_QUIT in mainLoop() is altogether unnecessary, since there is the Running variable. Anyway, the dx12 stuff is really well explained :). Sorry for nitpicking.
on Jul 10 `17
elaic
Thanks for finding that and fixing it elaic! Appreciate it!
on Jul 10 `17
iedoc