Sign Up NOW to get 100 free credits, used to download 3D Models, Textures, Sound Effects and Music!

Lesson 32: Direct3D 11 Instancing

Introduction

This lesson builds off the last lesson, the third person camera.

Here we will learn how to do a technique that you will most definitely want to learn, called instancing.

Instancing is a technique to draw multiple copies of the same geometry with slightly different changes per copy, such as position, orientation, color, animation, or scale (even different textures per copy). This technique is VERY fast because it saves the geometry on the GPU, so you do not have to call the draw() function for every copy of the geometry, which will send the geometry to the GPU per draw call. All you have to do is say how many copies of the geometry you want to draw, then call the draw() function only once, to send the geometry to the Shaders, which store the geometry on the GPU while it draws all the copies.

In this lesson, we will be drawing 400 trees, with 1000 leaves (quads) per tree. That gives us a total of 400,000 quads to draw! Although you probably won't be drawing trees this way, it is only an example of what instancing can do for you. If you were to make a draw call for every single leaf, that's 400,000 draw calls per frame! Compare that to 1 draw call per frame using the instancing technique, and you'll understand why instancing is so powerfull. Drawing this same scene without instancing will grind your computer to a hault, since you would have to make 400 draw calls for the trees (sending the entire tree's geometry to the GPU 400 times per frame), and 400,000 draw calls for the leaves (sending the leaf's quad geometry to the GPU 400,000 times per frame!).

Instancing

Instancing is much more simple to do than you may think. All you have to do is tell the GPU how many copies of the geometry you want to draw, and it will draw them all, looping through the graphics pipeline for each instance. The looping through the graphics pipeline per instance is important because it allows you to make the changes per instance that you want.

Instance Data (Instance Buffer/Constant Buffer/On-the-fly (inside shaders))
Along with telling the GPU how many copies to draw, you might also have to provide data for each instance, such as it's position, orientation, color, texture, animation or scale. We can do this a couple different ways, such as using an instance buffer, a constant buffer, computing the instance data directly in the shaders, or a combination of those three.

On-the-fly (inside shaders)
We'll start with the easiest, creating data per instance "on-the-fly". What I mean by on the fly, is the data for each instance is computed directly in the shaders. Since sharing data between each pass through a shader is not possible (such as getting information in the vertex shader from the previous vertex passed through the shader), this leaves only a small range of ways to create unique data for each instance (i'm not talking about using instance or constant buffers at this point, i'm talking about making the actual unique data per instance in the shaders using only what the GPU can provide, because we will talk about using instance and constant buffers next). The GPU can provide you with two things that may be unique between each pass through the vertex shader. They are the instance ID, obtained from using the system value semantic "SV_InstanceID" as input to the shader, which will give you the id of the current instance (this will give you the instance id for the entire copy, not per triangle or vertex), and using a random number function, to get a random number to make each instance different (using a random number in the vertex shader would give you a random number per vertex, so you might not want to do anything with the position using a random number in the vertex shader). Most likely you will want to use more than just the instance id and/or a random number to create uniqueness between each instance, which is why there is an instance buffer and constant buffer, and we will talk about these next.

To get the instance ID, we use the SV_InstanceID semantic, which is a system value (notice the "SV_") that the GPU will provide for us, as input to a shader. Here is an example of a vertex shader that uses the SV_InstanceID semantic as input:


float4 VS(float4 inPos : POSITION, uint instanceID : SV_InstanceID)
{
	inPos.x += instanceID; // Moves the position of each instance along the positive x axis
    
    return inPos;
}

Constant Buffer
I choose constant buffers next, because we already know about constant buffers at this point, so that makes them easier to understand than instance buffers (although instance buffers are almost identical to vertex buffers, which make them also easy to understand at this point). Another way to get unique data per instance, is to use a constant buffer. We can provide the GPU with an array of variables, such as an array of matrices (which we do in this lesson). We can send the array to the shaders constant buffer, and use the instance id of the instance to get the associated element in the instance data array stored in the constant buffer. What i mean is we can do something like:


position = instancePositions[instanceID];

Where "instancePositions[]" is the array we sent to the constant buffer, and "instanceID" is given to us by the GPU when using the SV_InstanceID semantic as the shaders input.


Instance Buffer
The instance buffer is created the exact same way as a vertex buffer. First we create an instance structure (like we create a vertex structure, which we store as an array inside a vertex buffer), which holds the data per instance, such as the instance's position, color, etc. We will create an array of these instance structure objects, and store them in the instance buffer. To use the instance buffer, we have to update the input layout to take data from the instance buffer. This is exactly similar to the way we set up the input layout to take vertex data from the vertex buffer. Here is an example of an input layout:


D3D11_INPUT_ELEMENT_DESC layout[] =
{
	// Data from the vertex buffer
	{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 },  
	{ "TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0 },  
	{ "NORMAL",	 0, DXGI_FORMAT_R32G32B32_FLOAT,    0, 20, D3D11_INPUT_PER_VERTEX_DATA, 0},
    
    // Data from the instance buffer
	{ "INSTANCEPOS", 0, DXGI_FORMAT_R32G32B32_FLOAT,    1, 0, D3D11_INPUT_PER_INSTANCE_DATA, 1},
	{ "INSTANCECOLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT,    1, 12, D3D11_INPUT_PER_INSTANCE_DATA, 1}
};

Notice the differences between the vertex data and the instance data elements. We'll take a look at each parameter that changes specifically between the vertex and instance elements. First notice the semantic names, "INSTANCEPOS" and "INSTANCECOLOR". These are custom semantics. We can name the semantic anything we want as long as there is a corresponding input in the vertex shader. This doesn't change specifically between the vertex and instance data, but i wanted to be clear that it is just a custom semantic name, and is only used to "link" the input element to the corresponding input in the shaders.

Next, look at the fourth parameter. This is the input slot. To use an instance buffer, we will be binding it to the input assembler ALONG WITH the vertex buffer. To do this, we create an array of ID3D11Buffer's that hold our vertex and instance buffers. The fourth slot is set to 0 for the vertex data, because we will be using the 0th element in the buffer array we pass to the input assembler for the vertices, and we set 1 for the instance data, which says we will be using the 1st element (which is actually the second... programming lingo...). That means we have to create the buffer array so that the vertex buffer is at "buffers[0]" and the instance buffer is at "buffers[1]". We are allowed to bind up to 16 (0-15) different buffers per input layout.

Take a look at the fifth parameter. This is the offset from the beginning of the "buffer" that the element is stored at. As we know, we use separate buffers for the vertex and instance data, so our instance data's position element starts at the beginning of the instance buffer, while the vertex data position element starts at the beginning of the vertex buffer.

Next is the "data class" for the element. We have two options here, D3D11_INPUT_PER_VERTEX_DATA and D3D11_INPUT_PER_INSTANCE_DATA. These are pretty self explanatory. D3D11_INPUT_PER_VERTEX_DATA says the element is used "PER VERTEX", so that every vertex passed through the graphics pipeline gets it's own data from the input, while the D3D11_INPUT_PER_INSTANCE_DATA says that the element is used "PER INSTANCE", so that each instance of the geometry passed through the graphics pipeline gets it's own data. When i say it gets its own data, i mean that the input element such as "POSITION" is used PER VERTEX, so that every vertex gets it's own POSITION, while the input element "INSTANCEPOS" is used PER INSTANCE, so that every instance gets it's own "INSTANCEPOS". I hope that's clear enough.

Finally, the last parameter. This is the number of instances that need to be rendered BEFORE moving to the next element in the instance buffer. We will take advantage of this parameter when drawing the leaves for our trees. In this lesson, we have a separate input layout just for our leaves, where the only difference is this parameter. For the trees, we will keep this parameter at 1, because we only want to draw a single tree before moving to the next tree's position. Our instance structure contains only a position, which is the position of our tree. We will create 400 trees, so we will need to have an array of 400 instance structures that we store in the instance buffer. We then use "INSTANCEPOS" to get the position of the trees, one tree at a time. Now, for our leaves, we will be using the same instance buffer, because we want the leaves to be on each tree. We set this last parameter to "numLeavesOnTree", which is the number of leaves we want to draw on each tree (1000 in this lesson). What this will do is draw 1000 leaves BEFORE moving to the next INSTANCEPOS defined in the instance buffer, where each position in the instance buffer is a position of a tree. We will draw 1000 leaves on one tree, then move to the next trees position, and draw 1000 more, and do this until we have drawn onto all of the trees. Make a note that this parameter is only used for instance's, and not for vertex data. We set this parameter to 0 for vertex data as you can see above.


Drawing The Instances

Now all that's left to explain is how to draw instanced geometry. We can draw instanced geometry using one of two methods from the device context, which are DrawInstanced(), and DrawIndexedInstanced(). DrawInstanced() will draw geometry directly from the vertex buffer, while DrawIndexedInstanced() will draw geometry using an index buffer. We will be using an index buffer in this lesson, so we will be calling DrawIndexedInstanced().

DrawIndexedInstanced() takes 5 arguments, which we will explain below:


void DrawIndexedInstanced(
  [in]  UINT IndexCountPerInstance,
  [in]  UINT InstanceCount,
  [in]  UINT StartIndexLocation,
  [in]  INT BaseVertexLocation,
  [in]  UINT StartInstanceLocation
);

IndexCountPerInstance is the number of indices to draw for each instance. Same as when we used DrawIndexed()

InstanceCount is the number of instances we want to draw. In this lesson, we will be drawing numLeavesPerTree * numTrees.

StartIndexLocation is the offset in the index buffer to start drawing from.

BaseVertexLocation is a value added to each index when reading from the vertex buffer. You might have a big vertex buffer with multiple objects in it, and then have separate index buffers for each object. You will want to set this as the position in the vertex buffer of the first vertex used for this current object. An example is you have two quads stored in a vertex buffer, giving you 8 vertices in the vertex buffer. Quad1 uses vertices 0-3, while quad2 uses vertices 4-7. Maybe you have a single index buffer, which uses vertices 0-3. We want to draw Quad2 using this index buffer, so we will set this parameter to 4, so we add 4 to each index value, which draws vertices 4-7 from the vertex buffer.

StartInstanceLocation is a value added to each index per instance. This means that you can actually use different geometry for each instance (although the entire vertex buffer is still passed).

I want to say one last thing that I noticed in the msdn documentation on the DrawIndexedInstanced() function. They say:


Indexing requires multiple vertex buffers: at least one for per-vertex data and a second buffer for per-instance data.

But... haha, that's not exactly right. They say it "requires" multiple buffers, but you can do it with a single vertex buffer, and just use the constant buffer for instance data. In fact, There are times that using the constant buffer is FASTER than using an instance buffer (of course that depends on what your doing). Take a look at this lessons code, you will see we store 1000 matrices in a constant buffer that is only updated once per scene. This way, the matrices are stored on the GPU throughout the scene. We COULD put the matrices into an instance buffer, and send that instance buffer along with the leaf's vertex buffer every time we draw the leaves, but that would lead to pointless data transfer every single frame (sending 1000 matrices to the GPU every frame vs. sending 1000 matrices to the GPU only once per scene). In this lesson, we are still sending an instance buffer along with the vertex buffer for the leaves, because we want to move the leaves to the tree positions. We don't have to do it this way though, we could just send the tree positions as an array to the same constant buffer that is only updated once per scene, and not bind an instance buffer at all when drawing the leaves (or trees). Instance buffers are an important part of instancing though, so I wanted to make sure that we are using one in this lesson.


cbPerObject Constant Buffer

Alright, let's start at the top. First new thing is a couple boolean variables in our cbPerObject. These are used so we can use a single vertex buffer. Usually you will want to have separate vertex buffers for instanced object, but in this lesson we'll keep it simple by using only a single vertex buffer. You will see how these two new variables are used when we get to the effects file.


struct cbPerObject
{
	XMMATRIX  WVP;
	XMMATRIX World;

	//These will be used for the pixel shader
	XMFLOAT4 difColor;
	BOOL hasTexture;
	//Because of HLSL structure packing, we will use windows BOOL
	//instead of bool because HLSL packs things into 4 bytes, and
	//bool is only one byte, where BOOL is 4 bytes
	BOOL hasNormMap;

	/************************************New Stuff****************************************************/
	// Usually you will want to create a separate vertex shader for instanced geometry, however
	// to keep things simple, i use the same vertex shader we have been using, but instead only
	// apply the instance calculations if isInstance is set to true, and the leaf calculations
	// if both isInstance and isLeaf are set to true
	BOOL isInstance;
	BOOL isLeaf;
	/*************************************************************************************************/
};

Some Globals

Here we have a couple goodies! First, we have the number of trees we want to draw in our scene and the number of leaves per tree. On my computer, the lesson runs at about 30 fps, but if you have a slower machine, you might want to take these numbers down a notch. We'll learn how to do frustum culling on the CPU and scene management in a later lesson, so we can still have this many trees and leaves in the scene, but cut down the numbers sent to the GPU, which will speed things up excellently!

Next we create a new constant buffer structure. Remember it is good practice to separate constant buffers depending on how often they are updated. This new constant buffer will only be updated once per scene, because the leaves will not move in this lesson, so it would be pointless to be sending 1000 matrices to the GPU every frame, when we can send it once at the scene initialization, where it will be stored on the GPU throughout the scene. The leafOnTree matrix is the position, scale, and rotation of a leaf in "tree space", which just means that after we apply this transformation matrix to the leaf, we will "add" the trees position to the leafs position.

We will be creating a new input layout for this lesson, which will be used for the leaves, called "leafVertLayout". We'll talk about this new layout when we get to it.

We will be creating an "instance buffer" for this lesson, which is very similar to how we create and use a vertex buffer. I've explained the instance buffer above, so i hope i don't have to go through it all again here ;)

Finally we come to the leaf and tree model stuff. We've covered all this in earlier lessons, but as you can see, the leaf is only a single quad with a texture of a leave on it. The tree is an obj model we load in, so we need the variables and stuff we used when loading an obj model from the obj model loading lesson.


const int numLeavesPerTree = 1000;
const int numTrees = 400;

struct cbPerScene
{
	XMMATRIX leafOnTree[numLeavesPerTree];
};
cbPerScene cbPerInst;

ID3D11Buffer* cbPerInstanceBuffer;
ID3D11InputLayout* leafVertLayout;

struct InstanceData
{
	XMFLOAT3 pos;
};

// leaf data (leaves are drawn as quads)
ID3D11ShaderResourceView* leafTexture;
ID3D11Buffer *quadVertBuffer;
ID3D11Buffer *quadIndexBuffer;

// Tree data (loaded from an obj file)
ID3D11Buffer* treeInstanceBuff;
ID3D11Buffer* treeVertBuff;
ID3D11Buffer* treeIndexBuff;
int treeSubsets = 0;
std::vector<int> treeSubsetIndexStart;
std::vector<int> treeSubsetTexture;
XMMATRIX treeWorld;

Updated Input Layout

An input layout, as we learned from one of the earliest lessons, describes the data we are going to be sending to the Input Assembler. Before we were only sending information PER VERTEX, such as the vertex position, tex coord, normal, etc. Now however, we are now going to send data per vertex AND PER INSTANCE. We are only storing the position of each tree in the instance buffer, so that is all we have to tell the input assembler we are sending for the instance data.


D3D11_INPUT_ELEMENT_DESC layout[] =
{
	{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 },  
	{ "TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0 },  
	{ "NORMAL",	 0, DXGI_FORMAT_R32G32B32_FLOAT,    0, 20, D3D11_INPUT_PER_VERTEX_DATA, 0},
	{ "TANGENT", 0, DXGI_FORMAT_R32G32B32_FLOAT,    0, 32, D3D11_INPUT_PER_VERTEX_DATA, 0},
	/************************************New Stuff****************************************************/
	// Instance elements
	// last parameter (InstanceDataStepRate) is one because we will "step" to the next instance element (INSTANCEPOS) after drawing 1 instance (tree)
	{ "INSTANCEPOS", 0, DXGI_FORMAT_R32G32B32_FLOAT,    1, 0, D3D11_INPUT_PER_INSTANCE_DATA, 1}
	/*************************************************************************************************/
};

Leaf Input Layout

This is the new input layout for the leaf we will be drawing. It is almost identical to the input layout we have above, the difference though, is the last parameter of the "INSTANCEPOS" element. We set this last parameter to "numLeavesPerTree". We will be using the exact same instance buffer as we do for the trees. The instance buffer stores all the tree positions, and we will need to move the leaves to the tree positions. We set the last parameter of this element to "numLeavesPerTree" because we want to draw all 1000 leaves for a tree, before moving to the next tree position ("INSTANCEPOS").


D3D11_INPUT_ELEMENT_DESC leafLayout[] =
{
	{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 },  
	{ "TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0 },  
	{ "NORMAL",	 0, DXGI_FORMAT_R32G32B32_FLOAT,    0, 20, D3D11_INPUT_PER_VERTEX_DATA, 0},
	{ "TANGENT", 0, DXGI_FORMAT_R32G32B32_FLOAT,    0, 32, D3D11_INPUT_PER_VERTEX_DATA, 0},
	// Instance elements
	// last parameter (InstanceDataStepRate) is set to the number of leaves per tree. InstanceDataStepRate is the number
	// of instances to draw before moving on to the next element in the instance buffer, in this case, the next tree position.
	// We want to make sure that ALL the leaves are drawn for the current tree before moving to the next trees position
	{ "INSTANCEPOS", 0, DXGI_FORMAT_R32G32B32_FLOAT,    1, 0, D3D11_INPUT_PER_INSTANCE_DATA, numLeavesPerTree}
};
UINT numLeafElements = ARRAYSIZE(leafLayout);

Loading the Tree Object Model and Computing Tree Positions

Luckily, we've already covered loading .obj models in a previous lesson, so we don't have to go through all that. We'll start with what's new for this lesson, which is creating the tree positions. In this lesson, we have 400 trees. We make a loop that loops 400 times, and gives the trees a random position between (-100, 0, -100) and (100, 0, 100). We then store that position in an InstanceData array called inst.

	
	// Load in our tree model
	if(!LoadObjModel(L"tree.obj", &treeVertBuff, &treeIndexBuff, treeSubsetIndexStart, treeSubsetTexture, material, treeSubsets, true, true))
		return false;

	// Set up the tree positions then instance buffer
	std::vector<InstanceData> inst(numTrees);
	XMVECTOR tempPos;
	srand(100);
	// We are just creating random positions for the trees, between the positions of (-100, 0, -100) to (100, 0, 100)
	// then storing the position in our instanceData array
	for(int i = 0; i < numTrees; i++)
	{
		float randX = ((float)(rand() % 2000) / 10) - 100;
		float randZ = ((float)(rand() % 2000) / 10) - 100;
		tempPos = XMVectorSet(randX, 0.0f, randZ, 0.0f);

		XMStoreFloat3(&inst[i].pos, tempPos);
	}

Creating the Instance Buffer

Now we'll create the instance buffer, which will hold our array of InstanceData objects. What's nice about this, is it's exactly the same as creating a vertex buffer, but instead of storing a Vertex array, we are going to store an InstanceData array. That's the only difference here.

The last thing we do with the tree initialization, is create it's world matrix. We will keep this as an identity matrix, meaning it does not change the trees in any way, and they will start at the point (0,0,0) in world space BEFORE they get translated to their positions that are defined in the instance buffer.


	// Create our trees instance buffer
	// Pretty much the same thing as a regular vertex buffer, except that this buffers data
	// will be used per "instance" instead of per "vertex". Each instance of the geometry
	// gets it's own instanceData data, similar to how each vertex of the geometry gets its own
	// Vertex data
	D3D11_BUFFER_DESC instBuffDesc;	
	ZeroMemory( &instBuffDesc, sizeof(instBuffDesc) );

	instBuffDesc.Usage = D3D11_USAGE_DEFAULT;
	instBuffDesc.ByteWidth = sizeof( InstanceData ) * numTrees;
	instBuffDesc.BindFlags = D3D11_BIND_VERTEX_BUFFER;
	instBuffDesc.CPUAccessFlags = 0;
	instBuffDesc.MiscFlags = 0;

	D3D11_SUBRESOURCE_DATA instData;
	ZeroMemory( &instData, sizeof(instData) );

	instData.pSysMem = &inst[0];
	hr = d3d11Device->CreateBuffer( &instBuffDesc, &instData, &treeInstanceBuff);

	// The tree's world matrix (We will keep it an identity matrix, but we could change their positions without
	// unrealistic effects, since remember that all transformations are done around the point (0,0,0), and we will
	// be applying this world matrix to our trees AFTER they have been individually positioned depending on the
	// instance buffer, which means they will not be centered at the point (0,0,0))
	treeWorld = XMMatrixIdentity();

Creating the Leaf

In this lesson, we are going to be drawing the leaf onto a quad. We already know how to create a quad and make a texture from previous lessons, so you should be able to see what this is all about.


	// Create Leaf geometry (quad)
	Vertex v[] =
	{
		// Front Face
		Vertex(-1.0f, -1.0f, -1.0f, 0.0f, 1.0f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f),
		Vertex(-1.0f,  1.0f, -1.0f, 0.0f, 0.0f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f),
		Vertex( 1.0f,  1.0f, -1.0f, 1.0f, 0.0f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f),
		Vertex( 1.0f, -1.0f, -1.0f, 1.0f, 1.0f, 0.0f, 0.0f, -1.0f, 0.0f, 0.0f, 0.0f),
	};

	DWORD indices[] = {
		// Front Face
		0,  1,  2,
		0,  2,  3,
	};

	D3D11_BUFFER_DESC indexBufferDesc;
	ZeroMemory( &indexBufferDesc, sizeof(indexBufferDesc) );

	indexBufferDesc.Usage = D3D11_USAGE_DEFAULT;
	indexBufferDesc.ByteWidth = sizeof(DWORD) * 2 * 3;
	indexBufferDesc.BindFlags = D3D11_BIND_INDEX_BUFFER;
	indexBufferDesc.CPUAccessFlags = 0;
	indexBufferDesc.MiscFlags = 0;

	D3D11_SUBRESOURCE_DATA iinitData;

	iinitData.pSysMem = indices;
	d3d11Device->CreateBuffer(&indexBufferDesc, &iinitData, &quadIndexBuffer);


	D3D11_BUFFER_DESC vertexBufferDesc;
	ZeroMemory( &vertexBufferDesc, sizeof(vertexBufferDesc) );

	vertexBufferDesc.Usage = D3D11_USAGE_DEFAULT;
	vertexBufferDesc.ByteWidth = sizeof( Vertex ) * 4;
	vertexBufferDesc.BindFlags = D3D11_BIND_VERTEX_BUFFER;
	vertexBufferDesc.CPUAccessFlags = 0;
	vertexBufferDesc.MiscFlags = 0;

	D3D11_SUBRESOURCE_DATA vertexBufferData; 

	ZeroMemory( &vertexBufferData, sizeof(vertexBufferData) );
	vertexBufferData.pSysMem = v;
	hr = d3d11Device->CreateBuffer( &vertexBufferDesc, &vertexBufferData, &quadVertBuffer);

	// Now we load in the leaf texture
	hr = D3DX11CreateShaderResourceViewFromFile( d3d11Device, L"leaf.png",
		NULL, NULL, &leafTexture, NULL );

Creating the Leaf "Tree Space" Matrix Array

I know i've already talked about what I mean when I say "Tree Space", but i'll do it again just to make sure your clear about how we do what we're doing in this lesson. When the leaves are first sent to the graphics pipeline (vertex shader, etc.), they will be positioned at the point (0,0,0) in world space. We will then transform the leaves using this matrix array (specifically using the matrix in the array that corresponds to the current leaf). After they have been transformed, we will add the tree's position to the leafs position (The current tree we are rendering the leaves for). Finally, we will apply the tree's world matrix (only because it's already an identity matrix), which won't do anything at all to the leaf, because it's an identity matrix. So when I say "tree space" matrix, we are creating the matrix that transform's the leaf RELATIVE to the tree's position. It's actually very simple, and if you already completely understood this idea, but after reading this you are confused, i'm really sorry ;)

Now, to create the leafs tree space matrix, We will rotate the leaf, translate the leaf along the x axis, then rotate the leaf again. This will make the leaf "spin" around it's own center, then by moving the leaf along the x axis (the distance we want from the center of the tree) and rotating again, we will now be "orbiting" the leaf around the center of the tree.

Because there is less space at the center of the leaf mass, than there is on the outer edge of the leaf mass, you will see that the leaves "bunch up" at the center. We don't want this, and instead would rather the edge of the mass to be more dense. when you look at how we are using the rand() function, you will see we first get a random number between 0 and 999. We then divide this by 250.0f which will give us a float value between 0 and 4. We want a distance of 4 to be the max because that is about how far the tree's branches extend from the center of the tree. We subtract this number from 6, so that now the distance is between 2 and 6. Now there is not really a bunch at the center of the mass (and in fact, there are no leaves at all for the first two unit radius space at the center of the mass ;). However, the leaf mass radius is 2 units too far from the tips of the branches, and if you rendered it like this, you will also see that the edge of the leaf mass is still very "weak", and the leaves are too spaced out, and the leaves are more bunched at the center of the mass still. So, we check to see if the distance for a leaf is greater than 4, and if it is, we set it at 4. Now, all the leaves that were further than 4 units from the tree, are now exactly 4 units from the tree, which will give us a higher density at the edge of our leaf mass, which is exactly what we wanted!

Now we apply the rotation to the leaf, which will give us the effect of a spherical mass of leaves!

We are not quite done yet though, because as it is at this point, we have a sphere of leaves. We want more of a half circle of leaves, so all we do is check if a leaf is below 0 (or close to it (1.0f)) on the y axis, and if it is, we just negate the y axis, so the leaf gets moved up into the half sphere mass of leaves.

Now we create the new position from the distance from the tree and the rotation applied to that distance vector. And then we apply all the transformations on a temporary matrix.

Lastly, we store that temporary matrix into our constant buffer structure array that gets updated a single time per scene (cbPerScene).


	// Here we create the leaf world matrices, that will be the leafs
	// position and orientation on the tree each individual tree. We will create an array of matrices
	// for the leaves that we will send to the shaders in the cbPerInstance constant buffer
	// This matrix array is used "per tree", so that each tree gets the exact same number of leaves,
	// with the same orientation, position, and scale as all of the other trees
	// Start by initializing the matrix array
	srand(100);
	XMFLOAT3 fTPos;
	XMMATRIX rotationMatrix;
	XMMATRIX tempMatrix;
	for(int i = 0; i < numLeavesPerTree; i++)
	{
		float rotX =(rand() % 2000) / 500.0f; // Value between 0 and 4 PI (two circles, makes it slightly more mixed)
		float rotY = (rand() % 2000) / 500.0f;
		float rotZ = (rand() % 2000) / 500.0f;

		// the rand() function is slightly more biased towards lower numbers, which would make the center of
		// the leaf "mass" be more dense with leaves than the outside of the "sphere" of leaves we are making.
		// We want the outside of the "sphere" of leaves to be more dense than the inside, so the way we do this
		// is getting a distance value between 0 and 4, we then subtract that value from 6, so that the very center
		// does not have any leaves. then below you can see we are checking to see if the distance is greater than 4
		// (because the tree branches are approximately 4 units radius from the center of the tree). If the distance
		// is greater than 4, then we set it at 4, which will make the edge of the "sphere" of leaves more densly
		// populated than the center of the leaf mass
		float distFromCenter = 6.0f - ((rand() % 1000) / 250.0f);	

		if(distFromCenter > 4.0f)
			distFromCenter = 4.0f;

		// Now we create a vector with the length of distFromCenter, by simply setting it's x component as distFromCenter.
		// We will now rotate the vector, which will give us the "sphere" of leaves after we have rotated all the leaves.
		// We do not want a perfect sphere, more like a half sphere to cover the branches, so we check to see if the y
		// value is less than -1.0f (giving us slightly more than half a sphere), and if it is, negate it so it is reflected
		// across the xz plane
		tempPos = XMVectorSet(distFromCenter, 0.0f, 0.0f, 0.0f);
		rotationMatrix = XMMatrixRotationRollPitchYaw(rotX, rotY, rotZ);
		tempPos = XMVector3TransformCoord(tempPos, rotationMatrix );

		if(XMVectorGetY(tempPos) < -1.0f)
			tempPos = XMVectorSetY(tempPos, -XMVectorGetY(tempPos));

		// Now we create our leaves "tree" matrix (this is not the leaves "world matrix", because we are not
		// defining the leaves position, orientation, and scale in world space, but instead in "tree" space
		XMStoreFloat3(&fTPos, tempPos);

		Scale = XMMatrixScaling( 0.25f, 0.25f, 0.25f );
		Translation = XMMatrixTranslation(fTPos.x, fTPos.y + 8.0f, fTPos.z );
		tempMatrix = Scale * rotationMatrix * Translation;

		// To make things simple, we just store the matrix directly into our cbPerInst structure
		cbPerInst.leafOnTree[i] = XMMatrixTranspose(tempMatrix);
	}

Creating the Leaf Input Layout

Here we create our leaf's input layout, which we will bind to the IA (Input Assembler) before we draw our leaf.


	hr = d3d11Device->CreateInputLayout( leafLayout, numLeafElements, VS_Buffer->GetBufferPointer(), 
		VS_Buffer->GetBufferSize(), &leafVertLayout );

Creating the Constant Buffer (cbPerScene)

We already know how to create constant buffers, so there's not much to say here


	//Create the buffer to send to the cbuffer per instance in effect file
	ZeroMemory(&cbbd, sizeof(D3D11_BUFFER_DESC));

	cbbd.Usage = D3D11_USAGE_DEFAULT;
	// We have already defined how many elements are in our leaf matrix array inside the cbPerScene structure,
	// so we only need the size of the entire structure here, because the number of leaves per tree will not
	// change throughout the scene.
	cbbd.ByteWidth = sizeof(cbPerScene);
	cbbd.BindFlags = D3D11_BIND_CONSTANT_BUFFER;
	cbbd.CPUAccessFlags = 0;
	cbbd.MiscFlags = 0;

	hr = d3d11Device->CreateBuffer(&cbbd, NULL, &cbPerInstanceBuffer);

Updating cbPerScene

This constant buffer is only updated a single time per scene, so we can do this update while initializing the scene. The data we update the constant buffer with will stay on the GPU until we are through with the scene. We do this so that we do not update the buffer every time we draw our leaves, since our leaves are not going to change positions. However, if you wanted to animate the leaves or whatever, you will have to update this buffer (or another buffer with the animation matrix) more than once per scene, and most likely every frame.


	d3d11DevCon->UpdateSubresource( cbPerInstanceBuffer, 0, NULL, &cbPerInst, 0, 0);

Drawing the Leaves

We have already explained most of the new stuff here that applies to instancing in the above overview, so i won't spend a lot of time here. All we are doing here, is drawing a quad and texturing it with the leaf texture. We can supply the shaders with both the vertex and instance buffers in one of two ways. The first way, which we do here, is create an array of buffers that store the vertex in the 0th (vertInstBuffers[0]) element, and the instance in the 1st element (vertInstBuffers[1]), and just pass this array of buffers to the function (IASetVertexBuffers) that binds the buffers to the IA. The second approach, is to bind the buffers separately, by calling IASetVertexBuffers twice, one for each buffer. If you do it this way, you will have to make sure they are bound to separate "slots" (first parameter of IASetVertexBuffers()). bind the vertex buffer to slot 0, and the instance buffer to slot 1.

We set the input layout to the leaf's input layout before drawing the leaf, and set it back to the default input layout after calling the draw function.

We want to see both sides of the leaf, so we turn off backface culling.

To draw the leaf, we call DrawIndexedInstanced() and tell it we want "numLeavesPerTree * numTrees" instances.


	///***Draw INSTANCED Leaf Models***///
	// We are now binding two buffers to the input assembler, one for the vertex data,
	// and one for the instance data, so we will have to create a strides array, offsets array
	// and buffer array.
	UINT strides[2] = {sizeof( Vertex ), sizeof( InstanceData )};
	UINT offsets[2] = {0, 0};

	// Store the vertex and instance buffers into an array
	// The leaves will use the same instance buffer as the trees, because we need each leaf
	// to go to a certain tree
	ID3D11Buffer* vertInstBuffers[2] = {quadVertBuffer, treeInstanceBuff};

	// Set the leaf input layout. This is where we will set our special input layout for our leaves
	d3d11DevCon->IASetInputLayout( leafVertLayout );

	//Set the models index buffer (same as before)
	d3d11DevCon->IASetIndexBuffer(quadIndexBuffer, DXGI_FORMAT_R32_UINT, 0);

	//Set the models vertex and isntance buffer using the arrays created above
	d3d11DevCon->IASetVertexBuffers( 0, 2, vertInstBuffers, strides, offsets );

	//Set the WVP matrix and send it to the constant buffer in effect file
	WVP = treeWorld * camView * camProjection;
	cbPerObj.WVP = XMMatrixTranspose(WVP);	
	cbPerObj.World = XMMatrixTranspose(treeWorld);		
	cbPerObj.hasTexture = true;		// We'll assume all md5 subsets have textures
	cbPerObj.hasNormMap = false;	// We'll also assume md5 models have no normal map (easy to change later though)
	cbPerObj.isInstance = true;		// Tell shaders if this is instanced data so it will know to use instance data or not
	cbPerObj.isLeaf = true;		// Tell shaders if this is the leaf instance so it will know to the cbPerInstance data or not
	d3d11DevCon->UpdateSubresource( cbPerObjectBuffer, 0, NULL, &cbPerObj, 0, 0 );	

	// We are sending two constant buffers to the vertex shader now, wo we will create an array of them
	ID3D11Buffer* vsConstBuffers[2] = {cbPerObjectBuffer, cbPerInstanceBuffer};
	d3d11DevCon->VSSetConstantBuffers( 0, 2, vsConstBuffers );
	d3d11DevCon->PSSetConstantBuffers( 1, 1, &cbPerObjectBuffer );
	d3d11DevCon->PSSetShaderResources( 0, 1, &leafTexture );
	d3d11DevCon->PSSetSamplers( 0, 1, &CubesTexSamplerState );

	d3d11DevCon->RSSetState(RSCullNone);
	d3d11DevCon->DrawIndexedInstanced( 6, numLeavesPerTree * numTrees, 0, 0, 0 );

	// Reset the default Input Layout
	d3d11DevCon->IASetInputLayout( vertLayout );

Drawing the Tree

Now we draw our tree model, loaded in from an obj file. If you don't remember how to draw the obj model loaded in, you can go back to the lesson on loading obj models. The only difference here when drawing the tree from drawing a regular obj model, is that this tree will be instanced, so we bind two buffers to the IA (vertex and instance buffers), and call DrawIndexedInstanced() instead of DrawIndexed().


	/////Draw our tree instances/////
	for(int i = 0; i < treeSubsets; ++i)
	{
		// Store the vertex and instance buffers into an array
		ID3D11Buffer* vertInstBuffers[2] = {treeVertBuff, treeInstanceBuff};

		//Set the models index buffer (same as before)
		d3d11DevCon->IASetIndexBuffer(treeIndexBuff, DXGI_FORMAT_R32_UINT, 0);
		//Set the models vertex buffer
		d3d11DevCon->IASetVertexBuffers( 0, 2, vertInstBuffers, strides, offsets );

		//Set the WVP matrix and send it to the constant buffer in effect file
		WVP = treeWorld * camView * camProjection;
		cbPerObj.WVP = XMMatrixTranspose(WVP);	
		cbPerObj.World = XMMatrixTranspose(treeWorld);	
		cbPerObj.difColor = material[treeSubsetTexture[i]].difColor;
		cbPerObj.hasTexture = material[treeSubsetTexture[i]].hasTexture;
		cbPerObj.hasNormMap = material[treeSubsetTexture[i]].hasNormMap;
		cbPerObj.isInstance = true;		// Tell shaders if this is instanced data so it will know to use instance data or not
		cbPerObj.isLeaf = false;		// Tell shaders if this is the leaf instance so it will know to the cbPerInstance data or not
		d3d11DevCon->UpdateSubresource( cbPerObjectBuffer, 0, NULL, &cbPerObj, 0, 0 );
		d3d11DevCon->VSSetConstantBuffers( 0, 1, &cbPerObjectBuffer );
		d3d11DevCon->PSSetConstantBuffers( 1, 1, &cbPerObjectBuffer );
		if(material[treeSubsetTexture[i]].hasTexture)
			d3d11DevCon->PSSetShaderResources( 0, 1, &meshSRV[material[treeSubsetTexture[i]].texArrayIndex] );
		if(material[treeSubsetTexture[i]].hasNormMap)
			d3d11DevCon->PSSetShaderResources( 1, 1, &meshSRV[material[treeSubsetTexture[i]].normMapTexArrayIndex] );
		d3d11DevCon->PSSetSamplers( 0, 1, &CubesTexSamplerState );

		d3d11DevCon->RSSetState(RSCullNone);
		int indexStart = treeSubsetIndexStart[i];
		int indexDrawAmount =  treeSubsetIndexStart[i+1] - treeSubsetIndexStart[i];
		if(!material[meshSubsetTexture[i]].transparent)
			d3d11DevCon->DrawIndexedInstanced( indexDrawAmount, numTrees, indexStart, 0, 0 );
	}

Effects File

We start our Effects file off by declaring a constant. This constant will be used for two things. The first is to initialize the leaf matrix in the cbPerScene buffer with the number of leaves per tree, and the second is to find which tree we are currently drawing leaves for.


#define NUM_LEAVES_PER_TREE 1000

This is the updated cbPerObject buffer. We now have two more boolean variables, which are used in the vertex shader to decide whether we need to do instance work on the vertices or not.


cbuffer cbPerObject
{
	float4x4 WVP;
    float4x4 World;

	float4 difColor;
	bool hasTexture;
	bool hasNormMap;

	bool isInstance;
	bool isLeaf;
};

Our new cbPerScene buffer. This buffer holds an array of float4x4's (matrices). We initialize this array with the number of leaves per tree we defined at the top of the effects file.

It is not possible to create dynamic arrays in the shader file, so the way around this is to make sure you initialize the array with the maximum number of elements you will use. On current directx 10/11 compatible devices, the limit for a float4 is 4096. We are using a float4x4, so the maximum limit is 1024.


cbuffer cbPerScene
{
	float4x4 leafOnTree[NUM_LEAVES_PER_TREE];
};

Here is our Vertex Shader. We have added two new inputs to the vertex shader. The first is a custom input (INSTANCEPOS), while the other is a system value (SV_InstanceID). A system value is an input that the GPU will provide you with. You can look online for all the system value semantics. We are using SV_InstanceID so we can find the current tree we should be drawing the leaf instance for, then using that current tree id to find the current leaf in the tree we are drawing (leaf 0 to 999), so we can get the leafs matrix from the matrix array stored in cbPerScene.

We transform the leaf using the leaf matrix from the matrix array, which will be it's position, orientation, and scale in the tree, then we add the current tree's position, taken from instancePos, which is the position vector stored in our instance buffer.


VS_OUTPUT VS(float4 inPos : POSITION, float2 inTexCoord : TEXCOORD, float3 normal : NORMAL, float3 tangent : TANGENT, float3 instancePos : INSTANCEPOS, uint instanceID : SV_InstanceID)
{
    VS_OUTPUT output;

	if(isInstance)
	{
		// get leaves position on tree, then add trees position
		if(isLeaf)
		{
			// We have 1000 leaves per tree, so we can find the current leaf (in the tree) we are on (so we can get it's matrix from the matrix array stored in cbPerScene)
            // by first getting the current tree (instanceID / NUM_LEAVES_PER_TREE). We can then find the current leaf in the tree we are on by multiplying the current tree id
            // with the number of leaves per tree, then subtracting that total from the current instance id.
            uint currTree = (instanceID / NUM_LEAVES_PER_TREE);
			uint currLeafInTree = instanceID - (currTree * NUM_LEAVES_PER_TREE);
			inPos = mul(inPos, leafOnTree[currLeafInTree]);
		}

		// set position using instance data
		inPos += float4(instancePos, 0.0f);
	}

    output.Pos = mul(inPos, WVP);
	output.worldPos = mul(inPos, World);

	output.normal = mul(normal, World);

	output.tangent = mul(tangent, World);

    output.TexCoord = inTexCoord;

    return output;
}

That's all there is to it! Let me know if you find any mistakes in my explanations, or things I REALLY should have done differently in the code (I know there are many things i should do differently, but most of them are besides the point of the lesson ;)

Exercise:

1. Try animating the leaves, such as rotating them, or even having some of them kinda fall down from the trees!

2. Just play with the whole instance idea, and post on the forum with any cool demos you might come up with!

>> Download Source Code <<
<<-- Simple Third Person Camera
(AABB) Frustum Culling -->>



- Comments will not be seen by public -

- Please be sure to put your email in the message if you'd like a response -

whats 7 - 3:
Name:
Message: