Running on Empty

The few things I know, I like to share.

XNA Framework GameEngine Development. (Part 19, Hardware Instancing PC Only)

Introduction

Welcome to Part19 of the XNA FrameWork GameEngine Development series.  In this article I will discuss the magic that is Hardware instancing.  Recent comments have expressed some concerns about poor performance in the engine.  One way to improve performance in the engine is to support instancing.

part19.jpg

Hardware instancing

This is by far the easiest method of instancing, I will discuss shader and vertex fetch instancing in later articles.  Essentially hardware instancing tells the graphics card that you want to draw an object multiple times in the same draw call.  Obviously, this will improve performance since one of the slowest calls you will ever make to a graphics device is draw.  We should always look to draw the most geometry in the fewest draw calls possible.

hardwareinstancing.png

Image courtesy MS XNA Team Site

Since shader model 3.0 we have at our disposal the concept of hardware instancing.  This method of instancing allows us to set up a vertex and index buffer to be drawn multiple times using an array of matrices.  This means pretty much any piece of complex geometry that uses the same effect settings can and should be drawn at the same time.

Starting the Magic

We need a way to identify a normal Model from an instanced model.  I do this by adding a few properties to the RoeModel class.  The ModelParts list will contain information about what else… model parts, and the Instanced proeprty will define if the model was created for instancing.  Finally, using the constructor parameter instanced will allow us to load the content and set up the model parts for instance rendering.

using System;
using System.Collections.Generic;
using System.Text;
using RoeEngine2.Interfaces;
using Microsoft.Xna.Framework.Graphics;
using RoeEngine2.Managers;

namespace RoeEngine2.Models
{
    public class RoeModel : IRoeModel
    {
        public List<RoeModelPart> ModelParts = new List<RoeModelPart>();

        private bool _instanced;
        public bool Instanced
        {
            get { return _instanced; }
        }
 
        private string _fileName;
        /// <summary>
        /// The file name of the asset.
        /// </summary>
        public string FileName
        {
            get { return _fileName; }
            set { _fileName = value; }
        }

        private Model _baseModel;
        public Model BaseModel
        {
            get { return _baseModel; }
        }

        private bool _readyToRender = false;
        ///<summary>
        ///Is the texture ready to be rendered.
        ///</summary>
        public bool ReadyToRender
        {
            get { return _readyToRender; }
        }

        public RoeModel(string fileName, bool instanced)
        {
            _instanced = instanced;
            _fileName = fileName;
        }

        /// <summary>
        /// Construct a new RoeModel.
        /// </summary>
        /// <param name="fileName">The asset file name.</param>
        public RoeModel(string fileName)
        {
            _fileName = fileName;
        }

        public void LoadContent()
        {
            if (!String.IsNullOrEmpty(_fileName))
            {
                _baseModel = EngineManager.ContentManager.Load<Model>(_fileName);
                if (_instanced)
                {
                    foreach (ModelMesh mesh in _baseModel.Meshes)
                    {
                        foreach (ModelMeshPart part in mesh.MeshParts)
                        {
                            ModelParts.Add(new RoeModelPart(part, mesh.VertexBuffer, mesh.IndexBuffer));
                        }
                    }
                }
                _readyToRender = true;
            }
        }
    }
}

Most of the work will happen in the model parts clss, but here you can see how a model is loaded as normal, then if it is instanced we do additional work to load the specific parts.  Please note, we do not have to use any custom processors to create the model parts.  For me this is a huge improvement over the example given on the team development site.

Backstage pass

This is were all the work takes place, here in the RoeModelPart class.  Each mesh part will be set up for prime GPU processing here, plus we will extend the vertex declaration to include the instancing magic.

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.Xna.Framework.Graphics;
using RoeEngine2.Managers;

namespace RoeEngine2.Models
{
    public class RoeModelPart
    {
        private int _primitiveCount;
        public int PrimitiveCount
        {
            get { return _primitiveCount; }
        }

        private int _vertexCount;
        public int VertexCount
        {
            get { return _vertexCount; }
        }

        private int _vertexStride;
        public int VertexStride
        {
            get { return _vertexStride; }
        }

        private VertexDeclaration _vertexDeclaration;
        public VertexDeclaration VertexDeclartion
        {
            get { return _vertexDeclaration; }
        }

        private VertexBuffer _vertexBuffer;
        public VertexBuffer VertexBuffer
        {
            get { return _vertexBuffer; }
        }

        private IndexBuffer _indexBuffer;
        public IndexBuffer IndexBuffer
        {
            get { return _indexBuffer; }
        }

        VertexElement[] originalVertexDeclaration;
 
        internal RoeModelPart(ModelMeshPart part, VertexBuffer vertexBuffer, IndexBuffer indexBuffer)
        {
            _primitiveCount = part.PrimitiveCount;
            _vertexCount = part.NumVertices;
            _vertexStride = part.VertexStride;
            _vertexDeclaration = part.VertexDeclaration;

            _vertexBuffer = vertexBuffer;
            _indexBuffer = indexBuffer;

            originalVertexDeclaration = part.VertexDeclaration.GetVertexElements();

            InitializeHardwareInstancing();
        }

        private void InitializeHardwareInstancing()
        {
            // When using hardware instancing, the instance transform matrix is
            // specified using a second vertex stream that provides 4x4 matrices
            // in texture coordinate channels 1 to 4. We must modify our vertex
            // declaration to include these channels.
            VertexElement[] extraElements = new VertexElement[4];

            short offset = 0;
            byte usageIndex = 1;
            short stream = 1;

            const int sizeOfVector4 = sizeof(float) * 4;

            for (int i = 0; i < extraElements.Length; i++)
            {
                extraElements&#91;i&#93; = new VertexElement(stream, offset,
                                                VertexElementFormat.Vector4,
                                                VertexElementMethod.Default,
                                                VertexElementUsage.TextureCoordinate,
                                                usageIndex);

                offset += sizeOfVector4;
                usageIndex++;
            }

            ExtendVertexDeclaration(extraElements);
        }

        private void ExtendVertexDeclaration(VertexElement&#91;&#93; extraElements)
        {
            // Get rid of the existing vertex declaration.
            _vertexDeclaration.Dispose();

            // Append the new elements to the original format.
            int length = originalVertexDeclaration.Length + extraElements.Length;

            VertexElement&#91;&#93; elements = new VertexElement&#91;length&#93;;

            originalVertexDeclaration.CopyTo(elements, 0);

            extraElements.CopyTo(elements, originalVertexDeclaration.Length);

            // Create a new vertex declaration.
            _vertexDeclaration = new VertexDeclaration(EngineManager.Device, elements);
        }
    }
}&#91;/sourcecode&#93;

<strong>The Beautiful Assistant</strong>

Now that we have a mesh broken up into its basic parts for fast GPU rendering, we must find a way to store matrices in a simple and easy to manage location.  I accomplish this using the SceneGraphManager.  This class will now support a dictionary keyed by string and support a list of matrices.  In addition, we also need a way to load up the dictionary and finally draw the instances.

        private static Dictionary<string, List<Matrix>> instanceMatrices;

        private static void DrawInstances(GameTime gameTime)
        {
            instancedshaderEffect effect = ShaderManager.GetShader("instance") as instancedshaderEffect;
            if (effect.ReadyToRender)
            {
                effect.View = CameraManager.ActiveCamera.View;
                effect.Projection = CameraManager.ActiveCamera.Projection;

                foreach (string key in instanceMatrices.Keys)
                {
                    RoeModel model = ModelManager.GetModel(key) as RoeModel;
                    if (model.ReadyToRender)
                    {
                        Model meshModel = model.BaseModel;
                        
                        foreach (RoeModelPart part in model.ModelParts)
                        {
                            EngineManager.Device.VertexDeclaration = part.VertexDeclartion;
                            EngineManager.Device.Vertices[0].SetSource(part.VertexBuffer, 0, part.VertexStride);
                            EngineManager.Device.Indices = part.IndexBuffer;
                            effect.BaseEffect.Begin();
                            foreach (EffectPass pass in effect.BaseEffect.CurrentTechnique.Passes)
                            {
                                pass.Begin();
                                DrawHardwareInstancing(instanceMatrices[key].ToArray(), part.VertexCount, part.PrimitiveCount);
                                pass.End();
                            }
                            effect.BaseEffect.End();
                        }
                    }
                }
            }
        }

        private static void DrawHardwareInstancing(Matrix[] matrix, int vertexCount, int primitiveCount)
        {
            const int sizeofMatrix = sizeof(float) * 16;
            int instanceDataSize = sizeofMatrix * matrix.Length;

            DynamicVertexBuffer instanceDataStream = new DynamicVertexBuffer(EngineManager.Device,
                                                                             instanceDataSize,
                                                                             BufferUsage.WriteOnly);

            instanceDataStream.SetData(matrix, 0, matrix.Length, SetDataOptions.Discard);

            VertexStreamCollection vertices = EngineManager.Device.Vertices;

            vertices[0].SetFrequencyOfIndexData(matrix.Length);

            vertices[1].SetSource(instanceDataStream, 0, sizeofMatrix);
            vertices[1].SetFrequencyOfInstanceData(1);

            EngineManager.Device.DrawIndexedPrimitives(PrimitiveType.TriangleList,
                                                       0, 0, vertexCount, 0, primitiveCount);

            // Reset the instancing streams.
            vertices[0].SetSource(null, 0, 0);
            vertices[1].SetSource(null, 0, 0);
        }

MisDirection

Now that we have a simple way to store instance objects and render them, we need to load up the dictionary.  This is done in the SceneObjectNode class.  As usual, we do not want to render objects that are culled.  In addition, if we are using instancing we need to bypass all of the occlusion work.  This is a tradeoff that we have to make, either we use occlusion or we use instancing.

        public override void DrawCulling(GameTime gameTime)
        {
            if (SceneObject is IRoeCullable)
            {
                ((IRoeCullable)SceneObject).Culled = false;
                if (CameraManager.ActiveCamera.Frustum.Contains(((IRoeCullable)SceneObject).GetBoundingBoxTransformed()) == ContainmentType.Disjoint)
                {
                    ((IRoeCullable)SceneObject).Culled = true;
                }
                else if (ModelManager.GetModel(SceneObject.ModelName).Instanced)
                {
                    SceneGraphManager.AddInstance(SceneObject.ModelName, SceneObject.World);
                }
                else
                {
                    SceneObject.DrawCulling(gameTime);
                }
            }
        }

        public override void Draw(GameTime gameTime)
        {
            if (SceneObject.ModelName != null && ModelManager.GetModel(SceneObject.ModelName).Instanced)
            {
                return;
            }
            else if (SceneObject is IRoeCullable && ((IRoeCullable)SceneObject).Culled)
            {
                SceneGraphManager.Culled++;
            }
            else if (SceneObject is IRoeOcclusion && ((IRoeOcclusion)SceneObject).Occluded)
            {
                SceneGraphManager.Occluded++;
            }
            else
            {
                SceneObject.Draw(gameTime);
            }
        }

The Grand Finale

Finally, our dictionary is loaded and we are ready to draw, time for some shader work.

#define MAX_SHADER_MATRICES 60

// Array of instance transforms used by the VFetch and ShaderInstancing techniques.
float4x4 instanceTransforms[MAX_SHADER_MATRICES];

// Camera settings.
float4x4 view;
float4x4 projection;

// This sample uses a simple Lambert lighting model.
float3 lightDirection = normalize(float3(-1, -1, -1));
float3 diffuseLight = 1.25;
float3 ambientLight = 0.25;

struct VS_INPUT
{
 float4 Position : POSITION0;
 float3 Normal : NORMAL;
 float2 TexCoord : TEXCOORD0;
};

struct VS_OUTPUT
{
 float4 Position     : POSITION;
 float4 Color  : COLOR0;
 float2 TexCoord : TEXCOORD0;
};

VS_OUTPUT VertexShaderCommon(VS_INPUT input, float4x4 instanceTransform)
{
    VS_OUTPUT output;

    // Apply the world and camera matrices to compute the output position.
    float4 worldPosition = mul(input.Position, instanceTransform);
    float4 viewPosition = mul(worldPosition, view);
    output.Position = mul(viewPosition, projection);

    // Compute lighting, using a simple Lambert model.
    float3 worldNormal = mul(input.Normal, instanceTransform);
    
    float diffuseAmount = max(-dot(worldNormal, lightDirection), 0);
    
    float3 lightingResult = saturate(diffuseAmount * diffuseLight + ambientLight);
    
    output.Color = float4(lightingResult, 1);

    // Copy across the input texture coordinate.
    output.TexCoord = input.TexCoord;

    return output;
};

// On Windows shader 3.0 cards, we can use hardware instancing, reading
// the per-instance world transform directly from a secondary vertex stream.
VS_OUTPUT HardwareInstancingVertexShader(VS_INPUT input,
                                         float4x4 instanceTransform : TEXCOORD1)
{
    return VertexShaderCommon(input, transpose(instanceTransform));
}

// All the different instancing techniques share this same pixel shader.
float4 PixelShaderFunction(VS_OUTPUT input) : COLOR0
{
    return input.Color;
}

// Windows instancing technique for shader 3.0 cards.
technique HardwareInstancing
{
    pass Pass1
    {
        VertexShader = compile vs_3_0 HardwareInstancingVertexShader();
        PixelShader = compile ps_3_0 PixelShaderFunction();
    }
}

 

March 17, 2008 Posted by | C#, XNA | 26 Comments