Running on Empty

The few things I know, I like to share.

XNA Framework GameEngine Development. (Part 19, Hardware Instancing PC Only)

Introduction

Welcome to Part19 of the XNA FrameWork GameEngine Development series.  In this article I will discuss the magic that is Hardware instancing.  Recent comments have expressed some concerns about poor performance in the engine.  One way to improve performance in the engine is to support instancing.

part19.jpg

Hardware instancing

This is by far the easiest method of instancing, I will discuss shader and vertex fetch instancing in later articles.  Essentially hardware instancing tells the graphics card that you want to draw an object multiple times in the same draw call.  Obviously, this will improve performance since one of the slowest calls you will ever make to a graphics device is draw.  We should always look to draw the most geometry in the fewest draw calls possible.

hardwareinstancing.png

Image courtesy MS XNA Team Site

Since shader model 3.0 we have at our disposal the concept of hardware instancing.  This method of instancing allows us to set up a vertex and index buffer to be drawn multiple times using an array of matrices.  This means pretty much any piece of complex geometry that uses the same effect settings can and should be drawn at the same time.

Starting the Magic

We need a way to identify a normal Model from an instanced model.  I do this by adding a few properties to the RoeModel class.  The ModelParts list will contain information about what else… model parts, and the Instanced proeprty will define if the model was created for instancing.  Finally, using the constructor parameter instanced will allow us to load the content and set up the model parts for instance rendering.

using System;
using System.Collections.Generic;
using System.Text;
using RoeEngine2.Interfaces;
using Microsoft.Xna.Framework.Graphics;
using RoeEngine2.Managers;

namespace RoeEngine2.Models
{
    public class RoeModel : IRoeModel
    {
        public List<RoeModelPart> ModelParts = new List<RoeModelPart>();

        private bool _instanced;
        public bool Instanced
        {
            get { return _instanced; }
        }
 
        private string _fileName;
        /// <summary>
        /// The file name of the asset.
        /// </summary>
        public string FileName
        {
            get { return _fileName; }
            set { _fileName = value; }
        }

        private Model _baseModel;
        public Model BaseModel
        {
            get { return _baseModel; }
        }

        private bool _readyToRender = false;
        ///<summary>
        ///Is the texture ready to be rendered.
        ///</summary>
        public bool ReadyToRender
        {
            get { return _readyToRender; }
        }

        public RoeModel(string fileName, bool instanced)
        {
            _instanced = instanced;
            _fileName = fileName;
        }

        /// <summary>
        /// Construct a new RoeModel.
        /// </summary>
        /// <param name="fileName">The asset file name.</param>
        public RoeModel(string fileName)
        {
            _fileName = fileName;
        }

        public void LoadContent()
        {
            if (!String.IsNullOrEmpty(_fileName))
            {
                _baseModel = EngineManager.ContentManager.Load<Model>(_fileName);
                if (_instanced)
                {
                    foreach (ModelMesh mesh in _baseModel.Meshes)
                    {
                        foreach (ModelMeshPart part in mesh.MeshParts)
                        {
                            ModelParts.Add(new RoeModelPart(part, mesh.VertexBuffer, mesh.IndexBuffer));
                        }
                    }
                }
                _readyToRender = true;
            }
        }
    }
}

Most of the work will happen in the model parts clss, but here you can see how a model is loaded as normal, then if it is instanced we do additional work to load the specific parts.  Please note, we do not have to use any custom processors to create the model parts.  For me this is a huge improvement over the example given on the team development site.

Backstage pass

This is were all the work takes place, here in the RoeModelPart class.  Each mesh part will be set up for prime GPU processing here, plus we will extend the vertex declaration to include the instancing magic.

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.Xna.Framework.Graphics;
using RoeEngine2.Managers;

namespace RoeEngine2.Models
{
    public class RoeModelPart
    {
        private int _primitiveCount;
        public int PrimitiveCount
        {
            get { return _primitiveCount; }
        }

        private int _vertexCount;
        public int VertexCount
        {
            get { return _vertexCount; }
        }

        private int _vertexStride;
        public int VertexStride
        {
            get { return _vertexStride; }
        }

        private VertexDeclaration _vertexDeclaration;
        public VertexDeclaration VertexDeclartion
        {
            get { return _vertexDeclaration; }
        }

        private VertexBuffer _vertexBuffer;
        public VertexBuffer VertexBuffer
        {
            get { return _vertexBuffer; }
        }

        private IndexBuffer _indexBuffer;
        public IndexBuffer IndexBuffer
        {
            get { return _indexBuffer; }
        }

        VertexElement[] originalVertexDeclaration;
 
        internal RoeModelPart(ModelMeshPart part, VertexBuffer vertexBuffer, IndexBuffer indexBuffer)
        {
            _primitiveCount = part.PrimitiveCount;
            _vertexCount = part.NumVertices;
            _vertexStride = part.VertexStride;
            _vertexDeclaration = part.VertexDeclaration;

            _vertexBuffer = vertexBuffer;
            _indexBuffer = indexBuffer;

            originalVertexDeclaration = part.VertexDeclaration.GetVertexElements();

            InitializeHardwareInstancing();
        }

        private void InitializeHardwareInstancing()
        {
            // When using hardware instancing, the instance transform matrix is
            // specified using a second vertex stream that provides 4x4 matrices
            // in texture coordinate channels 1 to 4. We must modify our vertex
            // declaration to include these channels.
            VertexElement[] extraElements = new VertexElement[4];

            short offset = 0;
            byte usageIndex = 1;
            short stream = 1;

            const int sizeOfVector4 = sizeof(float) * 4;

            for (int i = 0; i < extraElements.Length; i++)
            {
                extraElements&#91;i&#93; = new VertexElement(stream, offset,
                                                VertexElementFormat.Vector4,
                                                VertexElementMethod.Default,
                                                VertexElementUsage.TextureCoordinate,
                                                usageIndex);

                offset += sizeOfVector4;
                usageIndex++;
            }

            ExtendVertexDeclaration(extraElements);
        }

        private void ExtendVertexDeclaration(VertexElement&#91;&#93; extraElements)
        {
            // Get rid of the existing vertex declaration.
            _vertexDeclaration.Dispose();

            // Append the new elements to the original format.
            int length = originalVertexDeclaration.Length + extraElements.Length;

            VertexElement&#91;&#93; elements = new VertexElement&#91;length&#93;;

            originalVertexDeclaration.CopyTo(elements, 0);

            extraElements.CopyTo(elements, originalVertexDeclaration.Length);

            // Create a new vertex declaration.
            _vertexDeclaration = new VertexDeclaration(EngineManager.Device, elements);
        }
    }
}&#91;/sourcecode&#93;

<strong>The Beautiful Assistant</strong>

Now that we have a mesh broken up into its basic parts for fast GPU rendering, we must find a way to store matrices in a simple and easy to manage location.  I accomplish this using the SceneGraphManager.  This class will now support a dictionary keyed by string and support a list of matrices.  In addition, we also need a way to load up the dictionary and finally draw the instances.

        private static Dictionary<string, List<Matrix>> instanceMatrices;

        private static void DrawInstances(GameTime gameTime)
        {
            instancedshaderEffect effect = ShaderManager.GetShader("instance") as instancedshaderEffect;
            if (effect.ReadyToRender)
            {
                effect.View = CameraManager.ActiveCamera.View;
                effect.Projection = CameraManager.ActiveCamera.Projection;

                foreach (string key in instanceMatrices.Keys)
                {
                    RoeModel model = ModelManager.GetModel(key) as RoeModel;
                    if (model.ReadyToRender)
                    {
                        Model meshModel = model.BaseModel;
                        
                        foreach (RoeModelPart part in model.ModelParts)
                        {
                            EngineManager.Device.VertexDeclaration = part.VertexDeclartion;
                            EngineManager.Device.Vertices[0].SetSource(part.VertexBuffer, 0, part.VertexStride);
                            EngineManager.Device.Indices = part.IndexBuffer;
                            effect.BaseEffect.Begin();
                            foreach (EffectPass pass in effect.BaseEffect.CurrentTechnique.Passes)
                            {
                                pass.Begin();
                                DrawHardwareInstancing(instanceMatrices[key].ToArray(), part.VertexCount, part.PrimitiveCount);
                                pass.End();
                            }
                            effect.BaseEffect.End();
                        }
                    }
                }
            }
        }

        private static void DrawHardwareInstancing(Matrix[] matrix, int vertexCount, int primitiveCount)
        {
            const int sizeofMatrix = sizeof(float) * 16;
            int instanceDataSize = sizeofMatrix * matrix.Length;

            DynamicVertexBuffer instanceDataStream = new DynamicVertexBuffer(EngineManager.Device,
                                                                             instanceDataSize,
                                                                             BufferUsage.WriteOnly);

            instanceDataStream.SetData(matrix, 0, matrix.Length, SetDataOptions.Discard);

            VertexStreamCollection vertices = EngineManager.Device.Vertices;

            vertices[0].SetFrequencyOfIndexData(matrix.Length);

            vertices[1].SetSource(instanceDataStream, 0, sizeofMatrix);
            vertices[1].SetFrequencyOfInstanceData(1);

            EngineManager.Device.DrawIndexedPrimitives(PrimitiveType.TriangleList,
                                                       0, 0, vertexCount, 0, primitiveCount);

            // Reset the instancing streams.
            vertices[0].SetSource(null, 0, 0);
            vertices[1].SetSource(null, 0, 0);
        }

MisDirection

Now that we have a simple way to store instance objects and render them, we need to load up the dictionary.  This is done in the SceneObjectNode class.  As usual, we do not want to render objects that are culled.  In addition, if we are using instancing we need to bypass all of the occlusion work.  This is a tradeoff that we have to make, either we use occlusion or we use instancing.

        public override void DrawCulling(GameTime gameTime)
        {
            if (SceneObject is IRoeCullable)
            {
                ((IRoeCullable)SceneObject).Culled = false;
                if (CameraManager.ActiveCamera.Frustum.Contains(((IRoeCullable)SceneObject).GetBoundingBoxTransformed()) == ContainmentType.Disjoint)
                {
                    ((IRoeCullable)SceneObject).Culled = true;
                }
                else if (ModelManager.GetModel(SceneObject.ModelName).Instanced)
                {
                    SceneGraphManager.AddInstance(SceneObject.ModelName, SceneObject.World);
                }
                else
                {
                    SceneObject.DrawCulling(gameTime);
                }
            }
        }

        public override void Draw(GameTime gameTime)
        {
            if (SceneObject.ModelName != null && ModelManager.GetModel(SceneObject.ModelName).Instanced)
            {
                return;
            }
            else if (SceneObject is IRoeCullable && ((IRoeCullable)SceneObject).Culled)
            {
                SceneGraphManager.Culled++;
            }
            else if (SceneObject is IRoeOcclusion && ((IRoeOcclusion)SceneObject).Occluded)
            {
                SceneGraphManager.Occluded++;
            }
            else
            {
                SceneObject.Draw(gameTime);
            }
        }

The Grand Finale

Finally, our dictionary is loaded and we are ready to draw, time for some shader work.

#define MAX_SHADER_MATRICES 60

// Array of instance transforms used by the VFetch and ShaderInstancing techniques.
float4x4 instanceTransforms[MAX_SHADER_MATRICES];

// Camera settings.
float4x4 view;
float4x4 projection;

// This sample uses a simple Lambert lighting model.
float3 lightDirection = normalize(float3(-1, -1, -1));
float3 diffuseLight = 1.25;
float3 ambientLight = 0.25;

struct VS_INPUT
{
 float4 Position : POSITION0;
 float3 Normal : NORMAL;
 float2 TexCoord : TEXCOORD0;
};

struct VS_OUTPUT
{
 float4 Position     : POSITION;
 float4 Color  : COLOR0;
 float2 TexCoord : TEXCOORD0;
};

VS_OUTPUT VertexShaderCommon(VS_INPUT input, float4x4 instanceTransform)
{
    VS_OUTPUT output;

    // Apply the world and camera matrices to compute the output position.
    float4 worldPosition = mul(input.Position, instanceTransform);
    float4 viewPosition = mul(worldPosition, view);
    output.Position = mul(viewPosition, projection);

    // Compute lighting, using a simple Lambert model.
    float3 worldNormal = mul(input.Normal, instanceTransform);
    
    float diffuseAmount = max(-dot(worldNormal, lightDirection), 0);
    
    float3 lightingResult = saturate(diffuseAmount * diffuseLight + ambientLight);
    
    output.Color = float4(lightingResult, 1);

    // Copy across the input texture coordinate.
    output.TexCoord = input.TexCoord;

    return output;
};

// On Windows shader 3.0 cards, we can use hardware instancing, reading
// the per-instance world transform directly from a secondary vertex stream.
VS_OUTPUT HardwareInstancingVertexShader(VS_INPUT input,
                                         float4x4 instanceTransform : TEXCOORD1)
{
    return VertexShaderCommon(input, transpose(instanceTransform));
}

// All the different instancing techniques share this same pixel shader.
float4 PixelShaderFunction(VS_OUTPUT input) : COLOR0
{
    return input.Color;
}

// Windows instancing technique for shader 3.0 cards.
technique HardwareInstancing
{
    pass Pass1
    {
        VertexShader = compile vs_3_0 HardwareInstancingVertexShader();
        PixelShader = compile ps_3_0 PixelShaderFunction();
    }
}

 

March 17, 2008 - Posted by | C#, XNA

26 Comments »

  1. I’m too experiencing slow framerates. Running on P4 2,8 gHz with NVidia 6800GT and receiving about 22 FPS in the physics demo. Haven’t tried the last with hardware instancing yet, but it’s “just” for SM3, would be nice to be able to play decently on lower-end systems as well. Has anyone had the time to do some profiling to see where the bottleneck is (XNA or JigLibX)? And is there more optimizations that can be done to improve framerates for all XNA approved graphic cards?

    Comment by Zenox | March 18, 2008 | Reply

  2. Zenox, I am planning to demonstrate lower end system instancing for the xbox 360 and sm 2.0 hardware. I have not done much profiling on the application yet. Performance tuning at this stage would only cause confusion. I plan to write some articles on profiling and graphics tuning. I am not completely sold on JigLibX yet.

    Also, it should be noted that the engine is using a very very basic scene graph at this point. There are better ways of handleing scene management and I will be demonstrating those options soon.

    Comment by roecode | March 18, 2008 | Reply

  3. I think that your culling system also needs to be optimised. When I run your engine out of the box (part 18), I have 8 fps (without optimisations I’ve made prior). If I comment the culling system (SceneGraphManager.DrawCulling(gameTime);) in the gameplayscreen, I have 30fps+. Culling should improve performance, not decrease performances by a ratio 1/3 :-S. I’ll search for the bottleneck in it.

    Comment by Dracul86 | March 18, 2008 | Reply

  4. I think that using the .net “List” should be avoid. sort is a slow methods, and will become slower and slower when you’ll add models etc. On my computer, the sort method takes about a third of the time of one draw call. The recursive calls in the call Drawing takes an other third.

    Comment by Dracul86 | March 18, 2008 | Reply

  5. Dracul86 thank you man, always can count on people like yourself to help research some issues. Honestly, the only reason why I use the sort routine is to make use of occlusion query. Taking occlusion out of the picture is not a bad thing, was just investigating it for myself.

    Comment by roecode | March 18, 2008 | Reply

  6. Roe,

    I think you really need to spend time now tuning the engine so that it’s not running worse than Crysis on my 8600!

    Comment by MCDOWALL | March 21, 2008 | Reply

  7. MCDOWALL, yup that is exactly what I am doing right now. I am spending time tuning up the engine and refactoring a lot of the base classes.

    Comment by roecode | March 21, 2008 | Reply

  8. Excellent, great job so far – you’ve done some really good work pulling this all together!

    Comment by MCDOWALL | March 23, 2008 | Reply

  9. For performance you really want to avoid the following:
    Any IEnumerable specially if they are string based
    This means all foreach loops

    Any string operations, use int where possible. If you want to keep some kind of usefull debugger values use DebuggerDisplay attribute.

    Default sorting as this often implies HashCode and string operations, write your own IComparer

    Any string indexing, using int indexing.

    These are some of the most common and are usually really bad when moving over on the xbox.

    Another thing is that the phys lib you are usnig isn’t optimized and really drains, even when moved over on it’s own thread. It’s the only viable solution if you want to move to xbox, but if you stay on win platform consider using PhysX (or similar) lib.

    Comment by Sturm | April 4, 2008 | Reply

  10. Thank you for your suggestions Sturm. Trust me everything you have written here is not falling on dead ears. I have taken off writing new articles this entire month to work on performance issues with the engine.

    Much of what you have said here I have done or will be doing soon. Again thank you for your comments, I very much appreciate them.

    Comment by roecode | April 4, 2008 | Reply

  11. Thanks for all the good work, really enjoy your articles. A bit sad that it took so loooooong to find them, but have so far read each one and played with all the code in the matter of a few hours 🙂

    Also appreciate the fact that you refactor and optimise the code a bit, it’s always good to see a well written framework implementing a good design.

    Comment by Q | April 29, 2008 | Reply

  12. What’s happening with this series?

    Comment by MCDOWALL | May 31, 2008 | Reply

  13. MCDOWALL,
    I am still going to continue this series. I’ve been writing a lot of code to get the engine up and running in VS 2008 and XNA 3.0. I really appreciate all of the comments from people like yourself.

    Comment by roecode | June 2, 2008 | Reply

  14. Hello there 😀
    I found an error in your fx. In line 7,8,29,56 there is an false character. It must be an x between float4x4. I think it happens while your copy n paste your code in here. I only found it because of the error reporting duo the fx composer!

    Comment by J2T | July 7, 2008 | Reply

  15. It’s me again!
    Did anybody get the same error: http://img78.imageshack.us/my.php?image=errorqg9.jpg
    Or does anybody know why it happens and can help me?

    And Roe can you say me if my AddInstance() is correct cuz you dont post it:

    public static void AddInstance(String modelName, Matrix matrix)
    {

    if (modelName != null && matrices != null)
    {
    matrices.Add(matrix);
    instanceMatrices.Add(modelName, matrices);
    }
    }

    Comment by J2T | July 7, 2008 | Reply

  16. thanks a lot for these tutorials, I am learning a lot with them

    Comment by samairu | July 26, 2008 | Reply

  17. Is there a sprecial reason why you don’t release the source for this part?When not so would you please do so?!
    ty

    Comment by J2t | August 5, 2008 | Reply

  18. Hi,

    Brilliant tutorial, really interesting and very helpful, is there any chance you could do a mini tutorial on how to use the engine to its fully capacity for rendering, texturising, heightmaps and bump mapping for example. Just the basics for getting a nicely drawn world example.

    Thanks for all your hard work and for sharing it

    Comment by Jonathan Dixon | August 14, 2008 | Reply

  19. Yes, I am planning on putting together additional tutorials in this series. Just taking a break right now.

    Comment by roecode | August 14, 2008 | Reply

  20. Hi i just wanted to know i just love these tutorials and i’m hoping that you will continue soon with a new one.

    I’m also very interrested in more details on texturing things like terrain. I’ve been using programs like terragen and such to generate heightmaps and colormaps and such for use with the YATT plugin for nwn2 editor.

    Thanks for the great info so far.

    Comment by Galantir | September 21, 2008 | Reply

  21. Nice engine, is looking promising

    where is the link to download the source code of the engine? I can not found it. Thanks

    Comment by snoche | December 11, 2008 | Reply

  22. When will the call DrawInstances?

    Comment by TianYu | December 11, 2008 | Reply

  23. @J2T:
    My version of AddInstance is this:
    public static void AddInstance(String modelName, Matrix matrix)
    {
    if (modelName == null) return;
    List matrices = new List();
    matrices.Add(matrix);
    instanceMatrices.Add(modelName, matrices);
    }
    Remember that you are dealing with Dictionary<string, List>, meaning that you can’t just add a matrix. Not that this is the right answer either, but it does compile.

    @RoeCode:
    You were expecting us to use the ShaderManager for instancedshader.fx and didn’t warn us. That had me going for a while.

    For upcoming tutorials, especially for terrain, look up Reimer’s 4th series. What I really want is a Quadtree implementation with nicer skies. I can raytrace landscapes and sky textures of nearly unlimited size and can map skies on a sphere using a Gimp filter.

    I’ve already implemented his tutorial, but the performance from the result is abysmal (2-3 fps). Indeed I took it a step further and implemented slope maps using

    float angle = Vector3.Dot(vertices[i].Normal, Vector3.Up);
    vertices[i].Slope = MathHelper.Clamp((angle – 0.9f) * 5.0f, 0.0f, 1.0f);

    while generating the terrain data. Now that I have practice using the ShaderGenerator I want to throw a serious shader at it. I will look around online and see what I find.

    Comment by Armigus | February 17, 2009 | Reply

  24. Armigus, I actually have redone quite a bit of this in the new engine. Still debating how much of it I want to release just yet, but would very much like seeing your implementations if you succeed.

    Comment by roecode | February 18, 2009 | Reply

  25. Im not 100% sure if this is it because I cant actually get my instanced models to draw 🙂 but I think this works better albeit VERY UGLY 🙂

    public static void AddInstance(String modelName, Matrix matrix)
    {
    if (modelName == null) return;

    if (instanceMatrices.ContainsKey(modelName))
    {
    List update;
    instanceMatrices.TryGetValue(modelName, out update);
    instanceMatrices.Remove(modelName);
    instanceMatrices.Add(modelName, update);
    }
    else
    {
    var test = new List();
    test.Add(matrix);
    instanceMatrices.Add(modelName, test);
    }
    }

    Comment by sTeeL | March 6, 2009 | Reply

    • Sure sounds interesting, honestly instancing seems a bit hokey to me anyway. Currently, I am busy working on getting my game beta out, I am going to revist large portions of the engine after that project is finished.

      Comment by roecode | March 6, 2009 | Reply


Leave a reply to Galantir Cancel reply