I implemented Tom Forsyth's
Vertex Cache Optimization algorithm in
Open Scene Graph (OSG). The quick summary is that indexed lists of triangles in a mesh are more efficient than traditional representations like triangle strips. GPUs keep a cache of the post-transform values for the most recently used vertices in a mesh. This can avoid a whole lot of work, such as rerunning vertex shaders. Tristrips don't make good use of this cache because they introduce a new vertex for every triangle and only reuse the last two vertices. On the other hand, a random mesh order isn't good either; the triangles in a mesh need to be in a good order to take proper advantage of the cache. Tom describes an efficient algorithm for optimizing a mesh for the vertex cache. It produces an order that is "oblivious" to the actual cache size i.e., it will work well on a range of hardware.

The algorithm does work as advertised. I tried it on the
Happy Buddha model, which is a monster-size mesh with over a million triangles. On my machine the optimization reduces the draw time for this mesh by 38 percent, which is enormous. This is a special case because the mesh is so large and regular, and Your Milage May Vary, but it does seem like a worthwhile optimization for large meshes. I've submitted my code to the OSG project.