Posted: 16 Sep 2011 19:39
Tags: bvh embree sah
This post should have dealt with Catmull-Clark subdivision mesh texturing but this development is not yet completed.
The "culprit" is Intel which has recently released Embree, an open-source renderer. While it's a pretty decent renderer in its own right, do not expect fancy shaders, motion blur, or high quality texturing: its main purpose is to demonstrate how to efficiently build and traverse ray tracing acceleration structures on Intel architectures in parallel fashion. In this area, it fares very well.
Embree features four different acceleration structures based on BVH (bounding volume hierarchies): two different object partitionings (SAH1 and split BVH2) times two branching factors (the number of children per node in the tree) for the BVH (2 and 4). To complete the comparison, I have added a median cut object partitioning which makes for a total of six different acceleration structures. The benchmarks are unambiguous: median cut is slower than SAH which in turn is slower than split BVH and BVH4 is faster than BVH2 whatever is the object partitioning.
Therefore, I have decided to remove the dust from my BVH accelerator (based on a median cut object partitionning) by reusing Embree software. The results are somewhat mixed: overall rendering times are decreased (the larger the numbers of primitives, the larger the gains), BVH4 is slightly faster than BVH2 but median cut is faster than SAH which is faster than split BVH. That completely contradicts available computer graphics litterature and means there is something fishy.
To be honest, I am not so surprised: I had already tried my luck with my own SAH BVH implementation with the same results. The difference is that I had concluded that I was to blame and that I had better to wait for a model implementation before drawing any conclusion. Now I can say the problem is not the implementation.
A possible explanation is that the traversal and intersection costs for my architecture have the wrong values which leads to wrong decisions and a low quality hierarchy. Embree deals only with triangles which it intersect four at a time using SSE. XRT handles a large array of primitives types (some of them are really expensive to intersect) and intersects them one at a time using plain x86 code. However, I get the same result with triangles only scenes which makes me doubt it is the right reason. This requires further testing.
In the mean time, the tree traversal is much faster than before and this new BVH implementation will supersede the previous.