Posted: 07 Mar 2014 21:21
This release is a hot fix for XRT 2.3.0. Sorry about that but I left a few nasty bugs in subdivision surfaces rendering code. As a bonus, XRT now computes tighter bounding boxes for cubic curves which decreases rendering times by 10% on "hairy" scenes.
Posted: 02 Mar 2014 11:01
Tags: downloads subdivision surface
I can't get no satisfaction
I have never been satisfied with XRT implementation of subdivision surfaces. It was missing important features and was very slow. This new version solves both issues.
Still the same
The general outline of the rendering algorithm (described in this post) has not changed much in this new implementation: a topological structure is built from the subdivision surface description and is refined to isolate extraordinary features and to obtain a control mesh made of quads only. Then, the mesh is split into individual faces and their 1-neighborhood (the minimal data needed to apply subdivision rules and compute the limit surface). During the intersection phase, the face is subdivided on the fly until the resulting patches look flat or small enough. They can then be safely approximated as a bilinear patch which is checked for intersection.
With a little help from my friends
Until Pixar came with the OpenSubdiv project, a developer was really on its own with subdivision surfaces. Quoting the project overview, "OpenSubdiv is a set of open source libraries that implement high performance subdivision surface (subdiv) evaluation on massively parallel CPU and GPU architectures. This codepath is optimized for drawing deforming subdivs with static topology at interactive framerates. The resulting limit surface matches Pixar's Renderman to numerical precision." At first sight, this looks only suitable for people doing real-time graphics or interactive editors but, actually, OpenSubdiv architecture makes it reusable for many purposes.
OpenSubdiv is built from three layers:
- hbr (hierarchical boundary rep) is a topological structure designed to store edges, faces, and vertices of a subdivision surface. It also stores attributes of the surface such as corners and creases, facevarying data and various hints affecting the subdivision process such as hierarchical edits. Actually, it supports almost all features of SubdivisionMesh and HierarchicalSubdivisionMesh as defined by the RenderMan Interface specification.
- far (feature-adaptive rep) uses hbr to create and cache fast run time data structures for table driven subdivision of vertices and cubic patches for limit surface evaluation. Feature-adaptive refinement logic is used to adaptively refine coarse topology near features like extraordinary vertices and creases in order to make the topology amenable to cubic patch evaluation. It supports these subdivision schemes:
- osd (Open Subdiv) contains client-level code that uses far to create concrete instances of meshes. These meshes use precomputed tables from hbr to perform table-driven subdivision steps with a variety of massively parallel computational backend technologies. Osd supports both uniform subdivision and adaptive refinement with cubic patches. With uniform subdivision the computational backend code performs Catmull-Clark splitting and averaging on each face. With adaptive subdivision the Catmull/Clark steps are used to compute the control vertices of cubic patches, then the cubic patches are tessellated on with GLSL or DirectX. This top-level layer is really dedicated to real-time performance.
As you probably already guessed, XRT now uses hbr to store all subdivision surface data and far to selectively refine the control mesh, solving the missing features problem.
The need for speed
As usual in algorithms, there is a trade-off between speed and memory. The rendering algorithm XRT is using has been designed for GPUs which are so fast for simple and parallel computations that they can afford not to cache anything in memory and still be very efficient. Of course, it turns out to be not that suitable for CPUs.
Therefore, I have tried to use hbr to cache control mesh refinement. It failed because hbr structures are not multithreaded and because manipulating lots of pointers in a topological structure is slow (even slower that refining again and again with a linearized data structure).
The key observation is that, except around extraordinary features, the limit surface of a Catmull-Clark subdivision surface is a b-spline surface. In other words, once a subdivision surface has been refined up to a point that a patch is regular (ie does not contain extraordinary features: points with valence other than 4, crease or corner), that patch can be replaced by a bicubic b-spline patch whose 16 control points are the control points of the patch. Furthermore, when subdividing a patch containing one extraordinary feature (the case where it contains more than one has been dealt with in the refinement phase), only one (sometimes two) of the four resulting patches is extraordinary, the three others being regular. Actually, once features have been isolated, at each refinement level, the number of extraordinary patches stay almost the same. They just get smaller.
The consequence is that subdivision on the fly is almost never needed and that raytracing a Catmull-Clark subdivision surface can be nearly as fast as raytracing a mesh of bicubic patches. Implementing this optimization brought a 10 times speed increase to this algorithm.
|raw subdivision surface|
|with additional bump mapping effect|