Posted: 16 Jun 2012 22:14
Tags: downloads examples gallery
Reaching new heights
XRT 1.4.1 is out. Compared to the previous release, this one dramatically improves performances:
- on a single core, it is 30% faster in most cases and nearly 100% faster with scenes that do volume rendering
- with multiple cores, rendering times are now more than 90% linear with the number of cores in all test cases (ie on a quad core, the speedup exceeds 3.6) whereas, with the previous release, rendering times were nearly the same whether you had a dual or a quad core.
Of course, both acceleration factors combine for a much much faster renderer.
There were two major sources of slowdown which illustrate quite well the pitfalls of multithreaded programming.
The first was an incorrect usage of OpenImageIO ustrings (which stands for unique strings) where I was repeatedlly calling ustring constructors instead of reusing them. This was the major limiting factor in single thread rendering mode. To make the matter worse, the ustring constructor accesses a table protected by a mutex creating a bottleneck when multithreading is on.
The second problem is a bit more subtle and deals with shared pointers. To preserve memory, XRT shares shaders and transformations between primitives using reference counting and copy-on-write policy. This is a very effective way to manage memory and to avoid memory leaks but is not without constraints. In a multi-threaded environment, shared pointers must rely on counters implemented using atomic primitives. Amongst the many solutions available (for an accurate discussion, see Implementing Scalable Atomic Locks for Multi-Core Intel® EM64T and IA32 Architectures), XRT uses vanilla atomic operations (compare/exchange and the likes).
So far, so good, but on Intel processors, atomic operations lock the cache. While rendering, shared pointers to transformation matrices are accessed zillions of times. The result is that the cache is very often locked by one core while the others are waiting.
Fortunately, if shared pointers are really handy to manage ressources while parsing scene files, they are useless while rendering. There is never any need for keep pointing to a particular ressource of the scene once a pixel has been rendered and therefore there is no need to count references and dereferences: accessing directly the raw pointer is both safe and fast.
For now, as a kind of brute force solution, I have replaced reference counting on transformations by a global transformation cache. It is very efficient but I know there are some corner cases left which may leak memory. I'll improve on that later on.
Other than that, this release restores the gamma correction feature lost with the OpenImageIO package integration.
One more eye candy
Today's picture is a new procedural sample added in the XRT examples archive. It generates a million points organized to build a well-known 3D fractal: the Sierpinki Gasket.