About


XRT is a raytracing based programmable rendering system for photo-realistic image synthesis built around a plug-in architecture design.


Blog

XRT 1.5.0 released

Posted: 01 Sep 2012 16:26
Tags: downloads

Infuriated developer
Image Unavailable
Catmull-Clark hydra from Sitex Graphics's Air examples

Catmull-Clark subdivision surfaces …

The major feature in XRT 1.5.0 is a new geometric primitive: Catmull-Clark subdivision surfaces. Except for texturing, everything you would expect from it is supported: creases and corners, whether they are sharp or smooth, holes and boundaries. For a definition of these terms, please refer to my previous posts on the subject (basics, corners and creases, holes and boundaries).

You have certainly noticed that these posts are more than one year old and, indeed, the current implementation has been sitting on my hard drive from this time. I could pretend that I have been distracted by the implementation of other appealing features but the real reason is that I am not happy with the result: it's kinda slow (maybe my expectations were too high …), it suffers from tesselation artifacts and it's all my fault !

The intersection algorithm recursively refines the surface until the resulting patches are flat or small enough. The problem lies in the stopping criteria. It should be computed using derivative information which XRT current design is not able to provide1. So, it is stopped at an arbitrary subdivision level.

The result is that, depending on the zoom factor, a surface may be over-subdivided (which is bad for performance and sometimes leads to precision problems) or under-subdivided. The next picture is the horrifying result of a bad refinement.

rough_gnome.jpg

It was bad one year ago and it still is. So, why release it now ? For one thing, it does decent pictures most of the time and I believe it cannot be improved without changes that go far beyond what I planned originally for XRT 2.0. Therefore, it will have to wait for work on the next major version to start. I will go into further details in future posts.

… and the rest

  • XRT frontend has been completely rewritten for improved help message, versioning information, more control on threading, debugging output and statistics.
  • shadow bias is now supported.
  • a myriad of bugs has been fixed.
  • the complete list of changes is available in the ChangeLog.

This version and the updated documentation are available in the Downloads page.

Comments: 0, Rating: 0


XRT 1.4.1 released

Posted: 16 Jun 2012 22:14
Tags: downloads examples gallery

sierpinski.jpg

Reaching new heights

XRT 1.4.1 is out. Compared to the previous release, this one dramatically improves performances:

  • on a single core, it is 30% faster in most cases and nearly 100% faster with scenes that do volume rendering
  • with multiple cores, rendering times are now more than 90% linear with the number of cores in all test cases (ie on a quad core, the speedup exceeds 3.6) whereas, with the previous release, rendering times were nearly the same whether you had a dual or a quad core.

Of course, both acceleration factors combine for a much much faster renderer.

There were two major sources of slowdown which illustrate quite well the pitfalls of multithreaded programming.

The first was an incorrect usage of OpenImageIO ustrings (which stands for unique strings) where I was repeatedlly calling ustring constructors instead of reusing them. This was the major limiting factor in single thread rendering mode. To make the matter worse, the ustring constructor accesses a table protected by a mutex creating a bottleneck when multithreading is on.

The second problem is a bit more subtle and deals with shared pointers. To preserve memory, XRT shares shaders and transformations between primitives using reference counting and copy-on-write policy. This is a very effective way to manage memory and to avoid memory leaks but is not without constraints. In a multi-threaded environment, shared pointers must rely on counters implemented using atomic primitives. Amongst the many solutions available (for an accurate discussion, see Implementing Scalable Atomic Locks for Multi-Core Intel® EM64T and IA32 Architectures), XRT uses vanilla atomic operations (compare/exchange and the likes).

So far, so good, but on Intel processors, atomic operations lock the cache. While rendering, shared pointers to transformation matrices are accessed zillions of times. The result is that the cache is very often locked by one core while the others are waiting.

Fortunately, if shared pointers are really handy to manage ressources while parsing scene files, they are useless while rendering. There is never any need for keep pointing to a particular ressource of the scene once a pixel has been rendered and therefore there is no need to count references and dereferences: accessing directly the raw pointer is both safe and fast.

For now, as a kind of brute force solution, I have replaced reference counting on transformations by a global transformation cache. It is very efficient but I know there are some corner cases left which may leak memory. I'll improve on that later on.

Gamma correction

Other than that, this release restores the gamma correction feature lost with the OpenImageIO package integration.

One more eye candy

Today's picture is a new procedural sample added in the XRT examples archive. It generates a million points organized to build a well-known 3D fractal: the Sierpinki Gasket.

Comments: 0, Rating: 0


XRT 1.4.0 released

Posted: 27 May 2012 09:35
Tags: downloads

Full steam !!
Image Unavailable
The gratifying vision of a computer fully dedicated to raytracing !

This release is my first attempt at multithreaded rendering. Building upon work done for version 1.2, XRT now fires rays in a parallel fashion. Really, the algorithm is a no-brainer: the whole image is divided in small tiles stored in a work queue. While the queue is not empty, each rendering thread picks up a new tile and computes its pixels.

The tricky part is to make sure that the code is thread-safe. Namely, global ressources are evil things. Each thread must be granted exclusive write access while others are waiting for read access; otherwise, bad things happen.

There are only two solutions:

  • protect the global ressources against shared access using atomics, mutexes, … Just be aware that synchronisation primitives have an intrinsic run-time cost and that the more threads wait, the less efficient the program becomes.
  • make sure that each thread has its own copy of the data (re-entrancy is the buzzword here).

So, do not (yet) expect wonders. Synchronisation between threads on XRT is taking its toll and I have noticed that rendering times do not scale well with the number of processors. With 4 threads, I get only a 2.5x speed increase. I am looking at it.

As a side note, OIIO has also been upgraded to version 1.0.4. Except for a slight modification for XP, this is the genuine version.

The list of changes is fully detailed in the change log.

This version and the updated documentation are available in the Downloads page.

Comments: 0, Rating: 0


XRT 1.3.1 released

Posted: 10 Mar 2012 19:04
Tags: downloads

reflect.jpg

The major change in this release is the upgrade to OIIO 1.0. Be aware that the version bundled with XRT differs slightly from the genuine 1.0 version. It fixes a problem with the maketx utility (to be commited soon to the GitHub master) and compatibility with XP has been restored (yes, I still have an XP box!).

Environment mapping is now working as demonstrated on the right. Previously, a call to environment() in a shader would always fire rays. Now, depending on the argument, it will also sample a texture. The RIB file for this picture is included in the XRT examples archive. I have also updated a few examples to account for the change of behaviour and restore raytracing where needed.

I have made some changes to XRT C++ API. In the original Gelato specification, Input only accepts parameters through a single string which has to be parsed. This is not very flexible and akin to reinventing the wheel. Actually, using "user attributes" to pass parameters is much easier: the Generator calls GetAttribute to get values set before the Input call with Attribute. The only issue is that attributes are persistent and can possibly interfere with other Generators. You are safe if you use PushAttributes/PopAttributes to keep the "user attributes" stack clean. There has to be a better way.

Things get much simpler if Input behaves like Camera, Output, Shader, Light, or any geometric primitive: all calls to Parameter are saved into a “pending parameter” list. This list is passed to the Generator constructor and is cleared by XRT afterwards. This allows for a greater flexibility and a greater consistency.

Finally, a very annoying bug has been fixed: sometimes, rendering was freezing on startup.

As usual, this version and the updated documentation are available in the Downloads page.

For a complete list of changes, see the change log.

Comments: 0, Rating: 0


LuxRays

Posted: 03 Feb 2012 14:10
Tags: gallery

Some additions to the LuxRays gallery

Sponza Atrium
Image Unavailable
rendered with ambient occlusion

Loft
Image Unavailable
loft of area lights

LuxBall
Image Unavailable

Comments: 0, Rating: 0


XRT 1.3.0 released

Posted: 03 Feb 2012 13:55
Tags: downloads

oiio-logo-no-alpha.png

OpenImageIO or OIIO (an open source project managed by Larry Gritz, a living legend of the CGI industry) is a library for reading and writing images that is format agnostic — that is, a "client app" doesn't need to know the details about any particular image file formats. Specific formats are managed by DLL/DSO plugins. The list of plugins is already rather impressive and growing (TIFF, JPEG/JFIF, OpenEXR, PNG, HDR/RGBE, Targa, JPEG-2000, DPX, Cineon, FITS, BMP, ICO, RMan Zfile, Softimage PIC, DDS, SGI, PNM/PPM/PGM/PBM, Field3d, WebP). Additionally, a TextureSystem class provides filtered MIP-map texture lookups. Texture reads are handled through a cache mechanism which performs access to vast amounts of image data using only a tiny amount (tens of megabytes at most) of runtime memory.

These features make OIIO a very attractive component to include in any renderer, including mine. Actually, because its design is an improved version of the Gelato specification, it has been quite straightforward to replace XRT image I/O plugins and texturing system. The result is available with this new version in the Downloads page.

Frankly speaking, aside from a much wider access to image file formats, it does not improve XRT a lot for now. Further work is needed to fully take advantage of all enhancements OIIO provides. It has been also an opportunity to remove some dust in the interfaces, to clean up and to complete parts of the implementation. Once again, this looks like "invisible" work for the end user. Visible improvements are left for future releases.

For a complete list of changes, see the change log.

Comments: 0, Rating: 0


Surge or slump ?

Posted: 23 Dec 2011 12:42
Tags: examples

I have slightly expanded the XRT examples archive.

  • The source for the global illumination example seen here is now included.
  • The Menger Sponge procedural example has been updated.
  • For those puzzled by fractals, I offer this alternative:

Power charge
Image Unavailable

Pills
Image Unavailable

Comments: 0, Rating: 0


Two CGKit animations

Posted: 03 Dec 2011 09:34
Tags: animation cgkit

Last month, I have stumbled upon two animations from Roger Stuckey made with CGKit. CGKit is a really neat collection of Python modules dedicated to 3D graphics. Amongst a plethora of features, you can run rigid body dynamics simulations and export them to RenderMan RIB format. Here they are, rendered with 256 ambient occlusion samples (a rather high number required to avoid noise flickering across frames). I have tweaked a bit the original scripts mainly to reduce the size ot the generated RIBs (instead of a 500+ MB file, the teapots animation RIB is now less than 3 MB). If you want to pay a look, here are links to the files: pyode_render_ex3.py for the blocks animation and pyode_render_ex4.py for the teapots animation.

Falling Blocks

Teapots

Comments: 0, Rating: 0


It's a wonderful world ...

Posted: 04 Nov 2011 23:59
Tags: gallery hair

wStraight.jpg

While browsing the Internet for free models, the harvest has been very good and I wish to thank all these people who kindly made all this data available.

New eye candies

I have created new galleries for all of them:

  • The Utah 3D Animation Repository (here)
  • Deformation Transfer Data (here)
  • Hair Model Files (here)
  • McGuire GraphicsData (here)

Some of them are far from being complete (especially McGuire GraphicsData gallery). Properly setting up cameras and lights is awfully time consuming, and XRT runs out of memory when scenes contain more than 1.5 million primitives.

Hair raising (literaly)

I also spent quite some time on the Hair Models gallery. Rendering curves is not trivial but shading them so that they look like real hair is really challenging. I started with an old Renderman shader from the 1999 Siggraph course "Advanced renderman: Beyond the Companion" but it did not really cut it. A couple of Google searches later, I knew that my shader was based on a Kay-Kajiya hair model [1] which has been superseded by the Marschner hair model [2]. Not being discouraged by hairy formulas, I found a Marschner shader here itself based on an open source implementation from the Cortex project.

All pictures of the gallery are based on this shader, slightly extended: to account for hair self-shadowing, all shading calculations are modulated by ambient occlusion which I believe greatly improves realism. Nevertheless, it is still very far from what you can see in Disney's "Tangled" for instance. The state of the art seems to be Zinke hair model [3] from which Sadeghi derived the Renderman shader [4] used in that movie. I have not yet found enough courage to dig the formulas …

Bibliography
1. Kajiya J. et al., Rendering fur with three dimensional textures (1989)
2. Marschner S. et al., Light scattering from human hair fibers (2003)
3. Zinke A. et al., Dual Scattering Approximation for Fast Multiple Scattering in Hair (2008)
4. Sadeghi I. et al., Efficient Implementation of the Dual Scattering Model in RenderMan (2010)

Comments: 0, Rating: 0


XRT 1.2.0 released

Posted: 02 Oct 2011 13:22
Tags: downloads

sponza.topview.jpg

Apart from the improved BVH accelerator highlighted in my previous post, there are two other important changes in this version:

  • the code is now compiled with Visual C++ 2010. That affects the list of dependencies (listed in the Downloads page) required to run XRT.
  • a new OBJ generator imports geometry and materials from the Alias/Wavefront object format. There are still some rough edges (especially, I need to extend the list of supported illumination models) but I can already render nice scenes such as the Crytek Sponza Atrium or the Ajax model from Jotero. Given the wealth of OBJ models available on the Internet, the gallery will get even bigger!

For a complete list of changes, see the change log.

Post scriptum

The BVH accelerator code makes heavy use of SSE4 instructions. As a result, it will bomb on older processors that do not support this instruction set. If this is a problem for anyone, I can still provide an SSE4-less accelerator.

Comments: 0, Rating: 0


To SAH or not to SAH ?

Posted: 16 Sep 2011 19:39
Tags: bvh embree sah

ajax.jpg

This post should have dealt with Catmull-Clark subdivision mesh texturing but this development is not yet completed.

The "culprit" is Intel which has recently released Embree, an open-source renderer. While it's a pretty decent renderer in its own right, do not expect fancy shaders, motion blur, or high quality texturing: its main purpose is to demonstrate how to efficiently build and traverse ray tracing acceleration structures on Intel architectures in parallel fashion. In this area, it fares very well.

Embree features four different acceleration structures based on BVH (bounding volume hierarchies): two different object partitionings (SAH1 and split BVH2) times two branching factors (the number of children per node in the tree) for the BVH (2 and 4). To complete the comparison, I have added a median cut object partitioning which makes for a total of six different acceleration structures. The benchmarks are unambiguous: median cut is slower than SAH which in turn is slower than split BVH and BVH4 is faster than BVH2 whatever is the object partitioning.

Therefore, I have decided to remove the dust from my BVH accelerator (based on a median cut object partitionning) by reusing Embree software. The results are somewhat mixed: overall rendering times are decreased (the larger the numbers of primitives, the larger the gains), BVH4 is slightly faster than BVH2 but median cut is faster than SAH which is faster than split BVH. That completely contradicts available computer graphics litterature and means there is something fishy.

To be honest, I am not so surprised: I had already tried my luck with my own SAH BVH implementation with the same results. The difference is that I had concluded that I was to blame and that I had better to wait for a model implementation before drawing any conclusion. Now I can say the problem is not the implementation.

A possible explanation is that the traversal and intersection costs for my architecture have the wrong values which leads to wrong decisions and a low quality hierarchy. Embree deals only with triangles which it intersect four at a time using SSE. XRT handles a large array of primitives types (some of them are really expensive to intersect) and intersects them one at a time using plain x86 code. However, I get the same result with triangles only scenes which makes me doubt it is the right reason. This requires further testing.

In the mean time, the tree traversal is much faster than before and this new BVH implementation will supersede the previous.

Comments: 0, Rating: 0


Catmull-Clark subdivision surfaces: holes and boundaries

Posted: 12 Jun 2011 10:37
Tags: catmull clark subdivision surface

As promised, this post describes the remaining features in Catmull-Clark subdivision surfaces: holes and boundaries. This is going to be a lot more straightforward than the previous posts

Holes

A hole is a face or a group of faces that will not generate geometry to be drawn. You could say, why bother and specify data that will not be seen? Remember that a face depends on its 1-neighbourhood. Therefore, when a subdivision iteration occurs, a face next to a hole is influenced by it even if it is not to be seen.

Putting it all together
Image Unavailable
crease, hole and two corners

Boundaries

So far, so good but for faces abutting the boundary of an open mesh, their 1-neighbourhood is not complete and smooth subdivision rules as defined in this post do not apply. There are three possible interpolation strategies to handle this particular case:

  • no interpolation: boundary faces are treated like holes and will not be rendered
  • edge interpolation: boundary edges are tagged as infinitely sharp creases (which do not require a 1-neighbourhood for subdivision). All boundary vertices become implicitely crease vertices.
  • full interpolation: boundary edges are tagged as infinitely sharp creases and boundary vertices of valence 2 (in general, these correspond to geometric corners) are tagged as infinitely sharp corners. If you have abutting control meshes, you get abutting limit surfaces but they are only C0-continuous.

Let's have a quick example to illustrate all of these:

No interpolation
Image Unavailable

Edge interpolation
Image Unavailable

Full interpolation
Image Unavailable

Not yet ready …

This ends the series of posts on Catmull-Clark but it does not mean I am ready for delivery. Although I have made good progress with the implementation, there are still rough edges to be ironed out, the biggest one being the lack of texturing support for this primitive.

Comments: 2, Rating: 0


Gallery slideshow

Posted: 09 Jun 2011 21:14
Tags: gallery

As the gallery is growing, there are more and more pages to browse. If you just want to skim rapidly through all pictures, there is now a slideshow available in the main Gallery page. For good measure, a thumbnail version of it is also available on the Blog page to wet the appetite of any visitor. Kudos to Wikidot for making embedding gadgets so easy !

Comments: 0, Rating: 0


Catmull-Clark subdivision surfaces: corners and creases

Posted: 28 May 2011 12:41
Tags: catmull clark subdivision surface

This post delves a bit further into the realm of subdivision surfaces. Building upon the lore gathered in the previous post, it describes extensions to the Catmull-Clark scheme for modeling fillets and blends.

Sharp features

Because smooth only surfaces are of limited use, Pixar people have introduced sharp features into Catmull-Clark subdivision surfaces[1]: corners and creases. A few more rules (let's call them sharp rules) have been added to the smooth subdivision rules.

If you remember my previous post, the subdivision produces three types of vertices:

  • face vertex. The vertex uses always the smooth rule
  • edge vertex. For each edge tagged as sharp, the new vertex is the average of the edge's endpoints. The new sub-edges are also tagged as sharp. The other edges use the smooth rule.
  • control vertex. For each control vertex of the mesh, the vertex is moved to a new location that depends on the number of sharp edges incident at the vertex.
    • If this number is less than 2, the vertex uses the smooth rule.
    • If the number equals 2, the vertex uses the crease vertex rule: the new vertex is a weighted average of the old vertex location (3/4) and of the two other endpoints of the incident creases (1/8).
    • If the number is more than 2, the vertex uses the corner rule: the vertex does not move under subdivision.
    • Of course, if a vertex is tagged as sharp, it uses the corner rule whatever is the number of incident sharp edges.

Here is a small example of how a surface changes, just by tagging edges or vertices. The control mesh is the same for each picture.

Smooth surface
Image Unavailable
wireframe control mesh, where smooth edges are yellow

Creases
Image Unavailable
creases edges are red

Corner
Image Unavailable
the upper right vertex is a corner

In this example, four edges meet at the same vertex, implicitely making a corner out of it.

Implicit corner
Image Unavailable

Semi-sharp features

In the real world, surfaces are never infinitely sharp. Anything when viewed sufficiently closely is smooth. Another refinement of the subdivision rules is needed to obtain semi-sharp features. For that purpose, sharp edges and corners are weighted by a sharpness parameter. This parameter is decreased by 1 at each iteration of the subdivision process and controls which rules are applied to sharp features.
* If the sharpness is greater or equal to 1, sharp subdivsion rules are applied
* If the sharpness is between 0 and 1, the result is a linear interpolation of the smooth rule and of the sharp rule using the sharpness value.
* If the sharpness is lower or equal to 0, smooth subdivision rules are applied.

This is a very intuitive mechanism which behaves like levels of detail. At coarser levels of the subdivision, features are sharp and become smooth at finer levels.

Here is how the smooth surface of the first paragraph is affected by various sharpness values.

Image Unavailable
sharpness = 1.0

Image Unavailable
sharpness = 2.0

Image Unavailable
sharpness = 3.0

In the following animation, the sharpness varies between 0 and 10 with a 0.1 step.

Notice that a crease does not need to be not a closed contour. In the next example, only three of the four upper edges are tagged as creases with a sharpness of 4.0.

Open crease
Image Unavailable

More to come

We are not yet done with subdivision surfaces. The next post will describe how to deal with holes in the control mesh and boundaries for open meshes.

Bibliography
1. Tony Derose, Michael Kass, and Tien Truong. "Subdivision surfaces in character animation." In Proceedings of the SIGGRAPH 1999 annual conference on Computer graphics, pages 85-94, Los Angeles, CA, USA,August 1999.

Comments: 2, Rating: 0


Catmull-Clark subdivision surfaces: the basics

Posted: 04 May 2011 21:21
Tags: catmull clark subdivision surface

Introduction

Subdiv.jpg

The basic idea of subdivision surface is to construct a surface from an arbitrary polyhedron (the control mesh) by repeatedly subdividing each of the faces. If the subdivision is done appropriately, the limit of this subdivision process will be a smooth surface. Many subdivision schemes (or set of rules) have been defined over the years (Loop, Doo-Sabin, …) but Catmull-Clark scheme [1] and its extensions [2] are by far the most popular.

Why do modelers prefer subdivision surfaces over NURBS ?

A NURBS surface, like any other parametric surface, is limited to representing surfaces which are topologically equivalent to a sheet, a cylinder or a torus. Because the control mesh is rectangular, refining a NURBS affects the whole surface. By contrast, subdivision surfaces can be refined locally by adding points to the control mesh, giving more freedom to the modeler. Therefore, support for this primitive is now ubiquitous in all high-end renderers.

Catmull-Clark scheme

This subdivision produces three types of vertices:

  • face vertex. For each face, the new vertex is the average of the face's vertices
  • edge vertex. For each edge, the new vertex is the average of the edge's endpoints and the new face vertices of the two faces that share the edge.
  • control vertex. For each control vertex of the mesh, the vertex is moved to a new location that is (n-2)/n times the old vertex location, plus 1/n times the average of the n adjacent new edge vertex positions, plus 1/n times the average of the n adjacent new face vertex positions, where n is the valence of the vertex1.

For each face, new faces are created by connecting these new vertices as follows: each new face vertex is connected to the new edge vertices of the boundary of the face; each new control vertex is connected to the new edge vertices created for the edges incident with the old control vertex.

These rules are valid only for closed meshes (which have no boundaries). They must be extended for open meshes. This will be the subject for another post.

A few nice properties

This looks complicated and really is. Studying how these rules work, a number of properties can be observed:

  • For Catmull-Clark subdivision surfaces, each face has an associated limit surface patch that is completely defined by the face and its 1-neighborhood. That means a subdivision surface can be broken down into individual components that are rendered separately, just like a polygon mesh can be rendered as individual polygons.
  • Each new face is rectangular with two edge vertices, one control vertex and one face vertex.
  • The face vertices corresponding to the original non rectangular faces are extraordinary vertices2.
  • The new control vertices keep their valence.
  • Since there are only rectangular faces after the first iteration, the number of extraordinary vertices remain constant and there is at most one extraordinary vertex per face from the second iteration onward .
  • When subdividing an irregular patch3, the result is another irregular patch whose extraordinary vertex has the same valence and three regular patches.

Rendering

Rendering subdivision surfaces is difficult: it is not too hard to implement subdivision rules but the limit surface has no analytical expression and the control mesh can have any arbitrary topology. There are different schools of thought: some tesselate away the control mesh using subdivision rules until the resulting polygons are small enough at the expense of considerable memory usage (at each iteration, a Catmull-Clark scheme quadruples the number of polygons), others try to approximate the surface using bicubic patches.

XRT implements a third approach advocated in [3]. The subdivision is performed on the fly with the required precision which saves memory and avoids artifacts but increases rendering times.

Algorithm outline

The algorithm makes use of the Catmull-Clark scheme properties to simplify the topology of the control mesh. It performs one or two levels of subdivision to obtain a control mesh made of quads with at most one extraordinary vertex (this cuts down drastically the number of special cases) and breaks down the mesh into individual faces with their 1-neighborhood. In most cases, these are regular patches with 16 control points. During the intersection calculation, if the ray misses the patch bounding box, the patch is discarded, otherwise it is subdivided until it looks flat or small enough. The face can then be safely approximated as a bilinear patch which is checked for intersection.

Holger Dammertz's thesis is rather sketchy and only details regular patch subdivision. Nevertheless, the algorithm can be extended to handle open meshes with corners and creases.

Conclusion

This post has been rather long and technical. It was needed to setup the foundations. Next posts will describe the incremental changes required for a full fledged implementation of Catmull-Clark subdivision surfaces.

For now, XRT only supports closed meshes: open meshes require a set of additional rules which are being implemented. Many thanks to Kevin Beason for putting its subdivision software in the public domain. It has been a great starting point to build upon.

Bibliography
1. E. Catmull and J. Clark. "Recursively generated B-spline surfaces on arbitrary topological meshes." Computer Aided Design,10(6):350–355, 1978.
2. Tony Derose, Michael Kass, and Tien Truong. "Subdivision surfaces in character animation." In Proceedings of the SIGGRAPH 1999 annual conference on Computer graphics, pages 85-94, Los Angeles, CA, USA,August 1999.
3. Holger Dammertz, "Floating-Point Precision Ray Tracing of Free-Form Surfaces", Diplomarbeit, 2005

Comments: 0, Rating: 0


XRT 1.1.1 released

Posted: 13 Feb 2011 13:31
Tags: downloads gallery luxrays

classroom.jpg

There is nothing really striking in this new release, this is just a collection of bug fixes and improvements that I have brought to XRT while trying to render the classroom scene (for a larger rendering look into the LuxRays gallery) and a bunch of other scenes from Brigham Young University (BYU) CS655 Courseware (gallery here).

Here is a quick summary (for a complete list, see the change log):

  • improved sampling for area lights and distributed specular rays. My previous post explains in detail the ins and outs of proper sample generation.
  • improved ply generator. Depending on a user attribute (int user:plyGenerateNormals), it is now possible to average normals on vertices of a PLY mesh. This is mainly useful to give a rounded appearance to curved objects that have been tesselated (for instance, the chairs in the classroom).
  • deprecated LookAt call. It was not orthogonal with Camera and World calls and I have never able to come up with a clear and concise explanation on how to use it. These are genuine signs of a flawed design. When I finally figured out that, in most cases, I could do the same thing with Python, it was just about time to remove it. The StructureSynth exporter and a few other generators have been overhauled to take this change into account. There is now a tiny script in $(XRT_HOME)/inputs, lookat.py, that takes care of computing the viewing transformation. A nice side effect is that, now, the StructureSynth exporter is compatible with Gelato.

Comments: 0, Rating: 0


Bring da noise down

Posted: 08 Jan 2011 22:16
Tags: sampling

Computing a picture is basically sampling the world to reconstruct a signal (the color information). The question is: how many samples do you need ? Claude Shannon, in 1949, replied: "A signal can be reconstructed exactly if it is sampled, at least, at twice its maximum frequency". This minimum sampling frequency is called the Nyquist frequency. What happens if your sampling frequency is below that threshold ? You get geometrical, color or temporal aliasing (jaggies, Moiré patterns, wagon wheel effect) which is visually disturbing.

In the real world, geometric details range from kilometers to sub millimeters and light comes from all directions, there are abrupt geometric or color transitions which set the Nyquist frequency to very high values. Theoretically, you would need zillions of samples to remove aliasing. However, if you consider that your eye has even less sensors than the CCD of the average cellphone camera but still never ever aliases, it seems like Mother Nature has found a good solution. In 1983, Yellott had a decisive interview with a monkey [1].

Monkey eye cone distribution
Image Unavailable

At first sight, the cones only look like random dots. However, the frequency analysis of the spatial distribution is much more revealing.

Fourier transform
Image Unavailable

This is a blue noise spectrum (or Poisson disk distribution) which has two interesting properties:

  • there is no low frequency energy (the empty ring around the spike): there is always a minimum distance between samples.
  • there is an equal amount of energy in higher frequencies (the white ring): samples are noisy but uniformly distributed

Thanks to its cone distribution, our eye converts aliasing energy into high frequency noise for which our visual system is less sensitive. You cannot remove aliasing but you can mitigate its impact on the quality of the final image. Therefore, a good sample distribution should strive to mimic a Poisson disk distribution pattern.

However, generating a Poisson distribution in a efficient manner, with a accurate control of the number of samples is still an open problem. There are only approximating distributions available. Armed with these bits of theoretical knowledge, we can now explain the behaviour of various sample distributions1.

A uniform distribution satisfies the minimum distance requirement but fails to distribute evenly the aliasing energy. To make the matter worse, it concentrates it in a coherent manner producing repeating patterns. Conversely, a random distribution distributes evenly the aliasing energy but fails to respect the minimum distance requirement: this results in low frequency noise which is distracting for the eye.
A stratified jittered distribution does a better job with respect to the minimum distance requirement but does not completely fullfill it: there is no way to avoid that two samples from adjacent strata do not come too close. Most of low frequency noise is eliminated but not all. Finally, low-discrepancy2 distributions try to obtain stratified samples while avoiding clumps. They exhibit the best results.

There are many low-discrepancy distributions (Halton, Hammersley, Van Der Corput, Sobol', …). The next XRT release will use (0,2) sequence3 to sample things like area lights, soft shadows, glossiness or translucency.

As a concrete example, let's compare how an area light shadow is rendered when sampled with a random distribution and with a (0,2) sequence.

Random sampling
Image Unavailable
rendered with random sampling

Low discrepancy sampling
Image Unavailable
rendered with low discrepancy sampling

The area light is sampled with only 4 samples to highlight sampling efficiency problems. Even without zooming the pictures, it is obvious that the shadows that surround the dresser are noisier with the random sampler. Quality matters !

Bibliography
1. Yellott, John I. Jr., "Spectral Consequences of Photoreceptor Sampling in the Rhesus Retina." Science, volume 221, pp. 382-385, 1983.

Comments: 0, Rating: 0


Misc gallery updates

Posted: 24 Dec 2010 15:05
Tags: gallery

Sunflower.jpg

With the latest XRT release, not only ambient occlusion is a lot faster thanks to caching, but the image quality has improved due to the better statistical properties of the samples distribution. Visually speaking, a jittered stratified distribution requires approximately half the number of samples compared to a purely random distribution for the same result.

This is best examplified in the Structure Synth gallery which has been extended and updated.

Illum2%20-%201%20bounce.jpg

I have also started a new gallery to illustrate color bleeding. Have a look here.

The BMRT gallery layout have been revised and a few examples have been updated.

Comments: 0, Rating: 0


XRT 1.1.0 released

Posted: 08 Dec 2010 22:47
Tags: caching downloads irradiance

indirect.jpg

XRT 1.1.0 major new feature is the support of indirect lighting between diffuse surfaces (also called color bleeding). Indirect lighting is the phenomenon in which objects or surfaces are colored by reflection of light from nearby surfaces. For instance, a red carpet next to a white wall gives a pink tint to the wall.

Computing indirect lighting is really simple: sample the hemisphere centered on the intersection point and oriented according to the surface normal by firing rays, shade each intersection and average the results. Because sampling is a random process, it produces noise which can only be smoothed out by taking many samples: 256 samples at a single location is a common number. Even worse, shading the intersected surfaces may as well spawn a new batch of rays for reflection, refraction, shadows or … indirect lighting, and so on recursively. It only stops when a ray reaches a light source or a perfectly diffuse surface. One can easily figure out that the number of rays can grow out of control.

Fortunately, there are a number of techniques to limit this exponential growth. One of them, irradiance caching, is based on the fact that diffuse lighting varies much more slowly than specular lighting and is a prevalent effect for most scenes (yes, a swimming pool is a perfect counter example). Therefore, there is no need to spawn specular rays when shading a surface hit by a diffuse ray and it is not necessary to compute diffuse lighting at every hit point: interpolating between results from sparsely distributed locations is enough. For that purpose, you need a caching mechanism.

XRT implementation of irradiance caching is based on "Practical Global Illumination with Irradiance Caching" from Jaroslav Krivánek and Pascal Gautron, available here, with (nearly) all bells and whistles1. Some restrictions apply:

  • neighbour clamping is implemented but disabled because it is currently much too slow.
  • last query reuse is not implemented
  • multiple bounces caching is not implemented
  • motion blur is not handled

Finally, XRT is now linked against libtcmalloc, Google replacement for Microsoft memory allocator (link). Without changing a single line of code, I have seen an average 20% performance improvement on rendering times. Not too bad …

As usual, the goods are available in the Downloads section.

Comments: 0, Rating: 0


Geek gift or Greek gift?

Posted: 28 Aug 2010 14:06
Tags: implicit surface

chmutov97.jpg

Most renderers deal only with a very limited subset of primitives: points, curves, polygons, nurbs, subdivision surfaces. Beyond this tiny set, there exists an almost ignored mathematical realm: implicit surfaces.

An implicit surface in 3D is defined as the set of solutions of an equation $f(x,y,z) = 0$ where $f$ can be any algebraic or non algebraic expression. It is called implicit because you cannot compute explicitely points on the surface: you have to solve for the equation first. The fact that $f$ can be pretty much anything you want lets you guess the variety of shapes that can be explored. As you can see in the implicit surfaces section of the gallery, most of them are of little practical use but have real aesthetic value.

Intersection algorithm

XRT implicit surface plugin implementation is based on a publication by Knoll et al. [1] where the inclusion property of interval arithmetic is used to define an efficient rejection algorithm briefly summarized below1.

A fundamental theorem of interval arithmetic states that, for any function f defined by an arithmetical expression, the corresponding interval evaluation function F is an inclusion function of f. In other words, if you define X, Y, Z as three intervals containing respectively the bounding values for x, y, z and evaluate $f(X, Y, Z)$ using interval arithmetic, the resulting interval is guaranteed to contain all possible values of $f(x, y, z)$ for any combination of (x, y, z) within the bounding box defined by (X, Y, Z). Using bounding boxes defined by ray segments and a recursive bisection algorithm (a binary search), it is possible to compute an intersection:

  • if a interval evaluation does not contain 0, the input ray segment is discarded.
  • if it does, the input ray segment is subdivided in two halves which are in turn evaluated.

When an interval evaluation contains 0 and the desired precision is reached, an intersection is returned. If no interval evaluation contains 0, there is no intersection.

XRT implementation is using GOAL for interval arithmetic computations.

Usage

Shape( "implicit") instantiates the "implicit" shape plugin and creates a new implicit surface. Accepted parameters are:

  • "string name": the name of the surface. This parameter is required.
  • "string function": the arithmetic expression in C code that defines the implicit surface. Using this C code, the "implicit" plugin generates a function plugin that provides a floating point evaluation function and an interval evaluation function. This parameter is required.
  • "float[6] bounds": the domain [xmin, xmax, ymin, ymax, zmin, zmax] where the surface is defined. This parameter is required because most implicit surfaces have an infinite domain. It also serves as a starting point for the bisection algorithm.
  • "float precision": the bisection algorithm stops recursing when an interval containing 0 with this width is met. This parameter is optional. The default value is 0.001 and is small enough for most functions.

Pyg format example:

Shape("implicit", "string name", "dingdong", 
    "string function", "sqr(x) + sqr(y) + z * (1 - sqr(z))", 
    "float[6] bounds", (-3, 3, -3, 3, -3, 3))

defines a implicit surface "dingdong" whose equation is $x^2 + y^2 + z(1 -z^2) = 0$ over the domain [ -3, 3 -3, 3 -3, 3].

This surface (and many more) is available in the implicit surfaces section of the gallery.

Optimizing render times

There are two main rules to decrease render times:

  • provide tight bounding boxes. Interval evaluation is a costly operation. Computing these evaluations for void space wastes CPU cycles.
  • provide optimized code for function evaluation. Minimizing the number of operations when evaluating a function interval is a key point which will yield huge savings:
    1. Try to factorize expressions.
    2. Remember that pow() is a very expensive operation and that sqr() or * are not.
    3. Use Horner's rule to rewrite polynomial expressions.

Let's provide two significant examples with Chmutov surfaces:

  • The Chmutov surface of order 6 is defined by the equation $T{_6}(x)+T{_6}(y)+T{_6}(z)=0$ where $T{_6}(x) = 32x^6-48x^4+18x^2-1$. Using rule 1, this expression can be rewritten as $T{_6}(x) = 2x^2(3-4x^2)^2-1$.
  • The Chmutov surface of order 7 is defined by the equation $T{_7}(x)+T{_7}(y)+T{_7}(z)+1=0$ where $T{_7}(x) = 64x^7-112x^5 +56x^3-7x$. Combining rules 2 and 3, this expression can be rewritten as $T{_7}(x) = x(x^2 (x^2(64x^2-112)+56)-7)$.

In both cases, rendering times with these optimized expressions are decreased by an order of magnitude over "naive" expressions.

Requirements

To generate a function plugin, you need to have a C++ compiler installed (see Downloads section for more details). However, the implicit shape plugin will not rebuild an already generated function plugin.

Bibliography
1. A. Knoll, Y. Hijazi, C. D. Hansen, I. Wald, and H. Hagen. "Interactive ray tracing of arbitrary implicit functions". In Proceedings of the 2007 Eurographics/IEEE Symposium on Interactive Ray Tracing, 2007.
2. Jag Mohan Singh and P. J. Narayanan, "Real-Time Ray-Tracing of Implicit Surfaces on the GPU". Tech Rep. IIIT/TR/2007/72. July 2007

Rendering implicit surfaces is a bit addictive. Internet ressources are so numerous that you end up filling your hard drive with pictures, rising your CPU to insane temperatures and depriving you from sleep without ever paying attention.

"Abandon all hope, ye who enter here."

Comments: 0, Rating: 0