Shaders often need to store data persistently. Init shaders may construct tables, preprocess parameters, or prepare other data that is common to all future shader calls and can be done once, in advance, instead of repeating the computation for every shader call. This can improve performance significantly if a shader is called millions of times for rendering a frame. There are three classes of storage: constant data, per-instance data, and per-thread data.
If the data is constant, such as a random number lattice for Perlin noise functions, it can be stored in a static array that is built in the init shader and deleted, if necessary, in the exit shader.
static struct mystruct *mydata; DLLEXPORT void myshader_init( /* init shader */ miState *state, struct myshader *paras, miBoolean *inst_req) { mydata = mi_mem_allocate(...) ...initialize *mydata... return(miTRUE); } DLLEXPORT void myshader_exit( /* exit shader */ miState *state, struct myshader *paras) { mi_mem_release(mydata); return(miTRUE); } DLLEXPORT void myshader( /* main shader */ miState *state, struct myshader *paras) { ...read-only access to *mydata... return(miTRUE); }
This form of data storage is very simple, but can be used only if the data is constant for the current frame, and does not in any way depend on shader parameters or the shader instance. This means that the shader body may not modify the data because it is unpredictable in which order shaders are called. Worse, the shader may be called in multiple threads simultaneously, so writing to the data could become interleaved in ways that are very hard to reproduce and debug. Locking is not a solution because that would mean that shaders would have to ``wait in line'' to gain access to the data, which seriously degrades parallel performance.
If the data is separate for each shader instance, it can be created in the init shader and attached to the shader instance itself. This is the recommended method for preprocessing shader parameters into a form that simplifies later shader calls during rendering. For example, if a shader parameter specifies an angle in radians or degrees, converting the angle to the cosine of the angle once in advance can save millions of expensive cosine computations in the main body of the shader. Here is an example:
DLLEXPORT void myshader_init( /* init shader */ miState *state, struct myshader *paras, miBoolean *inst_req) { if (!paras) *inst_req = miTRUE; else { struct mystruct **mydata; mi_query(miQ_FUNC_USERPTR, state, 0, &mydata); *mydata = mi_mem_allocate(...); (*mydata)->cos_angle = cos(*mi_eval_scalar(¶s->angle)); ... } } DLLEXPORT void myshader_exit( /* exit shader */ miState *state, struct myshader *paras) { if (paras) { struct mystruct **mydata; mi_query(miQ_FUNC_USERPTR, state, 0, &mydata); mi_mem_release(*mydata); } return(miTRUE); } DLLEXPORT void myshader( /* main shader */ miState *state, struct myshader *paras) { struct mystruct **mydata; mi_query(miQ_FUNC_USERPTR, state, 0, &mydata); ...use (*mydata)->cos_angle... return(miTRUE); }
Note that the init shader first requests instance initializations by
setting inst_req to miTRUE, and creating the instance data
for each instance initialization, indicated by a nonzero paras
pointer. (Refer to page for information on the
difference between shader initialization and shader instance
initialization.) Similarly, the exit instance shader deletes the data.
Note that this rely on the angle shader parameter to be constant. It is obtained only once with mi_eval_scalar, so this parameter cannot be attached to another shader that computes a new angle for each shader call. Since this is often desirable, the example above is a bit contrived.
Sometimes shaders need to share data, but need to modify it. Sharing would allow using the first method for constant data, but that does not permit changing the data during rendering because no consistent writes are possible - multiple instances of the shaders in different threads may write to the same data simultaneously. This can corrupt even simple operations such as incrementing a variable, because the execution order is unpredictable. The sequence readA writeAreadB writeB in threads A and B works, but there is no way to avoid the sequence readA readB writeA writeB. This problem is called a race condition, and it can cause one increment to be lost, in rare and hard-to-debug cases. Locking would prevent that but may cause an unacceptable performance loss.
mental ray 2.1 allows solving this problem by allocating an array with one member per thread in the init shader. The number of threads could be obtained by calling mi_par_nthreads, and it was guaranteed that no thread with a thread number state - > thread outside that range would ever call a shader. However, mental ray 3.x no longer makes this guarantee; the number of threads may change at any time so mi_par_nthreads is deprecated. It is still available but always returns 65, which may allow unported shaders to limp along on hosts with few (say, up to 16) CPUs. This means that shaders have to implement a hashing scheme to use the array method in mental ray 3.x. To simplify this, mental ray 3.1 introduces a standard mechanism for thread-local storage.
Thread-local storage avoids the race condition by providing one copy of the data to each thread. Multiple threads can execute simultaneously but with any single thread the execution is strictly sequential, so that the read/write race condition cannot happen. Here is an example that counts shader calls:
DLLEXPORT void myshader( /* main shader */ miState *state, struct myshader *paras, miBoolean *inst_req) { int *counter; mi_query(miQ_FUNC_TLS_GET, state, miNULLTAG, &counter); if (!counter) { counter = mi_mem_allocate(sizeof(int)); mi_query(miQ_FUNC_TLS_SET, state, miNULLTAG, &counter); *counter = 0; } (*counter)++; ... return(miTRUE); } DLLEXPORT miBoolean myshader_init( miState *state, struct myshader *paras, miBoolean *init_req) { *init_req = miTRUE; return(miTRUE); } DLLEXPORT void myshader_exit( /* exit shader */ miState *state, struct myshader *paras) { int **ptrs; int num, i, total = 0; if (!paras) return(miTRUE); mi_query(miQ_FUNC_TLS_GETALL, state, miNULLTAG, &counters, &num); for (i=0; i < num; i++) { total += *counters[i]; mi_mem_release(counters[i]); } mi_info("myshader was called %d times", total); return(miTRUE); }
The thread-local data is a single integer that counts shader calls in this thread. Since init shaders are called once per shader or once per shader instance, but not once every time the shader is called in a new thread, the data cannot be installed and initialized in the init shader. Instead, it is created in the main body if it did not already exist. This is safe because no two threads will get the same pointer returned by miQ_FUNC_TLS_GET. (Note that setting *counter to zero is actually redundant because mi_mem_allocate always returns zeroed memory.)
The example exit shader collects all the thread-local counters of all threads that installed a counter, and computes and prints the total. It is done during shader instance exit, not the shader exit, by checking that paras is nonzero. This requires shader instance init/exit to be enabled in the init function by setting init_req to miTRUE.
This will only work on a single host because each host exits its own shaders, and there is no way to communicate the counters between hosts. Moreover, slave hosts may come and go3.1, and may call their exit shaders multiple times for a single frame.
Thread-local shader storage relies on three new mi_query modes:
The second argument to mi_query must be the shader state, and the third must be a null tag. A mi_query call with these modes in mental ray 2.1 and 3.0 will return miFALSE.