True multithreading in ShiVa pt.3: Object threads

Welcome to the last part of our multithreading tutorial series. In part 3, we are going to have a look at the unique challenges in Object AI threads and think about scaling our system from a single consumer (User AI) to potentially hundreds of objects in a single scene.

Extra layers of complexity

Using threads in Object AIs introduces additional layers of complexity compared to User AIs. S3DX::application.getCurrentUser() is pretty much safe to call from your result vector loop for as long as the application runs. AIModels on the user are also unlikely to change, unless you explicitly manipulate AIs in the user stack at runtime. With Object AIs on the other hand, there are very few guarantees, and you have to keep the following things in mind:
– Objects may appear and disappear at any time (e.g. rocket explosion removes rocket model, scenes change, etc.).
– Objects, although instances of the same model, may carry different AIs.
– The initial values and current state of these AIs are most likely different from object to object.
– Since there is only 1 current user, you have a lot of control over how many threads there will be. The number of potential objects creating new object threads on the other hand will be changing all the time.
– Since the number of calling objects is unknown and changing, your thread code has to scale well. Avoid using lots of shared data combined with mutex locks.
– States with global variables become impractical with a large number of objects calling threads, so the event driven system must be used.
– naming threads by hand (identifier strings) is no longer feasible. An automatic naming mechanism must be implemented.
– ShiVa engine APIs are largely unavailable in threads.

States vs Events in Object AIs

In part 2 of this tutorial series, we used both state loops and event handlers to signal to the main game loop whenever our threads had finished their calculations. Our state implementation scales very poorly because of its use of global variables. States also introduce a lot of overhead through the requirement for every state to have its own busy waiting loop. If you want to use multithreading in Object AIs, it is very much recommended that you focus on event handlers only, which is also what this tutorial will do.
Since you only need a single busy loop in the event driven system, no matter the number of objects, the loop needs to be outside of the object AIs. Creating a single thread manager AI and putting it in the user stack is usually the best solution:

Identifiers, Object handles and threads

All threads we are going to create will use the detach() method to turn them into background operations. Once detached, these threads will lose all connection to the functions and objects that have spawned them. However, when the calculation inside the thread has finished, we need to notify the main game thread with some sort identifier, so the result of the thread can be associated with the object that spawned it. We need to use a shared identifier between the thread and the object in some way. Why don’t we uise the unique object handle, for instance through this.getObject() ? While it is possible to use object handles (this.getObject() etc.) as parameters in plugin function calls, this will stop working as soon as the called function returns, which happens all the time with detached threads called from those functions. The handles/references to the objects will become invalid. To say it with actual code, this will work:

    S3DX::AIVariable hObj   = ( iInputCount < _iInCount ) ? _pIn[iInputCount++] : S3DX::AIVariable ( ) ;
    S3DX::AIVariable sAI    = ( iInputCount < _iInCount ) ? _pIn[iInputCount++] : S3DX::AIVariable ( ) ;
    S3DX::AIVariable sEvent = ( iInputCount < _iInCount ) ? _pIn[iInputCount++] : S3DX::AIVariable ( ) ;
	S3DX::object.sendEvent(hObj.GetHandleValue(), sAI.GetStringValue(), sEvent.GetStringValue(), 0,0,0);

But the next snippet will not work, as the hObj reference becomes invalid as soon as we detach and return:

	std::thread Thr(objectThreadFunction, hObj, sAI.GetStringValue(), sEvent.GetStringValue());
	Thr.detach();

There are three ways around this issue however. First, we could use manual identifier strings like we did with User AIs:

    if not thr.launchNamedPrimeThread ( "ShiVaThread1", 130000 ) then log.warning ( "T1 launch failure" ) end
    if not thr.launchNamedPrimeThread ( "ShiVaThread2", 140000 ) then log.warning ( "T2 launch failure" ) end
    if not thr.launchNamedPrimeThread ( "ShiVaThread3", 150000 ) then log.warning ( "T3 launch failure" ) end

This becomes impractical really fast however, since these IDs would have to be unique, in essence requiring you to design a specific AI for every single object you have in the scene. Alternatively, you could write a unique name generator function that creates these names for you based on unique properties of the object itself. Luckily, you don't have to, because as it turns out, ShiVa does provide this feature natively:

// transform ShiVa object handle into a unique string
std::string hObjHash(S3DX::object.getHashCode(hObj).GetStringValue());
// now you can use this string in a thread
// once you are done with the thread, use the string to notify the correct object:
// get object handle from hash
auto hObj2 = S3DX::scene.getObjectFromHashCode ( S3DX::application.getCurrentUserScene ( ), hObjHash.c_str()) ;
// send event notification to object - don't forget AIModel name, event name and payload
S3DX::object.sendEvent(hObj2, sAI.c_str(), sEvent.c_str(), some_data);

In ShiVa, every object handle can be converted into a unique hash code through the API function object.getHashCode(hObj). If you call S3DX::object.getHashCode(hObj).GetStringValue(), you can pass the resulting const char* / string to your thread. The function returns an ID that is guaranteed to be unique in the current scene. No other objects in that scene will have the same ID, even if they are destroyed and others are created. But this ID will not be identical on 2 different computers: The same object, on another computer, will have a different hashcode, so you cannot compute all static handles once and then use a lookup table for your objects. But you can even make your code simpler and faster by using yet another technique. You can prevent object handles from becoming invalid by turning them into static handles. Static handles will persist across engine/game frames in structs and threads.

// example: acquiring static handle
auto hStaticObjectHandle = S3DX::object.getStaticHandle ( hRuntimeObjectHandle ) ;
// example: retrieving
auto hRuntimeObjectHandle = S3DX::object.fromStaticHandle ( hStaticObjectHandle ) ;
if ( hRuntimeObjectHandle ) {
    // Do something with hRuntimeObjectHandle, such as S3DX::object.setVisible ( hRuntimeObjectHandle , true )
}
// example: release
S3DX::object.releaseStaticHandle ( hStaticObjectHandle ) ;

It is very important that all calls to S3DX::object.getStaticHandle have the corresponding S3DX::object.releaseStaticHandle call when the game ends, otherwise the objects will never been released and you have created a memory leak. Also, static handles are reference counted. Acquiring 2 static handles will require you to release them 2 times to actually free the object, so you cannot accidentally free a handle that is used by another thread/function. Therein also lies the trade-off with static handles: They are very fast and do not require extra hash calculation and lookups, but if you crash, return early from a function, throw an exception or just forget to release your handles, you will leak memory.

Example

Let's have a look at a working example next. In our test scene, we have a couple of instances of a cube model, all sharing the same shape, AIModels and so on. Our goal will be to override the material colors in subset 0. Every object launches a thread from its "mtobj" AIModel:

--------------------------------------------------------------------------------
function mtobj.onLaunchColorThread (  )
--------------------------------------------------------------------------------
    thr.objectThread_colors ( this.getObject ( ), "mtobj", "onGetColorThread" )
--------------------------------------------------------------------------------
end
--------------------------------------------------------------------------------

This call sends the (runtime) object handle and information about the target AIModel and handler name to the plugin. The plugin function then takes the handle, transforms it into a static handle, and hands it off to a thread, along with the AI and Event names. A quick note, I am using a list container here instead of a vector, because I want to be able to easily call pop_front() later on in the thread loop function, which vector does not support this feature. List in general is a much slower container though, so unless you want to pop single structs off the list like I do, vector is the better container, like we showed you in the previous two parts of this tutorial series.

#include 
typedef struct {
	S3DX::AIVariable static_hObj;
	std::string sAI;
	std::string sEvent;
	float r;
	float g;
	float b;
} objColor;
std::list lColor;
int Callback_thr_objectThread_colors ( int _iInCount, const S3DX::AIVariable *_pIn, S3DX::AIVariable *_pOut )
{
    S3DX_API_PROFILING_START( "thr.objectThread_colors" ) ;
    // Input Parameters
    int iInputCount = 0 ;
    S3DX::AIVariable hObj   = ( iInputCount < _iInCount ) ? _pIn[iInputCount++] : S3DX::AIVariable ( ) ;
    S3DX::AIVariable sAI    = ( iInputCount < _iInCount ) ? _pIn[iInputCount++] : S3DX::AIVariable ( ) ;
    S3DX::AIVariable sEvent = ( iInputCount < _iInCount ) ? _pIn[iInputCount++] : S3DX::AIVariable ( ) ;
    // Output Parameters
    S3DX::AIVariable bOK ;
	if (hObj.IsNil()) {
		bOK.SetBooleanValue(false);
	}
	else {
		// launch thread
		std::thread _(changeColor, S3DX::object.getStaticHandle(hObj), std::string(sAI.GetStringValue()), std::string(sEvent.GetStringValue()));
		_.detach();
		bOK.SetBooleanValue(true);
	}
    // Return output Parameters
    int iReturnCount = 0 ;
    _pOut[iReturnCount++] = bOK ;
    S3DX_API_PROFILING_STOP( ) ;
    return iReturnCount;
}

The thread function itself holds no surprises:

void changeColor(S3DX::AIVariable && staticHandle, std::string && sAI, std::string && sEvent) {
	auto r = S3DX::math.random(0, 255);
	auto g = S3DX::math.random(0, 255);
	auto b = S3DX::math.random(0, 255);
        // pretend this function takes a long time to compute
	using namespace std::chrono_literals;
	std::this_thread::sleep_for(3s);
	// lock write access
	std::lock_guard _(mu_objList);
	lColor.push_back({std::move(staticHandle), std::move(sAI), std::move(sEvent), std::move(r), std::move(g), std::move(b) });
	return;
}

Next, we need a loop that iterates through the list every frame and dispatches the events to all object AIs. As mentioned before, this could be best achieved with a thread manager AI running in the user stack. The contents of the function could look like this:

int Callback_thr_threadEventHandlerLoop ( int _iInCount, const S3DX::AIVariable *_pIn, S3DX::AIVariable *_pOut )
{
    S3DX_API_PROFILING_START( "thr.threadEventHandlerLoop" ) ;
	{
		std::lock_guard _(mu_objList);
		if (!lColor.empty()) {
			const auto & elem = lColor.front();
			// feel free to add checks for the vailidity of sAI/sEvent and !nil for the runtime handle before sending the event
			S3DX::object.sendEvent(S3DX::object.fromStaticHandle(elem.static_hObj), elem.sAI.c_str(), elem.sEvent.c_str(), elem.r, elem.g, elem.b);
			lColor.pop_front();
		}
	}
	// [...] other thread result containers can be queried here, like the prime number calculation we did in part 2
	// so you will ever only have one busy loop for threads
    S3DX_API_PROFILING_STOP( ) ;
    return 0;
}

As you can see, this function is a little different from the last two result checking loops. Instead of taking the whole range of results and flushing the entire queue after we are done, we only take a single element off the front of the list and dispatch its event. The next element in the list will be processed in the next engine frame. Since object operations are usually heavier than merely outputting text like the result of a prime number calculation, I chose this way to distribute the load evenly over a number of frames. On the flipside, changes in objects appear one after the other at a rate that matches your FPS. Which way is better? It really depends on your workload and how heavy the operations are you are performing on your objects. If you have only light functions like color changes, feel free to use a vector and flush the entire range every frame. If your workload is heavier, try to distribute it. To complete the example, we need to write a handler that takes the results and adjusts the object colors. Naturally, this could be done in Callback_thr_threadEventHandlerLoop if you wanted some extra performance, but for the sake of this tutorial, we are going to return the values to a proper event handler instead.

--------------------------------------------------------------------------------
function mtobj.onGetColorThread ( nR, nG, nB )
--------------------------------------------------------------------------------
    shape.overrideMeshSubsetMaterialAmbient ( this.getObject ( ), 0, nR/255, nG/255, nB/255, 2 )
    log.message ( "Material received " ..object.getHashCode ( this.getObject ( ) ) ..": " ..nR .." " ..nG .." " ..nB .." " )
--------------------------------------------------------------------------------
end
--------------------------------------------------------------------------------

Limited S3DX API access

Wouldn't it have been much easier if we simply called sendEvent() from the thread? Unfortunately, our threads have no idea that they are being spawned by a ShiVa application. Neither do they have access to objects (unless using static handles), scenes, or users. In other words, these functions of the ShiVa API cannot be called from threads. Any call to application.getCurrentUser() or getCurrentUserScene() will fail with the notice, "no application is running". You can still use ShiVa API calls in your threads though, like we did above with S3DX::math.random(), just be aware that your choices are limited.

Summary

Threads can be spawned from objects too, not just users. In order to identify your objects, you either need to generate their hashcodes or get their static handles. If you take the fast-but-dangerous approach with static handles, double-check for memory leaks.
If you are performing intensive operations on your objects, it might be a good idea to spread the processing of the results out over multiple frames by evaluating only a single element in your result queue. You can compute the results for another element in the next frame. If you decide on doing that, a vector is no longer your container of choice.