Tutorial 3.5: Tracking and Stabilizing

This tutorial covers the three tracking nodes, Transform - Tracker, Stabilize, and MatchMove, and has two main parts. The first part, consisting of several steps, creates some text, and then randomly moves it around to generate a faux unstable element, since we didn't want to ship out another 8 Mb of tutorial images. We then apply a Stabilize node to return it to a locked down element. This is to show you the basic mechanics of the tracker, and how to apply a Stabilize, so don't get too worked up that it isn't really a common production workflow to procedurally undo what you just made. It also introduces the 2.2 Time Bar.

The second part of the tutorial reads in a clip of a bus and an image of our first SIGGRAPH sign (ebay Item Number 8940204037), and slaps the sign on the bus side with a MatchMove node. This focuses on taking care of inevitable tracking corrections, and how to generate a matte for the sign using the QuickShape node.

For more detailed information on tracking, jump to About Tracking. For formatting examples of the three tracking nodes, jump to the Tracker, Stabilize or MatchMove reference pages.

Using the Time Bar

The purpose of a Stabilize function is to lock down a shot into a stable image with no apparent transformations on it. This is either done to correct for mishaps or limitations in the filming process (i.e., you left your vibrating pager on the camera, you forgot your SteadiCam rig when you were videotaping the street gang lifting your car hubcaps, etc), or you need the footage to remain stable for another process (extracting texture maps from a clip, adding paint touchups, etc). Another technique, described in the About Tracking section, details how to remove high frequency noise from a camera move, but retain the move itself.

 

Since our example is really just to introduce you to the mechanics of tracking in Shake, we will do something a little simple: Create an Image - Text node, and the attach a Transform - CameraShake node. We now have an unstable element of infinite time length.

 

To view it, notice that the Time Bar goes from 1 to 100, with some highlighting to show us that our global timeRange is set to 1.

Go to the Globals tab, and enter a timeRange of 1-40 for a 40 frame script. The highlighting will change in the Time Bar to indicate the change:

 

 

Hit Home in the Time Bar, or hit the Home button on the right side of the Time Bar, or enter 40 into the Time Bar end frame textfield (left of the Current textfield). All of these will scale the Time Bar to fit a range of 1 to 40:

 

 

Hit the play button on the Time Bar, and Shake will play through the animation as fast as it can. It will also cache them away if there is enough cache space. No, there isn't any respect of fps at all. The frames it plays through reflect the range in the Time Bar, not what is in the timeRange global parameter. Notice the second time it plays through there is a slight speed boost as images are pulled out of cache rather than calculated on the fly.

If you Shift+click on Play, it will load them into memory and play much faster the second time. However, you get no caching benefit.

To stop the playback, click anywhere on the interface.

 

Anyway, this is a long-winded way of saying, "See how the CameraShake node moves our text around? How awful. We better stabilize that"

Tracking and Stabilizing the Text

Now to undo what the CameraShake is doing with a Stabilize node. Once again, the only purpose of this is to show you the basic mechanics of the tracking nodes. I can think of no other reason why somebody would apply a CameraShake node followed by a Stabilize node.

 

Make sure you are on frame 1.

Now apply a Transform - Stabilize node onto the CameraShake1 node.

You should see a red tracker appear down and to the left from the image center.

If you put the cursor over the tracker, more detail will be highlighted on the tracker.

The inside cross hair is the actual tracking point. This is where the X,Y pixel coordinate is read from. The inside box is the reference pattern. This is the pattern that Shake looks for to find matches between frames. It compares the pattern in that box on frame 2 (and frame 3, 4, etc) with the initial reference pattern taken on frame 1 to see if a match can be made. The outside box is the search region. Shake will look for good patterns only inside of the search region. It is generally a good idea to keep these patterns small to keep your tracking processing time down.

If you drag on a blank spot inside of the tracker or on the track point itself, you can move it. If you grab a corner of either the reference pattern or the search region, you resize that box. If you grab an edge of one of these boxes, you can scale the box non-uniformly. With this in mind, select a point to track on the text. In this case there is no detail per se, since it is white text on black, so look for a pattern that cannot be easily be found elsewhere by shifting in the X or Y axis. This image, for example, shows a bad reference pattern, since this pattern can easily be found in other places by sliding up and down on the Y axis.

Instead, I select the inside of the "e" for my tracking area, since it is relatively small and the pattern cannot be easily shifted in any direction.

Now, check out the Stabilize parameters and look for the trackRange parameter near the bottom. It should say 1-40, because you created a potentially infinite clip, the Text node, so Stabilize entered your global timeRange automatically. If you were tracking an image clip, it would set your trackRange to match the clip's in/out points. If you ever start tracking and it does something for one frame and then stops, check to see if your trackRange is describing a valid frame range. For example, we couldn't track frames 50-60 right now, because that is outside of our trackRange.

If you have confirmed that you are on frame 1, hit the track forward button, the right button of the Viewer's two track buttons: . This will start the tracking process, which will continue until one of the following conditions is reached: the trackRange is reached; your reference pattern can't be found within the search region, which will force a stop by default; or you hit the Esc key. This process ideally needs to be done only once, after which all of the tracking information is stored in the tracking node and can be reused by any other node in Shake - no reprocessing is done. I say "ideally", because of course you often have to make adjustments to the track to ensure accuracy.

You might have noticed that only the portion of the image within the search region is loaded each frame. This is due to the limitProcessing toggle (above the trackRange parameter), and it has no effect on the output image.

 

Because our motion is rather simple, the track should be fine, so we can now apply the stabilization transformation in the node by toggling on the first parameter, applyTransform. Before we do this, the node has merely held the tracking information. When applyTransform is on, the stabilization occurs based on the tracking information.

Now launch a RAM flipbook in the Viewer to see the newly stabilized plate.

MatchMoving

This is a more practical example of using tracking information to place one foreground image so it follows detail on a second background image by matching the background's movement. That's where we came up with the devilishly clever name MatchMove. In this example we read in a clip of a moving bus and then, naturally, our company name, and slap it on the side of the bus. The other point of this tutorial is to show you some solutions for dealing with tracking problems you will run across

Go to File - New Script or hit Ctrl+N to clear away the old nodes.

Read in the bus2 clip (nope, there is no bus1) under the doc/pix/bus directory, consisting of 39 frames of jpegs, and read in doc/pix/nr_sign.jpg. The sign is 640 x 480 pixels, and the bus is 720 x 486 pixels.

The sign is what we used at our very first SIGGRAPH booth in '98. Hey, where's my crying towel?

Here is what we are trying to achieve:

The first step is always to examine the clip itself to find good search patterns. Luckily, the sign of the bus is visible during the entire clip, although there is some distortion due to perspective shifts.



Attach a Transform - MatchMove node to the nr_sign node. Unlike other Transform nodes, MatchMove allows you to do compositing within the node. The tracking will be done on the second input.

Although eventually we will be using four trackers to cornerpin the foreground into four corners created by the trackers, we will start with only one tracker for the sake of simplicity. We will be doing lots of intentional mistakes so that you will better recognize them when they occur with your own footage.

Put the tracker in the lower left corner of the sign on the bus side. Leave the tracker at the default size, but go ahead and precisely locate it on the corner. You can zoom in by using the +/- keys by the Backspace key and using the cursor as an aiming device. Hit Home to return to the normal size.

Make sure the trackRange says 1-40 and hit the track forward button on the Viewer.

Doh! It probably went about two frames and then stopped. If you notice the last number in the text associated with the tracker, you will see that the third coordinate says "c=.2". This means the correlation, or accuracy of the track A score of 1 means a perfect correlation with the reference pattern between frame 2 and frame 1 (frame 1 being the reference pattern). A score of 0 is amazingly bad. On my frame 2, I have a correlation of .2. This is bad.

The reason it failed so spectacularly is because our search region was not big enough to encompass the movement in the clip - the bus moved too far to the right. In this illustration, the desired area, marked in green, extends beyond the zone of the red search region.

To correct this, go back to frame 1, and grab the corners of the search region to scale it outwards. If you scale it too big, you will waste too much time processing. If you make it too small, you won't be able to find good patterns. I have increased the search region size, but I also notice that there is a black line along the bottom of the pattern.

There is a chance due to the perspective shift in later frames that the tracker will confuse the two lines. For this reason, I shrink the height of the pattern to make sure the lower black line is not in it. I also shrink the reference pattern to better fit the corner.

OK, so let's try this again. Hit the track forward button again. This will write over any previous keys you have created.

Gee, and everything was going fine, until around frame 34, right? At around that frame (results will of course vary), the tracker spitefully and evilly decided that the words on the sign were the best match to the original reference pattern at frame 1. This is probably due to the fact that we are constantly comparing back to frame 1. As the bus moves away, the sign gets smaller. At a certain point, the words match the original thickness of the sign border from frame 1 better than the border itself.

This matching behavior is controlled by the referenceBehavior setting in the tracker. By default, it is set to use start frame, which means that it will sample the image at the frame you start tracking. This is not always the same as the first or last frame in the trackRange, because you can stop tracking midway through a clip, and start it up again at that frame.

There are two settings of referenceBehavior which can probably help us here, since there are scaling changes in the image. They are update every frame, which uses the sample of the previous frame for the reference pattern, or update if below reference tolerance, which checks the correlation of every frame. If the correlation falls below the referenceTolerance value (right above referenceBehavior), then it uses the previous frame as the reference pattern. It continues to use that pattern until the correlation once again dips below the referenceTolerance. Generally, the second one (update if below...) is more accurate because you get inherent drift with update every frame as all of the tiny errors accumulate. Therefore, set it to update if below reference tolerance.

Although we can track from frame 33 (the frame before the tracker planted), it is better to go find a frame with a high correlation. Ideally, this is of course frame 1, but I found a correlation of .98 at frame 14, so I go to frame 14, and hit track forward again. You should get a good track out of this.

Well, that's one track. Now the other three trackers.

 

Go to the top of the MatchMove parameters and switch the trackType from 1 point to 4 point. This will turn on the extra three trackers, which are pre-positioned on the image. Each one is located in the relative corner where it should stay, i.e., the upper-left corner of the foreground image will be connected to track4 of the MatchMove. Keep in mind that Shake uses Cartesian coordinates for everything, so the trackers are ordered according to the following image.

Yes, we know this is different from some other prominent systems.

Anyway, on our image, you can see the four trackers. track1 we have already set, and you can see its key frames. Every time you hit the track button, all visible trackers will start to analyze the image. Therefore, we should turn off the visibility of track1.
At the bottom of the MatchMove parameters, you can see a list of all four trackers. You can change the name of any of them by clicking in the textfield (i.e., where is says "track1"). You can change the display color of a tracker by clicking on the color picker button. You can also toggle the visibility by clicking on the V button. In this example, track1 is now invisible, and therefore will not be overwritten when we track again.

Return to frame 1.

Following the same principles of the first tracker, position and shape each of the remaining trackers on the other corners of the sign:

 

 
Verifying that your referenceBehavior is still in update if below reference tolerance, and that track1 is still invisible, hit the track forward button to launch the tracking again. When it is finished, the three points should be reasonably accurate.

Applying the MatchMove

Now that we have four tracks plugged into the MatchMove, we can turn on the foreground element.

If you look at the outputType popup, you will see it is a list of several compositing operations. By default it is on Background, meaning of course that we only see the background image. If you want pass the transformed foreground without doing a composite, you would use Foreground of course. Because we are lazy, we will do the Over command right here, so switch it over to Over compositing mode.

Once you do, it will notice that the image isn't exactly an ideal composite. If you are doing any Over operation and the image seems to be getting brighter, you should immediately suspect the absence of a mask in the foreground, which of course is indeed the case since the sign image is a jpeg and they can't support alpha channels. This is easily remedied. Go to nr_sign's FileIn parameters, and turn on autoAlpha. This option examines the image for an alpha channel. If it doesn't find one, it sets the entire alpha channel to 1.

Now of course the foreground image covers most of the screen. This is because we haven't applied the matchmove transformation yet. Go back to MatchMove1's parameters, and turn on the first parameter, applyTransform.


This is a little better. The four corners of the Foreground image are snapped to the four corners of the trackers. If the image is twisted around, it is because you have done something like put track3 in track4's corner. You can unravel that in the next step.

You can notice that the four corners of the sign don't match the tracking points. This can be adjusted by toggling the Viewer over to FG display. Go to frame 1 and then hit the BG/FG toggle button / so you are on FG. This will toggle the display to the foreground image, and also display four corner markers. These four points correspond with what gets plugged into the four trackers. By default they are located in the four corners of the image.

Making sure you are at frame 1, grab them and place them on the four corners of the sign.

Now toggle the Viewer back to BG mode . Occasionally if you have done other operations before using the BG/FG toggle, you may have to switch your outputType back to Over or whatever other operation you were using.

Now the sign has been properly positioned (or so we think...) over the background, but we have to trim the image to get rid of the extra junk. We will use QuickShape to generate a matte.

Using QuickShape to Build a Mask

Since jpeg's don't have an alpha channel, we used the FileIn to generate one with the autoMask option. However, this made it so we see the entire image, not just sign itself. We will therefore use QuickShape to trace the shape of the sign. This section will only briefly go over the QuickShape node. For more information, jump to the QuickShape reference page.

Create an Image - QuickShape node. With a minimum of fuss, we want to look at the nr_sign node and enter parameters for the QuickShape in the Viewer. This isn't possible in the default setting, since Shake protects the user from modifying nodes in the Viewer that don't effect the resulting image. Normally, the solution is to connect the two nodes in question with a Layer node so some sort, but that wouldn't let me talk about the swell new Interactive Controls toggle we put in for 2.2, which looks like this on the Viewer: . Push it down so it looks like this: . This means you can now see the onscreen controls for any node regardless of which node you are actually evaluating.

Therefore, load the parameters for the QuickShape, then view the nr_sign node, and you should see the QuickShape onscreen controls.

Because the sign has kinda straight corners, toggle the Smooth/Linear button so you are in Linear mode: i.e., toggle this to this:



You will notice a large button on the Viewer that looks eerily similar to this one: . This is the mode you are in, which is either Build or Edit. The difference between the two is the behavior when you click on a blank spot. When you are in Edit mode, you are selecting points. In Build mode, you are inserting new points between the first point and the last point.

 
So, making sure you are on Build mode, click on the four corners of the sign (direction doesn't matter) - this will start drawing 3 segments, but won't draw the last segment.

To close the shape, toggle over to mode. This closes the shape, and allows you to do multiple selections of points. To deselect all points, click on a blank spot.

 
As we look at the segments, we can probably see that they don't exactly line up with the sign edges. Therefore, click on the segments to insert new knots and match up the edges more closely. To delete a point, select the point and click on the delete knot button .

To switch the points over to Smooth mode, you can drag-select the points you want to change, and then toggle the Smooth/Linear button back to Smooth .

OK, enough fooling around.

To attach the QuickShape as the alpha mask for the nr_sign, click on nr_sign and insert a Layer - SwitchMatte node, plugging the QuickShape in as the second image input. For the moment, leave matteMult on.

 

Evaluate the MatchMove again, and launch a flipbook with a frame range of 1-39 to test our tracking.

The track should probably look solid in the first three corners, but start to drift in the later frames for track4 in the upper left corner.

This sucks.

To fix this, I am going to re-track track4, but do it backwards. The first step is to turn off the other three trackers with the visibility toggles.


The second step is to switch the outputType back to Background so we don't have to bother processing the image while we track.

Next, although this isn't necessary, we are going to clear out track4. Click on the track4 textfield with the right mouse button, and select Clear Track from the popup menu. There are other swank functions here that you will want to check out in the About Tracking section, but I won't go into them here.

 

This removes the animation from the tracker, and resets the search region and reference pattern. Jump to frame 40, which is our last frame, and reposition the tracker. This time, scale the search region to the left, since the bus will be moving backwards to the left.

Now, hit the track backwards button . You should get a decent track going backwards. Toggle the outputType to Over, and it should be pretty solid.


Final Touches

As a final touch, I need to color correct the sign a little bit.

First, I turn off the matteMult option in the SwitchMatte. This leaves me with an non-premultiplied image. I then insert a Color - Compress and select the white off of the bus and the black from under the wheel wells. This ensures that my whites and blacks are in a similar range to the background. I then place a Color - MMult to premultiply the image, and apply a very slight dropshadow with Other - AddShadow.

 

Stabilize as an Alternative to MatchMove

I have saved the above script in doc/pix as bus_track.shk.

If you go back and tune the QuickShape while looing at the MatchMove, you will notice that the on-screen controls are not being properly passed through - the image is tracked, but the QuickShape is not. Additionally, you may be wanting to tune the corners of the sign while looking at the composite, which is something you can't do with the FG/BG controls in MatchMove.

The solution is to use the Stabilize node as an alternative to MatchMove. In Stabilize, there is a switch to declare if the data you are using is stabilization or matchmoving data. Just like MatchMove, you can generate 1, 2 or 4 tracks. We won't have to re-track. Instead, we will copy over our tracks we have already done.

Here is the general workflow for matchmoving with Stabilize:

This has several advantages:

To apply it to our tree, select AddShadow1 and Shift+click on Stabilize. This will send of a new branch. Normally, we would attach it to bus to start tracking, but we already have our tracks:

 

Now, we want to copy the tracks from MatchMove1 to Stabilize1. Right click on the trackNames in Stabilize1 and select Load Track:

A pop-up window will appear with a list of all tracks. On track1, we want to copy MatchMove1.track1. Select it and hit OK. You have now copied track1 from MatchMove1 to Stabilize1.

Repeat the process with tracks 2 through 4.

Once all four tracks are copied, activate the transformation with applyTransform and set inverseTransform to matchmove:

The next step is to composite it back over the background. Because we have copied the tracks, we can delete MatchMove1:

The sign follows the bus, but it isn't properly positioned, so we insert a Viewport between AddShadow1 and Stabilize1. Crop it as close as you can to the edges of the sign. You could alternatively use a CornerPin with inverseTransform turned on.

This has the effect of moving the foreground to the lower-left corner, but it also helps us with our next step, which is to use a CornerPin to push it into place:

You can see that the control is very intuitive. As a last point, load up the parameters for the QuickShape - notice how they follow the transformation accurately, so you can always intuitively control your masks as well. You get a double line, by the way, because you are also showing the effect of the mask on the AddShadow.