I’ve tried to keep everything here as simple as possible. Even so, it is still a long read, but if you’re serious about good stereoscopic footage, I guarantee you it will be worth the read!
In the VR video world, there’s a long debate to whether 360° video can claim the title of being “VR” without stereoscopy that tricks the brain into perceiving depth. While I won’t delve into that discussion here, I will take the freedom to conclude that, given an ideal playback system, you would almost always want to have stereoscopic footage rather than monoscopic.
Granted, there are cases where monoscopic is more suitable, but if we were not limited by resolution, decoding efficiency and other technical issues, stereo is the way to go. The difference in the quality, or “depth”, of the viewers immersion is like night and day.
If you have watched stereoscopic videos and disagree to this view, I will venture to say that you have either:
- Watched the content through a low-quality playback system. That is, the content has been too low resolution, or compression has interfered with your brains ability to properly “decode” the stereoscopic imagery into a pleasant depth perception.
- Watched faulty stereo footage. Contrary to what I’ve heard quite a few people say, there aboslutely is a “correct” way to produce stereo footage, and thousands of “wrong” ways. A surprisingly large amount of the stereo footage found online today is faulty in one way or another.
- Or most probably, a combination of the two.
To understand why and how stereo footage can be correct or wrong, stick out your index finger at arms length in front of you. Close one eye, look at an object behind your finger, in the background. Now switch between which eye is open a couple of times. Your finger seems to “move” in relation to the background. This is parallax, and it is the cue your brain uses to infer depth. Parallax is produced because your eyes are placed, in average, about 6cm apart (although this interpupillary distance varies greatly from person to person, from as low as 50mm to more than 75mm). By looking at the world from two slightly different angles, the brain can construct perception of depth and shape.
Simplifying things a great deal (lens distortion, for example), imagine for a moment we take two cameras, place them next to each other, with the center points of the lenses separated by exactly your interpupillary distance. We then take a photo with each, and project each photo into each of your eyes. That would produce a stereoscopic experience that is correct for you, and it would be comfortable and natural to watch. Problem is you can’t look around, the scene is head-locked and static.
To create the same kind of stereoscopy in a 360° image, we would ideally need to mount an infinite number of cameras on the circumference of a circle, taking an infinite number of “slice photographs”. We can then, for any given direction, use these slices to select the two that match your exact IPD, and project “correct” stereoscopic imagery into your eyes.
Unfortunately a camera with an infinite number of lenses is highly impractical, so we need to make some tradeoffs. Luckily, an acceptable level of “correctness” can be achieved by using a much lower number of lenses, mounted at something close to the mean IPD of humans. Doing this, and using some clever math, we can still produce two 360° images, one for each eye, where the perspective is slightly shifted from one eye to the other. The stereoscopy produced by this method will not be perfect in an absolute sense, but it will be good enough to provide sufficient depth cues, that for most people (with about average IPD) their brain can reconstruct depth and shape in a way that is comfortable and natural.
Knowing this, we can see that most professional level VR cameras designed for stereoscopic recording actually fit that description, so why is comfortable and correct 3D still so hard to achieve? Or asked another way, why is bad, uncomfortable stereoscopic footage so common?
I think the answer is a combination of a “software chasm” and lack of knowledge. The lack of knowledge is understandable. This is a pretty complex and very technical field, that most “shooters” don’t want to deal with. Instead of spending days trying to grasp the intricacies of cylindrical twist in equirectangular projections and lens distortion models, professionals on a tight budget and with time-constrained projects will jump on software solutions that promise easier results. Unfortunately none of the automatic stitching solutions provide anything near acceptable stereoscopic output, even though the camera that shot the source files might capable of it. And for any foreseeable future, neither will they be able to. If you want really good stereoscopic results, there’s currently only one way to do it, and I’ll get to that in a moment.
In theory, you should be able to feed your stitching software with all the details of your camera system, like how many lenses, how are they placed, what distance are they from each other, how are the lenses constructed, and so on, and the stitching software would ideally be able to use this information to spit out perfect, true-to-life stereo footage. But as you might have already experienced, it just doesn’t work like this. Even though the stitching software knows the general setup and construction of your camera, it often results in absolutely wonky and uncomfortable footage.
The reason is that even the smallest imprecisions and manufacturing defects will play a huge role when we are dealing with stereo. For monoscopic footage a small misalignment of a lens is probably not even noticeable, but for stereo, it is a very big deal. It might offset the image for one eye just a little bit, but since the brain is specifically looking for these small offsets and shifts to reconstruct depth, the result is a disaster. You suddenly have depth where it should not be, and the result is an uncomfortable experience for the viewer.
The problem is only compounded by the fact that automatic stitchers will try to warp and shift things to stitch the image, but no stitchers are intelligent enough to correctly deduce all the small manufacturing defects and imprecisions in the camera system, so this “optimisation” process will most probably only make the problem worse.
Furthermore, all cameras are going to be a little different. Not just between models, or even batches of the same model, but every single camera system will have small variances. Even a misalignment of just 0.2° of a lens is enough to throw off the result. With such small tolerances, it’s really no wonder that no automatic software can produce workable results.
This sounds like some very bad news for stereo footage, but the good news is that these challenges can be overcome, although it requires a bit of work. The best news is that once this work has been done, you will have a much more solid foundation to produce stereo 360° experiences from, and you won’t have to endlessly fiddle with every scene to make it pass for acceptable, saving you an incredible amount of time in the long run.
I personally use MistikaVR for stitching stereo footage, since it allows a very great deal of control over the stitching process, and has enough options that you can correct any misalignments in your camera system. My primary camera system is an Insta360 Pro, and it has some serious misalignments of the lenses and sensors. The stereo footage that the stitching software supplied with the camera produces is mostly atrocious, with the odd bit of “almost passable”.
Even so, after creating a carefully calibrated preset for this specific camera in MistikaVR, it produces very good results in any situation. The best thing is that I can load up this preset, and only do very minimal correction, like vertical balance and horizon correction, and then be ready to render the clip. To be honest, I did spend 2 full days calibrating the preset, but that time is quickly won back by the increase in workflow efficiency afterwards.
If you want to make a calibrated preset for your camera, here is my suggested steps:
- If you have access to a “calibration room” use that
- Most will probably not have, including myself, so go out and find a couple of scenes where you have clearly marked horizon in as much of the shot as possible, but also regularly dispersed foreground objects.
- Take extreme care to level your camera as perfectly as absolutely possible. Use bubble levels, lasers, RTK GPS, strings and pulleys, whatever you can! Don’t skimp on this step!
- Record a few different scenes.
- Download and look at my supplied samples (link in bottom of article) in an anaglyph viewer to get a feel for how the anaglyph view looks when the profile is calibrated correctly.
- Load up your footage in MistikaVR and start by loading the preset supplied with MistikaVR for your camera.
- Don’t use “Improve offsets” or “Improve angles”. You’ll need to do this manually.
- Start by calibrating the optical centers of each lens.
- After the optical centers have been calibrated, switch back to the stitched view and display the image as “B&W anaglyph”.
- Keep in mind that everything beyond the camera’s perspective convergence point (usually around 15-20 meters) should display no parallax (no red/cyan shift in the anaglyph view).
- The closer an object is, the more parallax it should display.
- Parallax should only ever be in the horizontal plane. If there is variance vertically, something is wrong!
- The amount of parallax should probably be more subtle than you think! Even a small amount means a lot to how your brain “renders” the object. Take a look at my supplied samples to see what amount is reasonable for a given distance.
- You can now start tweaking the placement and pitch/roll/yaw of each individual lens.
- After you get the alignment as good as possible, you might want to tweak the distortion parameters.
- Repeat tweaking for each lens towards better and better alignment.
- The process can take some significant amounts of time, and if you don’t feel like reading up on all the theory behind the parameters, you will definitely want to allow yourself some time to “get a feel” for whether you are going in the right direction.
It can be an arduous process, especially if you have multiple cameras, but the results are absolutely worth it!
I have supplied three sample images to help you learn how correct stereo should look. They can be used for personal educational purposes, but any other use or publishing is strictly prohibited. If you’re cool with that, you can download them as a zip file here.
Please do keep in mind that these samples are still not 100% perfect. They have small errors here and there, but they are small enough that the brain can accept them comfortably when viewed.
Also of note is that the samples were produced simply by loading the source files into MistikaVR, loading my calibrated preset, and then exporting. No further adjustments done in Mistika. Pretty easy workflow!