
https://ieeexplore.ieee.org/document/8954293
Background
Human beings with two functioning eyeballs have what researchers call stereoscopic (or binocular) vision. In other words, you see with both eyes. However, the brain delivers a cyclopean (one-eyed) perception of that binocular reality. The visual cortex then processes the images formed in the eye that subsequently activate the retina and so forth and so on from there to the brain. Both inputs to human stereoscopic vision are perceived in cyclopean reality.
We humans do fairly well driving our cars while managing to miss one another. Most obstacles in our driving paths are avoided when most of us are paying attention to the road because the brains of those drivers are able to perceive relative motion, which means depth perception. According to the Vangilder-MacDuff First Zeroth Law, every change in position requires a change in time, and we seem to be pretty good at computing those relative differences in a way that keeps most of us safe from harm. Lots of automobile accidents are caused by distracted driving, which is why self-driving cars that always pay attention hold real promise for driving safety.
The Technology Options
Light Detection and Ranging (LiDAR) is a completely non-human way to map out the perceptual field that human vision takes in. A spinning bulb thing attached to the top of a vehicle shoots out lasers, and based on reflected pulses it computes the shapes and sizes and distances of whatever is there. It does a good job provided that there is no fog, rain, snow, or other environmental problems that cause the laser to go blind. Beams of light are reflected and refracted by water, which means LiDAR goes blind the instant the weather turns inclement. LiDAR is a poor choice for self-driving cars, but it is very good at mapping roads.
Tesla has been successful using cameras and Radio Detection And Ranging (RADAR) to produce a 3D representation of the surrounding space. Radio waves penetrate most things, including weather, and when used in conjunction with cameras, you get some advantages in comparative analysis. Neural networks “learn” the perceptual field and make decisions that the self-driving car’s control systems subsequently use to adjust speed and direction. At least that’s one way to grossly oversimplify what’s going on “under the hood”.
A new thing called pseudo-LiDAR mimics true LiDAR by converting “the estimated depth map from stereo or monocular imagery into a 3D point cloud” (Wang et al., 2019). It is essentially the same data set represented differently. The result is a detection accuracy that skyrockets from 22% to 74% for objects in the 30-meter field of view. LiDAR is very expensive, whereas consumer-grade optical cameras — such as those found in cell phones — are orders of magnitude cheaper. The reason that pseudo-LiDAR is still named “LiDAR” is because it still uses the same algorithms as LiDAR and a different representation of the same data.
Pulling It All Together
Copying Nature tends to yield superior results. Human beings do not shoot lasers out of their eyes in order to see what’s around them. The eyeball refracts light through a lens and focuses it on retinal cells that generate electrical signals that the brain processes. We get that cyclopean view of the perceptual field (your surroundings) from all that. The Weber–Fechner Law models mental scaling (Dehaene, 2003) and the intensity of sense perceptions on a logarithmic scale (Sun et al., 2012). In other words, relative differences are processed by the brain according to an exponential scale, which can be thought of as base-10—or powers of ten. In fact, symmetry obtains in the data associated with neural sensory perception when plotted logarithmically (Miller, 2003), and symmetry bodes well for measurement and prediction in Science. It’s also prettier.
The beauty of this research is that it calls upon multiple fields of inquiry. The Physics of motion must be accounted for in the Computer Science. The Neuroscience of how human vision works is also part of Cognitive Psychology, and all of this must be considered in the neural networks that computer scientists and engineers deploy in the service of building computer vision systems. The Biology of the human eye and physiology of the human nervous system underlies how well we engineer systems that approximate the wonder of the human system. No single branch of Science can pull this off in isolation. Some fraction of Introductory Physics students eventually become the folks who do this, but only after learning something about motion, light, and computing.
MISC Video Resources
NSF promo: https://youtu.be/dlRbvlrS8AM
Tesla not using LIDAR: https://youtu.be/lowChJTkHLs
How Tesla trains neural networks to perceive depth: https://youtu.be/LR0bDLCElKg
Unsupervised Monocular Depth Estimation With Left-Right Consistency: https://youtu.be/jI1Qf7zMeIs
Disclaimer
I’m just a Physics Professor who thinks that pointing at the convergence of multiple fields of Science and Engineering is super duper cool. However, the more fields of inquiry that you point at in an attempt to explain _________ at the introductory level means that sooner or later your glossing over of stuff is bound to be intolerably gross for the expert. I welcome constructive criticism from relevant experts.
Be nice.