Why I'm becoming more bullish on self-driving cars
Eighteen months ago, I viewed the near-term technological feasibility of self-driving cars as a largely unsolvable mystery. More recently, companies like Waymo, Cruise, and Zoox have given (direct or indirect) indications that their autonomous vehicles are either superhuman or fast approaching it in certain environments. This is one reason my confidence has been slowly growing.
Another reason is that there have been fundamental advances in AI for each of the three major subdomains of self-driving car AI: perception, prediction, and planning.
In perception, the main advance is self-supervised learning on unlabelled video. This means using part of a video to predict a probability distribution for another part of a video. Neural networks trained this way automatically learn representations of objects that can later be fine-tuned with a labelled dataset. The combination of unlabelled and labelled data is called semi-supervised learning.
Self-supervised pre-training scales with data and compute. This means vastly more data can be made useful than would ever be economically possible under a purely hand-labelled paradigm.
Here are some other important ideas in perception.
Weak supervision: in a self-driving car context, using human driving input to label images of the surrounding scene.
Active learning: pulling data selectively from fleets of vehicles using neural networks running in the car that sift and sort through the incoming sensor streams.
Multi-task learning: sharing learned representations of objects across different perception tasks in order to create synergy.
A wildcard that Tesla is apparently working on is 3D labelling. It isn’t 100% clear to me how 3D labelling is meant to work, but the speculation that makes the most sense to me is that Tesla is using classical photogrammetry to reconstruct the 3D scene. What human labellers will see is the 3D reconstruction, not the raw images. Elon Musk claims this will enable Tesla to leverage human labour orders of magnitude more efficiently, since labelling a single 3D object creates labels for many 2D video frames.
In prediction, the problem is that the fundamental work required to turn the prediction of future behaviours of road users into a deep learning problem is at an earlier stage than computer vision. That's the troubling part. The hopeful part is that it seems like prediction is getting more love now. With prediction, as opposed to computer vision, it is easier to make the process completely self-supervised. The future itself provides unlimited ground truth labels. Labour is no bottleneck.
In planning, the advance that excites me is imitation learning. AlphaStar, an expert-level StarCraft AI by DeepMind, is an astonishing proof of concept of imitation learning. Simply observe many, many, many instances of human behaviour and learn the correlations between the state of the environment and the actions humans take.
Recent research hopes to take this further, with imitative agents that can learn causation, as opposed to just correlation. In the autonomous vehicle domain, hybrid systems can also be created that fall back on hand-coded planners when the present situation falls outside the scope of the imitative agent’s training dataset.
Ideas like self-supervised video prediction, fully machine learned prediction, and causal imitation learning appear like they may be powerful enough to push self-driving cars across the finish line, especially if existing prototypes are already beating human performance on some metrics in some environments.
Tesla will soon be bringing Level 2 semi-automated urban driving features to its fleet. Tesla is working on most if not all of the ideas mentioned in this newsletter and it also benefits from 100x more fleet data than any other company. Tesla’s only “disadvantage” relative to others is its eschewal of lidar. But, in principal, nothing is stopping Tesla from using lidar for dedicated robotaxis, regardless of what it does with its consumer vehicles. I continue to believe that the stock market will be blindsided by what happens in the next 1-2 years.
Disclosure: I’m long TSLA.
Disclaimer: This newsletter doesn’t provide financial advice. Invest at your own risk.