Tesla and the Unreasonable Effectiveness of Data

Why does the size of Tesla’s training fleet matter? It’s well over 1 million cars, with the fleets of all competitors worldwide amounting to a combined size of well under 10,000. But is data really that important? Can you really get much more with 1 million+ cars than with 100 or 1,000? Yes. (Note: I am long shares of TSLA.)

A paper and accompanying blog post by Google AI called “Revisiting the Unreasonable Effectiveness of Data” emphasizes that neural networks continue to improve logarithmically as noisily labelled training datasets grow exponentially, up to at least the scale of 300 million examples, as long as the neural network has enough capacity (in terms of size/depth) to absorb the training signal.

From the blog post:

Our first observation is that large-scale data helps in representation learning which in-turn improves the performance on each vision task we study. Our findings suggest that a collective effort to build a large-scale dataset for visual pretraining is important. It also suggests a bright future for unsupervised and semi-supervised representation learning approaches. It seems the scale of data continues to overpower noise in the label space.

Another important excerpt:

It is important to highlight that the training regime, learning schedules and parameters we used are based on our understanding of training ConvNets with 1M images from ImageNet. Since we do not search for the optimal set of hyper-parameters in this work (which would have required considerable computational effort), it is highly likely that these results are not the best ones you can obtain when using this scale of data. Therefore, we consider the quantitative performance reported to be an underestimate of the actual impact of data for all reported image volumes.

Facebook AI later took this even further, using a dataset of 1 billion images.

In certain applications of deep learning – and my contention is that autonomous driving is one of them – the ceiling on neural network performance is imposed by the quantity of available data.

What about labelling? As many papers have shown, various techniques such as weak/noisy labellingself-supervised learning, and automatic curation through techniques like active learning can leverage very large quantities of data for better neural network performance with an increase in hand annotation.

An example of Tesla using self-supervised learning:

In the autonomous driving subdomain of planning, imitation learning and reinforcement learning can – at least in theory and somewhat in practice already – leverage real world data to train neural network without any human annotators in the loop (besides the drivers).

As a general principle of deep learning, it’s not controversial to say performance scales with data and neural network size, with no known limit as of now. It has become less controversial over the last few years that techniques like imitation learning, reinforcement learning, and self-supervised learning are highly promising, and virtually all autonomous vehicle companies as far as I’m aware have started to at least experiment with one or more of them, if not deploy them to their fleet.

If we apply this general principle in the specific case of Tesla, the inference is clear: Tesla is working under a much higher ceiling than everyone else. Everyone else combined, in fact.

It is my strong belief that this advantage will become plain to see over the next few years, perhaps starting as early as this year. I wouldn’t be surprised if, in a few years from now, people said it was always obvious this would happen.

Common objections​

What about lidar?​

Before you bring up lidar, watch this video and then come back with an argument about why Levandowski is wrong:

Secondly, if lidar really is the secret sauce… Neural networks trained via Tesla’s production fleet can be deployed in any cars. What’s to stop Tesla from using lidar like everyone else, at the same scale as everyone else, with (non-lidar-related) neural networks that far surpass what everyone else has?

The proof is in the pudding!​

Indeed, but this amounts to an argument against making predictions in general. Once everything has already happened, it’s too late to predict what will happen. Once the results are in, it’s too late to place a bet. Predicting the future inherently involves speculating about an uncertain unfolding of events.

Waymo already cracked it!​

Really? Then why does Waymo not seem to think so? If they really believed they had solved autonomous driving, they would be focused on expanding, on scaling up. They don’t seem to be. This article provides some food for thought.

Another company has more data than Tesla​

No, they don’t. Not the kind of data we’re talking about. For example, Uber has a massive amount of GPS data from cars, but that is useless for training an autonomous driving system. Other companies’ data collection operations are just not comparable at all. Mobileye, for instance, doesn’t have the ability to upload sensor data or push new firmware to the car.

Tesla: When Every Input is a Label

Weak supervision and automatic human labelling

The latest version of Tesla’s vehicle software enables Autopilot to automatically stop at stop signs and traffic lights. Currently, Autopilot stops at all traffic lights, including green lights. To continue through a green light, the user has to press on the accelerator or pull the stalk next to the steering wheel. Every time a user does this, they label a traffic light as green. (Elon Musk confirmed this on Tesla’s Q1 2020 earnings call.) In the machine learning world, this is known as weak supervision.

Weak supervision can also be applied to free space: space without obstacles in it. People typically drive through space where there are no obstacles. So, when a person drives somewhere, they label it as free space. Conversely, when a person brakes unexpectedly, that action could potentially be used to label an area as not free space.

Tesla's competitive advantage isn't just that it has a fleet of over 800,000 vehicles. It also has a workforce of over 800,000 human labellers. 

Why I'm becoming more bullish on self-driving cars

Eighteen months ago, I viewed the near-term technological feasibility of self-driving cars as a largely unsolvable mystery. More recently, companies like Waymo, Cruise, and Zoox have given (direct or indirect) indications that their autonomous vehicles are either superhuman or fast approaching it in certain environments. This is one reason my confidence has been slowly growing.

Another reason is that there have been fundamental advances in AI for each of the three major subdomains of self-driving car AI: perception, prediction, and planning. 

In perception, the main advance is self-supervised learning on unlabelled video. This means using part of a video to predict a probability distribution for another part of a video. Neural networks trained this way automatically learn representations of objects that can later be fine-tuned with a labelled dataset. The combination of unlabelled and labelled data is called semi-supervised learning. 

Self-supervised pre-training scales with data and compute. This means vastly more data can be made useful than would ever be economically possible under a purely hand-labelled paradigm.

Here are some other important ideas in perception. 

Weak supervision: in a self-driving car context, using human driving input to label images of the surrounding scene. 

Active learning: pulling data selectively from fleets of vehicles using neural networks running in the car that sift and sort through the incoming sensor streams. 

Multi-task learning: sharing learned representations of objects across different perception tasks in order to create synergy.

A wildcard that Tesla is apparently working on is 3D labelling. It isn’t 100% clear to me how 3D labelling is meant to work, but the speculation that makes the most sense to me is that Tesla is using classical photogrammetry to reconstruct the 3D scene. What human labellers will see is the 3D reconstruction, not the raw images. Elon Musk claims this will enable Tesla to leverage human labour orders of magnitude more efficiently, since labelling a single 3D object creates labels for many 2D video frames. 

In prediction, the problem is that the fundamental work required to turn the prediction of future behaviours of road users into a deep learning problem is at an earlier stage than computer vision. That's the troubling part. The hopeful part is that it seems like prediction is getting more love now. With prediction, as opposed to computer vision, it is easier to make the process completely self-supervised. The future itself provides unlimited ground truth labels. Labour is no bottleneck. 

In planning, the advance that excites me is imitation learning. AlphaStar, an expert-level StarCraft AI by DeepMind, is an astonishing proof of concept of imitation learning. Simply observe many, many, many instances of human behaviour and learn the correlations between the state of the environment and the actions humans take. 

Recent research hopes to take this further, with imitative agents that can learn causation, as opposed to just correlation. In the autonomous vehicle domain, hybrid systems can also be created that fall back on hand-coded planners when the present situation falls outside the scope of the imitative agent’s training dataset. 

Ideas like self-supervised video prediction, fully machine learned prediction, and causal imitation learning appear like they may be powerful enough to push self-driving cars across the finish line, especially if existing prototypes are already beating human performance on some metrics in some environments. 

Tesla will soon be bringing Level 2 semi-automated urban driving features to its fleet. Tesla is working on most if not all of the ideas mentioned in this newsletter and it also benefits from 100x more fleet data than any other company. Tesla’s only “disadvantage” relative to others is its eschewal of lidar. But, in principal, nothing is stopping Tesla from using lidar for dedicated robotaxis, regardless of what it does with its consumer vehicles. I continue to believe that the stock market will be blindsided by what happens in the next 1-2 years.

Disclosure: I’m long TSLA.

Disclaimer: This newsletter doesn’t provide financial advice. Invest at your own risk.

Are Zoox's self-driving vehicles already superhuman drivers?

In an interview released on Friday, Jesse Levinson, the Chief Technology Officer of the self-driving car startup Zoox, made a remarkable comment about the capabilities of Zoox’s autonomous vehicles:

...we also measure human driving. So, we’ve had humans drive a lot of those same really challenging routes and we measure when humans make mistakes. And what's pretty exciting is a few months ago we got to the point where our AI system is making fewer mistakes than people do on those routes.

Levinson also said that Zoox’s goal is to achieve a rate of at-fault crashes “about an order of magnitude lower than it is for humans.” The company aspires to deploy a driverless vehicle without a steering wheel or pedals by the end of 2021. This implies Zoox hopes to reach significantly superhuman safety by the end of next year.

It’s difficult for me to accept on trust Levinson’s comment about the error rate for human driving vs. AI driving. As a techno-optimist and robotaxi investor, I’m tempted to believe that Zoox is perhaps the second company to pass this major milestone. But I would be a lot more convinced if I knew the metrics Zoox is using to measure safety and the sample size of miles driven.

The only public data we have on self-driving cars is the rate of safety driver disengagements of the autonomous system. Cruise President and CTO Kyle Vogt published a blog post in January that convincingly argued that disnegagements are not a good metric for safety or for apples-to-apples comparisons with human beings. For better insight into how close or far self-driving cars really are to human capability, we need companies to open up about their testing methodologies and what metrics they’re using internally. And, of course, what numbers they're actually getting.

Disclosure: I am long TSLA.

Waymo is valued at $30 billion by its investors

And the fact of computer science that markets are missing

This week, Waymo raised $2.25 billion from outside investors led by the private equity firm Silver Lake, the Canada Pension Plan Investment Board, and Mubadala, a sovereign wealth fund of the United Arab Emirates. Additional investors included the venture capital firm Andreessen Horowitz, the car dealer AutoNation, the contract carmaker and auto parts maker Magna, and Waymo’s parent company Alphabet. 

According to reporter Richard Waters at The Financial Times, the investment round valued Waymo at $30 billion. By comparison, Cruise, a subsidiary of General Motors, is valued at $19 billion by its outside investors, which include SoftBank, Honda, and T. Rowe Price.

It seems increasingly likely to me that Waymo has internal metrics that show its driverless robotaxis in the Phoenix, Arizona metro area have superhuman safety. Hopefully, Waymo will be able to publicly confirm this hunch by the end of 2020:

Cruise, for its part, seems confident it can cross the human-level threshold within the next few years: 

The last concrete information we got out of Cruise was an internal report that was leaked to the press in June. The report projected (it’s unclear on what basis) that Cruise would be at 5% to 11% of human-level safety by the end of 2019.

The upshot is that robotaxi companies that are perceived to be global leaders are getting valuations of ~$20 billion or $30 billion, even given the uncertainty, skepticism, and sense of risk that pervades today. Here are what I see as the next steps for robotaxi companies:

  1. Publicly release compelling data that indicates superhuman safety.

  2. Attain regulatory approval for commercial operation of driverless vehicles.

  3. Show a positive gross margin that demonstrates a path to long-term net profitability. 

  4. Devise a credible plan to rapidly scale up service to the level of nations and continents.

If those four criteria are satisfied, then I believe we’re looking at a scenario where the global robotaxi market is worth $1 trillion+ collectively, as the equity research firm ARK Invest models:

Autonomous cars are, of course, robots that use deep learning: to perceive the environment, to predict the future, and to plan actions. The performance of deep learning scales with data, sometimes in predictable, lawlike ways. Baidu conducted research that found, for image recognition, accuracy scales roughly 2x with each 10x increase in data. So, let’s use this knowledge to do a comparison between companies.

In October 2018, Waymo announced its fleet had driven 10 million miles cumulatively. Fourteen months later, in January 2020, it announced it hit 20 million miles. That’s 715,000 miles per month.

Tesla has a fleet of roughly 800,000 cars equipped with 360-degree cameras, a forward-facing radar, ultrasonics, and either a) the “Hardware 2” computer supplied by Nvidia or b) the ~10-20x more powerful Full Self-Driving Computer designed in-house by Tesla (also known as “Hardware 3”). My rough guess is that approximately 400,000 cars have the FSD Computer. Let’s assume these 400,000 cars drive 37 miles per day on average. That’s 440 million miles per month or about 620x more than Waymo. Including all 800,000 cars, Tesla’s drives over 1,200x as much as Waymo.

With the scaling rate discovered by Baidu, 620x more data would translate into about 6x better performance (if my math is correct.). 1,200x more data would rest in more than 8x better performance. In my opinion, because investors and analysts don’t appreciate this fact, Tesla is radically mispriced as a robotaxi company relative to Waymo and Cruise. I could be wrong, but as far as I know, investors and analysts broadly attribute almost $0 in value to Tesla as a robotaxi company. (Please email me if you think this might be incorrect.)

It’s true that for what’s known as fully supervised deep learning of computer vision tasks, the bottleneck is manual labelling, rather than miles driven. However, this is not true for self-supervised, semi-supervised, or weakly supervised learning of computer vision tasks. It’s also not true for prediction tasks or planning tasks at all. Moreover, the quality of data used in fully supervised learning scales with miles driven. Companies employ a variety of techniques to automatically curate the most valuable data from their fleets. The more miles driven, the more value. (See an elaboration on all these concepts in my blog post here.)

For example, consider a rare type of wildlife like moose or bears. Or a rare vehicle type like an excavator or tanker truck. A fleet of cars that drives 620x more will encounter 620x more moose, bears, excavators, and tanker trucks. If these objects are rare enough that the bottleneck is finding enough new examples to label, then Tesla’s performance on object recognition for these rare objects will scale with its miles driven. 620x more examples will lead to 6x better performance.

As I see it, in their pricing of Waymo, Cruise, and Tesla, the markets are neglecting a fact of computer science. I think the narrative around Tesla and autonomy will profoundly shift once Tesla finishes its rewrite of Autopilot and ships it to customers. I expect that will most likely happen before the end of this year.  The underlying computer science principles, which remain unappreciated by market participants, will translate into visible progress in the production Autopilot system that is used by hundreds of thousands of customers. At that point, I suspect many of Wall Street’s sell-side analysts will scramble to update their views and begin citing Tesla’s data advantage.

Financial disclosure: I own shares of Tesla (TSLA).

Important disclaimer: This newsletter is not intended as financial advice. Invest at your own risk and please consult a professional investment advisor if that is appropriate to your situation.

👋 Want to support my writing and gain access to occasional premium-only newsletters? I would appreciate it so much if you became a paid subscriber. Since I’m just getting started, I’m offering a discount to anyone willing to put their faith in me:

Get 60% off for 1 year

Loading more posts…