Full Autonomy is Harder Than I Thought

Woe for Tesla

Jul 18, 2023

a close up of a car — Photo by Eyosias G

It turns out that robotics, especially robotaxis, is a really hard AI problem, even as the latest advances with digital AIs like GPT-4 stun and amaze many close observers, like myself. Tesla has been trying in earnest for over six years to crack self-driving. Sadly, the latest versions of its “Full Self-Driving” software still require frequent human interventions in order to drive safely, comfortably, and politely. As much as I’d like to believe full autonomy is right around the corner, the rate of progress on the FSD software has been fairly linear and not the exponential trend I’d need to see to believe that robotaxis will happen anytime soon.

Without robotaxis, it’s hard for me to see much opportunity for growth beyond what is presumably already priced in to its valuation of over $750 billion. As boring as it is to say, our default assumption should be that the price of a stock is roughly “correct” and we should only overturn that assumption if we see a clear, compelling reason the stock is mispriced. I don’t see any particular upside potential or downside risk that the market isn’t already presumably pricing in.

What about the humanoid Optimus robot, Tesla’s other robotics project? I think it’s certainly an idea worth trying and it’s fun that Tesla has the capital to embark on such long-term and speculative R&D adventures. However, Optimus is even more ambitious than robotaxis and is at a much earlier stage of development. Only if Tesla succeeds with robotaxis should investors start modelling Optimus when they think about valuation.

It brings me no comfort to write these words. In fact, I write them with a heavy heart. As someone who is excited about frontier technologies, particularly AI, it is deeply disappointing to see the field of AI robotics run aground. I used to believe we were on the cusp of a wondrous robotic future. Now, I no longer do.

Tesla’s competitors like Waymo, Cruise, Zoox, and Mobileye are not in any better a position than Tesla with regard to self-driving cars. After years of R&D and billions invested, self-driving cars seem nearly as elusive today as they did a decade ago. The technology is stuck in a rut. We are not progressing as fast as we need to be to make self-driving cars a going concern.

One day, the research questions that need to be solved in order to commercialize full autonomy will be solved. Self-driving cars will eventually exist. But, for investors, eventually might as well be never.

The Age of Auto-Labeling is the Age of Automation

A paradigm shift for AI

Yarrow Bouchard

Oct 06, 2022

two red power tools inside room — *Photo by David Levêque*

Conventionally, AI comes in the form of neural networks that require labels for their training data, which are created via human labour. Human labour, therefore, limits (bottlenecks, in fact) the computational size of neural networks and, therefore, their performance, i.e., their ability to instantiate intelligence.

Under the auto-labeling regime (which is a superset of the self-supervised learning regime), labels are applied to data automatically. (The means can vary quite widely.) The bottlenecks to neural network performance become data, compute, and ingenuity. These terms defined:

Data: e.g., video, human input to a steering wheel, actions taken by humans in a video game.

Compute: i.e., cycles of a GPU, a neural network accelerator chip, or, more rarely, a CPU.

Ingenuity: the ability of machine learning engineers and researchers to design neural networks.

In the research space, auto-labeling has produced such feats as AlphaStar and GPT-3.

In the commercial robotics space, the most interesting use of auto-labeling (to me, anyway) is in vehicular automation (i.e. self-driving cars and ADAS systems). However, in principle, the auto-labeling regime can and most likely will, eventually, spread to other kinds of robotics applications.

The impact of robots (powered by very well-designed neural nets utilizing large amounts of compute and auto-labeled data) will be much larger than the impact of disembodied AIs (like AlphaStar and GPT-3) because the physical economy is much larger than the digital economy. Most of human activity still involves carrying out physical tasks, which is something only robots and not disembodied AIs can do.

The age of auto-labeling will also be the age of robotics automation. A robotics age has been frequently imagined and depicted, almost enough to make one forget that a real one has never yet transpired. We don’t really know what it will be like, except the broad contours, like that its impact on human lives and the human economy will be very big.

Knowing that, however, is news enough to be worth writing about.

Visual Transformers Are Becoming the New Paradigm in Computer Vision

Yarrow Bouchard

Jul 03, 2022

Across the AI industry, convolutional neural networks (or ConvNets) are being replaced with Visual Transformers (or ViTs) for computer vision tasks — or with neural networks that hybridize ConvNets and ViTs.

The key difference between ConvNets and ViTs (as far as I’m able to understand, anyway) is that ConvNets rely on an assumption of locality in images (or video frames): pixels that are closer together in an image are more likely to be part of the same semantic or physical whole. ViTs process pixels in an image globally, without assuming pixel proximity tells us anything about the world.

Research has found that ConvNets’ assumption of locality is a crutch: it helps ConvNets outperform ViTs when it comes to smaller datasets, but as data grows, ViTs overtake ConvNets in accuracy. My non-expert hunch is that this is because with enough data, a ViT learns the locality assumption insofar as it holds true for the images it has seen, but also learns a larger set of inductive biases and rules of thumb that allow it to balance pixel locality against other considerations.

If my hunch is correct, then it’s another demonstration of the general power of neural networks: given enough data, they can learn to replace human-designed assumptions with better ones learned from the data itself. This is a reason to be bullish on ViTs: because they seem like a more powerful and more general approach to computer vision, one that will allow us to build ever-more accurate models as data sets continue to grow.

When it comes to 3D computer tasks for robots operating in the real world, such as self-driving cars, ViTs could (I speculate) reason better about depth than ConvNets. A car with a longitudinal distance, or depth, of 100 metres from the ego vehicle could be exactly adjacent to another car only 10 metres in front. A transformer is (presumably) less biased toward assuming that cars that are adjacent in a 2D image (or video frame) are adjacent in 3D space.

To make safety-critical real world robotics tasks, such as autonomous driving, practically feasible, a major improvement in computer vision is needed. It is, therefore, encouraging that the fundamental neural network architecture of computer vision is in the midst of being revolutionized.

Disclosure: this post was written with the assistance of a text-generating AI.

Loading more posts…

Wondrous Robotic Futures

Full Autonomy is Harder Than I Thought

Woe for Tesla

The Age of Auto-Labeling is the Age of Automation

A paradigm shift for AI

Visual Transformers Are Becoming the New Paradigm in Computer Vision

Get my mini-ebook for free!

Pay what you want for "Wondrous Robotics Futures", the mini-ebook

New mini-ebook published