Reflections from RSS: Three reasons DL fails at autonomy
Last week I had the pleasure of attending, and presenting at, the annual Robotics: Science and Systems (RSS) in Daegu, South Korea. RSS ranks amongst the most prestigious of the international robotics conferences, and brings together researchers from academia and industry to present recent advances, highlight ongoing challenges, and define future research directions. So was there an overarching theme that summarizes RSS ‘23? For me, it was the attempt to leverage deep-learning (DL) models designed for computer vision & natural language processing to solve problems in robotics. And despite some successes, I couldn’t help being left with the feeling that there remain fundamental issues that prevent such approaches from addressing the hard problems in autonomy.
Deep-learning models have revolutionized, indeed cannibalized, entire research fields – namely computer vision (e.g. AlexNet outstrips competition in ImageNet Challenge 2012), machine learning (DeepMind’s algorithms learning to play Atari games in 2013), and natural language processing (ChatGPT3 and others launched in 2023). Given sufficient data and computational resources, DL based models have outperformed not only hand-engineered methods, but even humans themselves (recall AlphaGo beat Lee Sedol at Go in 2016). So the logic in bringing these models to robotics appears sound, and at RSS we were treated to some excellent examples of solutions progressing towards levels ready for real-world deployment in use-cases such as sorting waste in office buildings, and picking in a logistics warehouse.
Yet, behind these (and other) impressive demonstrations remain three issues that, I believe, will keep DL models confined to the ‘soft’ problems in robotics (i.e. those in which the task or world is in some way constrained) rather than the ‘hard’ problems that must be solved if we are to create truly general purpose robots:
- If you can’t be right, at least be sure when you’re wrong. At the industry-focused session with a panel made up of researchers from Samsung, Amazon, Toyota and XYZ Robotics, it was broadly agreed that models reporting imperfect accuracy scores may still be good enough for industrial take-up. With the caveat that failures must (a) be safe, and (b) not require human intervention at each instance (e.g. putting hard-to-sort packages aside for a human to batch sort the next day would be fine). Given that failures will always happen, robots that have an accurate measure of their own confidence, and behave accordingly, thus represent the future of robotics. This should worry those looking to translate generative AI models, such as large language models (LLMs), as they have become renowned for their hubris, even when wrong. Even more worryingly, LLMs have been found to completely fabricate results (hallucinating, in the industry lingo – see Gary Marcus’ blog). Proponents of the DL approach will argue that self awareness can be tackled using the same models, but this will involve model redesign, training, and maybe even new datasets (see below).
- Near term competitive advantage beats interpretability every time. Another argument from the industry panel was that economic realities mean that any solution that provides a ‘commercial win’ would be rapidly adopted even if it is completely black-box and uninterpretable. Safety concerns could be addressed by removing humans from the workspace, controlling environmental conditions etc, as is the trend in logistics and warehouse robotics at the moment (Why online supermarket Ocado wants to take the human touch out of groceries). I suspect that in this case DL models will create a suitable solution, but this approach falls squarely in the ‘narrow AI’ / ‘soft autonomy’ domain where a solution only performs well in a limited range of predefined tasks in a predefined space, or under specific constraints. This focus on the singular problem spaces is unlikely to lead to the great leap needed for general autonomy.
- All that is needed is more, and better curated, data. A common critique of the DL approach is that it requires vast amounts of data, and computing resources, to train models (training GPT-3 is estimated to have cost 552 tonnes of CO2). Energy consumption is only going to increase when moving to the robotics domain with data often captured from many high-dimensional sensors over long-periods of time, and across multiple settings and use cases. Such a trove of data is needed to provide the breadth of data required to ensure that models generalize. Indeed, entire tracts of the conference were dedicated to efforts in generating ever more data via simulation, automated long-term real-world data capture, and data augmentation, while reducing known issues in transferring models from simulated to real world settings. But even in the case that we could generate datasets representative of all the use cases that a robot could face, key questions remain. Firstly, is a perfect real-world dataset all that stands between us and an AI that is general, and efficient enough to be deployed on a mobile robot? This assumption remains questionable due to known issues in the current DL process (see Opteran Chief Scientific Officer James Marshall’s recent article). Secondly, given the tendency of generative AI models to hallucinate (Warning, do not ask an AI image generator to do cycling), will the augmented datasets that they simulate be representative of the real world, or will they end up contaminating their own training data? And finally, in the age of climate change and energy insecurity is it feasible, or even ethical, to use so much computing and energy resources to train a model to have our recycling correctly sorted?
To sum up, DL models have had industry-leading success in solving problems for which there is (a) a large data corpus, (b) vast computational and energy resources, and (c) a finite set of target outcomes (image classification, playing computer games). It is no surprise that these advances have correlated with the growth of the dominant data-economy with large tech companies capturing, hoarding and mining huge data stores in ever larger data centers. Data has indeed become the modern commodity of choice (Data Is the New Oil of the Digital Economy) Yet, this fixation on the power of data may also be their Achilles heel, as they reapply the same approach even in areas such as robotics, for which it may be less suited. This presents enormous opportunities for disruptors like Opteran that resolve the hard autonomy problem using entirely new innovative solutions. That is, by reverse-engineering natural brain algorithms Opteran solves hard autonomy without the constraints of Deep Learning algorithms, and is reaping the rewards. See www.opteran.com for more information.
Mike Mangan
VP Research, Opteran