How brain biomimicry pioneers a ‘third way’ for engineering autonomy!
Guest blog: Gur Kimchi, ex head of Amazon Prime Air discusses the benefits of biomimicry – safety, diversity, and adaptability necessary for effective autonomy at incredibly low SWaPs without massive data collection, data labeling, or model training.
I’ve been involved in the development of Robotic, Autonomy & Remote Sensing systems (and their enabling subsystems and underlying data) for many years – decades at this point! A few years ago at Amazon, my Prime Air team built one of the first FAA (and EASA)-certifiable aircraft autonomy systems – able to perform some of the same roles & responsibilities as a professional pilot.
While building Prime Air was incredibly complex, we were able to leverage an existing family of techniques from the “toolbox of AI”, including MDO (allowing us to trade various design choices at scale), System Theory (enabling a robust Autopilot that can handle both nominal and emergency cases), Machine Learning (Supervised and Unsupervised et-al), and Photogrammetry (both online and offline). The experience we share as an industry across multiple technology domains is what enables our emerging advanced Robotic applications to be built – from “self flying” airplanes, to “self driving” robotaxis, to industrial autonomy (for example – warehouse logistics), or home (for example – dog training) automation.
One of the fundamental lessons I learned early in my engineering career (and we deeply implemented at Prime Air) was how critical it is to integrate Safety Engineering*1 into every step of the product development from “day one” (a trigger-word if there was ever one for current and past Amazonians); like Scalability or Security, one cannot easily add “Safety” into a product “later”. IMHO, a key complexity of safety engineering when building “open world” robotic systems (as opposed to a system that operate in constrained & deterministic environments) is deeply respecting the fact that the world can be a “Continuous Black-Swan Scenario Generator” – as the use of our system in the real world scales up, unknown-unknown scenarios that happen exactly-once (scenarios that have never been observed before, and will not be observed ever again) will be encountered by our autonomous system at an essentially ~constant rate.
As almost everyone knows at this point if you read any (say) self-driving robotaxi project blog, Robotic systems are often “trained” to react correctly (in terms of functionally, performance, and safety) within their operating environment as defined by their Concept of Operations (also referred to as the “ConOps”). Training can be an incredibly expensive proposition, with companies publishing “millions of miles driven / simulated / labeled” metrics to show how they scale their systems to handle the rich complexities of our world.
As testing in the real world is quite expensive (and often dangerous when testing high-risk edge-cases), simulations are used by virtually all major programs to increase both scale and diversity, and these simulations are often built using technologies derived from game-engines. Another important tool is to modify real-world captured or simulated data (for example using techniques from the Image-Based Rendering toolbox) to create variations: changing the field-of-view, inserting artificial objects, adding motion blur, fog, rain, snow, dust, sensor issues (mud on cameras, sun-shining-into-lens artifacts, readout timing issues) et-al, to ensure the system is able to react correctly (e.g. safely) to whatever the robot will encounter in the (essentially infinitely flexible) world. Philosophically, a training-based approach assumes that at some point we run out of “interesting” new scenarios to teach the system to handle, and can call the product “ready”.
But here’s the problem – if our system has to react correctly to exactly-once (“one-off”) events that are (by definition!) not in the training-data, the training-only approach is at least insufficient for many applications, and for some applications – where the consequence of failure can be deadly – the training-only approach may be entirely wrong. I usually think about this as a head vs. tail decomposition (following Zipf-law) – We can train for head-events (they repeat, so training makes sense), but the deeper we go in the tail the chance of an event repeating is getting lower, eventually reaching the “longest tail” where events never repeat.
But what does it mean if we keep training our systems with new events that never repeat? In other words, more and more of the training data becomes useless (as we will never encounter these events again), and unless we’re very careful we risk lowering the system’s performance against head and “long-tail” events! This is often solved by creating multi-domain systems where different vertically-focused capabilities are invoked depending on the scenarios – but that introduces a whole other issue: modality. One of the critical lessons of Safety Engineering is that modal systems, regardless if they leverage humans-in-the-loop or are completely autonomous, can easily suffer from “mode confusion”. The historical lesson is that modal-less systems are ~generally more robust vs. modal systems, unless one can deterministically determine what mode to be in.
My opinion is that for systems where over time the bulk of the scenarios fall into this “longest-tail” category, a training-only approach will ~never converge on a safe-at-any-scale solution.
So what can be done? At Prime Air we chose a diversity (vs. redundancy) multi-physics / multi-algorithm system design: (at least) two separate sensing (observation) and perception algorithms were running in parallel, and both had to “vote yes” that the mission can proceed, for example to take-off, to keep flying, to approach the delivery zone, etc. As one example, one can equip the system with visible-spectrum and bolometry (heat) cameras (say Long-Wave IR), and run both sensor data streams in parallel to two algorithms: one algorithm based on training (in the toolbox of machine learning), the other based on the photogrammetry toolbox (calibration, robust features/correspondence, SLAM, et-al).
By perceiving the world twice (two different spectras) and running two different algorithms – in parallel – we introduce independent layers in our “Swiss cheese” (the chance of the “holes in the slices of cheese aligning” is lower) and the chance of the system “voting yes incorrectly” is lower. One can easily imagine adding other physically-diverse sensors (Radars, Sonars, LIDARs et-al) to the system in parallel, further improving diversity.
But of course there’s a painful trade here – a fundamental trade that one has to deal with in any Mobile-robotic or Aerospace design: SWaP-C. Adding more sensors increases the Size, Weight, Power-consumption and Cost of the system. Running multiple algorithms requires additional compute and memory – and SWaP-C keeps growing. We want Diversity as it increases safety, but diversity is expensive – is there anything we can do?
As a side-note, consider that training-based systems are usually expensive in the data-capture/labeling, model-development & maintenance side of the system, where their runtime (model deployment) is often quite efficient. Photogrammetric systems on the other hand are often the opposite: they can be relatively efficient (as it relates to SWaP-C) on the development side, but require massive online compute and memory capabilities on the robot – by deployment both in parallel we gain diversity and a safer system, but we end up with the worst of both worlds in terms of SWaP-C – an expensive system both in development, and when operating.
A critical element to consider here (through the lens of safety engineering) is that focusing on just great performance for each independent subsystems is insufficient, one must make sure one builds sufficient dissimilarity in failure modes – if both parts of the system fail-similarly then they may not increase the overall safety (lower the rate of incorrect “it’s safe to proceed” votes). As one example consider a camera + LIDAR system that both sense in the visible spectrum, while the sensing approach is different between cameras and LIDARs (passive vs. active) both systems are photonic – so anything that disrupts photons (say a micro-droplet fog) will reduce the performance of both systems – in other words, they will fail similarly. As an alternative consider a Camera and a Radar – they are physically dissimilar, yielding a more robust design.
But things aren’t as easy on the algorithmic side of robotics – we really don’t have that many “diverse” options when it comes to perception/navigation algorithms – most of our algorithms these days fall on the machine-learning (some type of training producing a runtime model) or Photogrammetric (Classic computer-vision, or more modern approaches like SLAM), and different approaches within each algorithmic family too often share failure modes.
IMHO what we need as an industry is as many new algorithmic approaches as we can find – to enable more diverse systems design. Over the years I got to meet many people working on this problem and it turns out that one such promising path is not new at all but actually very old – teaching us lessons that biological systems have learned over the last 100 million years. Of course “biological inspired design” is not new – researchers have been fascinated by how different animals perceive and navigate the world (consider Bats, or Fish navigating the bottom of the ocean); and some biological systems seem to be able to handle exactly-once scenarios extremely efficiently – or at least efficiently from a runtime SWaP-C perspective, not evolutionary time!
One such research team came out of Sheffield University where they spent the last 8 years reverse-engineering the brains of insects (honey bees, ants & fruit flies) which are remarkable animals from a perception/navigation perspective! Ants for example can navigate successfully in very sparse environments such as deserts, and honey bees are able to successfully navigate previously unknown locations back to their hive, including navigating across “domains” that they never experienced before – such as between urban and rural environments. As we know, many “AI” systems can be fragile in the face of changing domains – but somehow honey bees managed to learn to navigate successfully in and across multiple domains without prior knowledge.
Add to this that Honeybees have extremely limited computational, memory and power capacity, and a lot of their energy budget is allocated to flight – while land (and water) creatures have the option to slow down to reduce their energy needs, flying creatures (generally) don’t have as many energy-conservation options.
I won’t get into the detail of how the Opteran researchers figure out what’s happening in the brain of insects, or exactly how these algorithms work (I’ll leave the fun “explainers” to the researchers) – suffice it is to say that insects (and millions of years of evolution) figured out how to perceive & navigate the world *very* efficiently, and do it in a way that requires neither massive computation (Photogrammetry) or massive data systems (labeling/training/model-building, the main cost of an ML system) – they figured out a “third way”.
What’s special about this “third way” warrants repeating: no massive data collection, no labeling, no model training, fewer domain-transfer issues – the same simple system can handle multiple environments safely.
IMHO, there are two extremely interesting side-effects that come out of this new “third-way”. The first is that it is now “cheap” to add diversity to any system (regardless if it’s Machine-Learning or Photogrammetric-based) as the SWaP-C requirements are minimal; and the second is that it’s now possible to add advance perception/navigation to incredibly small & compact system, enabling a completely new category of robotics. I’m excited to be collaborating with two different organizations who are using the Opteran stack – one adding diversity, the other creating a completely new category of extremely small robots that can navigate almost any environment safely – including sense-and-avoid, GPS-Denied navigation, and “place names” – the ability to describe a place and navigate to it later on.
Gur Kimchi
Founder, Advisor, ex head of Amazon Prime Air