Navigation is a filtering process. Like an insect whose compound eyes identify what is and isn’t a space it can fly through, sensors on robots are built to not just navigate the path ahead, but also to map out in real time all the obstacles to avoid. Two studies published this year, one in April 2019 and one in June, outline how, exactly, robots can be trained to navigate in the between spaces full of obstacles. Specifically, these are studies about moving through crowds of people, and moving through forests.
“Autonomous Navigation of MAVs in Unknown Cluttered Environments,” published June 20, 2019, looks at flying through forests despite the trees. Researchers at CINESTAV, Mexico’s Center for Research and Advanced Studies of the National Polytechnic Institute, and Intel Labs created a method for flying quickly through unfamiliar object-rich space.
Using measurements from a depth sensor, the framework maps the disparity in distance. Then, a second method generates a path through that space while accounting for the limitation of what the sensor cannot see. With the objects measured and the path plotted to a set goal, a third part of the framework generates actual plans for movement.
Besides testing in simulation, the researchers applied their framework to a robot that flew through a series of real-world challenges. These included a maze, an industrial warehouse, a lab with people working in it, and a forest. The mapping provided valuable, not just for passing through the space without relying on GPS, but also for escaping dead ends.
“The ability to escape ‘pockets’ or getting out of dead ends is fundamental to complete the navigation task in general cluttered environments,” wrote the authors. “Beyond local collision avoidance, this requires maintaining and keeping an up to date map of the explored areas together with a strategy for handling unexplored regions. Based on this map the robot should be able to generate a motion plan from the current position to the goal.”
While a particular lab might be full of scientists willing to provide background information for a robotics test, it’s harder to find a depth of situations where people freeze in place for training a navigation algorithm. Or it would have been, were it not for the variety and ubiquity of entries in the Mannequin Challenge, a viral film challenge that spread throughout social media in November 2016. The challenge, traditionally set to the song "Black Beatles" by Rae Sremmurd, involved people staying frozen in place while the person film the challenge navigated around them. As a meme, it was a novelty, destined for the same memory hole as planking. As research data, the Mannequin Challenge turns out to have been a gold mine.
“Learning the Depths of Moving People by Watching Frozen People,” published April 25, 2019, and written by a Google Research team, used vast amounts of Mannequin Challenge video to train algorithms to infer depth from still video, without a special depth sensor.
“In this paper, we take a data-driven approach and learn human depth priors from a new source of data: thousands of Internet videos of people imitating mannequins, i.e., freezing in diverse, natural poses, while a hand-held camera tours the scene,” wrote the authors. “Because people are stationary, training data can be generated using multi-view stereo reconstruction.”
The researchers claim their methods, derived from training on the mannequin challenge data set, offers better and more accurate depth prediction than existing methods that try to infer depth from a single camera. At least as relevant than the specific method is the creation of a data set, culled from 2,000 uploaded individual videos, that can train algorithms to understand how cameras move around people in space.
“The researchers also released their data set to support future research, meaning that thousands of people who participated in the Mannequin Challenge will unknowingly continue to contribute to the advancement of computer vision and robotics research,” wrote Karen Hao of MIT Technology Review, speaking to this specific data set. “While that may come as an uncomfortable surprise to some, this is the rule in AI research rather than the exception.”
The potential of training navigation algorithms on video sets pulled from massively available and public-facing clips is tremendous, especially if it can yield programs that can run on simple, commercial machines. Depth sensors are likely ideal of military robot navigation but the potential of operating on existing cameras makes it easier for a wider range of robots to fly through crowds or woods. The ethical implications of subverting private data sets, uploaded for ephemeral memes, into robotic navigation is important to take into account, especially when workers at technology companies are already calling into question the relationship between research intended for peaceful, civilian use being adapted to military purposes.
That said, draining on public domain video or video specifically owned by the Pentagon could provide a way for contractors, eager to adapt research like this that’s out in the world, into military navigation tools. Recording battlefield information can be hard, but training data sets on video from wargames and military exercises is likely an under-explored field, with potential benefits for the autonomous robots the military will want to field in the next decades.
Kelsey Atherton blogs about military technology for C4ISRNET, Fifth Domain, Defense News, and Military Times. He previously wrote for Popular Science, and also created, solicited, and edited content for a group blog on political science fiction and international security.