From AlphaGo to universal language translation, 2016 was an awesome year for advancements in artificial intelligence and machine learning. Progress in the field seems to be accelerating, as seen with both the algorithms developed and the applications created.
Deep learning – which has been the hottest trend in machine learning for the past decade — continued to dominate the wow moments this past year. Every month, creative variations and novel machine learning architectures led to impressive demonstrations of what was newly possible for artificial intelligence. We’ll focus on a few of the more prominent themes throughout 2016.
The past few years have seen a resurgence in the prominence of AI that can sense, operate, act, and have an effect on their environment, such as self-driving cars or chatbots. Particularly successful have been AI systems that combine deep learning with reinforcement learning.
The most prominent progress this year has been in how these AI systems combined high-level strategy with the computational equivalent of gut instinct. We saw this exemplified by Google DeepMind’s AlphaGo, which caused a huge splash when it bested human Go champion Lee Sedol. AlphaGo effectively “beat” the game of Go, which most AI researchers didn’t expect to happen for many more years.
AlphaGo’s breakthrough was two-fold. It trained using a policy network to narrow down which moves it would consider (strategy) and a value network to give it an idea how successful a move might be (computational “gut instinct”). It developed these by reviewing a large number of games between humans, and then it improved its technique even more by playing against itself hundreds of thousands of times.
Beating Montezuma’s Revenge was another significant step for DeepMind this past year. DeepMind had developed AI in the past that could beat simpler Atari games, but Montezuma’s Revenge is difficult because its rewards are few and far between – so the AI agent has little feedback to go off of to learn when it’s playing. To encourage the agent to explore more, and to help it better learn the trade-offs between exploration and exploitation, DeepMind introduced “artificial curiosity.”
Today’s AI systems are still rather narrow in their scope – that is, they can only do one task, like play chess or drive a car, but they can’t play chess and drive a car. But one of the most important trends to keep an eye on is the movement toward greater generality.
AI researchers are turning to techniques like transfer learning and multimodal representation to increase the breadth and well-roundedness of individual systems. With transfer learning, AI systems apply concepts or skills they learned in one context and apply it to others – if a system learns to hit a baseball with a bat in one video game, it could apply what it learned to hit a ping pong ball with a paddle in another. Multimodal representation allows an AI system to take data about how something looks, how it sounds, descriptions of it, etc., and recognize that all of this data can apply to one thing – rather than treating a photo of a woman laughing and hearing her laugh as two entirely separate data points, the AI can recognize that both represent the same basic idea of laughter.
With this aim of more generalized AI, a team from DeepMind found a way to get neural networks to learn new tasks while remembering previous tasks it was trained on. Before this, it was the norm for neural networks to forget the previous task they were trained on when being trained on a new one. Another DeepMind group created what they called progressive neural networks. These networks help an AI to not only remember previous tasks, but they also help it form connections between the new tasks it’s learning and aspects of previously learned tasks.
As AI become longer-lived, life-long learning and the accumulation of skills becomes central. A team from Technion has made some intriguing progress in lifelong hierarchical reinforcement learning, in which skills are reused as the system spends more time playing the game Minecraft. A team from Cornell has even made multimodal representation assist with transfer learning in the context of robotic manipulation of novel objects.For example, a roboticist could use this work to train an AI robot to handle all of the different types of appliances in your kitchen that is, the robot could learn the difference between your oven and your blender, and then realize on its own that using a never-before-seen food processor is rather like using the blender.
The visual world is rich with complexities and offers many learning opportunities for AI – both the systems and the scientists. The “simplest” of the AI visual tasks in 2016 was colorizing black and white photos. Another was to create new pieces of fine art that copy the styles of classical masters using only a simple template. An increasingly popular task within a number of teams was generating realistic images from equivalent pencil sketches, including of faces, bedrooms, and cars.
A team from Berkeley developed a technique that could be applied to several kinds of image-to-image translation tasks. Their system painted somewhat realistic shoes and handbags from extremely rough sketches, and it used the same underlying technique for coloring black and white photos, for generating realistic map terrain, and for turning daytime photos into nighttime photos.
Another way the year’s deep learning systems demonstrated an understanding of the visual world is with their improved ability to create images based on a text description of the scene. While similar feats have been done in prior years, the quality and resolution have improved substantially.
But the world is of course 3D, and it flows with time. To improve the AI’s ability to understand these dynamics, multiple groups have been working on the task of taking the first part of a video and training an AI to predict the next frame of the video or even many frames of a video.
Not only can AI use these deep learning techniques for translating images, but also text. Perhaps the 2016 development most indicative of the depth of understanding possible from these techniques was when Google Brain’s neural machine translation system translated between language pairs it had not seen before.
Another new language milestone was human parity in conversational speech recognition, achieved by Microsoft. Other impressive language feats from deep learning systems this past year include typing from dictation much faster than people type, lip reading better than human professionals, and lastly, detecting sarcasm, where humans are unlikely to give up their edge any time soon!