Multi-modal Language
Natural Language Navigator
Project Principal Investigator(s): G. Srinivasaraghavan
Natural Language Interfaces to computing systems have been attempted for decades – natural language command line interface to operating systems, natural language query of databases, expert systems with natural language query and responses, etc. The success was limited and most attempts till about a decade ago were not much more than glorified templatized, filling-in-the-blanks kind of systems. The growth of deep learning in the last decade or so along with our ability to construct semantically meaningful word-embedding has accelerated our natural language processing ability tremendously leading to impressive work in image captioning, dialog systems, document classification, information extraction and question answering, text summarization etc. This project is broadly about converting natural language instructions into structured input to a computing system (structured instructions to an autonomous robot, sql queries on databases, shell commands to an OS, even api calls to a web-service, …). The specific use-case this project addresses is that of robot navigation through natural language instructions.
The project proposes to extend the simulator that was hosted at https://evalai.cloudcv.org/web/challenges/challenge-page/97/overview to include several environments (the IIIT-B campus to start with) and extend the work on natural language understanding to generate instruction for the robot navigation to reach a given destination. The algorithm is expected to take into account the scene witnessed during the navigation, relate it to the natural language instructions that refer to the visual features seen and then generate structured instructions for the robot to follow.
The problem demands new methods and techniques related to deep reinforcement learning, natural language understanding, and visual semantics.