I feel that academic researchers from different communities (e.g. psychology, computer science, and philosophy) have different expectations and targets to build an intelligent machine and we sometimes can't even find the common definitions about intelligence and its connection to cognition and consciousness itself. Here I will briefly introduce my own idea about the cognitive roadmap towards intelligent machines.
Here, what we mean by an intelligent machine, is an autonomous machine which can finish any tasks assigned to it, or assist people in an autonomous way. This kind of machine is usually implemented with Artificial General Intelligence (AGI). One prerequisite for this machine to function as intended, is the acquisition of global understanding, which can be achieved by the following steps:
Information integration of multi-modal channels. Recent research [1,2] mostly focus on multi-modal learning on sub-networks. The most interesting part in  is that the learning mimics the brain cross-modality function. In , the semantic information are not explicitly trained on the higher layers, but it emerges as pre-symbolic concepts in an unsupervised way. This work, to some extent, suggests that we may be able to reach the level B.
Understanding the sensory information by integrating such information which comes from different modalities and sensorimotor processes . In here, “understanding” (or grounding) means that the resulting (pre-) symbolic representation that can be further processed for some higher level cognitive processes, e.g. reasoning, inferencing.
Such integrated information can be globally accessed by different sub-levels of tasks . And specifically, there exist the top-down processes in which the integrated information also plays a top-down enrichment for the sub-level of cognitive processes, through body exploration (e.g. we control your body to touch a cup on the table to make sure what is its texture, to eliminate the uncertainties of the integrated information), anticipation etc.
On top of the integrated information, similar top-down processes also exists another prerequisite: higher-level of cognition (meta-cognition). This is a higher level of processes about understanding, deduction and management of the integrated information from the lower levels. For instance, in the previous example of touching the cup, there exists a confidence level about its perception of the texture in the meta-cognitive level.
To coordinate the sub-levels (intelligent modules or sub-networks) to adaptively survive/finish goals (e.g. to help the hosts).
Make interaction/reasoning/transfer-learning between sub-levels (probably from the integrated information).
To evaluate of self-confidence of sub-level processes and steps A and B.
We also hypothesize that the two prerequisites (global understanding and meta-cognition) stack in a hierarchical way and form more and more abstract of our intelligent processes, as well as the intelligent machines. While the abstractness of the integrated information, the subjective experience is formed. While the formality of the meta-cognition emerges, the introspection  is formed.
Another unavoidable question, in the road toward building intelligent machines, is that should we build a conscious machine? In my opinion, the current artificial intelligence methods (e.g. deep CNN) are unconscious or semi-consciousness (AlphaGo or machine translation). This kind of machines have abilities to finish a/some task(s) autonomously and efficiently. But they do not have shared information or meta-cognition to control such tasks.
Joni will be joining RE•WORK at the Machine Intelligence Summit in Hong Kong this June 6 - 7 where he will discuss cross-modal understanding and prediction for cognitive robots. Register now to guarantee your place at the summit.