Lori Holt Carnegie Mellon University
Understanding how humans interpret the complexity of spoken language
Experience deeply shapes how human listeners perceive spoken language. We learn long-term phonetic representations and words that respect the sound structure of our native language and, yet, we maintain enough flexibility to make sense of experience with nonnative accents or speech from imperfect computer synthesis. There are rich behavioral-science literatures that speak to the many ways that experience shapes speech perception. Yet, for the most part, contemporary neurobiological models of spoken language are oriented toward characterization of the system in a stable state. We are just beginning to understand the learning mechanisms involved in supporting successful human speech communication. I will describe how experience shapes speech perception at different time scales - from the influence of a single precursor sound, to distributions of sounds across seconds, to statistical regularities in acoustics experienced across multiple training sessions.
In Part I, I will describe current thinking in how human listeners discover functional units in speech like phonemes and words and how this learning fundamentally shapes perception. In Part II, I will describe more dynamic aspects of speech comprehension that depend on very rapid adaptation and learning at short timescales. In general, this research demonstrates that human speech recognition is a flexible, adaptive, experience‐dependent skill that draws upon perceptual, cognitive, motor and linguistic systems. I will argue that human speech communication has much to offer machine listening and speech recognition and that - reciprocally - next-generation approaches to human speech processing will benefit a great deal from closer connection to machine systems.
|
Mounya Elhilali Johns Hopkins University
Reverse-engineering auditory computations in the brain
The perceptual organization of sounds in the environment into coherent objects is a feat constantly facing the auditory system. It manifests itself in the everyday challenge to humans and animals alike to parse complex acoustic information arising from multiple sound sources into separate auditory streams. While seemingly effortless, uncovering the neural mechanisms and computational principles underlying this remarkable ability remain a challenge facing both brain sciences and engineering systems. The perceptual organization of sounds in the environment into coherent objects is a feat constantly facing the auditory system. It manifests itself in the everyday challenge to humans and animals alike to parse complex acoustic information arising from multiple sound sources into separate auditory streams. While seemingly effortless, uncovering the neural mechanisms and computational principles underlying this remarkable ability remain a challenge facing both brain sciences and engineering systems.
In the first part of this talk, I review perceptual and neural underpinnings of processing complex soundscapes in the brain and discuss theoretical interpretations of biological processes in an effort to develop more robust sound processing technologies. In the second part of the talk, I will focus on the adaptive capabilities of the auditory system mediated by processes of attention and memory in order to facilitate the perceptual mapping of our acoustic surround. The ability of the auditory system to adapt based on goals and context holds important lessons for developing truly intelligent audio processing system.
|
Shantanu Chakrabartty Washington University St. Louis
Neuromorphic Computing at Cross-roads &
Neuromorphic Sensing: ways to approach energy-efficiency limits
Talk 1: As an isolated signal processing unit, a biological neuron is not optimized for energy efficiency. Constrained by the idiosyncrasies of ion-channel dynamics, a relatively large membrane capacitance and propagation artifacts through axonal pathways, a neuron typically dissipates an order of magnitude more energy than a highly optimized silicon neuron. In spite of such disparity, populations of biological neurons serve as marvels of energy- optimized systems. The biological basis for such energy-efficient and robust representation might lie in the nature of the spatiotemporal network dynamics, in the physics of noise-exploitation and through the use of neural oscillations. On the other hand, most synthetic and large-scale neuromorphic systems ignore these network dynamics, focusing instead on a single neuron and building the network bottom-up. From this approach, it is not evident how the shape, the nature and the dynamics of each individual spike is related to the overall system objective and how a population of neurons when coupled together can self-optimize itself to produce an emergent spiking or population response, for instance spectral noise-shaping or synchrony. Other well established synthetic neural network formulations (for example deep neural networks and support vector machines) follow a top-down synthesis approach starting with a system objective function and then reducing the problem to a model of a neuron that inherently does not exhibit any spiking or complex dynamics. This talk will provide an overarching view of the discipline of neuromorphic computing and discuss new perspectives on how to combine machine learning principles with biologically relevant neural dynamics.
Talk 2: In this talk I will discuss four neurosensory paradigms that I believe can push the limits of energy-efficiency for neuromorphic sensors like silicon cochlea. These include: (a) exploiting physics of the device and non-linearity for sensing and computation; (b) using learning and adaptation to compensate for mismatch; (c) exploit noise for sensing; and (d) use parallel but high-density combination of simple sensing elements. I will illustrate the first two paradigms using an implementation of a jump-resonance based silicon cochlea. For the last two paradigms, I will discuss some of our recent work in the area of variance-based signal representation where instead of suppressing noise is exploited to achieve sensing at zero-energy dissipation.
|
Barbara Shinn-Cunningham Carnegie Mellon University
Role of attention mechanisms in listening
Understanding speech in natural environments depends not just on decoding the speech signal, but on extracting the speech signal from a mixture of sounds. In order to achieve this, the listener must be able to 1) parse the scene, determining what sound energy belongs to the speech signal and what energy is from a competing source (perform auditory scene analysis), and 2) filter out the competing source energy and focus on the speech. Together, these processes allow a listener to focus attention on the speech and analyze its content in detail. In Part I of my presentation, I will illustrate these issues, including what acoustic features support auditory scene analysis and what features allow a listener to focus attention. In Part II, I will describe the different brain networks that control auditory attention, and how we measure the effects of attention on neural processing.
|
Ying Xu Western Sydney University
A Digital Neuromorphic Auditory Pathway
This talk gives an overview of my work on the development of a digital binaural cochlear system, and its applications to a “where” pathway and a “what” pathway model. The binaural cochlear system models the basilar membrane, the outer hair cells, the inner hair cells and the spiral ganglion cells. The “where” pathway model uses a deep convolutional neural network to analyse correlograms from the binaural cochlear system to obtain sound source location. The “what” pathway model uses an event-based unsupervised feature extraction approach to investigate the acoustic characteristics embedded in auditory spike streams from the binaural cochlear system.
|
Neeraj Sharma Carnegie Mellon University and Indian Institute of Science
Talker Change Detection: Humans, Machines, and the Gap
Studies on natural selection suggest - it is not the strongest of the species that survives, but rather, the one most adaptable to change. A similar strategy might be in play while listening to multi-talker
conversation, composed of multiple talkers speaking in turns. On the listener’s side, the perception of conversational speech demands quick perception and adaptation to talker changes to support communication. The mechanism in play is open for research, and understanding it will benefit design of automatic systems for the flagship problem of conversational speech analysis. In this talk, I will present a study examining human talker change detection (TCD) in multi-party speech utterances using a behavioral paradigm in which listeners indicate the moment of perceived talker change. Modeling the behavioral data shows that the human reaction time can be well estimated using the distance between acoustic features before and after change instant. Further, the estimation improves by incorporation of longer durations of speech prior to talker change. A performance comparison of humans with few of the state-of-the-art machine TCD systems indicates a gap yet to be filled in by machines.
|
Shayan Garani Srinivasa Indian Institute of Science
Spatio-temporal Memories
Inspired by the functioning of the brain, content addressable memories, such as the Kohonen self-organizing maps and its variants have been proposed and widely used in data science applications. However, this paradigm is ‘static’, in the sense, the input signal dynamics is not reflected within the memory of the neural network. Inspired by the seminal work of Alan Turing on morphogenesis that explains the formation of patterns in animals, we develop an analogous theoretical model for storing and recalling spatio-temporal patterns from first principles. The spatio-temporal memory is neuro-biologically inspired, and the neurons exhibit the 'temporal-plasticity' effect during recall. Future research directions and applications of this model will be highlighted towards the end of the talk.
|
Shihab Shamma University of Maryland
Cortical Mechanisms for Auditory Selective Attention and Decision Making
|
S. P. Arun Indian Institute of Science
Compositionality as the key to object perception
Compositionality refers to the premise that the whole can be understood in terms of its parts. This is a fundametal question in our quest to simplify neural representations. In the case of vision, it is widely believed that our brain has evolved to highly specialized feature detectors whose response is "more than the sum of their parts", thereby violating compositionality. A classic example is the idea of a grandmother cell, which responds to any image containing your grandmother, whether small or big, rotated towards or away etc. Such feature processing, it is believed, is what makes our brain so good at vision compared to the best computers today. Identifying these highly specialized features then becomes extremely difficult because a given image might contain a large number of features, and finding the right combination of features involves searching through a combinatorial explosion of possible feature subsets.
In my lab we are investigating these fundamental questions using a combination of experimental techniques. I will present a series of results from our lab that challenge these widely held beliefs about how higher-order visual processing works. Our key conceptual advance is that while identifying complex features is difficult, understanding how such features combine is in fact tractable. I will present a series of results showing that visual object representations are highly compositional in nature at both behavioral and neural levels. In particular I will show that the response to the whole object is systematically related to its parts, but the definition of parts require careful elaboration. Further, these systematic relationships can explain complex percepts like symmetry, visual word processing etc. Thus, it may be more insightful to understand how features combine than identifying the features themselves.
|