图1:Pitch model overview模型结构示意图dnn被训练来估计嵌入在现实世界背景噪声中的语音和音乐声音的F0。网络接收声学刺激的模拟听觉神经表征作为输入。绿色轮廓描述了示例卷积滤波器核在时间和频率上的范围(分别为水平和垂直维度)。b基频(F0)为200hz的谐音的模拟听觉神经表征。声音波形如图所示,其功率谱如图所示。波形在时间上是周期性的,周期为5ms。频谱是谐波的(即包含基频的倍数)。网络输入是瞬时听觉神经发射速率的数组(用灰度表示,颜色越浅表示发射速率越高)。 Each row plots the firing rate of a frequency-tuned auditory nerve fiber, arranged in order of their place along the cochlea (with low frequencies at the bottom). Individual fibers phase-lock to low-numbered harmonics in the stimulus (lower portion of the nerve representation) or to the combination of high-numbered harmonics (upper portion). Time-averaged responses on the right show the pattern of nerve fiber excitation across the cochlear frequency axis (the “excitation pattern”). Low-numbered harmonics produce distinct peaks in the excitation pattern. c Schematics of six example DNN architectures trained to estimate F0. Network architectures varied in the number of layers, the number of units per layer, the extent of pooling between layers, and the size and shape of convolutional filter kernels d Summary of network architecture search. F0 classification performance on the validation set (noisy speech and instrument stimuli not seen during training) is shown as a function of training steps for all 400 networks trained. The highlighted curves correspond to the architectures depicted in a and c. The relatively low overall accuracy reflects the fine-grained F0 bins we used. e Histogram of accuracy, expressed as the median F0 error on the validation set, for all trained networks (F0 error in percent is more interpretable than the classification accuracy, the absolute value of which is dependent on the width of the F0 bins). f Confusion matrix for the best-performing network (depicted in a) tested on the validation set. Credit: DOI: 10.1038/s41467-021-27366-6
“人工耳蜗可以做一个漂亮的好工作帮助人们理解语言,尤其是在安静的环境中。但他们真的不能很好地再现音调的感觉,”马克·萨德勒说研究生CBMM研究员,共同领导了该项目,也是K. Lisa Yang综合计算神经科学中心的就职研究生。“了解听力正常的人音高感知的详细基础很重要的原因之一是,我们可以更好地了解我们如何在假体中人工重现这种感觉。”