| 类型 | 说明 |
|---|---|
| 论文信息 | Classifying environmental sounds using image recognition networks Venkatesh Boddapatia, Andrej Petefb, Jim Rasmussonb, Lars Lundberg |
| 会议期刊 | International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES2017, 6-8 September 2017, Marseille, France |
| 介绍 | 说明 |
|---|---|
| 简介 | 1.环境声音自动分类有助于智能家居、远程监控的应用 2.现有移动设备普遍有图像深度神经网络模型,本文研究用现有图像分类模型Alexnet、GoogLeNet模型进行语音分类 |
| 常用音频预处理方法 | 说明 |
|---|---|
| (1) Framing-based | 1) Framing-based where the audio signals are separated into frames using a Hamming window. Then the features are extracted from each frame and classified separately. |
| (2) Sub-framing | 2) Sub-framing based processing where the frames are further subdivided and each frame is classified based on the majority voting of the sub-frames. |
| (3) Sequential processing | 3) Sequential processing where the audio signals are divided into segments of typically 30 ms with 50% overlap. The classifier then classifies the features extracted from these segments. |
| 数据集 | 说明 |
|---|---|
| (1)ESC-10 | 400个5秒音频,10类,每类40个音频 |
| (2)ESC-50 | 2000个5秒音频,50类,每类40个音频 |
| (3)UrbanSound8K | 8732个<=4s的环境声音音频,共10类, {0: 1000, 1: 429, 2: 1000, 3: 1000, 4: 1000, 5: 1000, 6: 374, 7: 1000, 8: 929, 9: 1000} |
| 方法 | 说明 |
|---|---|
| (1)特征 | MFCC(Mel-Frequency Cepstral Coefficients)、Spectrogram、CRP(Cross Recurrence Plot)三类特征单通道图以及组合成的三通道图(256×256) )![]() |
| (2)模型 | AlexNet、GoogLeNet |
| (3)设置 | 五折交叉验证、32kHz采样率、帧长30ms、帧重叠率50%、50训练周期、学习率0.01、SGD优化器、学习率计划:指数(0.95)衰减、256×256 |
| 实验 | 说明 |
|---|---|
| (1)三类特征两种模型实验 | ![]() **** |
| (2)采样率实验 | ![]() 最优采样率分别用于后续实验 |
| (3)帧长实验 | )![]() |
| (4)UrbanSound8K实验以及CRNN实验 | ![]() CRNN并没有提高准确度 |
| 结论 | 说明 |
|---|---|
| ① | 在ESC-50、ESC-10和UrbanSound8K数据集上,GoogLeNet的分类准确率分别为73%、91%和93%。 |
| ② | 三通道组合图并没有提高分类准确率、CRNN也没有得到较高精度 |

)


)
