[1] S. Adavanne, A. Politis and T. Virtanen, Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network, Proc. Eur. Signal Process. Conf. 2018.
[2] M. Arjovsky, S. Chintala and L. Bottou, Wasserstein generative adversarial networks, Int. Conf. Machine Learn.
2017, pp. 214–223.
[3] E. Benetos, S. Dixon, Zh. Duan, Sebastian ewert: automatic music transcription: an overview, IEEE Signal
Process. Mag. 36(1) (2019) 20–30.[4] S. Chakrabarty and E.A.P. Habets, Multi-speaker localization using convolutional neural network trained with
noise, Proc. Machine Learn. Audio Process. Workshop at NIPS, 2017.
[5] S.Y. Chang, B. Li, T.N. Sainath, G. Simko, and C. Parada, Endpoint detection using grid long short-term memory
networks for streaming speech recognition, Proc. Interspeech 2017.
[6] J. Devlin, M. Chang, K. Lee and K. Toutanova, BERT: pre-training of deep bidirectional transformers for language
understanding, arXivpreprintarXiv: 1810.04805, 2018.
[7] E.L. Ferguson, S.B. Williams and C.T. Jin, Sound source localization in a multipath environment using convolutional neural networks, Proc. IEEE Int. Conf.Acoustics, Speech, and Signal Process. 2018.
[8] T. Gan, M´usica colonial: 18th century music score meets 21st century digitalization technology, JCDL ’05: Proc.
5th ACM/IEEE-CS Joint Conf. Digital Libraries, pp. 379.
[9] M. Huzaifah, Comparison of Time-Frequency Representations for Environmental Sound Classification usingConvolutional Neural Networks, arXiv:1706.07156v1 [cs.CV], 2017.
[10] N.P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden and
A. Borcherset al., In-datacenter performance analysis of a tensor processing unit, in IEEE Computer Architecture(ISCA), 2017 ACM/IEEE 44th Annual Int. Symp. (2017) 1–12.
[11] N. Kalchbrenner, E. Elsen, K. Simonyan, S. Noury, N. Casagrande, E. Lockhart, F. Stimberg, A.v.d. Oord, S.
Dieleman and K. Kavukcuoglu, Efficient neural audio synthesis, arXiv preprintarXiv:1802.08435, 2018.
[12] T. Kawashima and K. IchigeK, Automatic piano music transcription by hadamard product of low-rank NMF and
CNN/CDAE outputs, IEEJ Trans. Electron. Inf. Syst. 139(10) (2019) 1106–1112.
[13] M. Kolbæk, Z.H. Tan and J. Jensen, Monaural speech enhancement using deep neural networks by maximizing a
short-time objective intelligibility measure, Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Process., 2018.
[14] J. Lee, J. Park, K.L. Kim and J. Nam, Sample-level deep convolutional neural networks for music auto-tagging
using raw waveforms, Proc. 14th Sound and Music Comput. Conf. Espoo, Finland, 2017, pp. 220–226.
[15] W. Li, L. Cao, D. Zhao, X. Cui and J. Yang, CRNN: Integrating classification rules into neural network, Proc.
Int. Joint Conf. Neural Network. 2013, pp. 1–8.
[16] Q. Liu, Y. Xu, J.B. Jackson, W. Wang and Ph. Coleman, Iterative deep neural networks for speaker-independent
binaural blind speech separation, Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process. 2018.
[17] S. Mishra, B.L. Sturm and S. Dixon, Local interpretable model-agnostic explanations for music content analysis,
ISMIR 2017.
[18] M. M¨uller, D.P.W. Ellis, A. Klapuri and G. Richard, Signal processing for music analysis, IEEE J. Selected
Topics in Signal Process. 5(6) (2011) 1088–1110.
[19] Musical scales, [Online]. Avalable: https://heptagrama.com/musical-scales.htm.
[20] B. PuriS and S.P. Mahajan, Optimum feature selection for harmonium note identification using ANN, 10th Int.
Conf. Comput. Commun. Network. Technol. 2019.
[21] S.B. Puri and S.P. Mahajan, Review on automatic music transcription system, Int. Conf. Comput. Commun.
Control and Automation 2017.
[22] P. Raguraman, R. Mohan and M. Vijayan, LibROSA based assessment tool for music information retrieval
systems, IEEE Conf, Multimedia Information Processing and Retrieval 2019.
[23] A. Rom´an Antonio Pertusa, Jorge Calvo-Zaragoza: Data representations for audio-to-score monophonic music
transcription, Expert Syst. Appl. 162 (2020).
[24] M. Schedl and S. B¨ock, Polyphonic piano note transcription with recurrent neural networks, IEEE Int. Conf.
Acoustics, Speech, and Signal Process. 1988.
[25] A. Schl¨uter, Learning to pinpoint singing voice from weakly labeled examples, ISMIR 2016.
[26] S. Sigtia, S. Dixon and E. Benetos, End-to-End neural network for polyphonic piano music transcription,
IEEE/ACM Trans. Audio, Speech, and Language Process. (2016) 927–939.
[27] Y.C. Subakan and P. Smaragdis, Generative adversarial source separation, IEEE Int. Conf. Acoustics, Speech
and Signal Process. 2018, pp. 26–30.
[28] D. Wang and J. Chen, Supervised speech separation based on deep learning: an overview, arXiv:1708.07524, 2017.
[29] T. Weyde, S. Sigtia, S. Dixon, G. D’Avila, E. Benetos, N. Boulanger-Lewandowski and S. Artur, A Hybrid
Recurrent Neural Network For Music Transcription, School of Electronic Engineering and Computer Science
Centre for Digital Music 1411.1623, 2014.
[30] G.A. Wiggins, S. Liu, L. Guo and F. Cong, A parallel fusion approach to piano music transcription based on
convolutional neural network, ICASSP 2018-2018 IEEE Int. Conf. Acoustics, Speech and Signal Process., 2018.
[31] J. Xu, B. Tang, H. Man and H. He, Semi-supervised feature selection based on relevance and redundancy criteria,
IEEE Trans. Neural Networks Learn. Syst. 28(9) (2017).
[32] A. Ycart and E. Benetos, Polyphonic Music Sequence Transduction with Meter Constrained LSTM Networks,Conf. ICASSP 2018-2018 IEEE Int. Conf. Acoustics, Speech and Signal Process. 2018.
[33] J. Zhang, X. Yu, W. Wan and J. Liu, An audio retrieval method based on Chroma gram and distance metrics,
Int. Conf. Audio, Language Image Process. 2010.
[34] A.Zhou, Chord detection uUsing deep learning, Int. Conf. Music Inf. Retrieval 2015.
[35] G. Zweig, C. Yu, J. Droppo and A. Stolcke, Advances in all-neural speech recognition, Proc. ICASSP, 2017.