nach oben

2023 | Buch

Kapitel lesen Erstes Kapitel lesen

Advances in Speech and Music Technology

Computational Aspects and Applications

herausgegeben von: Anupam Biswas, Emile Wennekes, Alicja Wieczorkowska, Rabul Hussain Laskar

Verlag: Springer International Publishing

Buchreihe : Signals and Communication Technology

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book presents advances in speech and music in the domain of audio signal processing. The book begins with introductory chapters on the basics of speech and music, and then proceeds to computational aspects of speech and music, including music information retrieval and spoken language processing. The authors discuss the intersection in the field of computer science, musicology and speech analysis, and how the multifaceted nature of speech and music information processing requires unique algorithms, systems using sophisticated signal processing, and machine learning techniques that better extract useful information. The authors discuss how a deep understanding of both speech and music in terms of perception, emotion, mood, gesture and cognition is essential for successful application. Also discussed is the overwhelming amount of data that has been generated across the world that requires efficient processing for better maintenance, retrieval, indexing and querying and how machine learning and artificial intelligence are most suited for these computational tasks. The book provides both technological knowledge and a comprehensive treatment of essential topics in speech and music processing.

Inhaltsverzeichnis

Frontmatter

State-of-the-Art

Frontmatter

A Comprehensive Review on Speaker Recognition

Abstract

Speech is the most natural mode of human communication. In addition to the exchange of thoughts and ideas, speech is considered to be useful for extracting a lot of other information like language identity, gender, age, emotion, and cognitive behavior. It is also known to contain speaker identity information besides other useful information. The task of recognizing the identity of an individual from the para-linguistic cues present in his or her speech signal is known as speaker recognition. It finds numerous applications across different fields like biometrics, forensics, and access control systems. Research in this field has been carried out over several decades, focusing on various aspects like features, modeling techniques, and scoring. The significant advancements in the field of machine learning and deep learning in recent times have developed renewed interest among researchers in this area. This chapter presents a comprehensive literature review on speaker recognition with emphasis on the text-dependent case wherein a predefined text is used for authentication purpose. It discusses the feature extraction and modeling techniques from the earliest to the newest. It also surveys the different deep learning architectures that have resulted in state-of-the-art systems, having received impetus from the availability of increased data and high computational power.

Banala Saritha, Mohammad Azharuddin Laskar, Rabul Hussain Laskar

Music Composition with Deep Learning: A Review

Abstract

Generating a complex work of art such as a musical composition requires exhibiting a certain level of creativity. This depends on a variety of factors that are related to the hierarchy of musical language. Music generation has faced challenges by using algorithmic methods and recently is approaching them with deep learning models that are being used in other fields such as computer vision. In this chapter, we place into context the existing relationships between AI-based music composition models and human musical composition and creativity processes. First, we describe the music composition process, and then we give an overview of the recent deep learning models for music generation classifying them according to their relationship with some of the music basic principles: melody, harmony, structure, or music composition processes—instrumentation and orchestration. The relevance of classifying music generation models in those categories helps us to measure and understand how deep learning models deal with the complexity and hierarchy of music. We try to answer some of the most relevant open questions for this task by analyzing the ability of current deep learning models to generate music with creativity or the similarity between AI and human composition processes, among others.

Carlos Hernandez-Olivan, José R. Beltrán

Music Recommendation Systems: Overview and Challenges

Abstract

Recommendation systems are a backbone for promoting products and services on social networking and e-commerce websites. Major platforms such as Netflix, Amazon Prime, Spotify, and YouTube use recommendation systems to promote different commodities. The current systems are mainly based on metadata and collaborative techniques. Over the past few years, these systems have evolved with keyword filtering, item feature-based filtering, finding aggregates, and commonalities between users. Online streaming music applications nowadays dominate music consumption. Popular websites such as YouTube or social media applications allow users to listen to songs and recommend the songs based on user history. Thus, the need is identified to provide the user personalized, enhanced musical experience. This chapter covers the overview of recommendation systems and the challenges involved. Various challenges include cold start issues, relevant data in-availability, overspecialization, lack of freshness, data sparsity, and unreliable metadata. These issues apply to recommendation systems, in general, to address user needs effectively.

Makarand Velankar, Parag Kulkarni

Music Recommender Systems: A Review Centered on Biases

Abstract

Although there have been significant developments in music recommender systems (MRS), artists interested in promoting their artistic career and listeners interested in exploring new musical content do not always feel satisfied with the results obtained. Songs composed by some artists have very little chance of being recommended, and listeners in most cases repeatedly find the same songs. This chapter presents a review of the literature relating to the analysis of recommendation strategies, studies of biases, and audio datasets. Based on the literature review, an in-depth discussion examines how and why biases appear in MRS. A set of guidelines for handling biases is then proposed. Moreover, the specific case of non-superstar artists is analyzed from the point of view of datasets, considering its relevance for understanding biases.

Yesid Ospitia-Medina, Sandra Baldassarri, Cecilia Sanz, José Ramón Beltrán

Computational Approaches for Indian Classical Music: A Comprehensive Review

Abstract

Technological advances have resulted in massive digital music growth and created the need to efficiently analyze large volumes of digital music and extract helpful information needed to perform various musical tasks. Classical music in the Indian subcontinent, known as Indian classical music (ICM), has a long tradition and many followers. ICM has always been less researched, but it has changed in the past decade. Many researchers have started focusing on the tasks of ICM. These analyses and approaches by various researchers must be combined and analyzed in-depth to develop future research avenues. Therefore, this chapter critically reviews various approaches for the fundamental tasks in ICM. The basic concepts of ICM are also described in detail to get a precise hold of musical ideas. Moreover, the signal processing methods are examined to draw out valuable characteristics for specific tasks and their strengths and shortcomings in ICM. This chapter also highlights some broad research problems with the present methodologies and potential solutions to correct and increase efficiency.

Yeshwant Singh, Anupam Biswas

Machine Learning

Frontmatter

A Study on Effectiveness of Deep Neural Networks for Speech Signal Enhancement in Comparison with Wiener Filtering Technique

Abstract

This chapter intends to provide the optimum method between the Wiener filtering method and the neural network method for speech signal enhancement. A speech signal is highly susceptible to various noises. Many denoising methods include the removal of high-frequency components from the original signal, but this leads to the removal of parts of the original signal. Thus, the quality of the signal reduces which is highly undesirable. Our main objective is to denoise the signal while we enhance its quality. Two methods, namely, fully connected and convolutional neural network methods, are compared with the Wiener filtering method, and the most suitable technique will be suggested. To compare the output signal quality, we compute signal-to-noise ratio (SNR) and peak signal-to-noise ratio (PSNR). An advanced version of MATLAB with the advanced toolboxes such as Deep Learning toolbox, Audio toolbox, Signal Processing toolbox, etc. will be utilized for speech denoising and its quality enhancement.

Vijay Kumar Padarti, Gnana Sai Polavarapu, Madhurima Madiraju, V. V. Naga Sai Nuthalapati, Vinay Babu Thota, V. D. Subramanyam Veeravalli

Video Soundtrack Evaluation with Machine Learning: Data Availability, Feature Extraction, and Classification

Abstract

In this chapter, we attempt to combine multimodal data for video soundtrack evaluation, i.e., the task of evaluating whether the music chosen for a video is aesthetically suitable. We propose a method of collecting and combining relevant data from three modalities: audio, video, and symbolic representations of music. We describe and extract a comprehensive multimodal feature library. We construct a database by applying our method on a small set of available data from movie scenes. We implement and tune a classifier on our constructed database of features with adequate results. The proposed classification pipeline attempts to discriminate between real and fake examples of video soundtracks from movies. Finally, we describe some possible improvements on the proposed methods, and we point at some directions for future attempts at this and adjacent tasks.

Georgios Touros, Theodoros Giannakopoulos

Deep Learning Approach to Joint Identification of Instrument Pitch and Raga for Indian Classical Music

Abstract

Indian classical music is broadly divided into two classifications Carnatic and Hindustani music. The concept of raga and shruthi is fundamental in both Hindustani and Carnatic styles of music. Hence, when analyzing them, the identification of shruthi and raga is of prime importance. However, identifying the pitch, raga, and instrument from an Indian classical instrumental polyphonic audio for analysis proves to be quite a tough challenge. This chapter proposes a comprehensive comparison among convolution neural networks (CNN), recurrent neural network (RNN), and XGboost. There are three distinguishing feature sets created for each task. The first feature set comprises 26 defining features extracted from the songs, while the second feature set consists of 10 of the most significant features among the 26 defining features, and the third feature set is made up of 26 features extracted from source-separated audio files. The three models were created for individual tasks, and their performance is evaluated for the three feature sets. The implemented CNN and XGboost models have had an accuracy of around 70% and 90%, respectively. The RNN model, on the other hand, showed approximately 98% percent accuracy due to its internal memory and unique features.

The three individual tasks are combined to make a single model with RNN architecture. This combined model then demonstrated an approximate accuracy of 97.2% which is close to the accuracy obtained for the individual tasks.

For the three tasks, CNN, RNN, and XGboost had favorable accuracy.

Ashwini Bhat, Karrthik Gopi Krishnan, Vishal Mahesh, Vijaya Krishna Ananthapadmanabha

Comparison of Convolutional Neural Networks and K-Nearest Neighbors for Music Instrument Recognition

Abstract

Music instrument recognition is one of the main tasks of music information retrieval. Identification of instruments present in an audio track provides information about the composition of music. Music instrument recognition in polyphonic music is a challenging task. Existing approaches use temporal, spectral, and perceptual feature extraction techniques to perform music instrument recognition. In the proposed work, a convolutional neural network and k-nearest neighbor classifier framework are implemented to identify the musical instrument present in a monophonic audio file, and the performance of the two models is compared. The model is trained on the London Philharmonic Orchestra dataset which consists of six different classes of musical instruments. Mel spectrogram representation is used to extract features for the convolutional neural network model. For k-nearest neighbors, the Mel-frequency cepstral coefficient’s feature vectors are calculated to perform classification. This approach only works for monophonic music and cannot be used for polyphonic music. The model helps to label the unlabelled audio files so that manual annotation can be avoided. The model performed well with excellent result of 99.17% accuracy for the convolutional neural network and 97% accuracy for the k-nearest neighbor architecture.

S. Dhivya, Prabu Mohandas

Emotion Recognition in Music Using Deep Neural Networks

Abstract

In this chapter, we investigate the task of music emotion recognition using deep neural networks and adversarial architectures for music data augmentation to improve performance. For ground truth, we used Eerola/Vuoskoski’s 360-set, a fully labeled (by music experts) set of 360 soundtrack excerpts. As a baseline, we used handcrafted audio features and machine learning classification algorithms like SVMs. Then, we fed Mel spectrogram representations into several CNN architectures, employing two transfer learning techniques: freezing all layers except the classifier layer and fine-tuning the entire model to update all weights. We demonstrate the aforementioned classification performance in the classification tasks of valence, energy, tension, and emotions. Furthermore, as proved by comparative tests using a 17K-track of pop rock songs set to train source models, transfer learning works well even when sets are from totally different music domains. Finally, experiments showed that for all classification tasks, deep neural networks outperformed traditional ML methods.

Angelos Geroulanos, Theodoros Giannakopoulos

Perception, Health and Emotion

Frontmatter

Music to Ears in Hearing Impaired: Signal Processing Advancements in Hearing Amplification Devices

Abstract

Music is not just a complex sound encompassing multiple tones but an amalgamation between the harmonicity of the spectral components and its temporal relationships. Music perception involves complex auditory processing, starting at cochlea wherein fundamental pitch, duration, and loudness are encoded in the tiny hair cells of the inner ear. Derailing of the hair cells causes hearing loss, which in turn affects the spectrotemporal resolution. The offshoot of the hearing loss at the cochlear level translates physiologically into deficits in the extraction and coding of the pitch at the brainstem level, which in turn impairs pitch and temporal perception at the cortical level. Thus, hearing loss adversely affects perception of pitch, temporal, and loudness, all of which compound as difficulty appreciating the normal aspects of music perception. The contemporary solution FOR improving audibility and perception of speech sounds in the hearing-impaired population are the use of hearing aids and cochlear implants. Although these devices are primarily meant to amplify speech sounds, their utility is also advocated for music perception. Digital hearing aids employ sophisticated signal processing techniques to improve the perception of speech sounds. However these techniques do not, in music processing, and are not an alternative to human cochlea. The multichannel amplitude compression, used in hearing aids to improve audible range of loudness levels, can cause distortions in the temporal envelope of sounds resulting in poor quality for music perception. Additionally, fast-acting compression circuitry used in the modern digital hearing aids causes more temporal smearing (compared to slow-acting compression), adversely affecting music perception. The limited input dynamic range and higher crest ratio in AD converters of hearing aids fall short of processing live music. Unlike hearing aids, cochlear implants work on the principle of electrical stimulation. This auditory prosthesis processes the incoming sound and delivers its electrical output directly into the auditory nerve bypassing the cochlea. Though it is more sophisticated and advanced than hearing aids, it was developed to improve speech perception rather than music perception. A cochlear implant uses “N” number of electrode channels situated along the human cochlea to deliver its output. Since the partition along the human cochlea codes various frequencies in sounds (place coding) which cannot be matched by a surgically implanted electrode array, the cochlear implant users experience difficulties with pitch perception. The rate of stimulation used by the cochlear implant may not be higher enough to deliver the higher harmonics of musical sounds. The above discussed physiological limitations of hearing loss and technological limitations of hearing amplification devices on music perception are elaborated in this chapter. The book chapter also features comprehensive discussion on the advancements in signal processing techniques available in hearing amplification devices (hearing aids and cochlear implants) that can address these shortcomings.

Kavassery Venkateswaran Nisha, Neelamegarajan Devi, Sampath Sridhar

Music Therapy: A Best Way to Solve Anxiety and Depression in Diabetes Mellitus Patients

Abstract

In spite of many healthy individuals, diabetic patients suffer from a huge psychological and physiological disorders leading to heavy anxiety and depression. Video-taped observatory methods can even cure Alzheimer’s disease but can cause complex activities. To overcome such disorders, combinations of music and medication therapies play a vital role to find a solution. The main objective of the research work is to determine the effectiveness of music therapy in diabetes by using Beck Anxiety Method and Beck Depression Inventory methods with reliability of 0.67. The research methodology mainly focuses on the pre-evaluation stage, post-evaluation stages, and follow-up stages on diabetes patients. The reduced mean and covariance values evaluates the effectiveness of music therapy.

Anchana P. Belmon, Jeraldin Auxillia

Music and Stress During COVID-19 Lockdown: Influence of Locus of Control and Coping Styles on Musical Preferences

Abstract

The imposition of strict lockdown by the government of India during the first outbreak of COVID-19 had a remarkable impact on the well-being/wellness of the citizens. Studies around the globe demonstrated music as one of the effective strategies to enhance well-being during the lockdown. However, response to stressful events is modulated by individual characteristics like coping styles and locus of control (internal/external dependence) which have received little attention. The present chapter examined the use of music to cope with COVID-19 lockdown by these individual traits and their musical preferences during this period. A factor analysis yielded four music dimensions preferred by the participants during the lockdown: intense and electronic; cultural, emotional, and melodious; Indian contemporary and popular; and devotional music. Among the music genres, new and old Bollywood music were the most preferred genres. Participants with a higher internal locus of control, emotion, and problem-focused coping style demonstrated greater use of music in coping with stress. Problem-focused coping showed significant positive correlations with all the music dimensions, and emotion-focused coping style correlated with intense and electronic music; cultural, emotional, and melodious music; and devotional music. Internals showed no correlation with the different music genres. Externals showed a preference for intense and electronic and Indian contemporary and popular music. Listening to music had a significant positive effect on people high in emotion-focused and problem-focused coping styles and internal locus of control. However, it was not necessarily effective for people endorsing high external locus of control and avoidant coping. It implies that it can be used as a self-administered tool and therapeutically for people who engage in these coping styles and locus of control.

Junmoni Borgohain, Rashmi Ranjan Behera, Chirashree Srabani Rath, Priyadarshi Patnaik

Biophysics of Brain Plasticity and Its Correlation to Music Learning

Abstract

Brain plasticity is one of the hallmarks of learning and memory. Even within a lifetime, human brains can change both structurally and functionally, which is a basis of the remarkable capacity of the brain to learn or unlearn and to memorize or forget. It has been established that the presence or absence of external cues can induce biological changes in the brain over an elongated time scale. The plasticity of the brain is manifested at the level of synapses, at networks, and even at single neurons, which is termed as the intrinsic plasticity. Learning involves all of these mechanisms of brain plasticity. Music learning is also attributable to the plasticity of the brain. It is known that music requires intensive brain activities at different regions, whether it is simply listening to a music pattern, or performing, or even imaging music. From a biophysical point of view, music perception and learning is a correlation between sound waves and the biological changes that they induce in the brain. In this chapter, we highlight how brain plasticity is relatable to music learning by discussing the mechanisms and the experimental evidences.

Sandipan Talukdar, Subhendu Ghosh

Analyzing Emotional Speech and Text: A Special Focus on Bengali Language

Abstract

The present chapter describes the overall study on TTS systems by analyzing the special roles of emotions in speech and texts. Speech forms the basic channel for human communication and is the key medium for human-computer interaction (HCI) in recent trends. Moreover, emotions add the extra flavor on both the speech and text. The scope of this chapter covers not only the development of a Bengali TTS system by employing a state-of-the-art English TTS model but also how we can add more features to this TTS from the perspectives of various emotions. The present model contains only convolutional neural networks to facilitate faster training through parallelization, unlike RNN or LSTM which does serial processing. In addition, we have introduced a novel Bengali emotional dataset that was developed during this work and used for training our models. In this chapter, we have also discussed the roles of language independence and the scopes of incorporating multilingual features. We included our observations in developing a bilingual TTS model in English-Bengali language and the problems that we faced in working with multiple languages. Lastly, we have discussed the roles of emotions in speech. We have discussed our experiments to include happy emotion in generated speech as a case study and shown how this can be extended to other emotions. Finally, we proposed a high-level architecture of a TTS system with multilingual and emotional features.

Krishanu Majumder, Dipankar Das

Case Studies

Frontmatter

Duplicate Detection for for Digital Audio Archive Management: Two Case Studies

Abstract

This chapter focuses on identification of duplicate audio material in large digital music archives. The music information retrieval (MIR) problem to efficiently find duplicate in large collections is a solved problem. There are even off-the-shelf systems available to find duplicates. The applications of this technology, however, are still too unknown and underexploited.

This chapter describes duplicate detection and its many applications which include meta-data quality verification, improving listening experiences, reuse of meta-data, informed noise cancellation, optimising storage space and linking and merging archives. The applications of duplicate detection are illustrated with two case studies.

The first case study uses a collection of digitized shellac discs from the Belgian national public-service broadcaster. It shows a surprisingly high amount of duplicate material of around 38%. With some discs better preserved (and digitized) than others, linking duplicate material allows to redirect listeners to higher-quality audio.

An archive of early electronic music is the focus of a second case study. The archive has been digitized twice. Segmentation timestamps and other meta-data, originating from first digitization campaign, are reused to annotate higher fidelity digital audio from a second campaign.

The main contribution of this chapter is to highlight practical uses of duplicate detection. A secondary contributions are the findings detailed in the case studies. A third contribution is an evaluation of an updated fingerprinting system.

Joren Six, Federica Bressan, Koen Renders

How a Song’s Section Order Affects Both ‘Refrein’ Perception and the Song’s Perceived Meaning

Abstract

Digital technologies provide excellent possibilities to create various versions of musical stimuli without changing the performance. Thus, the effect of specific musical properties (such as pitch height, timing or section order) can be tested in a performance-neutral way. In a small listening experiment, the section order within a song is manipulated digitally to investigate several hypotheses. As expected, the perception of song sections as either verse, chorus or bridge turns out to be dependent not only on their musical and lyrical properties but also on their position within the song. The interpretation of a song’s meaning is also partly determined by section order. As evidenced by the author’s research, the participants’ interpretation of a song is mainly based on the song section that they perceive to be the ‘refrein’ (i.e. the chorus, or the leading refrain line).

Yke Paul Schotanus

Musical Influence on Visual Aesthetics: An Exploration on Intermediality from Psychological, Semiotic, and Fractal Approach

Abstract

Human experience involves the five senses and hence is multimodal. This includes aesthetic experiences and their memories as well. Human auditory and visual senses, being the two most involved of the five senses, are often entangled with each other and dominate both our utilitarian and aesthetic emotions. The nature of association between music and visuals—be it indifferent, compatible, or incompatible—can significantly influence the total emotional outcome for the audience. This work aims to investigate the influence of music complementary and contradictory to the visuals presented through abstract paintings. To quantitatively study this intermediality between music and visual arts, a three-way approach encompassing feature analysis, audience response, and nonlinear chaos-based fractal analysis was used. Results indicate toward dominance of musical contributions over the visuals in the total emotional experience of the audience and a prominent difference between the impact of complementary and contradictory music integration with abstract paintings.

Archi Banerjee, Pinaki Gayen, Shankha Sanyal, Sayan Nag, Junmoni Borgohain, Souparno Roy, Priyadarshi Patnaik, Dipak Ghosh

Influence of Musical Acoustics on Graphic Design: An Exploration with Indian Classical Music Album Cover Design

Abstract

Graphic designers often look for new strategies to create interesting and innovative designs for book cover, music album cover, advertisement, etc. In the case of Indian classical instrumental music, while designing the music album covers, graphic designers till date have mostly used iconic representations such as photographs or paintings of the concerned vocalist/instrument players or the specific music instrument(s). Sometimes, we have also noticed Indian ragamala paintings from famous Indian miniatures, ornamental designs with ornamental typography, and nature photography on the album covers. These approaches of music album cover design are very common and, to a great extent, conventional. In order to explore new pathways and ideas for designing music album covers, we conducted an experiment with a group of 30 design students, where two preselected instrumental (sitar) Indian classical music clips of contrasting emotions (slow-tempo sad/calm music vs. fast-tempo joyful/exciting music) were played at a gap of 1 hour and the designers were asked to create a suitable album cover for each of the music pieces as they listened to each continuous in a loop. We used semiotic analysis and fractal analysis (Detrended Fluctuation Analysis) of the chosen music pieces as well as the corresponding cover designs made by the designers to explore the impact of different musical acoustical features on the graphic designs of the music album covers and the nature of intermediality between these two mediums. Findings revealed that while the designers represented their musical experiences through the album covers, they mostly depicted the album covers in three different ways, namely, (1) the designers directly represented the moods and emotions of the music clips in their album cover designs. In this case, the designers dominantly used symbolic representation followed by indexical representation. (2) Some of the designers represented the visual imageries, which were evoked by the music clips. This type of album covers have dominant pattern of indexical representation followed by iconic representation. (3) Some designers represented the musical features in their album cover designs. This type of album covers mostly comes under iconic representation. A comparative quantitative study on the symmetry scaling behavior (using fractal analysis) of the acoustical waveforms of the two music clips as well as the designers’ created images also indicated that there was a clear correspondence between musical acoustical features and the depicted visual features in the album cover designs. Moreover, the findings of this study provided us a new innovative approach for creating music album cover design beyond the conventional approaches.

Pinaki Gayen, Archi Banerjee, Shankha Sanyal, Priyadarshi Patnaik, Dipak Ghosh

A Fractal Approach to Characterize Emotions in Audio and Visual Domain: A Study on Cross-Modal Interaction

Abstract

In this paper, we try to classify the emotional cues of sound and visual stimuli solely from their source characteristics, i.e., from the 1D time series generated from the audio signals and the two-dimensional matrix of pixels generated from the affective picture stimulus. The sample data consists of six audio signals of 15 s each and six affective pictures, of which three belonged to positive and negative valence, respectively. Detrended Fluctuation Analysis (DFA) has been used to calculate the long-range temporal correlations or the Hurst exponent corresponding to the audio signals. The 2D analogue of the DFA technique has been applied on the array of pixels corresponding to affective pictures of contrast emotions. We obtain a single unique scaling exponent corresponding to each audio signal and three scaling exponents corresponding to red/green/blue (RGB) component in each of the visual images. Detrended Cross-correlation (DCCA) technique (both 1D and 2D) has been used to calculate the degree of nonlinear correlation present between the sample audio and visual clips. To assess the proportion of cross-modal correlation in the emotional appraisal, Pearson correlation coefficient was calculated using the DFA exponents of the two modalities. The results and findings have been corroborated with a human response study based on the emotional Likert scale ratings. To conclude, we propose a novel algorithm with which emotional arousal can be classified in cross-modal scenario using only the source audio and visual signals while also attempting to assess the degree of correlation between them.

Shankha Sanyal, Archi Banerjee, Sayan Nag, Souparno Roy, Ranjan Sengupta, Dipak Ghosh

Inharmonic Frequency Analysis of Tabla Strokes in North Indian Classical Music

Abstract

Tabla is a majorly used accompanying rhythm instrument in North Indian classical music (NICM). In contrast with Western rhythm instruments such as drums, the membranes of the Tabla instrument are loaded with an ink. Due to this ink, the membrane vibrates for a longer duration and creates various inharmonic frequencies (overtones) that contribute to the timbre of the sound. In this research, we have analysed nine basic Tabla strokes with respect to their modes of vibrations and identified the prominent overtone frequencies and their corresponding musical notes that contribute to the homophonic texture of the Tabla sound. Though the duration of the overtones is short, they still contribute to the timbral quality of the Tabla stroke. It is found that both the Tabla drums’ membrane exhibit eight inharmonic overtones, each carrying a frequency corresponding to a musical note. These all notes contribute to the timbral structure of the audio produced by each basic Tabla stroke. This research aims at standardizing the Tabla instrument frequency overtones and corresponding musical notes that contribute to the homophonic texture of Tabla stroke sound.

Shambhavi Shivraj Shete, Saurabh Harish Deshmukh

Backmatter

Titel: Advances in Speech and Music Technology
herausgegeben von: Anupam Biswas
Emile Wennekes
Alicja Wieczorkowska
Rabul Hussain Laskar
Verlag: Springer International Publishing
Electronic ISBN: 978-3-031-18444-4
Print ISBN: 978-3-031-18443-7
DOI: https://doi.org/10.1007/978-3-031-18444-4