The calls of frogs are a familiar noise from yards from backyards to the bush, but what is a frog call? From love songs to battle cries, frogs use vocal communication to find mates, fight over territories, and cry for help. Each frog species has a unique call, even though that call can differ place to place, just like human accents.
With the developments of predictive models, we can use multiple types of models to predict the species of each frogs or Anuran by its calls.
Aim : to build two effective predictive model that is able to accurately classify the Anuran species based on its calls and compare its accuracy.
Dataset Description
This dataset was used in several classifications tasks related to the challenge of anuran or frogs species recognition through their calls. It is a multilabel dataset with three columns of labels. This dataset was created segmenting 60 audio recordings belonging to 4 different families, 8 genus , and 10 species in which each audio corresponds to one specimen ( an individual frog/anuran ) . The record ID is also included as an extra column . After segmentation, there are 7165 syllables which will become the instances for train and test the classifiers. Some species are from the campus of Federal University of Amazonas, Manaus, others from Mata Atlântica, Brazil, and one of them from Córdoba, Argentina. The attributes are acoustic features extracted from the syllables of anuran calls, including the family, the genus, and the species label. Mel-frequencies cepstral coefficient (MFCCs) are coefficients that collectively make up an Mel-frequencies cepstrum (MFC). Due to each syllables having different length , every row was normalized . There are in total 22 attributes of MFCCs.
Data Analysis
Scatter Plot
Data Modelling
Two predictive models are built using Decision Tree and K-Nearest Neighbor (KNN) algorithm.
Result of the classification of each predictive models are given below.
Decision Tree
K-Nearest Neighbor
Conclusion
- Based on the classification results, we can see that the average precision of the K-Nearest Neighbor model is higher than the precision of the Decision Tree model.
- K-Nearest Neighbor also has the greater average value of recall.
- Thus, it can be deduced that in this case. K-Nearest Neighbor predictive model might be the better algorithm to use.
Project Part 2
Data Modelling
Two predictive models are built using Neural Network and Extreme Gradient Boosting (XGBoost)
Result of the classification of each predictive models are given below.
Neural Network
Extreme Gradient Boosting
Conclusion
- Based on the classification results, we can see that the average precision of the Extreme Gradient Boosting machine learning algorithm is more or less the same as the Neural Network Algorithm
- Extreme Gradient Boosting also has the greater average value of recall.
- Thus, it can be deduced that in this case, Extreme Gradient Boosting machine learning algorithm might be the same ML algorithm to use compared to Neural Network because of the overall more or less same score of average for accuracy, precision, recall and f1-Score.
- Hence, Both is good for ML algorithm