Loading

CPC251 PROJECT AZRI ZAMRUD BIN KIMIN (153507) , HAZIQ BIN HIZUL (152770) , MUHAMMAD ARMAND BIN MUHAMAD FAZLI (153857) , SYED MUHAMMAD HAIKAL BIN SYED HUSNI (153086)

The calls of frogs are a familiar noise from yards from backyards to the bush, but what is a frog call? From love songs to battle cries, frogs use vocal communication to find mates, fight over territories, and cry for help. Each frog species has a unique call, even though that call can differ place to place, just like human accents.

With the developments of predictive models, we can use multiple types of models to predict the species of each frogs or Anuran by its calls.

Aim : to build two effective predictive model that is able to accurately classify the Anuran species based on its calls and compare its accuracy.

Dataset Description

This dataset was used in several classifications tasks related to the challenge of anuran or frogs species recognition through their calls. It is a multilabel dataset with three columns of labels. This dataset was created segmenting 60 audio recordings belonging to 4 different families, 8 genus , and 10 species in which each audio corresponds to one specimen ( an individual frog/anuran ) . The record ID is also included as an extra column . After segmentation, there are 7165 syllables which will become the instances for train and test the classifiers. Some species are from the campus of Federal University of Amazonas, Manaus, others from Mata Atlântica, Brazil, and one of them from Córdoba, Argentina. The attributes are acoustic features extracted from the syllables of anuran calls, including the family, the genus, and the species label. Mel-frequencies cepstral coefficient (MFCCs) are coefficients that collectively make up an Mel-frequencies cepstrum (MFC). Due to each syllables having different length , every row was normalized . There are in total 22 attributes of MFCCs.

Table 1 : Dataset of Anuran
Table 1 : Dataset of Anuran

Data Analysis

Scatter Plot
Figure 1 : MFCCs_1 against MFCCs_3
Figure 2 : MFCCs_3 against MFCCs_4
Figure 3 : MFCCs_5 against MFCCs_9
Figure 4 : Decision Tree Diagram of the Model

Data Modelling

Two predictive models are built using Decision Tree and K-Nearest Neighbor (KNN) algorithm.

Table 2 : Parameters of the predictive models.

Result of the classification of each predictive models are given below.

Decision Tree
Figure 5: Results of classification using Decision Tree model.
Figure 6: Confusion Matrix for Decision Tree model.
K-Nearest Neighbor
Figure 7: Results of classification using K-Nearest Neighbor model.
Figure 8: Confusion Matrix for K-Nearest Neighbor model.

Conclusion

  • Based on the classification results, we can see that the average precision of the K-Nearest Neighbor model is higher than the precision of the Decision Tree model.
  • K-Nearest Neighbor also has the greater average value of recall.
  • Thus, it can be deduced that in this case. K-Nearest Neighbor predictive model might be the better algorithm to use.
Project Part 2

Data Modelling

Two predictive models are built using Neural Network and Extreme Gradient Boosting (XGBoost)

Result of the classification of each predictive models are given below.

Neural Network
Figure 9 : Classification for Neural Network machine learning algorithm
Figure 10 : Confusion Matrix for Neural Network machine learning algorithm
Extreme Gradient Boosting
Figure 11 : Classification for XGBoost machine learning algorithm
Figure 12 : Confusion Matrix for XGBoost machine learning algorithm

Conclusion

  • Based on the classification results, we can see that the average precision of the Extreme Gradient Boosting machine learning algorithm is more or less the same as the Neural Network Algorithm
  • Extreme Gradient Boosting also has the greater average value of recall.
  • Thus, it can be deduced that in this case, Extreme Gradient Boosting machine learning algorithm might be the same ML algorithm to use compared to Neural Network because of the overall more or less same score of average for accuracy, precision, recall and f1-Score.
  • Hence, Both is good for ML algorithm