Measuring Behavior '98: Ikeda & Ishii

Characteristics of cattle voice and their application to recognition of individuals

Y. Ikeda and Y. Ishii

Division of Environmental Science and Technology, Graduate School of Agriculture, Kyoto University, Kyoto, Japan

The objectives of the present research are to analyze cattle voice in the time and frequency domains. Using the obtained results, it is tried to recognize the individual animal with the voice characteristics. The final goal of the research is the quality control of livestock during breeding by understanding the intelligent behavior of the animal. The voice generating process is described with the linear prediction filter and the parameters contained in this model are estimated with the maximum entropy method (MEM). Using these parameters, the spectral envelope can be computed and the formant frequencies are read from the resonance frequencies. The formant frequencies can also be computed through solving numerically the characteristic equation obtained from the all pole model of the linear prediction filter [1]. In this paper, we tried mainly to recognize the individual animal with the filter parameters.

The number of cattle whose voices were recorded was six, that is, three grown cows (78-133 months), one female calf (13 months) and two male calves (9 months). The variety of the cattle was Japanese Black. The voices were recorded with the precision microphone at 7 to 9 o'clock before feeding in the morning. The total power of each voice of the individual animal was computed as the variance. Based on the values of variances, the voice levels could be classified into three groups, that is, high, medium, and low levels. The voice groups of the high and medium levels were analyzed in this research. From the final prediction error analysis, the number of linear prediction parameters was set to 15. The formants were from 200 Hz to 1800 Hz and they were almost constant for one voice. The power spectra of one grown cow and male calf had the overtone structure in which the formants were three to six times of the fundamental frequency. The fundamental frequencies for the cow and calf were 500 and 200 Hz, respectively.

It was tried to recognize the individual animal on the feature plane on which the two dimensional feature vector was defined with two filter parameters selected from the 15 parameters of the filter. For the high level voices of two caw and one calf, it was possible to recognize with the simple hyper plane, that is the straight line. For the voices including the high and medium levels, it was difficult to discriminate these three cattle with the simple discriminant function as the straight line. In order to improve the discriminant efficiency, it may be necessary to increase the dimension of the feature space and to use the higher discriminant technique. The formants estimated by solving numerically the characteristic polynomial of the all-pole model were significantly different from the resonance frequencies measured from the spectra calculated with the filter parameters. The cause of this discrepancy is unknown, but should be explored in the future, considering the number of the filter parameters and mathematical model of the vocal tract of the animal.

References

Deller, J.R.; Proakis, J.G.; Hansen; Discrete, J.H.L. (1993). Time Processing of Speech Signals. Macmillan.

Poster presented at Measuring Behavior '98, 2^nd International Conference on Methods and Techniques in Behavioral Research, 18-21 August 1998, Groningen, The Netherlands