Due to the prevalent use of the digital technologies in almost all areas, we here discuss the digital representations. s(n) is an frame of speech signal, equal to zero when n is out of the range [0,N-1]. Because s(n) is stationary in a short-time term, we achieve the auto-correlation of s(n), R(l). R(l)=sum[s(n)*s(n+l)], in which n is from 0 to N-l-1 and l is from N+1 to N-1. R(l) is an even function of l and gets its maximum at l=0. Since the voiced sound is produced by human vocal system, excited by a quasi-periodic sequence, s(n) does have periodicity, if we don't see the amplitude of the signal. So the auto-correlation function shows periodicity, getting its local peak when l=N0, 2N0 N0 is the pitch period. If s(n) is a frame of unvoiced sound, both s(n) and R(l) are random. Then we can discriminate the voiced and the unvoiced speech sound and determine the pitch of the voiced.
Now let's turn to the three method used for speech signal processing.
1) Short-time AMDF:.
AMDF is the abbreviation of Average Magnitude Difference Function, r(l). .
r(l)= sum[|s(n+l)-s(n)|], n=0~N-1.
r(l) has the similar characteristic with R(l) except that it achieves local minimum at points of integer times of N0. r(l) can be acquired much more simply than R(l) because only addition and subtraction are needed, while R(l) must use multiplication. But the accuracy of determination decreased with the reduced computation of the algorithm. .
In section 2) & 3), we will discuss two widely used methods to implement de-convolution computation in order to achieve one of the useful signals. For example, the input or the impulse response sequence of a system. .
2) Homomorphic speech signal processing:.
In this section, we will explore a process, in which we obtain a transformation of the signal, other than the auto-correlation function, which also shows quasi-periodicity so that we can determine the pitch. It is a model shows as follows:.