Davidson Gillespie (grilldryer9)
The purpose of this study was to assess the effect of downsampling the acoustic signal on the accuracy of linear-predictive (LPC) formant estimation. Based on speech produced by men, women, and children, the first four formant frequencies were estimated at sampling rates of 48, 16, and 10 kHz using different anti-alias filtering. With proper selection of number of LPC coefficients, anti-alias filter and between-frame averaging, results suggest that accuracy is not improved by rates substantially below 48 kHz. Any downsampling should not go below 16 kHz with a filter cut-off centered at 8 kHz.The goal of this study is to estimate vocal fold geometry, stiffness, position, and subglottal pressure from voice acoustics, toward clinical and other voice technology applications. Unlike previous voice inversion research that often uses lumped-element models of phonation, this study explores the feasibility of voice inversion using data generated from a three-dimensional voice production model. Neural networks are trained to estimate vocal fold properties and subglottal pressure from voice features extracted from the simulation data. Results show reasonably good estimation accuracy, particularly for vocal fold properties with a consistent global effect on voice production, and reasonable agreement with excised human larynx experiment.Mode-matching based multizone reproduction has been mainly focused on a purely two-dimensional (2D) theory, where infinite-long 2D secondary sources are assumed for 2D multizone reproduction. Its extension to the three-dimensional (3D) case requires more secondary sources and a higher computational complexity. This work investigates a more practical setup to use 3D sound sources as secondary sources for multizone reproduction in a 2D horizontal plane, i.e., 2.5D multizone reproduction. A weighted mode-matching approach is proposed to solve the dimensionality mismatch between the 2D desired sound field and 3D reproduced sound field. The weighting is based on an integral of Bessel-spherical harmonic modes over the entire control region. A detailed analysis of the weighting function is provided to show that the proposed method controls all the reproduction modes present on the 2D plane to minimize the reproduction error. The method is validated in both simulation-based and hardware-based experiments. The results demonstrate that in comparison with the conventional sectorial mode-matching method, the proposed approach can achieve more accurate reproduction over a wide frequency range and a large control region.The goal of this study is to determine potential intelligibility benefits from Lombard speech for cochlear implant (CI) listeners in speech-in-noise conditions. "Lombard effect" (LE) is the natural response of adjusting speech production via auditory feedback due to noise exposure within acoustic environments. To evaluate intelligibility performance of natural and artificially induced Lombard speech, a corpus was generated to create natural LE from large crowd noise (LCN) exposure at 70, 80, and 90 dB sound pressure level (SPL). Clean speech was mixed with 15 and 10 dB SNR LCN and presented to five CI users. First, speech intelligibility was analyzed as a function of increasing LE and decreasing SNR. Results indicate significant improvements (p less then 0.05) with Lombard speech intelligibility in noise conditions for 80 and 90 dB SPL. Next, an offline perturbation strategy was formulated to modify/perturb neutral speech so as to mimic LE through amplification of highly intelligible segments, uniform time stretching, and spectral mismatch filtering. This process effectively introduces aspects of LE into the neutral speech, with the hypothesis that this would benefit intelligibility for CI users. Significant (p less then 0.01) intelligibility improvements of 13% and 16% percentage points were observed for 15 and 10 dB SNR conditions respectively for CI users. The results indicate how LE and LE-inspi