s' is a smoothing constant which is adjusted for the actual slice rate so as to give approximately the same results over a range of different sample rates: s' = s ** (1 / R) Where B_i represents the instantaneous value, and B_i' represents the smoothed value. Instead, we blend the instaneous values with the per-slice values from the last frame: B_i' = B_(i-1)' * s' + B_i * (1 - s') The instantaneous bar heights vary quite wildly, and even after applying the two improvements above, they flicker if plotted directly. Real music often contains narrow but perceptible peaks, which are swamped when averaged with neighbouring frequency powers. Since human hearing sensitivity has such a wide dynamic range, it’s much better to take the logarithm of the power instead.Ī further improvement is to take the logarithm of the maximum power in each range, instead of the average. The obvious way to obtain a instantaneous bar height would be to plot the average power in each frequency range (where power is the square of the magnitude of the complex amplitude). Higher values tend to hide too much of the detail at the high end of the spectrum. In this implementation, we use a gamma value of 2. A gamma value of 1 produces a linear division. Where f_i ranges over all frequencies obtained from the FFT, and B_i is the corresponding bin index. As a compromise, we use the same function used to correct for perceived brightness on CRT monitors: B_i = ((f_i / f_max) ** (1 / gamma)) * B_max However, due to the size of typical slices, we don’t have enough frequencies to do this without introducing large gaps. Ideally, frequencies should be mapped to bin indexes logarithmically, with the number of frequencies covered by a bar always being a fixed proportion times the number covered by the previous bar. Human perception of frequency follows a geometric progression (high C sounds the same “distance” from middle C as middle C from low C, although the actual frequencies involved follow a geometric progression). Gamma-corrected frequency rangeįirst, the frequency ranges are not uniform. The key differences are explained in the following sections. So far, this is exactly how any other FFT visualization works. The spectrum is divided into a fixed number of frequency bins, and a vertical bar drawn for each bin. In this implementation, the slice/frame rate is always between 32 and 64 Hz, depending on the incoming audio sample rate.įor each slice, a Hamming window is applied, and then the slice is transformed to give a spectrum. These slices contain a power-of-two number of samples, and the output video frame rate is locked to the slice rate. Like all FFT-based visualizations, this one starts by dividing the incoming audio stream into fixed-size slices. This problem can be fixed though, with a few simple modifications. This leads to a display which flickers wildly and doesn’t appear to move in time with the audio. If the output’s number in the series is lower than the mapped input range, you turn it on.There are a lot of FFT-based audio visualizations available, but they usually make the mistake of displaying a raw FFT-based bar graph.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |