Sounds¶
This page documents the Sound type, which represents a sound file loaded in memory, and related functions. Sound is non-clonable.
Functions¶
- get_sounds()¶
Return a list of all the sounds in the current project.
- get_sound(path)¶
Return the Sound object from the current project whose path is path, or null if there is no such
sound. If the object exists but is not a sound, an error is thrown.
- get_current_sound()¶
Return the Sound object loaded in the current view, or null if the current view is neither an annotation view
nor a sound view.
- get_window_duration()¶
Return the duration of the visible window in the current annotation or sound view.
- get_selection_duration()¶
Return the duration of the selection in the current annotation or sound view, or 0 if there is no selection.
- get_visible_channels()¶
Return a list of the visible channel indices in the current annotation or sound view.
Structural transformations¶
These functions produce a new sound file on disk and return a fresh Sound handle (except
convert, which returns nothing). They do not modify the source and do not add the result to
the current project — call import_file(path) if you want the new file in the project. For
extract_sound_slice and concatenate_sounds, the on-disk format is inferred from the output
path’s extension (.wav, .aiff, .flac, .ogg, or .mp3). For convert, the
format is passed explicitly as a string so you can write to a path whose extension doesn’t match
the desired container, or write to an extensionless path. Data is streamed through libsndfile,
so even multi-hour files are processed in constant memory.
Support for .mp3 (and equivalently the "mp3" format string) depends on the libsndfile
build Phonometrica was linked against. If your platform’s libsndfile lacks MPEG support, any
attempt to write an MP3 raises a clear [I/O error] rather than silently producing a broken
file.
- extract_sound_slice(sound as Sound, t_start as Number, t_end as Number, path as String)¶
Extracts the samples in [t_start, t_end] (in seconds) from sound into a new sound file at
path and returns the resulting Sound. Times must satisfy 0 <= t_start < t_end <=
duration. Sample rate and channel count are preserved.
- concatenate_sounds(sources as List, path as String)¶
Concatenates the sounds in sources end-to-end into a new sound file at path and returns
it. All sources must share the same sample rate and channel count; any mismatch raises an error
identifying the first offending file. The output keeps the common rate and channel count, with
the format determined by path’s extension.
- convert(sound as Sound, path as String, format as String[, sample_rate as Number])¶
Writes sound to path in the given format, optionally resampling to sample_rate
(in Hz). When the rate argument is omitted, the source’s sample rate is preserved and the data
is streamed straight through libsndfile (a fast path that performs no resampling).
format is a case-insensitive string. The recognised names are "wav", "aiff" (also
"aif"), "flac", "ogg", and "mp3". A leading dot is allowed, so ".wav" and
"wav" behave identically. Unknown names raise an [Argument error]; names that are known
but unavailable in this libsndfile build raise an [I/O error].
Channel count is always preserved. When sample_rate is given, each channel is resampled
independently using the r8brain CDSPResampler24, so stereo (and any higher channel count) is
handled correctly. The output bit depth follows the source where the target container allows it:
PCM_24 stays PCM_24 for WAV/AIFF/FLAC, FLOAT stays FLOAT on WAV, and everything else falls back
to PCM_16; OGG always writes Vorbis and MP3 always writes Layer III.
convert returns no value. If you want the new file to appear in the current project, call
import_file(path) after the conversion.
Examples:
let s = get_current_sound()
# Re-encode without changing the sample rate.
convert(s, "/tmp/copy.flac", "flac")
# Downsample to 16 kHz, write as a WAV.
convert(s, "/tmp/16k.wav", "wav", 16000)
# The format string takes precedence over the extension.
convert(s, "/tmp/take_001.audio", "flac")
Acoustic measurement¶
- get_intensity(sound as Sound, channel as Integer, time as Number)¶
Returns the intensity (in dB) at the given time on the specified channel.
- get_mean_intensity(sound as Sound, channel as Integer, t1 as Number, t2 as Number)¶
Returns the mean intensity (in dB) between t1 and t2 on the specified channel.
- get_pitch(sound as Sound, channel as Integer, time as Number[, options as Table])¶
Returns the F0 value (in Hz) at the given time on the specified channel, or undefined if the
sound is unvoiced at that time. When the options table is omitted, all tracker settings come
from your current pitch-tracking preferences.
The options table can be written as a literal ({ "min_pitch": 80, "max_pitch": 400 }) or
built up with the named-argument syntax
(min_pitch = 80, max_pitch = 400 as trailing arguments). Both forms are exactly equivalent.
Validation is strict: any unknown key raises an error rather than being silently ignored, so a
typo like "min_picth" does not leave you wondering why your override had no effect. Keys you
do not supply fall back to your global pitch-tracking settings.
Supported keys:
method(string): pitch tracker to use (e.g."reaper").min_pitch(number): lower bound on the candidate F0, in Hz.max_pitch(number): upper bound on the candidate F0, in Hz.threshold(number): voicing threshold used by the tracker.octave_jump_cost(number): penalty applied to large frame-to-frame F0 jumps.voicing_cost(number): penalty controlling the voiced/unvoiced decision.silence_threshold(number): amplitude below which frames are treated as silent.octave_cost(number): bias toward higher candidates within each frame.use_gaussian(boolean): iftrue, apply a Gaussian window to the analysis frames.
Example:
let snd = get_sounds()[1]
# Defaults from settings.
let f0 = get_pitch(snd, 1, 0.5)
# Override the search range. Named-argument form:
let f0b = get_pitch(snd, 1, 0.5, min_pitch = 80, max_pitch = 400)
# Same call, table-literal form:
let f0c = get_pitch(snd, 1, 0.5, { "min_pitch": 80, "max_pitch": 400 })
- get_mean_pitch(sound as Sound, channel as Integer, t1 as Number, t2 as Number[, options as Table])¶
Returns the mean F0 value (in Hz) between t1 and t2 on the specified channel, averaged over
the voiced frames in that interval. When the options table is omitted, all tracker settings
come from your current pitch-tracking preferences.
options behaves exactly as for get_pitch(), with the same strict validation and the same
two equivalent call forms (table literal or named arguments). All keys
listed for get_pitch are accepted, plus:
time_step(number): frame step in seconds for the underlying pitch tracker.
Example:
let snd = get_sounds()[1]
let m = get_mean_pitch(snd, 1, 0.5, 1.2, min_pitch = 80, max_pitch = 400)
- get_formants(sound as Sound, channel as Integer, time as Number[, options as Table])¶
Returns an Array containing nformant rows and 2 columns. The first column contains formant
values (in Hertz), such that F1 is at index (1, 1), F2 is at index (2, 1), etc. The second column
contains the formants’ bandwidths: F1’s bandwidth is at index (1, 2), F2’s bandwidth is at (2, 2),
etc.
When the options table is omitted, all analysis parameters come from your current formant
settings. As for get_pitch(), options can be written as a literal
({ "nformant": 5, "lpc_order": 12 }) or with the
named-argument syntax
(nformant = 5, lpc_order = 12). Unknown keys raise an error.
Supported keys:
nformant(integer): number of formants to return.nyquist(number): maximum frequency considered for the topmost formant, in Hz. A common choice is 5000 Hz for adult male voices and 5500 Hz for adult female voices.window_size(number): analysis window duration, in seconds.lpc_order(integer): order of the LPC analysis.
Example:
let snd = get_sounds()[1]
# Defaults from settings.
let f = get_formants(snd, 1, 0.5)
# Female-voice band, 5 formants:
let f2 = get_formants(snd, 1, 0.5, nformant = 5, nyquist = 5500, lpc_order = 12)
- get_voice_report(sound as Sound, channel as Integer, t1 as Number, t2 as Number[, options as Table])¶
Computes the full voice-quality battery (jitter, shimmer, harmonics-to-noise ratio, plus a pulse summary) over the half-open
time interval [t1, t2) on the specified channel, and returns the result as a Table. When channel is 0, the per-frame
mean across channels is analysed (the “average” view).
When the options table is omitted, F0 search bounds default to 75 Hz and 600 Hz, matching Praat’s voice-report defaults.
options can be written as a literal ({ "f0_min": 100, "f0_max": 500 }) or with the
named-argument syntax (f0_min = 100, f0_max = 500). Unknown keys raise an error.
Supported keys:
f0_min(number): lower bound on REAPER’s periodicity search and the period filter, in Hz.f0_max(number): upper bound on REAPER’s periodicity search and the period filter, in Hz.
The returned table has 14 fields. num_pulses is the number of voiced glottal-closure instants detected by REAPER in the
selection. All other fields are Number values, and equal undefined (NaN) when there are not enough valid pulses or
voiced frames to compute the corresponding measure.
Field |
Description |
|---|---|
|
Number of voiced pulses (integer). |
|
Mean period over in-range pulses, in seconds. |
|
|
|
Mean |
|
Mean |
|
3-point relative average perturbation (dimensionless). |
|
5-point period perturbation quotient (dimensionless). |
|
Difference of differences of periods, equal to |
|
Relative shimmer (dimensionless). |
|
Mean |
|
3-point amplitude perturbation quotient (dimensionless). |
|
5-point amplitude perturbation quotient (dimensionless). |
|
11-point amplitude perturbation quotient (dimensionless). |
|
Harmonics-to-noise ratio, mean over voiced frames, in decibels. |
The pulse times come from REAPER [TAL2014], restricted to voiced regions; HNR is derived from the normalised autocorrelation strength of the Praat-style pitch tracker [BOE1993] along its chosen Viterbi path. Jitter and shimmer aggregates apply the same period (1.3) and amplitude (1.6) ratio filters as Praat’s voice report. See Voice report in the sound view documentation for full definitions.
Example:
let snd = get_sounds()[1]
let r = get_voice_report(snd, 1, 0.5, 1.2)
print "Pulses found: " & r.num_pulses
print "Local jitter: " & (100 * r.jitter_local) & " %"
print "HNR: " & r.hnr & " dB"
When a measure is undefined (e.g. on an unvoiced selection), the corresponding field holds NaN and prints as nan (or
undefined when serialised through JSON). A NaN field can be tested with the standard x != x idiom, since NaN
compares unequal to itself.
Spectrum and spectral moments¶
- get_spectrum(sound as Sound, channel as Integer, t1 as Number, t2 as Number)¶
Computes an FFT spectrum from the sound between t1 and t2 on the specified channel and returns a Spectrum object.
The resulting spectrum can be queried for its properties (see Fields below).
Example:
let snd = get_sounds()[1]
let spec = get_spectrum(snd, 1, 0.5, 0.55)
print spec.bin_count
print spec.bandwidth
- get_spectral_moments(sound as Sound, channel as Integer, time as Number, window as Number, min_freq as Number, max_freq as Number)¶
Computes the four spectral moments at the given time on the specified channel. window is the analysis
window duration (in seconds), and min_freq/max_freq define the frequency range (in Hz).
Returns a Table with the following keys:
cog: centre of gravity (1st moment), in Hzspread: standard deviation (2nd moment), in Hzskewness: skewness (3rd moment), dimensionlesskurtosis: excess kurtosis (4th moment), dimensionless
Example:
let snd = get_sounds()[1]
let m = get_spectral_moments(snd, 1, 0.5, 0.025, 1000, 10000)
print "COG = " & m["cog"]
print "Skewness = " & m["skewness"]
Reporting functions¶
These convenience functions display acoustic measurements in the output panel for the sound loaded in the current view. They are typically used from the console or from scripts attached to keyboard shortcuts.
- report_intensity(time as Number)¶
Displays the intensity at the given time in the current view.
- report_mean_intensity(t1 as Number, t2 as Number)¶
Displays the mean intensity between t1 and t2 in the current view.
- report_pitch(time as Number)¶
Displays the pitch at the given time in the current view.
- report_mean_pitch(t1 as Number, t2 as Number)¶
Displays the mean pitch between t1 and t2 in the current view.
- report_formants(time as Number)¶
Displays the values of the visible formants at the given time in the current view.
- report_mean_formants(t1 as Number, t2 as Number)¶
Displays the mean formant values between t1 and t2 in the current view.
Frequency conversion¶
- hertz_to_bark(f)¶
Converts frequency f (in Hertz) to bark. See [TRA1990].
Note: if f is an Array, the conversion is applied to all the elements in the array.
- bark_to_hertz(z)¶
Converts frequency z (in bark) to Hertz. See [TRA1990].
Note: if z is an Array, the conversion is applied to all the elements in the array.
- hertz_to_erb(f)¶
Converts frequency f (in Hertz) to ERB units. See [GLA1990].
Note: if f is an Array, the conversion is applied to all the elements in the array.
- erb_to_hertz(e)¶
Converts frequency e (in ERB units) to Hertz. See [GLA1990].
Note: if e is an Array, the conversion is applied to all the elements in the array.
- hertz_to_mel(f)¶
Converts frequency f (in Hertz) to mel.
Note: if f is an Array, the conversion is applied to all the elements in the array.
- mel_to_hertz(mel)¶
Converts frequency mel (in mel) to Hertz.
Note: if mel is an Array, the conversion is applied to all the elements in the array.
- hertz_to_semitones(f0[, ref])¶
Converts frequency f0 (in Hertz) to semitones, using ref as a reference frequency (in Hertz). If ref is not provided,
it is equal to 100 Hz.
Note: if f0 is an Array, the conversion is applied to all the elements in the array.
- semitones_to_hertz(st[, ref])¶
Converts the number of semitones st to Hertz, using ref as a reference frequency (in Hertz). If ref is not provided,
it is equal to 100 Hz.
Note: if st is an Array, the conversion is applied to all the elements in the array.
Sound fields¶
- path¶
Returns the path of the sound file.
- duration¶
Returns the duration of the file in seconds.
- sample_rate¶
Returns the sample rate of the file in Hertz.
- nchannel¶
Returns the number of channels in the file.
Spectrum fields¶
- bin_count¶
Returns the number of frequency bins in the spectrum.
- sample_rate¶
Returns the sample rate (in Hz) of the sound from which the spectrum was computed.
- bandwidth¶
Returns the bandwidth (frequency resolution) of the spectrum in Hz.
- max_frequency¶
Returns the maximum frequency in the spectrum (in Hz).
- start_time¶
Returns the start time (in seconds) of the analysis window.
- end_time¶
Returns the end time (in seconds) of the analysis window.
- peak_dB¶
Returns the peak power level in dB.
- floor_dB¶
Returns the floor power level in dB.
- lpc_order¶
Returns the LPC order used for spectral envelope estimation, or 0 if no LPC was computed.
- has_lpc¶
Returns true if an LPC spectral envelope has been computed.