Pellegrino paper on the phonetic aspects of speech reversals

Preservation of primary phonetic and acoustic cues of phonemes trigger their perceptual identification. Time reversal of speech both preserves and alters phonetic and acoustic features of speech signals. Invariant features such the power spectra of a signal are usually maintained whilst properties such as duration and the shape of the temporal envelope, as well as finer details of the acoustic spectrum are altered (Grataloup, Hoen, Veuillet, Collet, Pellegrino and Meunier, 2009). Non-continuant speech sounds are more susceptible to altered perception in reversals as assymetry typically occurs in the shape of the temporal envelope. This is the case in stop bursts, abrupt vowel onsets, and ramping (smooth increase in amplitude) and damping (smooth decay) of signals (Pellegrino, Ferragne and Meunier, 2010). Time reversal of these features alter the characteristics of the speech signal, permitting perception of alternative phonemes, and even the addition of phonemes to the speech signal, or the omission of phonemes from the forward speech.

One study has been conducted that investigates the preservation of phonetic cues in time reversed speech and the perception of reversed phonemes. Pellegrino, Ferragne and Meunier (2010) conducted an experiment which required four phoneticians to listen to pseudowords that were recorded and played in reverse, and phonemically transcribe what they heard. The results of the study showed that around 25% of the original segments from the forward speech were exactly retrieved in reverse. The experiment also demonstrated that certain phoneme types were more likely to be distinguished than others. Fricatives (e.g. /f, v/) liquids (e.g. /l/) and nasals (e.g. /n, m/) were identified at a rate above 90%, and vowels at close to 90%. The authors suggest that the high rate of identification likely reflects the invariance of continuant waveforms preserving a high level of perceptual cues permitting perception. Rhotics (e.g. /r/) and voiced stops (e.g. /b, d, g/) were identified at an intermediate level (66.7% and 61.8% respectively). Listeners, however, were inaccurate with unvoiced stops (e.g. /p, k, t/), with a rate of only 9.4%, as well as schwas (mid central neutral vowel /ə/).

The ones that were not correctly recognised were identified as phonemes having alternative place and/or manner of articulation. 30% of unvoiced stops were transcribed as fricatives. 25% were identified as stops, which also included other stop types such as glottal stops or unreleased voiced stops. 28% were heard as a cluster; for example, a final /t/ in the natural speech was heard as an /sn/ cluster. The authors suggest that the /n/ arose from the ramping of the vowel in the time-reversal signal. 7% were transcribed as a sonorant (r, l, m, n, w, y) while 10% of the stop segments were not detected.

The findings of this study suggest that not only are speech sounds from the forward speech heard in reverse, sounds that are not in the forward speech are also perceived as phonemes.

These perceptions are typical in Reverse Speech. Although many phonemes from the forward speech are perceived, others are heard as alternative sounds, and this is certainly the case with unvoiced stops. They can be perceived as a phoneme with a different place of articulation (e.g. /t/ → /k/ or different manner of articulation (e.g. /t/ →/s/, /p/ → /f/). An alveolar stop and alveolar /l/ can convert into another alveolar consonant; for example, /t/ or /d/ may be perceived as /n/ or vice versa. Others may be heard as allophones (different variation of the one phoneme; e.g. /t/ → /ʔ/ or unreleased /t/), or a similar phoneme such as an alveolar tap /ɾ/. Phoneme addition can occur such as /t/→ /st/. Stop bursts can disappear when reversed, lost in the vowel sound that came before it in reverse, resulting in perception of an alternative phoneme, an unreleased allophone, or omission altogether. Omission of sounds from the forward speech is a common occurrence. Light articulation of consonants or the strong frication of vowels next to a consonant may result in non-recognition of the consonant.

Some sounds in time reversed speech are highly ambiguous and may be heard differently by different listeners. Alteration of phonemic cues through reversing or degrading of the sound through audio noise or poor audio quality contribute to ambiguity. In this case, one’s grammatical and lexical knowledge comes into play in phoneme selection, projecting the desired phoneme to produce meaning.

Reverse Speech is very much about the perception of speech sounds and finding meaning though the building of strings of language that make some grammatical and syntactical sense. But of course, this is very much the case for normal speech as well. We turn the sounds uttered by another into coherent meaning. When listening to speech, we cannot actually perceive each individual speech sound. We assume that they are there. However, if we were to examine the individual segments of spontaneous forward speech, we would find that not all phonemes of the heard words are recognisable; they may sound different or be missing altogether. Yet, there is ample remaining of the speech signal to perceive a coherent string of words. The rest is projected into it.

So, we can now see that Reverse Speech is composed of perceivable phonemes and segments. Not covered by Pellegrino et al. is whether the segments produce lexical information. It can be easily proven that they indeed do. However, to perceive strings of language correctly, one needs to operate within linguistic possibilities and parameters. This entails examination of phonemes and segments of reversed speech as well as comparing them to the information in the forward speech. This means understanding linguistic processes. This also means knowing that some speech sounds in forward speech can be heard differently to the sounds which normally make up words.  It is important to know what is wrong about the string of words just as it is important to know what is right. This helps to set reasonable linguistic parameters for what can be accepted as linguistically viable. There are innumerable examples out there in Reverse Speech World that are obviously not what they are claimed to say. There are also many that can sound like what they are attested to be, yet still lack the necessary evidence for it.

Yet, strings do occur that mirror acceptable language. Nevertheless, proving that they are anything but coincidental is another matter. Every day, there are perhaps trillions of strings of language produced by speakers around the world. Quite naturally, ‘words’ will appear that are purely coincidental, even if they are a grammatically acceptable string of two, three or four words which are composed of perhaps one or two content words and one or two particles. One can shake these in front of linguistics all day and get a response like “that’s interesting, but no cigar!”, even if they did seem to have some meaning regarding the speaker and what he was saying. For attention to be garnered, linguistically viable strings that are much longer need to occur; minimum 7 words in length with ample examples of ones that are more than 10 words and even as long as 15 -20 words.

Funnily enough, they exist.


Grataloup, C., Hoen, M., Veuillet, E., Collet, L., Pellegrino, F & Meunier, F. (2009). Speech Restoration: An Interactive Process, Journal of Speech, Language, and Hearing Research, 52, 827-838.
Pellegrino, F., Ferragne, E., & Meunier, F. (2010). 2010, a speech oddity: Phonetic transcription of reversed speech. Interspeech 2010, 1221 – 1224.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: