How Accurate Is Speaking Simulator?

from lingustics on 2020-05-19 ↩ back

Speaking Simulator is a video game where you speak with your mouth. This is different to most video games where your stare blankly at dialogue options and press a button to speak.

Does this make Speaking Simulator better than The Witcher 3? Well that’s what we’re going to find out.

Today we’re discussing how accurate Speaking Simulator portrays the art of speaking. But first I’m going to give you a phonology lesson as we will need to know how speaking works before we can judge how accurate the game simulates it.

The production of any sound involves the movement of air. Most speech sounds are produced by pushing lung air through the vocal cords, up the throat, into the mouth or nise and finally out of the body.

Language textbooks often describe sounds by comparison with English, offering recipes for producing unusual sounds. One of my German textbooks has a dictionary filled with passages like this: ”kurios (koohr-ee-ohs): This word means strange, not curious. The German word for curious is neugierig (noy-geer-iH).” Here the word ”kurios” is shown to be pronounced as ”koohr-ee-ohs”.

Linguists instead use the IPA (International Phonetic Alphabet), a set of symbols with precise meanings. It utilises both ordinary letters and invented symbols. Many characters like [p] are pronouced similar to in English (where [p] is the p in ”pill”), but some can be unfamiliar like [ʃ] which is in ”shill”. For comparison ”kurios” is transcribed as /ku’ri̯o:s/.

I’m going to have to use IPA to transcibe some words, and as a convention to indicate something is in IPA: single characters will be in brackets [] and words will be in slashes //.

The sounds of all languages fall into two classes: consonants and vowels. Consonants are produced with some restriction or closure in the vocal tract that impedes the flow of air from the lungs. Place of articulation describes where the obstruction occurs. By convention, we start at the lips and move inward. Here’s a handy diagram of the mouth I found on the Internet:


Name Obstruction IPA Examples
Bilabial Both lips together [p], [b], [m]
Labiodental Touching the bottom lip to the upper teeth [f], [v]
Dental Tongue against the teeth [θ], [ð]
Alveolar Tongue behind the teeth [s], [z], [t], [d]
Palatal Tongue touching the top of the palate [ʃ], [ʒ]
Velar Back of the tongue against the back of the mouth [k], [g], [ŋ]
Uvular Tongue compressing way back in the mouth [ʀ]
Glottal Constricting the throat [h], [ʔ]

Speech sounds also vary in the way the airstream is affected as it flows from the lungs up and out the mouth and nose. It may be blocked or partially blocked; the vocal cords may or may not vibrate. This is the manner of articulation and is what separates sounds in each class from one another. Here are some of the common ones:

Name Closure IPA Examples
Stops (or Plosives) Complete closure [p], [b], [d]
Fricatives Impeded, enough to create a hissing sound [ʃ], [ʒ], [x]
Affricates A stop that releases into a fricative [tʃ], [dʒ]
Approximants Slightly impeded, no hissing sound [r], [l], [j]

These can be further separated with voicing. Sounds are voiceless when the vocal cords are apart so that air flows freely through the glottis. Sounds are voiced when the vocal cords are together and airstream forces its way through causing them to vibrate. The voiced/voiceless distinction is very imporant in English and distinguishes the words in pairs like the following:

Voiceless Voiced
rope (/rop/) robe (/rob/)
fate (/fet/) fade (/fed/)

Vowels are produced with little restriction of the airflow from the lungs out through the mouth and/or nouse. The quality of a vowel depends on the shape of the vocal tract as the air passes through.

Vowel sounds carry pitch and loudness; you can sing vowels or shout vowels. They may be longer or shorter in duration. Vowels can stand alone - they can be produced without consonants before or after them. You can say the vowels of beat /bit/ and boot /but/ without the initial [b] or [t], but you cannot say a [b] or [t] alone without at least a little but of vowel sound.

This long introduction is important in getting you to understand how much is happening when you make sounds (I haven’t covered close to all of it). But with our new knowledge, we should be ready to judge Speaking Simulator now. Let’s see what we need to do to speak in the game.


Oh dear, this is very unfortunate. On the left you can see the inside of the mouth like in the picture above. There are 3 buttons on the inside that you move the tongue with the WASD keys to press. To the right you can see your character and with the mouse you can drag their lips up/down and left/right. This is a total of 5 ways to create sounds. I haven’t counted how many there are in real life, but I’m pretty sure it’s more than 5.

It appears they have completely left out the manner of articulation and how air flows through the lungs to out the mouth while also leaving out many things about the place of articulation. I don’t really have to go any further do I? You can’t make very many distinct sounds with just 5 ways to move your mouth.

But for fun let’s go through a simple example. Here’s my guy trying to say ”Knock. Knock.”:


Oh deary me! They’ve got it completely wrong. All he did was move his jaw up and down! The transcription for ”Knock. Knock” is /nɒk nɒk/. Now let’s go through each character:

  • [n] is a voiceless alveolar (used in knock).
  • [ɒ] an open back vowel and I’m just realsing that I haven’t told you what ”open” or ”back” means so we’re going to ignore it.
  • [k] is a voiceless velar (used in knock).

With these, we see that our lips shouldn’t close at all and our tongue should start somewhere behind the teeth and end at the back of the mouth.

This is a huge dissapointment, I came to Speaking Simulator to see some high quality mouth movement, not this low quality mouth movement.

But Speaking Simulator gets a 9/10 because any game that starts like this is a masterpiece: