THE PHONETIC APPROACH
Phonetics is the systematic study of the sounds of
human speech. It provides a means of describing and classifying
virtually all the sounds that can be produced by the human voice.
This study is based on phonemes - the smallest unit of human speech.
All utterances made in the entire world have been
catalogued within a 400 phoneme range. The majority of languages
fall around the 40 phoneme range. Searches using phoneme pattern
matching can be executed on:
- blended words
- proper names, slang, code words, brands, etc.
- non-standard grammar patterns
- ad-hoc use of different languages
Phonetic Searching
As archives of digital audio expand, and people need
to find specific information within those archives, it becomes clear
that a highly efficient method of searching recorded media is required.
The metadata that currently tags audio information (such as title,
date of recording, subject, or person) is not sufficient for the
accurate and rapid retrieval of specifically requested data.
Nexidia’s Phonetic Search Engine (abbreviated
as PSE, trademark-pending) is an open-vocabulary retrieval system,
which greatly reduces the time, and increases the accuracy of searches
against large collections of recorded speech. Searches can be conducted
at speeds over 548,000 times faster than real-time playback of the
recordings.
The Advantages of Phonetic Searching
There are compelling reasons why using the Phonetic
Search Engine is preferable to using speech-to-text searches. The
PSE has a completely open vocabulary. No base lexicon is required.
In contrast, the speech-to-text method must map all words into lexicon
entries. For example, if a word is not in the dictionary, the speech-to-text
solution will not find it in the audio
Another advantage of the PSE is that accuracy is not
compromised for speed. Speech-to-text must limit its search and
must make hard decisions about word bindings – else searches
are too slow and unpredictable. This is why speech-to-text lexicons
are never large enough and seldom contain enough key search terms,
which are often proper names or unusual phrases.
Some speech-to-text systems have tried to improve
their accuracy by introducing a semantics-based constraints i.e.,
probability of word sequences process, that can sometimes produces
inaccuracies and extends processing time. In addition, it will never
be complete due to the inherent flexibility of the human language.
Phonetic searching emphasizes how things sound, not what strict
grammar rules may infer they mean. This is especially evident when
searching for proper names. Exact spelling is not required.
For example, the PSE can find references to “Sudetenland”
spelled properly, or even as “ Sue Dayton Land.” This
might be an extreme example, but the utility of this kind of searching
becomes obvious when you look at a name like “Qaddafi”
that has been written with many different spellings such as Khaddafi,
Quadafy, and Kaddafi. This name could be input into the PSE as KADOFFEE
and still be found.