NEXIDIA Logo Speech Intelligence. Delivered
top menu
Technology
The Phonetic Approach
The NEXIDIA Advantage
Recorder Integration

 

 


THE PHONETIC APPROACH

Phonetics is the systematic study of the sounds of human speech. It provides a means of describing and classifying virtually all the sounds that can be produced by the human voice. This study is based on phonemes - the smallest unit of human speech.

All utterances made in the entire world have been catalogued within a 400 phoneme range. The majority of languages fall around the 40 phoneme range. Searches using phoneme pattern matching can be executed on:

  • blended words
  • proper names, slang, code words, brands, etc.
  • non-standard grammar patterns
  • ad-hoc use of different languages


Phonetic Searching

As archives of digital audio expand, and people need to find specific information within those archives, it becomes clear that a highly efficient method of searching recorded media is required. The metadata that currently tags audio information (such as title, date of recording, subject, or person) is not sufficient for the accurate and rapid retrieval of specifically requested data.

Nexidia’s Phonetic Search Engine (abbreviated as PSE, trademark-pending) is an open-vocabulary retrieval system, which greatly reduces the time, and increases the accuracy of searches against large collections of recorded speech. Searches can be conducted at speeds over 548,000 times faster than real-time playback of the recordings.


The Advantages of Phonetic Searching

There are compelling reasons why using the Phonetic Search Engine is preferable to using speech-to-text searches. The PSE has a completely open vocabulary. No base lexicon is required. In contrast, the speech-to-text method must map all words into lexicon entries. For example, if a word is not in the dictionary, the speech-to-text solution will not find it in the audio

Another advantage of the PSE is that accuracy is not compromised for speed. Speech-to-text must limit its search and must make hard decisions about word bindings – else searches are too slow and unpredictable. This is why speech-to-text lexicons are never large enough and seldom contain enough key search terms, which are often proper names or unusual phrases.

Some speech-to-text systems have tried to improve their accuracy by introducing a semantics-based constraints i.e., probability of word sequences process, that can sometimes produces inaccuracies and extends processing time. In addition, it will never be complete due to the inherent flexibility of the human language. Phonetic searching emphasizes how things sound, not what strict grammar rules may infer they mean. This is especially evident when searching for proper names. Exact spelling is not required.

For example, the PSE can find references to “Sudetenland” spelled properly, or even as “ Sue Dayton Land.” This might be an extreme example, but the utility of this kind of searching becomes obvious when you look at a name like “Qaddafi” that has been written with many different spellings such as Khaddafi, Quadafy, and Kaddafi. This name could be input into the PSE as KADOFFEE and still be found.




 
 

      HOME     DOWNLOADS     NEWS+EVENTS     LEGAL     PRIVACY     SITEMAP   © COPYRIGHT 2008