|
|
Details of work done in the area Speech Research and
System
Development
- Voice
Project
: Voice Oriented Interactive Computing Environment
(VOICE) project was undertaken as a part of Knowledge Based
Computer System (KBCS) Project, Phase II, under the aegis of (the
then) Department of Electronics (DOE). I was the Co-Principal
Investigator (Co-PI) from the time of initiation (1995) and was the
Principal Investigator (PI) from 1998 till the end of the project
(2000).
The
project was aimed to build application prototypes of voice input /
output systems in Indian Languages, by utilizing the technology
developed at TIFR during KBCS Project, Phase I. The systems developed
were : (1) An isolated word speaker independent speech recognition
system, with 200 word vocabulary. It used HMM methodology and was
trained by about 100 speakers that included male, female and
children. The queries were organized in a menu structure and were
used in a travel guide that accepts query (of a single menu word) by
voice. The recognition accuracy for the given application was near
100%. (2) A text-to-speech systems for Hindi. It accepts Hindi text
in an ITRANS-like transliteration code and generates corresponding
Hindi Speech. The output was acceptably intelligible. (3) A
continuous Hindi Speech Database. This contains 800 sentences that
were designed to be phonetically balanced. Number of speakers was
about 100. The CD of the database was handed over to the sponsoring
agency (DOE). [1995-2000]
- Text-to-Speech
Systems: Under
the KBCS [Knowledge Based Computer Systems] project, the development
of a text-to-speech system for Hindi was completed. Considerable
progress was made in developing a similar system for Indian English.
A morphological parser was designed for extracting root words from
the text for the purpose of subsequent dictionary look-up.
Compilation of a phonetic dictionary for Indian English was under
progress. [1998-1999]
- Speech
Recognition : As
an implementation of the TIFR-developed HMM based isolated word,
speaker independent speech recognizer, the development of a speech
activated `travel guide’ was undertaken under the KBCS
project.
Basic blocks of the system were developed and were working with
selective travel information, which could be augmented to make the
system a practically useful one. The system prompted the user to
specify the information needed by `speaking out’ any of the
menu
words displayed. There are several routes which can thus be traveled.
On recognizing the spoken word, the system prompted for further
clarification of the message with textual captions, graphics (maps
etc.) and voice. [1998-1999]
- Information
Access by Dialing: Work
was carried out in the area of Speech Recognition and Synthesis for
automatic interaction with Computers. A system for accessing
information from a database by voice, through telephone, was
proposed.
The
recognizer was based on Hidden Markov Model technique. Specifically,
a strict left-to-right model was employed and continuous Gaussian
mixture density was used. The state transitions were restricted to
only forward transactions up to two immediately following states. The
system was initially trained by Indian speakers. The recognition
accuracy was around 90%. The system was trained by a large number and
wider variety of people. “Word Spotting” technique
was pursued to
speed up information retrieval process.
The
R & D work on Speech Synthesis was based on Formant Synthesis.
A
“Text Analyzer” generated phoneme symbols and
stress/punctuation
markers from the input text. A context-sensitive, language-specific
rule module generated formant and other features corresponding to the
utterance for the final stage which was implemented by a series of
filters, modeling the vocal tract, excited by some source which was
either a series of quasi-periodic pulses, simulating the vibration of
the vocal chord or random noise, simulating the generation of noisy
sound.
The
Rule based phoneme to Acoustic-phonetic Parameter conversion stage
was the basic area of focus for research. The control parameters of
this model e.g. energies, pitch, `formant’ frequencies were
to be
varied to produce the desired utterances, specified by the phoneme
strings. The rule base, which governed the control parameters, was
very much language dependent, was dependent on phonetic context and
it was quite complex and extensive. The rules were developed for
Hindi and Indian English. [1997-1998]
- Real-time
Synthesizer: The
speech synthesizer software system (developed for the VOICE project)
was ported to SUN 10/51 and Pentium Machine platform. Structural
modifications of the same were also carried out for faster program
execution. The result was real-time synthesis on the above platforms.
It is significant that less than a second is now needed to synthesize
a second of synthetic speech. This is important for the following
reasons: (a) With real time synthesis, it is possible to read text
from a file, synthesize and continue to play back the synthesized
speech without any break. (b) Fast synthesis results in drastic
reduction in the time for synthesis experiments. [1995-1996]
|