mycroft-core/mycroft/tts/mimic_tts.py

201 lines
5.6 KiB
Python
Raw Normal View History

# Copyright 2016 Mycroft AI, Inc.
#
# This file is part of Mycroft Core.
#
# Mycroft Core is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Mycroft Core is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Mycroft Core. If not, see <http://www.gnu.org/licenses/>.
In the 1970s computer users had to understand the arcane syntax of the machines they used. They programed their computers using the machine's native language and hardly gave it a thought. The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses. We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech. This is an important development. Some might even call it revolutionary. Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart. Not anymore. Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways. Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine. This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction. Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone. Joshua W Montgomery – May 17, 2016
2016-05-20 14:16:01 +00:00
import subprocess
import hashlib
import os
import os.path
from time import time, sleep
2016-09-16 18:12:43 +00:00
In the 1970s computer users had to understand the arcane syntax of the machines they used. They programed their computers using the machine's native language and hardly gave it a thought. The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses. We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech. This is an important development. Some might even call it revolutionary. Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart. Not anymore. Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways. Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine. This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction. Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone. Joshua W Montgomery – May 17, 2016
2016-05-20 14:16:01 +00:00
from mycroft import MYCROFT_ROOT_PATH
from mycroft.configuration import ConfigurationManager
2016-09-17 02:08:53 +00:00
from mycroft.tts import TTS, TTSValidator
from mycroft.util.log import getLogger
LOGGER = getLogger(__name__)
In the 1970s computer users had to understand the arcane syntax of the machines they used. They programed their computers using the machine's native language and hardly gave it a thought. The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses. We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech. This is an important development. Some might even call it revolutionary. Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart. Not anymore. Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways. Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine. This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction. Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone. Joshua W Montgomery – May 17, 2016
2016-05-20 14:16:01 +00:00
2016-09-17 02:08:53 +00:00
__author__ = 'jdorleans', 'spenrod'
In the 1970s computer users had to understand the arcane syntax of the machines they used. They programed their computers using the machine's native language and hardly gave it a thought. The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses. We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech. This is an important development. Some might even call it revolutionary. Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart. Not anymore. Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways. Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine. This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction. Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone. Joshua W Montgomery – May 17, 2016
2016-05-20 14:16:01 +00:00
2016-09-17 02:08:53 +00:00
config = ConfigurationManager.get().get("tts").get("mimic")
In the 1970s computer users had to understand the arcane syntax of the machines they used. They programed their computers using the machine's native language and hardly gave it a thought. The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses. We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech. This is an important development. Some might even call it revolutionary. Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart. Not anymore. Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways. Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine. This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction. Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone. Joshua W Montgomery – May 17, 2016
2016-05-20 14:16:01 +00:00
BIN = config.get("path", os.path.join(MYCROFT_ROOT_PATH, 'mimic', 'bin',
'mimic'))
In the 1970s computer users had to understand the arcane syntax of the machines they used. They programed their computers using the machine's native language and hardly gave it a thought. The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses. We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech. This is an important development. Some might even call it revolutionary. Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart. Not anymore. Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways. Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine. This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction. Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone. Joshua W Montgomery – May 17, 2016
2016-05-20 14:16:01 +00:00
class Mimic(TTS):
def __init__(self, lang, voice):
2016-09-16 18:12:43 +00:00
super(Mimic, self).__init__(lang, voice, MimicValidator(self))
2016-09-17 02:08:53 +00:00
self.init_args()
def init_args(self):
self.args = [BIN, '-voice', self.voice, '-psdur']
stretch = config.get('duration_stretch', None)
if stretch:
self.args += ['--setf', 'duration_stretch=' + stretch]
In the 1970s computer users had to understand the arcane syntax of the machines they used. They programed their computers using the machine's native language and hardly gave it a thought. The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses. We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech. This is an important development. Some might even call it revolutionary. Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart. Not anymore. Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways. Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine. This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction. Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone. Joshua W Montgomery – May 17, 2016
2016-05-20 14:16:01 +00:00
def get_tts(self, sentence):
key = str(hashlib.md5(sentence).hexdigest())
wav_file = os.path.join(mycroft.util.get_cache_directory("tts"),
key + ".wav")
if os.path.exists(wav_file):
phonemes = self.load_phonemes(key)
if phonemes:
# Using cached value
LOGGER.debug("TTS cache hit")
return wav_file, phonemes
# Generate WAV and phonemes
phonemes = subprocess.check_output(self.args + ['-o', wav_file,
'-t', sentence])
self.save_phonemes(key, phonemes)
return wav_file, phonemes
def save_phonemes(self, key, phonemes):
# Clean out the cache as needed
cache_dir = mycroft.util.get_cache_directory("tts")
mycroft.util.curate_cache(cache_dir)
pho_file = os.path.join(cache_dir, key+".pho")
try:
with open(pho_file, "w") as cachefile:
cachefile.write(phonemes)
except:
LOGGER.debug("Failed to write .PHO to cache")
pass
def load_phonemes(self, key):
pho_file = os.path.join(mycroft.util.get_cache_directory("tts"),
key+".pho")
if os.path.exists(pho_file):
try:
with open(pho_file, "r") as cachefile:
phonemes = cachefile.read().strip()
return phonemes
except:
LOGGER.debug("Failed to read .PHO from cache")
return None
2016-09-17 02:08:53 +00:00
def execute(self, sentence):
wav_file, phonemes = self.get_tts(sentence)
2016-09-17 02:08:53 +00:00
self.blink(0.5)
process = mycroft.util.play_wav(wav_file)
self.visime(phonemes)
process.communicate()
2016-09-17 02:08:53 +00:00
self.blink(0.2)
def visime(self, output):
start = time()
2016-09-17 02:08:53 +00:00
pairs = output.split(" ")
for pair in pairs:
if mycroft.util.check_for_signal('buttonPress'):
return
2016-09-17 02:08:53 +00:00
pho_dur = pair.split(":") # phoneme:duration
if len(pho_dur) == 2:
code = VISIMES.get(pho_dur[0], '4')
if self.enclosure:
self.enclosure.mouth_viseme(code)
duration = float(pho_dur[1])
delta = time() - start
if delta < duration:
sleep(duration - delta)
In the 1970s computer users had to understand the arcane syntax of the machines they used. They programed their computers using the machine's native language and hardly gave it a thought. The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses. We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech. This is an important development. Some might even call it revolutionary. Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart. Not anymore. Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways. Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine. This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction. Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone. Joshua W Montgomery – May 17, 2016
2016-05-20 14:16:01 +00:00
class MimicValidator(TTSValidator):
2016-09-16 18:12:43 +00:00
def __init__(self, tts):
super(MimicValidator, self).__init__(tts)
In the 1970s computer users had to understand the arcane syntax of the machines they used. They programed their computers using the machine's native language and hardly gave it a thought. The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses. We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech. This is an important development. Some might even call it revolutionary. Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart. Not anymore. Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways. Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine. This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction. Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone. Joshua W Montgomery – May 17, 2016
2016-05-20 14:16:01 +00:00
2016-09-16 18:12:43 +00:00
def validate_lang(self):
# TODO
In the 1970s computer users had to understand the arcane syntax of the machines they used. They programed their computers using the machine's native language and hardly gave it a thought. The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses. We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech. This is an important development. Some might even call it revolutionary. Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart. Not anymore. Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways. Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine. This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction. Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone. Joshua W Montgomery – May 17, 2016
2016-05-20 14:16:01 +00:00
pass
2016-09-16 18:12:43 +00:00
def validate_connection(self):
In the 1970s computer users had to understand the arcane syntax of the machines they used. They programed their computers using the machine's native language and hardly gave it a thought. The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses. We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech. This is an important development. Some might even call it revolutionary. Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart. Not anymore. Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways. Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine. This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction. Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone. Joshua W Montgomery – May 17, 2016
2016-05-20 14:16:01 +00:00
try:
subprocess.call([BIN, '--version'])
except:
2016-05-20 22:15:53 +00:00
raise Exception(
2016-09-16 18:12:43 +00:00
'Mimic is not installed. Run install-mimic.sh to install it.')
In the 1970s computer users had to understand the arcane syntax of the machines they used. They programed their computers using the machine's native language and hardly gave it a thought. The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses. We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech. This is an important development. Some might even call it revolutionary. Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart. Not anymore. Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways. Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine. This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction. Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone. Joshua W Montgomery – May 17, 2016
2016-05-20 14:16:01 +00:00
2016-09-16 18:12:43 +00:00
def get_tts_class(self):
In the 1970s computer users had to understand the arcane syntax of the machines they used. They programed their computers using the machine's native language and hardly gave it a thought. The 1980s birthed a new form of interaction between computers and users. For the first time computers became capable of understanding the most basic form of human communication - pointing and grunting. The mouse and the GUI revolutionized computing and made computers accessible to the masses. We have now entered a third era. We are rapidly approaching a time when computer systems will understand human language and respond using the most natural form of human communication – speech. This is an important development. Some might even call it revolutionary. Despite its importance, however, the technologies that will underpin this new method of interaction are the property of major tech firms who don't necessarily have the public's best interests at heart. Not anymore. Meet Mycroft – the worlds first open source natural language platform. Mycroft understands human language and responds with speech. It is being designed to run on anything from a phone to an automobile and will change the way we interact with open source technologies in profound ways. Our goal here at Mycroft is to improve this technology to the point that when you interact with the software it is impossible to tell if you are talking to a human or a machine. This initial release of the Mycroft software represents a significant effort by the Mycroft community to give the open source world access to this important technology. We are all hoping that the software will be useful to the public and will help to usher in a new era of human machine interaction. Our community welcomes everyone to use Mycroft, improve the software and contribute back to the project. With your help and support we can truly make Mycroft an AI for everyone. Joshua W Montgomery – May 17, 2016
2016-05-20 14:16:01 +00:00
return Mimic
2016-09-17 02:08:53 +00:00
# Mapping based on Jeffers phoneme to viseme map, seen in table 1 from:
# http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.221.6377&rep=rep1&type=pdf
#
# Mycroft unit visemes based on images found at:
# http://www.web3.lu/wp-content/uploads/2014/09/visemes.jpg
#
# Mapping was created partially based on the "12 mouth shapes visuals seen at:
# https://wolfpaulus.com/journal/software/lipsynchronization/
VISIMES = {
# /A group
'v': '5',
'f': '5',
# /B group
'uh': '2',
'w': '2',
'uw': '2',
'er': '2',
'r': '2',
'ow': '2',
# /C group
'b': '4',
'p': '4',
'm': '4',
# /D group
'aw': '1',
# /E group
'th': '3',
'dh': '3',
# /F group
'zh': '3',
'ch': '3',
'sh': '3',
'jh': '3',
# /G group
'oy': '6',
'ao': '6',
# /Hgroup
'z': '3',
's': '3',
# /I group
'ae': '0',
'eh': '0',
'ey': '0',
'ah': '0',
'ih': '0',
'y': '0',
'iy': '0',
'aa': '0',
'ay': '0',
'ax': '0',
'hh': '0',
# /J group
'n': '3',
't': '3',
'd': '3',
'l': '3',
# /K group
'g': '3',
'ng': '3',
'k': '3',
# blank mouth
'pau': '4',
}