I’ve wanted to use speech detection in my personal projects for the longest time, but the Google API has gradually gotten more and more restrictive as time passes. In order to ensure that my projects could work even without an internet connection, I looked for another speech recognition package that would preferably be easier to use. I found the Sphinx voice recognition suite of CMU to be a really great speech to text package. However, documentation and sample code is non-existent, so it took me forever to get anything done. Finally, I’ve figured it out! The example code is at the bottom of this post, but you can directly download it from Github here.
Here are the steps to take to get this working:
- Download SphinxBase and follow the install instructions
- Download PocketSphinx and follow the install instructions
- Download PocketSphinx-python and follow the install instructions
- Run the code below
The main problems I had with setting up PocketSphinx was the myriad of libraries that the main site told me to download. However, after lots of trial and error, I’ve realized that I really only need three.
- SphinxBase is the base package that all of the other Sphinx programs use
- PocketSphinx is the lightweight recognizer, since I was okay with the program being a bit inaccurate if it meant I could decode phrases faster
- PocketSphinx-python is the wrapper to allow us to program in the best scripting language ever.
The code basically sets up the microphone and saves each phrase detected as a temporary .wav file which the Sphinx decoder then translates into a list of strings representing the spoken words. A phrase is defined as a bunch of sound sandwiched by duration of silence. I stole most of the phrase detection code from someone else two years ago, though unfortunately, I can’t remember who. If you’re reading this, thank you! 🙂
Anyhow, in the initialization of the run loop, we first define what the minimum threshold should be in defining “silence”. Then we launch into an infinitely running loop that will continue to listen to sounds over the microphone, calling the Sphinx decoder whenever a phrase has been saved. A sliding average is used as well during phrase detection, to make things a bit more accurate. You can load different voice recognition models into the decoder config if you want this speech recognition code to work for different languages.
Now that I have this speech detection code in a neat little importable class, I’m really excited about future capabilities of my projects. So many ideas, so little time!
-Sophie
[Addendum] Thanks to Carl at jazzystring1@gmail.com for getting this code working with Python3!
Auslander
Years later, and your post just got me running on an offline, non-Google-API based speech transcription project that would have taken me a week to do manually. Thank you.
Sophie
Wow, that’s great news! If you’re interested in some more cutting edge stuff, I know that Tensorflow as some public implementations of offline speech recognition based on convolutional neural nets. It’s probably way more accurate than PocketSphinx, but would require some basic knowledge of machine learning. In case you’re interested, here’s a cool tutorial: https://github.com/tensorflow/docs/blob/master/site/en/r1/tutorials/sequences/audio_recognition.md
Otherwise, happy I was able to help!
Mainak Biswas
* Mic set up and listening.
Traceback (most recent call last):
File “test2.py”, line 161, in
sd.run()
File “test2.py”, line 122, in run
slid_win = deque(maxlen=self.SILENCE_LIMIT * rel)
TypeError: an integer is required
Im getting a type error. I tried to typecast but still error is there.
Sophie
Hmm, did you typecase to an int? What version of python are you using?
Can you try:
slid_win = deque(maxlen=int(self.SILENCE_LIMIT * rel))
Mainak Biswas
yep..I typecasted to ‘int’. I will try what u said. THANKS FOR THE REPLY 🙂
Sophie
Great! Let me know how it goes. 🙂
Zorawar Singh
hi Sophie can i use this code with audio file. wave format.
Sophie
Hi Zorawar, this already reads .wav file formats. Are you looking for a different encoding?
Mikener
Hi Sophie,
great work you have done. Maybe you can help me:
* Mic set up and listening.
Nothings happens then.
Use: Python 2.7 , pyaudio 0.2.11, pi3 B,
Tried several numbers with self.INPUT_DEVICE_INDEX in Class SpeechDetector -> but nothing new.
Attached USB-Mic works perfectly with “pocketsphinx_continuous” -command.
Mikener
Okay, some debug infos with logging-object:
INFO:TestLogger:INITIALIZED
INFO:TestLogger:Getting intensity values from mic.
INFO:TestLogger:r-value: 2181.74425632
INFO:TestLogger: Finished
INFO:TestLogger:cur_data:
INFO:TestLogger:x in slid_win: deque([0.0], maxlen=15)
INFO:TestLogger:cur_data:
INFO:TestLogger:x in slid_win: deque([0.0, 0.0], maxlen=15)
……
r-value – changes when i speak to the mic during setup_mic()
but it seems there is nothing in cur_data..
Sophie
During the setup_mic() phase, the mic is active and listening to the default intensity values (sound level) of the room. It’ll then set that level as the trigger value for when it’ll start recording speech. So, during the setup phase, try to have the mic be in a fairly quiet room so it’ll trigger during the actual detection phase. Let me know if that helps! 🙂
gundu
Hi,
I am running my code on windows. after a few corrections in my code, I am able to execute using the python script and here is the output.(for a few runs). I want to print the grimmer and the spoken words also. How do I get them?
Mic set up and listening.
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘
‘, ‘[SPEECH]’]Listening …
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘
‘, ‘‘]Listening …
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘
‘, ‘ugh’, ‘‘]Listening …
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘
‘, ‘[SPEECH]’, ‘‘]Listening …
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘
‘, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘ugh’, ‘‘]Listening …
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘
‘, ‘[SPEECH]’, ”, ‘and’, ‘[SPEECH]’, ‘‘]Listening …
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘
‘, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ‘and’, ‘[SPEECH]’, ”, ‘that’, ”, ‘that’, ”, ‘bad’, ”, ‘‘]gundu
how to print the recognition result?
Sophie
I’m not sure about grammar, but the [SPEECH] bracket means that the decoder couldn’t interpret the words that it was hearing. I’d suggest retraining the decoder, or listening to the recorded voice files to see if there’s an issue with the clarity of the sound file you’re passing to the decoder.
RookieConverter
Hello Sophie,
Like you said i followed all the instructions you have mentioned above:
I have downloaded and complied both sphinxbase & pocketsphinx inside a folder.
Now when i am trying to run your program it does not do anything. Sorry i dont know what i am doing wrong.
using Windows and Python 3.5.4
Also i am not able to find the below folders:
DATADIR = “C:\Python34\Lib\site-packages\pocketsphinx\test\data”
I am totally new to Python and i am running your code using IDLE. There are no errors as such but i guess the terminal would show a message as * Mic set up and listening. like you have specified in your run(self) function.
Please Help 🙂 !!
Carl David
To those who are encountering the “new Decoder returned -1 error”, fix the path location of your model in line 40 and 41 🙂
Carl David
Thanks Sophie for your amazing post 🙂 It helps a lot. To those who are having a hard time running this code in Python 3+ (3.6 specifically) due to big changes to its core, here’s the code
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *
import os
import pyaudio
import wave
import audioop
from collections import deque
import time
import math
“””
Written by Sophie Li, 2016
http://blog.justsophie.com/python-speech-to-text-with-pocketsphinx/
“””
class SpeechDetector:
def __init__(self):
# Microphone stream config.
self.CHUNK = 1024 # CHUNKS of bytes to read each time from mic
self.FORMAT = pyaudio.paInt16
self.CHANNELS = 1
self.RATE = 16000
self.SILENCE_LIMIT = 1 # Silence limit in seconds. The max ammount of seconds where
# only silence is recorded. When this time passes the
# recording finishes and the file is decoded
self.PREV_AUDIO = 0.5 # Previous audio (in seconds) to prepend. When noise
# is detected, how much of previously recorded audio is
# prepended. This helps to prevent chopping the beginning
# of the phrase.
self.THRESHOLD = 4500
self.num_phrases = -1
# These will need to be modified according to where the pocketsphinx folder is
MODELDIR = “pocketsphinx/model”
DATADIR = “pocketsphinx/test/data”
# Create a decoder with certain model
config = Decoder.default_config()
config.set_string(‘-hmm’, os.path.join(MODELDIR, ‘en-us/en-us’))
config.set_string(‘-lm’, os.path.join(MODELDIR, ‘en-us/en-us.lm.bin’))
config.set_string(‘-dict’, os.path.join(MODELDIR, ‘en-us/cmudict-en-us.dict’))
# Creaders decoder object for streaming data.
self.decoder = Decoder(config)
def setup_mic(self, num_samples=50):
“”” Gets average audio intensity of your mic sound. You can use it to get
average intensities while you’re talking and/or silent. The average
is the avg of the .2 of the largest intensities recorded.
“””
print (“Getting intensity values from mic.”)
p = pyaudio.PyAudio()
stream = p.open(format=self.FORMAT,
channels=self.CHANNELS,
rate=self.RATE,
input=True,
frames_per_buffer=self.CHUNK)
values = [math.sqrt(abs(audioop.avg(stream.read(self.CHUNK), 4)))
for x in range(num_samples)]
values = sorted(values, reverse=True)
r = sum(values[:int(num_samples * 0.2)]) / int(num_samples * 0.2)
print (” Finished “)
print (” Average audio intensity is %s ” % r)
stream.close()
p.terminate()
if r self.THRESHOLD for x in slid_win]) > 0:
if started == False:
print (“Starting recording of phrase”)
started = True
audio2send.append(cur_data)
elif started:
print (“Finished recording, decoding phrase”)
filename = self.save_speech(list(prev_audio) + audio2send, p)
r = self.decode_phrase(filename)
print (“DETECTED: %s” % r)
# Removes temp audio file
os.remove(filename)
# Reset all
started = False
slid_win = deque(maxlen=int(self.SILENCE_LIMIT * rel))
prev_audio = deque(maxlen=int(0.5 * rel))
audio2send = []
print (“Listening …”)
else:
prev_audio.append(cur_data)
print (“* Done listening”)
stream.close()
p.terminate()
if __name__ == “__main__”:
sd = SpeechDetector()
sd.run()
Sophie
Hey Carl,
Thanks for getting the code working with Python 3! I’ll add your code as an addendum to my post if that’s ok.
Carl David
Sorry for late reply. Yes you could append the code 🙂 Thank you.
MCC
Hi Carl,
Thanks for sharing your fantastic input.
Can you please re-post your code as am running into few error messages when executing the code?
Sophie – Well done for your input as well.
Thanks,
Sophie
Hi MCC, apologies for the late reply. Have you worked through the issues in your code? I did add Carl’s python3.x implementation to the bottom of the post.
majo
Hi Sophie! I hope you know there are people all over the world trying to compile your code
I have I similar issue, when I do the cast that you suggest, another come up. I can belive that python 2.7 and 3.6 change this so much
* Mic set up and listening.
Starting recording of phrase
Finished recording, decoding phrase
Traceback (most recent call last):
File “sophie.py”, line 167, in
sd.run()
File “sophie.py”, line 145, in run
filename = self.save_speech(list(prev_audio) + audio2send, p)
File “sophie.py”, line 86, in save_speech
data = ”.join(data)
TypeError: sequence item 0: expected str instance, bytes found
Have any suggestinon to fix this one?
Sophie
Hi Majo,
I’m happy that my code has helped!
Try casting data to a str. On line 86,
data="".join(str(data))
.Robyn
Sorry to bother you again, but after switching to Anaconda(because I was told it was the best program for beginners such as myself), things are going a bit more smoothly, but I keep getting this error:
runfile(‘C:/Users/ccatx/Downloads/pystuff/Lib/site-packages/deathtrial1.py’, wdir=’C:/Users/ccatx/Downloads/pystuff/Lib/site-packages’)
Traceback (most recent call last):
File “”, line 1, in
runfile(‘C:/Users/ccatx/Downloads/pystuff/Lib/site-packages/deathtrial1.py’, wdir=’C:/Users/ccatx/Downloads/pystuff/Lib/site-packages’)
File “C:\Users\ccatx\Downloads\pystuff\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 705, in runfile
execfile(filename, namespace)
File “C:\Users\ccatx\Downloads\pystuff\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 102, in execfile
exec(compile(f.read(), filename, ‘exec’), namespace)
File “C:/Users/ccatx/Downloads/pystuff/Lib/site-packages/deathtrial1.py”, line 164, in
sd = SpeechDetector()
File “C:/Users/ccatx/Downloads/pystuff/Lib/site-packages/deathtrial1.py”, line 48, in __init__
self.decoder = Decoder(config)
File “C:\Users\ccatx\Downloads\pystuff\Lib\site-packages\pocketsphinx\pocketsphinx.py”, line 275, in __init__
this = _pocketsphinx.new_Decoder(*args)
RuntimeError: new_Decoder returned -1
I have redirected the MODELDIR and DATADIR to what I believe are the right pathways, and I have put sphinxbase in my pocketspinx folder(I do not seem to have a stt.py file anywhere on my computer, so I have not added that), but neither have worked.
I doubt this affects anything, but just in case it does, I get a warning sign next to the first two lines that read:
‘from pocketsphinx import *’ used; unable to detect undefined names
‘from sphinxbase import *’ used; unable to detect undefined names
Thanks in advance!
Sophie
Hi Robyn,
Wow, this is a late reply–but better late than never?
The error at the bottom is because you’re using wildcard imports, which the Flake8 Python style checker doesn’t like. It’s a style issue, so that shouldn’t have anything to do with the errors you’re seeing.
One thing that catches my attention is the direction of the / and \ when referring to the file directories. Windows usually uses backwards-slash “\” and Unix uses “/” which is what I wrote this code in. You could try changing the direction of the slashes so they fit? I’m not actually sure, since I’ve never used a windows computer before.
MODELDIR = "..\..\tools\pocketsphinx\model"
DATADIR = "..\..\tools\pocketsphinx\test\data"
Alternatively, I’d recommend trying to do further coding projects in macOS/Ubuntu, since it’ll make things a lot easier for you during the learning stages since a lot coding projects are built for Unix/Linux systems.
Hope this helps! Or maybe is just informative if you’ve already figured it out. ^^’
-Sophie
Robyn
Dear Sophie,
I’m not sure if you’re even on this blog anymore, but I’ve been having a couple problems I can’t figure out:
File “C:\Users\Robyn\Downloads\yikes”, line 164, in
sd = SpeechDetector()
File “C:\Users\Robyn\Downloads\yikes”, line 21, in __init__
self.FORMAT = pyaudio.paInt16
AttributeError: module ‘pyaudio’ has no attribute ‘paInt16’
I’ve read your other answers on the SpeechDetector but I still couldn’t find a solution. I haven’t seen the paint16 one, however, and I checked and there is indeed no such file in my pyaudio download.
*Note: In case you noticed, I didn’t name my file ‘yikes’ because of your code(which is actually very nice by the way), it was just the word I thought of when naming the file.
Sophie
Hi Robyn,
Yep, still here!
A couple things I can think of:pip install pyaudio
– Wrong version of pyaudio or python, for the record I used Python2.7 and pyaudio-0.2.11 though people have said this works with Python3.x
— How did you install pyaudio? I did it though
– You’ve named another file pyaudio.py and it’s importing the wrong file: see https://stackoverflow.com/questions/13813164/python-import-random-error
– You’re on a Windows machine, and I’ve only tested this code on Ubuntu
In any case, you can get around the issue by replacing pyaudio.paInt16 with the integer 8 and it should get you past that problem.
Hope this helps!
Robyn
I tried paInt8 with no avail, so I am looking into downloading an eariler version of pydio if possible. For whatever reason, I can’t use pip install(I am using Sublime Text, if that makes any difference), so I downloaded it to my computer regularly and then imported it. The version of pydio I had was 8.0.2, and yes, I am on Windows 10.
Sophie
Cool, can you try replacing line 23 withself.FORMAT = 8 ?
Vinay
Hi Sophie,
We don’t want to use microphone.We have a wav file which needs to converted into text.Can you please guide us to code on it ,
Thanks,
Vinay
Sophie
Hey Vinay,
The decode_phrase function on line 95 takes in a .wav file. Perhaps that’s what you’re looking for?
Fred
Hi,
I am also getting new_Decoder returned -1 error.
I made sure all the paths are setup correctly and followed the readme guides correctly (hopefully) for all the repositories. I am a Windows 7 user.
Any help would be appreciated!
Sophie
Hmm, that might be a problem because you’re using a different OS. I won’t be able to explicitly help you out, but you could try checking the CMU sphinx forums to see if someone else has successfully used the software…
Otherwise, installing ubuntu is a viable option. It’s free, has a lot of community support, and is linux based which will help if you want to do more coding projects in the future. 😉
Fred
Thanks for the reply!
I actually solved the issue. I specified the path wrong for the dictionary.. Make sure you point to the correct file people!
Thanks to your help on initial setup, I am now finished training pocketsphinx to recognize what I need and started implementing my application.
Thank you so much Sophie!
Sophie
I’m glad I could help out! 🙂
Rania
Chang the path of dirpath dirpath
VINEETH KV
hi sophie,
Traceback (most recent call last):
File “/home/vineeth/pycharm-community-2017.3/helpers/pydev/pydev_run_in_console.py”, line 37, in run_file
pydev_imports.execfile(file, globals, locals) # execute the script
File “/home/vineeth/PycharmProjects/main/new.py”, line 166, in
sd = SpeechDetector()
File “/home/vineeth/PycharmProjects/main/new.py”, line 50, in __init__
self.decoder = Decoder(config)
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 324, in __init__
this = _pocketsphinx.new_Decoder(*args)
RuntimeError: new_Decoder returned -1
the above are the compilation result
how to correct the code in without error
Sophie
Hi Vineeth,
In the spirit of discovery, your problem seems similar to ones that others have already posted about in the past. Can you try their solutions?
Klaus
Hey Sopie (:
I installed Alexa access n my raspberry pi. Everything is working right, it’s reacting to “Alexa”. Unfortunatelly it has to record the whole time to recognize “Alexa”. I would like to have an offline sst application, which recognize a special name (for e.g. Dave) and THEN start the Alexa application. Do you think it is possible with sphinx?
Klaus
*offline stt application
Sophie
Hey Klaus,
Yep, entirely possible. CMU has implemented a short example script here: https://github.com/cmusphinx/pocketsphinx/blob/master/swig/python/test/kws_test.py
Note that you’re still gonna have to have the script running though. If you want keyword recognition without having any code running at all, you’re probably going to have to do something analog, like with matching audio signals or something.
deolu
Hi Sophie,
Do i have to use python for this or I could just run the code directly on linux terminal because that’s what i did.
I am trying to get pocketsphinx to index an audio file already on my machine and search for keyword within it.
my code ;
pocketsphinx_continuous -infile success.wav -hmm en-us -kws_threshold 1e-40 -keyphrase “success” -time yes
error i got;
INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’live’, VARNORM=’no’, AGC=’none’
ERROR: “acmod.c”, line 79: Folder ‘en-us’ does not contain acoustic model definition ‘mdef’
i checked pocketsphinx and I do have the mdef file.
Steve
So strange- I’ve been playing around with PocketSphinx myself, for much the same reasons. I happened across your blog looking for some assistance (found it here, BTW, thanks!).
Lo and behold- the “previous post” link is to EOM for double pendulums, some other random thing I just happen to be playing around with in the last few weeks.
Duly bookmarked and favorited!
Sophie
Haha, I’m glad that you liked my posts! I have a pretty eclectic set of interests and that’s probably reflected in this blog.
Maria Villalobos
Is the transcription really working for you, though? I tried an example and I am not getting good results, any ideas?
Sophie
Hmm, that’ll depend on a variety of factors. Inaccuracy could result from noise (either in environment or microphone quality) or in poor correlation of your speech against the data used to train the recognizer. If it’s not working, I would recommend checking the sound samples recorded by commenting out line 150, or retraining the recognizer.
sara
Thank u 🙂
Sophie
Glad I could help! ^_^
Jorge
What if I want to speech-to-text on a pre-recorded .wav file instead of on live audio?
Sophie
Yep, on line 146 replace ‘filename’ with the path of the file you want to laod.
Sophie
Yep, on line 146 replace ‘filename’ with the path of the file you want to load.
Alysa
Hello Sophie, i’m getting an error:
DETECTED: [‘
‘, ‘ah’, ”]Listening …
Traceback (most recent call last):
File “sophie.py”, line 168, in
sd.run()
File “sophie.py”, line 135, in run
cur_data = stream.read(self.CHUNK)
File “/usr/local/lib/python2.7/site-packages/pyaudio.py”, line 608, in read
return pa.read_stream(self._stream, num_frames, exception_on_overflow)
IOError: [Errno -9981] Input overflowed
Alysa
hello, actually in my folder en-us, there isn’t any file named en-us, which is used in line 45. What is the file en-us?
Chrishane
This is what I’m getting as output after compiling
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152609
INFO: ngram_search_fwdtree.c(333): Created 723 root, 152481 non-root channels, 53 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
Getting intensity values from mic.
ALSA lib pcm_dsnoop.c:606:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
Finished
Average audio intensity is 668.187156541
ALSA lib pcm_dsnoop.c:606:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
* Mic set up and listening.
Nothing happens after this.. I can find out the error..can you help me?
Sophie
Have you verified that the mic is connected to the computer and can be accessed though PyAudio? What you’ve posted leads me to believe that everything is working, but no audio input is being recognized.
Chrishane
Yeah i was trying to run this program using the laptop dedicated mic..but after a while I was figure out that I have to connect a external mic, then it was worked 🙂 so is there any way to run this program using the laptop default mic…in order to do that what should I do?
Sophie
Lines 58 and 59 deal with accessing the microphone. You likely have to provide pyaudio the index of your internal microphone. This stackoverflow question might help: https://stackoverflow.com/questions/36894315/how-to-select-a-specific-input-device-with-pyaudio
Farrukh Saleem
Hi i am creating speech to text application and i want to keep mic on will you please help me for this?
Abinaya
I’m getting an error saying that there is no module named pocketsphinx.pocketsphinx
Why is that so ?
the code and pocketsphinx are in the same directory only
Sophie
Hi Abinaya,
Have you followed the correct installation instructions on the individual repositories for the code? If you’re having import issues, its probably because the pip install didn’t fully work, or the sphinx packages don’t have the correct hierarchy. The physical layout of the folders must be such that:
.
├── pocketsphinx/
└── sphinxbase/
└── stt.py
Hope this helps!
Robyn
Abinaya,
I had the same problem, and simply getting rid of the second “.pocketsphinx” , so that it looked like:
from pocketsphinx import *
jim
Hi Sophie,
that’s a really nice program. Much shorter & tidier than i’d have expected to work with a monster like Sphinx.
Well done.
i’m on Ubuntu, with python 3+
(& had to change lines 128, 130 & 153 – cast to int.)
Here’s the (thankfully short) stack trace for an error i can’t get past:
line 167, in
sd.run()
line 145, in run
filename = self.save_speech(list(prev_audio) + audio2send, p)
line 86, in save_speech
data = ”.join(data)
TypeError: sequence item 0: expected str instance, bytes found
Any inspiration?
Sophie
Hi Jim,
Sorry for the delay. Have you tried casting data to a string before line 86? Haven’t tested this code on python 3+ yet, so the type casting might be weird.
Shishira Shastri H
Hi Sophie,
Thanks for the code.
When i run the file, it prints: * Mic set up and listening.
And after that nothing happens… I tried printing the value of slid_win variable, it prints while the while loop runs infinitely…. Could you please tell me when the recording will be stopped ?
Or is there a way to stop it ?
Shishira Shastri H
Also, I’m testing this on Windows machine and not Linux.
Thanks!!
Sophie
Hi Shishira,
Sorry it took so long to get back to you. If you’re still having issues, I can think of a couple of places where your code could be erroring:
1. Are you sure your microphone is connected to the computer and accessible by the program?
2. On line 52, the setup_mic function sets a threshold noise level for the mic. Are you letting the microphone sit in a quiet environment when the code is first run so the correct threshold can be set?
Hope this helps, good luck!
Gopi
Hey I am getting this error.
prev_audio = deque(maxlen=self.PREV_AUDIO * rel)
TypeError: an integer is required
I changed self.PREV_AUDIO = 1
instead of 0.5
Now no error, but having the above situation Shishira encountered.
Sophie
Hey Gopi,
You should be casting that entire expression to an integer, instead of changing self.PREV_AUDIO only.prev_audio = deque(maxlen=int(self.PREV_AUDIO * rel))
Aji
hello Sophie
my model dir located in /home/pocketsphinx/model
and inside en-us/en-us dir there is an mdef file.
but when the program running, that mdef file is not detected.
can you help me ?
many thanks for you
btw i use ubuntu 16.04 with python 2.7
Sophie
Hi Aji,
Have you made sure that the file path is correct in the program as well? That would be one lines 40 and 41 of the program.
Ambrose Douglas
Hi, so I got everything working fine. I’m just curious if anyone has had this work well enough for any practical use? If I could give my computer simple commands I would be very happy, but I can’t seem to get more words than simple ones like “you”, “it”, “are”, etc.
Do I need to find a different model?
any pointers would be awesome!
Sophie
Hey Ambrose,
What are you trying to get it to recognize? There are a couple ways to improve accuracy:
1. Reduce the size of the recognition dictionary. IE: If you only need the STT engine to recognize a small set of words instead of the entire english language, you can increase accuracy by deleting words out of the dictionary that you don’t need. The location of the dictionary is found on line 47 in the code.
2. Adapting the acoustic model to be more accurate to the sound of your voice. Instructions for that can be found here: http://cmusphinx.sourceforge.net/wiki/tutorialadapt
Hope this helps,
-Sophie
Josef
Sophie,
You might want to look at io.BytesIO, instead of saving to a temporary file. This will keep the array in memory, even better you can pass the entire buffer to the recognizer bypassing the need to save it altogether.
Sophie
Oh, very interesting! That does seem more efficient than saving to a temp file, I’ll keep it in mind for future iterations of this code.
Rahul Vansh
Please can you provide little guidance for how to set MODELDIR and DATADIR?
Rahul Vansh
When I’m running this code, it shows below error please give me solution for this error…
INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
ERROR: “acmod.c”, line 83: Folder ‘pocketsphinx/model/en-us/en-us’ does not contain acoustic model definition ‘mdef’
Traceback (most recent call last):
File “Test.py”, line 17, in
decoder = pocketsphinx.Decoder(config)
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 266, in init
this = _pocketsphinx.new_Decoder(*args)
RuntimeError: new_Decoder returned -1
Abu
hi can you please give me some help. I ran into similar problem..
john
INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’live’, VARNORM=’no’, AGC=’none’
ERROR: “acmod.c”, line 79: Folder ‘../../tools/pocketsphinx/model/en-us/en-us’ does not contain acoustic model definition ‘mdef’
Traceback (most recent call last):
File “stt.py”, line 166, in
sd = SpeechDetector()
File “stt.py”, line 50, in __init__
self.decoder = Decoder(config)
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 332, in __init__
this = _pocketsphinx.new_Decoder(*args)
RuntimeError: new_Decoder returned -1
Hello,
I get this error. Any thoughts?
Sophie
You can kinda see the problem in the error message:
You need to change lines 40 and 41 so the MODELDIR and DATADIR that refer to the actual location of the files.
Hope it helps!
Cookiecrunch
ERROR: “acmod.c”, line 83: Folder ‘C:\Python27\Lib\site-packages\pocketsphinx\model\en-us\en-us’ does not contain acoustic model definition ‘mdef’
Traceback (most recent call last):
File “sophierun.py”, line 326, in
sd = SpeechDetector()
File “sophierun.py”, line 94, in __init__
self.decoder = Decoder(config)
File “C:\Python27\lib\site-packages\pocketsphinx\pocketsphinx.py”, line 277, in __init__
this = _pocketsphinx.new_Decoder(*args)
RuntimeError: new_Decoder returned -1
I have changed MODELDIR and DATADIR so that they refer the actual path of the files. Still I am getting this error. How do I rectify this?
Sophie
Hmm, you’re running this code on a windows machine, so I can’t fully vouch that this code will work. I can think of two things.
1. The physical layout of the folders must be such that:
.
├── pocketsphinx/
└── sphinxbase/
└── stt.py
Have you verified that?
2. If you enter theC:\Python27\Lib\site-packages\pocketsphinx\model\en-us\en-us URL in your file explorer, does it actually take you to the folder where the mdef file can be found?
Hope this helps, good luck!
Hector
I am working in Mac and although I install pocketsphinx by pip install it does not recognize me either pocketsphinx and sphinxbase. I do not have any folder with both but if I do pip freeze I see pocketsphinx
leoGalani
Worked like a charm 🙂
Thanks!
renato gallo
./tardis.py
Traceback (most recent call last):
File “./tardis.py”, line 166, in
sd = SpeechDetector()
File “./tardis.py”, line 44, in __init__
config = Decoder.default_config()
AttributeError: type object ‘pocketsphinx.Decoder’ has no attribute ‘default_config’
Eisen
How to install pocketsphinx-python in raspbian jessie?
Raj
sudo apt-get install python-pocketsphinx
Daryll
Wow nice tut 🙂 is this working in python 3.4 ?
Sophie
Aside from a few changes to the print statements and such, the code should be python 3.4 compatible. It’s currently written for python 2.7 though.
Daryll
I had a problem following the installation process of sphinx using Visual Studio I follow the instructions build it using Visual Studio 2015 but i got this error:
TRACKER : error TRK0005: Failed to locate: “CL.exe”. The system cannot find the file specified
Sophie
I wrote the above code for Ubuntu 14.04, so while it might work for UNIX based OS’ like OS X or other linux distros, I can’t say for sure how it would work with Windows.
There are probably some libraries missing during the installation phase that aren’t covered in my installation instructions. You could try following the Windows install directions from the CMU Sphinx website directly to see if it’ll help with that issue. Here: http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx#windows
Daryll
I have a hard time installing Sphinx on my windows 64 bit . 🙁 And i get this error:
Traceback (most recent call last):
File “C:\Python27\pocketsphnx.py”, line 1, in
from pocketsphinx.pocketsphinx import *
File “C:\Python27\lib\site-packages\pocketsphinx\__init__.py”, line 35, in
from sphinxbase import *
File “C:\Python27\lib\site-packages\sphinxbase\__init__.py”, line 32, in
from .ad import *
File “C:\Python27\lib\site-packages\sphinxbase\ad.py”, line 35, in
_ad = swig_import_helper()
File “C:\Python27\lib\site-packages\sphinxbase\ad.py”, line 34, in swig_import_helper
return importlib.import_module(‘_ad’)
File “C:\Python27\lib\importlib\__init__.py”, line 37, in import_module
__import__(name)
ImportError: No module named _ad
Sophie
Sorry, I’ve never done installations on Windows, so I won’t be able to help you much on that. 🙁 My suggestion would be to dual-boot or run Ubuntu 14.04/16.04 on a virtual box so you’d be able to follow the instructions as is, or Google your error to see if other people have solved it before.
Daryll
I am running on windows. I have followed the tutorial on how to install sphinxbase ang pocketsphinx . Downloaded Visual Studio 2012 express but still go this error : sphinx error; missing pocketsphinx module: ensure that pocketsphinx is set up correctly.
David
Sophie,
I am working on a voice recognition project and came across your code base. Got it up and running with no problems but was wondering if you could provide some insight to the specifics of the INFO: outputs.
I also noticed it transitions pretty quick from Listening… to Starting the recording… to Finishing the recording. Most of the time this seems to happen in the middle of testing speech recognition and I have to time when to speak. I also notice sometimes the output is just [SPEECH} other times just
even though I was speaking and other times when there is no noise there is speech output being displayed.Below is some of the output.
Listening …
Starting recording of phrase
Finished recording, decoding phrase
INFO: cmn_live.c(88): Update from
INFO: cmn_live.c(105): Update to
INFO: cmn_live.c(88): Update from
INFO: cmn_live.c(105): Update to
INFO: cmn_live.c(120): Update from
INFO: cmn_live.c(138): Update to
INFO: ngram_search_fwdtree.c(1550): 24051 words recognized (32/fr)
INFO: ngram_search_fwdtree.c(1552): 2808025 senones evaluated (3784/fr)
INFO: ngram_search_fwdtree.c(1556): 19077058 channels searched (25710/fr), 489688 1st, 672158 last
INFO: ngram_search_fwdtree.c(1559): 37370 words for which last channels evaluated (50/fr)
INFO: ngram_search_fwdtree.c(1561): 1405708 candidate words for entering last phone (1894/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 6.07 CPU 0.818 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 6.09 wall 0.821 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 473 words
INFO: ngram_search_fwdflat.c(948): 16143 words recognized (22/fr)
INFO: ngram_search_fwdflat.c(950): 996651 senones evaluated (1343/fr)
INFO: ngram_search_fwdflat.c(952): 1704946 channels searched (2297/fr)
INFO: ngram_search_fwdflat.c(954): 83335 words searched (112/fr)
INFO: ngram_search_fwdflat.c(957): 45835 word transitions (61/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.57 CPU 0.077 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.57 wall 0.077 xRT
INFO: ngram_search.c(1250): lattice start node
.0 end node.669INFO: ngram_search.c(1276): Eliminated 1 nodes before end node
INFO: ngram_search.c(1381): Lattice has 2546 nodes, 23747 links
INFO: ps_lattice.c(1380): Bestpath score: -24784
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:669:740) = -1342034
INFO: ps_lattice.c(1441): Joint P(O,S) = -1492405 P(S|O) = -150371
INFO: ngram_search.c(1027): bestpath 0.09 CPU 0.012 xRT
INFO: ngram_search.c(1030): bestpath 0.09 wall 0.012 xRT
(‘DETECTED: ‘, [‘
‘, ‘[SPEECH]’, ”, ”, “what’s(2)”, ‘this’, ‘and(2)’, ‘he’, ”, ”, ‘[SPEECH]’, ”, ‘was(2)’, ‘‘])Listening …
Starting recording of phrase
Finished recording, decoding phrase
INFO: cmn_live.c(88): Update from
INFO: cmn_live.c(105): Update to
INFO: cmn_live.c(88): Update from
INFO: cmn_live.c(105): Update to
INFO: cmn_live.c(88): Update from
INFO: cmn_live.c(105): Update to
INFO: cmn_live.c(88): Update from
INFO: cmn_live.c(105): Update to
INFO: cmn_live.c(120): Update from
INFO: cmn_live.c(138): Update to
INFO: ngram_search_fwdtree.c(1550): 39722 words recognized (37/fr)
INFO: ngram_search_fwdtree.c(1552): 3494277 senones evaluated (3296/fr)
INFO: ngram_search_fwdtree.c(1556): 22075849 channels searched (20826/fr), 576804 1st, 1109420 last
INFO: ngram_search_fwdtree.c(1559): 60213 words for which last channels evaluated (56/fr)
INFO: ngram_search_fwdtree.c(1561): 1900155 candidate words for entering last phone (1792/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 6.89 CPU 0.650 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 6.89 wall 0.650 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 705 words
INFO: ngram_search_fwdflat.c(948): 24076 words recognized (23/fr)
INFO: ngram_search_fwdflat.c(950): 1527037 senones evaluated (1441/fr)
INFO: ngram_search_fwdflat.c(952): 2894866 channels searched (2731/fr)
INFO: ngram_search_fwdflat.c(954): 142217 words searched (134/fr)
INFO: ngram_search_fwdflat.c(957): 73064 word transitions (68/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.97 CPU 0.092 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.97 wall 0.092 xRT
INFO: ngram_search.c(1250): lattice start node
.0 end node.1055INFO: ngram_search.c(1276): Eliminated 0 nodes before end node
INFO: ngram_search.c(1381): Lattice has 3021 nodes, 31799 links
INFO: ps_lattice.c(1380): Bestpath score: -40103
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:1055:1058) = -2135522
INFO: ps_lattice.c(1441): Joint P(O,S) = -2386068 P(S|O) = -250546
INFO: ngram_search.c(1027): bestpath 0.15 CPU 0.014 xRT
INFO: ngram_search.c(1030): bestpath 0.15 wall 0.014 xRT
(‘DETECTED: ‘, [‘
‘, ‘i’, ‘have’, ‘a’, ‘somewhat(2)’, ‘is’, ‘the’, ‘weather’, ‘in’, ‘now’, ”, “it’s”, ”, ”, ”, ‘just’, ‘what(2)’, ‘‘])Sophie
Hey David!
Glad you got the code working!
The short delay is likely due to the setup_mic function on lines 52-77. While this code is being run, the mic records a sound sample for a while and sets the base threshold as the average amplitude of the sound sample. So, when the code is first being initiated, you’d want to have the microphone be in as close to “neutral” sound level as possible. You can alter the values in that function to tune the thresholding to be better suited to your methods.
Since [SPEECH] is a placeholder for a sound that the recognizer couldn’t classify, you might want to listen to the sound samples that are being recorded to see if the results make any sense. You can comment out line 150 if you want to do that. Hope this helps!
David
Thanks. Can you explain what affect either increasing or decreasing the 0.2 avg value will have along with the 3500 threshold value?
I haven’t changed any of the default values yet but notice I see the following quite a bit:
ERROR: “ngram_search.c”, line 1139: Couldn’t find
in first frameI also see various tags when there is an output… such as
or what is the significance of them and how can I prevent those tags from being displayed?Thanks
Sophie
3500 is the minimum threshold value, so changing it will affect the minimum sound thresholds during the mic setup method (i.e if you’re recording in a really quiet environment you want some sort of threshold at least). If you change the 0.2 constant, the threshold will be determined from a larger average of amplitudes. So if your mic prone to random spikes in amplitude due to noise, it would be better to increase the constant.
As for the tags, a single word may have different pronunciations. So when you see something like was(2), its likely referring to pronunciation 2 in the word dictionary. You could manually strip these tags using string comprehension.
Daryll
Sir David,
Can you show us how you do the voice recognition? I am new to python and I am planning on building my own AI. Hope you could help me thanks 🙂
David
Daryll,
I apologize for my use of ‘voice recognition’ I meant speech recognition… there is a big difference.
I am not focusing on having the system differentiate between physical human speakers… my focus is on having the system correctly interpret and execute execute tasks based on human speech input.
Sorry for any confusion.
-David
David
Sophie,
I’m running into an error and am curious to know if there is a pocketsphinx-python version that will run with python 3.0-3.5.
I have existing 3.x functionality but when I attempt to incorporate pocketsphinx-python I get the following error:
ImportError: //pocketsphinx-python/sphinxbase/_ad.so: undefined symbol: PyInstance_Type
I’m an old Java programmer and when I say old, I’m talking about JDK version 1.4 and I’m not familiar with C. From what I’ve found based on searches I think the issue is with the python version the .so file was created against. If I change to version 2.x it will work but my existing code won’t.
Any input would be appreciated.
Harshit
Hey Sophie,
I am using ubuntu14.04 and python 2.7 and have installed pocketsphinx using `sudo apt-get install python-pocketsphinx` but i am getting the error: `no module named pocketsphinx` in the third line.
Is there way out?
Sophie
Hmm, did you follow the instructions in full on the github readme? You might need to use pip to get the correct paths set.
There are several things that need to be installed for pocketsphinx to be imported correctly:
sudo apt-get install -y python python-dev python-pip build-essential swig git
sudo pip install pocketsphinx
Hasib
While executing “build-essential swig git” the following error is shown
build-essential: command not found
but build-essential and swig all are installed
Rob
Hi Sophie, thanks so much for the share! I have some issues if you do not mind taking a look at:
runtimeerror: new decoder returned -1
Any ideas?
Many thanks
Sophie
Hi Rob,
On an initial guess it may be because your folders are not organized correctly or you didn’t correctly install all the modules. The physical layout of the folders must be such that:
.
├── pocketsphinx/
└── sphinxbase/
└── stt.py
What operating system are you running this code on? I’ve only tested it on Ubuntu 14.04 using Python 2.7
John
hello,
Im getting an invalid sample rate error when i run it with rate of 16000. It works with the default sampling rate of my mic 48000 but cannot recognize words. must be a pyaudio issue?. how do i configure it to work with this script?
Traceback (most recent call last):
File “/home/pi/stt.py”, line 174, in
sd.run()
File “/home/pi/stt.py”, line 121, in run
self.setup_mic()
File “/home/pi/stt.py”, line 70, in setup_mic
frames_per_buffer=self.CHUNK)
File “build/bdist.linux-armv7l/egg/pyaudio.py”, line 750, in open
stream = Stream(self, *args, **kwargs)
File “build/bdist.linux-armv7l/egg/pyaudio.py”, line 441, in __init__
self._stream = pa.open(**arguments)
IOError: [Errno -9997] Invalid sample rate
Sophie
Hey there,
You probably knew this, but Sphinx only works at a sampling rate of 16kHz, which is why it won’t work if you pass it a sampling rate of 48kHz. 16000 shouldn’t be an invalid sampling rate, so it probably has something to do with how your mic is set up. I did some looking around, and it seems like people at the RaspberryPi forums have had the same issues:
https://www.raspberrypi.org/forums/viewtopic.php?t=63136&p=468103
https://www.raspberrypi.org/forums/viewtopic.php?f=37&t=97702
Alternatively, have you tried using a different microphone? It might give you better results. Good luck!
vishwas
how should i get sampling rate of 16k or there is any method to take input
Anup
Hello,
I’m getting the following error when i’m trying to run your script.
slid_win = deque(maxlen=self.SILENCE_LIMIT * rel)
TypeError: an integer is required
Could you please help
Thanks,
Anup
Sophie
If you’re using Python 3.0, rel isn’t automatically converted to an integer when self.RATE/self.CHUNK is calculated.
Replace that line with this:
slid_win = deque(maxlen=self.SILENCE_LIMIT * int(rel))
Rodrigo
Hi Sophie, as you said, I did try install again the packages.
So, now, when i’m installing pocket…python, I have an error on final of the output:
~/pocketsphinx-python $ sudo python setup.py install
running install
running bdist_egg
running egg_info
writing pocketsphinx.egg-info/PKG-INFO
writing top-level names to pocketsphinx.egg-info/top_level.txt
writing dependency_links to pocketsphinx.egg-info/dependency_links.txt
error: package directory ‘pocketsphinx/swig/python’ does not exist
“error: package directory ‘pocketsphinx/swig/python’ does not exist” But this directory already exist, my directory structure is:
/home/user/pocketsphinx
/home/user/sphinxbase
/home/user/pocketsphinx-python
Is it wrong?
If I try run the script I’m get this:
$ python sample.py
Traceback (most recent call last):
File “sample.py”, line 3, in
from pocketsphinx.pocketsphinx import *
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/__init__.py”, line 37, in
from pocketsphinx import *
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 42, in
_pocketsphinx = swig_import_helper()
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 38, in swig_import_helper
_mod = imp.load_module(‘_pocketsphinx’, fp, pathname, description)
ImportError: libpocketsphinx.so.3: cannot open shared object file: No such file or directory
Sophie
For the first one: Is there a reason you’re not using pip? Since it’s the package installer for python, it might be easier to install pocketsphinx-python that way. Your directory structure is correct though. The error is saying you don’t have swig installed, did you follow all of the install instructions?
sudo apt-get install -y python python-dev python-pip build-essential swig git
sudo pip install pocketsphinx
For the second: That’s the same issue you had previously right? That you solved by exporting the LD_LIBRARY_PATH?
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
Rodrigo
I tryed pip and git clone..swig is already installed.
I just forgot about export =/
But, even do this I get errors:
Traceback (most recent call last):
File “sample.py”, line 162, in
sd.run()
File “sample.py”, line 109, in run
self.setup_mic()
File “sample.py”, line 58, in setup_mic
frames_per_buffer=self.CHUNK)
File “/usr/local/lib/python2.7/dist-packages/pyaudio.py”, line 750, in open
stream = Stream(self, *args, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/pyaudio.py”, line 441, in __init__
self._stream = pa.open(**arguments)
IOError: [Errno -9996] Invalid input device (no default output device)
Sophie
Okay, so that error is saying that it can’t find any of your microphones.
You can check to see if the microphone is working outside of the script by following these instructions.
If you have multiple input devices, you might have to modify line 117 so that PyAudio initializes with the correct microphone. You can take a look at the documentation here.
tushar
hey, i am running Centos7 and having the same problem. Did you sorted out your’s?
Rodrigo
There is a way to chance de language to Portugese?
Rodrigo
I’m getting this error:
$ python sample.py
Traceback (most recent call last):
File “sample.py”, line 3, in
from pocketsphinx.pocketsphinx import *
File “sphinxbase.pxd”, line 150, in init pocketsphinx (pocketsphinx.c:7935)
ValueError: PyCapsule_GetPointer called with invalid PyCapsule object
Sophie
That appears to be an error internal to Python or Cython. I need a bit more information:
What version of python are you using?
Are you using Ubuntu? Or another operating system?
On a first pass, it appears that you’ll need to do a reinstall after ensuring that you’ve configured Cython correctly. Maybe these instructions can help?
Rodrigo
Hi, thanks for you answer!!
After try your tip, I get this error:
$ python teste.py
Traceback (most recent call last):
File “teste.py”, line 3, in
from pocketsphinx.pocketsphinx import *
File “sphinxbase.pxd”, line 150, in init pocketsphinx (pocketsphinx.c:7934)
File “/usr/local/lib/python2.7/dist-packages/sphinxbase/__init__.py”, line 37, in
from sphinxbase import *
File “/usr/local/lib/python2.7/dist-packages/sphinxbase/sphinxbase.py”, line 42, in
_sphinxbase = swig_import_helper()
File “/usr/local/lib/python2.7/dist-packages/sphinxbase/sphinxbase.py”, line 38, in swig_import_helper
_mod = imp.load_module(‘_sphinxbase’, fp, pathname, description)
ImportError: libsphinxbase.so.3: cannot open shared object file: No such file or directory
My Python version is:
$ python –version
Python 2.7.6
I’m using Linux Mint:
$ uname -a
Linux LinuxMint 3.19.0-32-generic #37~14.04.1-Ubuntu SMP Thu Oct 22 09:41:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Thank you so much.
Rodrigo
Ok, this last one I resolved with this:
$ export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
But now i’m getting this:
$ python teste.py
Traceback (most recent call last):
File “teste.py”, line 3, in
from pocketsphinx.pocketsphinx import *
File “sphinxbase.pxd”, line 150, in init pocketsphinx (pocketsphinx.c:7934)
ValueError: sphinxbase.NGramModel has the wrong size, try recompiling
=/ this is script does not like me
Sophie
That does seem to be a Cython issue, and that would be internal to the pocketsphinx library–not the script that I posted. (You can tell because it’s failing at the import step before getting to any of the actual code :P)
Have you tried uninstalling all the sphinx libraries and reinstalling? It’s kind of annoying, but it might fix your problem.
Sophie
You can change the language settings by changing the configs you set. Modify lines 45-47 to use the language files that you’d like. (In this case, Portuguese).
Of course, you’ll need to have language model for this to work. Take a look at the information provided here for implementation details.
Amitava
Hi,
I get the following error when run the above code (F5)
Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:19:22) [MSC v.1500 32 bit (Intel)] on win32
Type “copyright”, “credits” or “license()” for more information.
>>>
=================== RESTART: C:\Sphinx\project\stt\stt.py ===================
Traceback (most recent call last):
File “C:\Sphinx\project\stt\stt.py”, line 166, in
sd = SpeechDetector()
File “C:\Sphinx\project\stt\stt.py”, line 50, in __init__
self.decoder = Decoder(config)
File “C:\Python27\lib\site-packages\pocketsphinx\pocketsphinx.py”, line 277, in __init__
this = _pocketsphinx.new_Decoder(*args)
RuntimeError: new_Decoder returned -1
>>>
Sophie
I’ll need a bit more info, but on an initial guess it’s because your folders are not organized correctly or you didn’t correctly install all the modules. The physical layout of the folders must be such that:
.
├── pocketsphinx/
└── sphinxbase/
Also, you’re running this code on a Windows machine. I’ve only tested this code on Linux, so no guarantees that it will work on a different operating system because the drivers and system architecture is different.
Amitava
The folders are organized as
C:\Sphinx\project\stt
├── pocketsphinx/
└── sphinxbase/
└── stt.py
where stt.py is the above source file.
I manually copied the all 7 files (pocketsphinx.dll, pocketsphinx.dll, .. etc.) from C:\Sphinx\pocketsphinx\bin\Release\Win32 to the above directory:
C:\Sphinx\project\stt\pocketsphinx
Did similar thing for sphinxbase also.
I used “pip install pocketsphinx” to install pocketsphinx, and the Installation was successful:
Collecting pocketsphinx
Downloading pocketsphinx-0.1.3-cp27-cp27m-win32.whl (29.0MB)
100% |################################| 29.0MB 47kB/s
Installing collected packages: pocketsphinx
Successfully installed pocketsphinx-0.1.3
The imports were also fine
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *
Could it be a 64 bit vs 32 bit issue?
Sophie
I am tempted to believe that it is the 64bit vs 32bit — I’m not sure how well Sphinx works on a Windows computer, but you could install Ubuntu on a virtual machine and run this code on Linux that way.
Amitava
Hi Sophie,
Thanks for your time and feedback. But you think my folder structure and the way I filled the pocketsphinx and sphinxbase folders above are correct, right?
Sophie
Yep, they look alright to me! If you have a chance, try the installation process via an Ubuntu install (either as a native OS or as a virtual OS).