Python speech to text with PocketSphinx

March 25, 2016 / 128 Comments

I’ve wanted to use speech detection in my personal projects for the longest time, but the Google API has gradually gotten more and more restrictive as time passes. In order to ensure that my projects could work even without an internet connection, I looked for another speech recognition package that would preferably be easier to use. I found the Sphinx voice recognition suite of CMU to be a really great speech to text package. However, documentation and sample code is non-existent, so it took me forever to get anything done. Finally, I’ve figured it out! The example code is at the bottom of this post, but you can directly download it from Github here.

Here are the steps to take to get this working:

Download SphinxBase and follow the install instructions
Download PocketSphinx and follow the install instructions
Download PocketSphinx-python and follow the install instructions
Run the code below

The main problems I had with setting up PocketSphinx was the myriad of libraries that the main site told me to download. However, after lots of trial and error, I’ve realized that I really only need three.

SphinxBase is the base package that all of the other Sphinx programs use
PocketSphinx is the lightweight recognizer, since I was okay with the program being a bit inaccurate if it meant I could decode phrases faster
PocketSphinx-python is the wrapper to allow us to program in the best scripting language ever.

The code basically sets up the microphone and saves each phrase detected as a temporary .wav file which the Sphinx decoder then translates into a list of strings representing the spoken words. A phrase is defined as a bunch of sound sandwiched by duration of silence. I stole most of the phrase detection code from someone else two years ago, though unfortunately, I can’t remember who. If you’re reading this, thank you! 🙂

Anyhow, in the initialization of the run loop, we first define what the minimum threshold should be in defining “silence”. Then we launch into an infinitely running loop that will continue to listen to sounds over the microphone, calling the Sphinx decoder whenever a phrase has been saved. A sliding average is used as well during phrase detection, to make things a bit more accurate. You can load different voice recognition models into the decoder config if you want this speech recognition code to work for different languages.

Now that I have this speech detection code in a neat little importable class, I’m really excited about future capabilities of my projects. So many ideas, so little time!
-Sophie

[Addendum] Thanks to Carl at jazzystring1@gmail.com for getting this code working with Python3!

128 Comments

Auslander

October 24, 2021 at 2:12 pm

Years later, and your post just got me running on an offline, non-Google-API based speech transcription project that would have taken me a week to do manually. Thank you.

Reply
- Sophie
  
  October 24, 2021 at 2:17 pm
  
  Wow, that’s great news! If you’re interested in some more cutting edge stuff, I know that Tensorflow as some public implementations of offline speech recognition based on convolutional neural nets. It’s probably way more accurate than PocketSphinx, but would require some basic knowledge of machine learning. In case you’re interested, here’s a cool tutorial: https://github.com/tensorflow/docs/blob/master/site/en/r1/tutorials/sequences/audio_recognition.md
  
  Otherwise, happy I was able to help!
  
  Reply
Mainak Biswas

June 16, 2018 at 1:52 am

* Mic set up and listening.
Traceback (most recent call last):
File “test2.py”, line 161, in
sd.run()
File “test2.py”, line 122, in run
slid_win = deque(maxlen=self.SILENCE_LIMIT * rel)
TypeError: an integer is required

Im getting a type error. I tried to typecast but still error is there.

Reply
- Sophie
  
  June 25, 2018 at 2:36 am
  
  Hmm, did you typecase to an int? What version of python are you using?
  Can you try:
  slid_win = deque(maxlen=int(self.SILENCE_LIMIT * rel))
  
  Reply
  - Mainak Biswas
    
    June 25, 2018 at 2:39 am
    
    yep..I typecasted to ‘int’. I will try what u said. THANKS FOR THE REPLY 🙂
    
    Reply
    - Sophie
      
      June 25, 2018 at 2:49 am
      
      Great! Let me know how it goes. 🙂
      
      Reply
  - Zorawar Singh
    
    November 13, 2018 at 6:00 pm
    
    hi Sophie can i use this code with audio file. wave format.
    
    Reply
    - Sophie
      
      December 17, 2018 at 3:23 pm
      
      Hi Zorawar, this already reads .wav file formats. Are you looking for a different encoding?
      
      Reply
Mikener

June 12, 2018 at 5:09 am

Hi Sophie,

great work you have done. Maybe you can help me:

* Mic set up and listening.

Nothings happens then.
Use: Python 2.7 , pyaudio 0.2.11, pi3 B,

Tried several numbers with self.INPUT_DEVICE_INDEX in Class SpeechDetector -> but nothing new.

Attached USB-Mic works perfectly with “pocketsphinx_continuous” -command.

Reply
- Mikener
  
  June 12, 2018 at 6:30 am
  
  Okay, some debug infos with logging-object:
  
  INFO:TestLogger:INITIALIZED
  INFO:TestLogger:Getting intensity values from mic.
  INFO:TestLogger:r-value: 2181.74425632
  INFO:TestLogger: Finished
  INFO:TestLogger:cur_data:
  INFO:TestLogger:x in slid_win: deque([0.0], maxlen=15)
  INFO:TestLogger:cur_data:
  INFO:TestLogger:x in slid_win: deque([0.0, 0.0], maxlen=15)
  ……
  
  r-value – changes when i speak to the mic during setup_mic()
  but it seems there is nothing in cur_data..
  
  Reply
  - Sophie
    
    June 25, 2018 at 2:45 am
    
    During the setup_mic() phase, the mic is active and listening to the default intensity values (sound level) of the room. It’ll then set that level as the trigger value for when it’ll start recording speech. So, during the setup phase, try to have the mic be in a fairly quiet room so it’ll trigger during the actual detection phase. Let me know if that helps! 🙂
    
    Reply
gundu

June 4, 2018 at 1:08 am

Hi,
I am running my code on windows. after a few corrections in my code, I am able to execute using the python script and here is the output.(for a few runs). I want to print the grimmer and the spoken words also. How do I get them?
Mic set up and listening.
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘‘, ‘[SPEECH]’]
Listening …
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘~~‘, ‘~~‘]
Listening …
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘~~‘, ‘ugh’, ‘~~‘]
Listening …
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘~~‘, ‘[SPEECH]’, ‘~~‘]
Listening …
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘‘, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘ugh’, ‘‘]
Listening …
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘~~‘, ‘[SPEECH]’, ”, ‘and’, ‘[SPEECH]’, ‘~~‘]
Listening …
Starting recording of phrase
Finished recording, decoding phrase
DETECTED: [‘~~‘, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ‘and’, ‘[SPEECH]’, ”, ‘that’, ”, ‘that’, ”, ‘bad’, ”, ‘~~‘]

Reply
- gundu
  
  June 4, 2018 at 1:19 am
  
  how to print the recognition result?
  
  Reply
- Sophie
  
  June 25, 2018 at 2:42 am
  
  I’m not sure about grammar, but the [SPEECH] bracket means that the decoder couldn’t interpret the words that it was hearing. I’d suggest retraining the decoder, or listening to the recorded voice files to see if there’s an issue with the clarity of the sound file you’re passing to the decoder.
  
  Reply
RookieConverter

May 28, 2018 at 12:34 am

Hello Sophie,

Like you said i followed all the instructions you have mentioned above:
I have downloaded and complied both sphinxbase & pocketsphinx inside a folder.
Now when i am trying to run your program it does not do anything. Sorry i dont know what i am doing wrong.

using Windows and Python 3.5.4

Also i am not able to find the below folders:
DATADIR = “C:\Python34\Lib\site-packages\pocketsphinx\test\data”

I am totally new to Python and i am running your code using IDLE. There are no errors as such but i guess the terminal would show a message as * Mic set up and listening. like you have specified in your run(self) function.

Please Help 🙂 !!

Reply
Carl David

May 8, 2018 at 2:44 pm

To those who are encountering the “new Decoder returned -1 error”, fix the path location of your model in line 40 and 41 🙂

Reply
Carl David

May 8, 2018 at 2:43 pm

Thanks Sophie for your amazing post 🙂 It helps a lot. To those who are having a hard time running this code in Python 3+ (3.6 specifically) due to big changes to its core, here’s the code

from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *

import os
import pyaudio
import wave
import audioop
from collections import deque
import time
import math

“””
Written by Sophie Li, 2016
http://blog.justsophie.com/python-speech-to-text-with-pocketsphinx/
“””

class SpeechDetector:
def __init__(self):
# Microphone stream config.
self.CHUNK = 1024 # CHUNKS of bytes to read each time from mic
self.FORMAT = pyaudio.paInt16
self.CHANNELS = 1
self.RATE = 16000

self.SILENCE_LIMIT = 1 # Silence limit in seconds. The max ammount of seconds where
# only silence is recorded. When this time passes the
# recording finishes and the file is decoded

self.PREV_AUDIO = 0.5 # Previous audio (in seconds) to prepend. When noise
# is detected, how much of previously recorded audio is
# prepended. This helps to prevent chopping the beginning
# of the phrase.

self.THRESHOLD = 4500
self.num_phrases = -1

# These will need to be modified according to where the pocketsphinx folder is
MODELDIR = “pocketsphinx/model”
DATADIR = “pocketsphinx/test/data”

# Create a decoder with certain model
config = Decoder.default_config()
config.set_string(‘-hmm’, os.path.join(MODELDIR, ‘en-us/en-us’))
config.set_string(‘-lm’, os.path.join(MODELDIR, ‘en-us/en-us.lm.bin’))
config.set_string(‘-dict’, os.path.join(MODELDIR, ‘en-us/cmudict-en-us.dict’))

# Creaders decoder object for streaming data.
self.decoder = Decoder(config)

def setup_mic(self, num_samples=50):
“”” Gets average audio intensity of your mic sound. You can use it to get
average intensities while you’re talking and/or silent. The average
is the avg of the .2 of the largest intensities recorded.
“””
print (“Getting intensity values from mic.”)
p = pyaudio.PyAudio()
stream = p.open(format=self.FORMAT,
channels=self.CHANNELS,
rate=self.RATE,
input=True,
frames_per_buffer=self.CHUNK)

values = [math.sqrt(abs(audioop.avg(stream.read(self.CHUNK), 4)))
for x in range(num_samples)]
values = sorted(values, reverse=True)
r = sum(values[:int(num_samples * 0.2)]) / int(num_samples * 0.2)
print (” Finished “)
print (” Average audio intensity is %s ” % r)
stream.close()
p.terminate()

if r self.THRESHOLD for x in slid_win]) > 0:
if started == False:
print (“Starting recording of phrase”)
started = True
audio2send.append(cur_data)

elif started:
print (“Finished recording, decoding phrase”)
filename = self.save_speech(list(prev_audio) + audio2send, p)
r = self.decode_phrase(filename)
print (“DETECTED: %s” % r)

# Removes temp audio file
os.remove(filename)
# Reset all
started = False
slid_win = deque(maxlen=int(self.SILENCE_LIMIT * rel))
prev_audio = deque(maxlen=int(0.5 * rel))
audio2send = []
print (“Listening …”)

else:
prev_audio.append(cur_data)

print (“* Done listening”)
stream.close()
p.terminate()

if __name__ == “__main__”:
sd = SpeechDetector()
sd.run()

Reply
- Sophie
  
  June 25, 2018 at 2:46 am
  
  Hey Carl,
  
  Thanks for getting the code working with Python 3! I’ll add your code as an addendum to my post if that’s ok.
  
  Reply
  - Carl David
    
    July 4, 2018 at 11:25 pm
    
    Sorry for late reply. Yes you could append the code 🙂 Thank you.
    
    Reply
  - MCC
    
    July 9, 2018 at 3:14 am
    
    Hi Carl,
    
    Thanks for sharing your fantastic input.
    
    Can you please re-post your code as am running into few error messages when executing the code?
    
    Sophie – Well done for your input as well.
    
    Thanks,
    
    Reply
    - Sophie
      
      August 5, 2018 at 1:00 am
      
      Hi MCC, apologies for the late reply. Have you worked through the issues in your code? I did add Carl’s python3.x implementation to the bottom of the post.
      
      Reply
majo

April 30, 2018 at 12:33 am

Hi Sophie! I hope you know there are people all over the world trying to compile your code

I have I similar issue, when I do the cast that you suggest, another come up. I can belive that python 2.7 and 3.6 change this so much

* Mic set up and listening.
Starting recording of phrase
Finished recording, decoding phrase
Traceback (most recent call last):
File “sophie.py”, line 167, in
sd.run()
File “sophie.py”, line 145, in run
filename = self.save_speech(list(prev_audio) + audio2send, p)
File “sophie.py”, line 86, in save_speech
data = ”.join(data)
TypeError: sequence item 0: expected str instance, bytes found

Have any suggestinon to fix this one?

Reply
- Sophie
  
  April 30, 2018 at 2:19 pm
  
  Hi Majo,
  
  I’m happy that my code has helped!
  Try casting data to a str. On line 86, data="".join(str(data)).
  
  Reply
Robyn

April 1, 2018 at 10:49 am

Sorry to bother you again, but after switching to Anaconda(because I was told it was the best program for beginners such as myself), things are going a bit more smoothly, but I keep getting this error:

runfile(‘C:/Users/ccatx/Downloads/pystuff/Lib/site-packages/deathtrial1.py’, wdir=’C:/Users/ccatx/Downloads/pystuff/Lib/site-packages’)
Traceback (most recent call last):

File “”, line 1, in
runfile(‘C:/Users/ccatx/Downloads/pystuff/Lib/site-packages/deathtrial1.py’, wdir=’C:/Users/ccatx/Downloads/pystuff/Lib/site-packages’)

File “C:\Users\ccatx\Downloads\pystuff\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 705, in runfile
execfile(filename, namespace)

File “C:\Users\ccatx\Downloads\pystuff\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 102, in execfile
exec(compile(f.read(), filename, ‘exec’), namespace)

File “C:/Users/ccatx/Downloads/pystuff/Lib/site-packages/deathtrial1.py”, line 164, in
sd = SpeechDetector()

File “C:/Users/ccatx/Downloads/pystuff/Lib/site-packages/deathtrial1.py”, line 48, in __init__
self.decoder = Decoder(config)

File “C:\Users\ccatx\Downloads\pystuff\Lib\site-packages\pocketsphinx\pocketsphinx.py”, line 275, in __init__
this = _pocketsphinx.new_Decoder(*args)

RuntimeError: new_Decoder returned -1

I have redirected the MODELDIR and DATADIR to what I believe are the right pathways, and I have put sphinxbase in my pocketspinx folder(I do not seem to have a stt.py file anywhere on my computer, so I have not added that), but neither have worked.

I doubt this affects anything, but just in case it does, I get a warning sign next to the first two lines that read:

‘from pocketsphinx import *’ used; unable to detect undefined names
‘from sphinxbase import *’ used; unable to detect undefined names

Thanks in advance!

Reply
- Sophie
  
  April 30, 2018 at 2:36 pm
  
  Hi Robyn,
  
  Wow, this is a late reply–but better late than never?
  
  The error at the bottom is because you’re using wildcard imports, which the Flake8 Python style checker doesn’t like. It’s a style issue, so that shouldn’t have anything to do with the errors you’re seeing.
  
  One thing that catches my attention is the direction of the / and \ when referring to the file directories. Windows usually uses backwards-slash “\” and Unix uses “/” which is what I wrote this code in. You could try changing the direction of the slashes so they fit? I’m not actually sure, since I’ve never used a windows computer before.
  
  MODELDIR = "..\..\tools\pocketsphinx\model" DATADIR = "..\..\tools\pocketsphinx\test\data"
  
  Alternatively, I’d recommend trying to do further coding projects in macOS/Ubuntu, since it’ll make things a lot easier for you during the learning stages since a lot coding projects are built for Unix/Linux systems.
  
  Hope this helps! Or maybe is just informative if you’ve already figured it out. ^^’
  
  -Sophie
  
  Reply
Robyn

March 16, 2018 at 12:19 pm

Dear Sophie,

I’m not sure if you’re even on this blog anymore, but I’ve been having a couple problems I can’t figure out:

File “C:\Users\Robyn\Downloads\yikes”, line 164, in
sd = SpeechDetector()
File “C:\Users\Robyn\Downloads\yikes”, line 21, in __init__
self.FORMAT = pyaudio.paInt16
AttributeError: module ‘pyaudio’ has no attribute ‘paInt16’

I’ve read your other answers on the SpeechDetector but I still couldn’t find a solution. I haven’t seen the paint16 one, however, and I checked and there is indeed no such file in my pyaudio download.

*Note: In case you noticed, I didn’t name my file ‘yikes’ because of your code(which is actually very nice by the way), it was just the word I thought of when naming the file.

Reply
- Sophie
  
  March 16, 2018 at 12:28 pm
  
  Hi Robyn,
  
  Yep, still here!
  
  A couple things I can think of:
  – Wrong version of pyaudio or python, for the record I used Python2.7 and pyaudio-0.2.11 though people have said this works with Python3.x
  — How did you install pyaudio? I did it though pip install pyaudio
  – You’ve named another file pyaudio.py and it’s importing the wrong file: see https://stackoverflow.com/questions/13813164/python-import-random-error
  – You’re on a Windows machine, and I’ve only tested this code on Ubuntu
  
  In any case, you can get around the issue by replacing pyaudio.paInt16 with the integer 8 and it should get you past that problem.
  
  Hope this helps!
  
  Reply
  - Robyn
    
    March 17, 2018 at 8:54 am
    
    I tried paInt8 with no avail, so I am looking into downloading an eariler version of pydio if possible. For whatever reason, I can’t use pip install(I am using Sublime Text, if that makes any difference), so I downloaded it to my computer regularly and then imported it. The version of pydio I had was 8.0.2, and yes, I am on Windows 10.
    
    Reply
    - Sophie
      
      March 17, 2018 at 1:59 pm
      
      Cool, can you try replacing line 23 with self.FORMAT = 8?
      
      Reply
Vinay

March 8, 2018 at 10:42 pm

Hi Sophie,

We don’t want to use microphone.We have a wav file which needs to converted into text.Can you please guide us to code on it ,

Thanks,
Vinay

Reply
- Sophie
  
  March 16, 2018 at 1:00 am
  
  Hey Vinay,
  
  The decode_phrase function on line 95 takes in a .wav file. Perhaps that’s what you’re looking for?
  
  Reply
Fred

January 7, 2018 at 2:46 pm

Hi,

I am also getting new_Decoder returned -1 error.
I made sure all the paths are setup correctly and followed the readme guides correctly (hopefully) for all the repositories. I am a Windows 7 user.

Any help would be appreciated!

Reply
- Sophie
  
  January 20, 2018 at 10:09 am
  
  Hmm, that might be a problem because you’re using a different OS. I won’t be able to explicitly help you out, but you could try checking the CMU sphinx forums to see if someone else has successfully used the software…
  
  Otherwise, installing ubuntu is a viable option. It’s free, has a lot of community support, and is linux based which will help if you want to do more coding projects in the future. 😉
  
  Reply
  - Fred
    
    January 26, 2018 at 2:54 pm
    
    Thanks for the reply!
    
    I actually solved the issue. I specified the path wrong for the dictionary.. Make sure you point to the correct file people!
    
    Thanks to your help on initial setup, I am now finished training pocketsphinx to recognize what I need and started implementing my application.
    
    Thank you so much Sophie!
    
    Reply
    - Sophie
      
      February 9, 2018 at 10:52 am
      
      I’m glad I could help out! 🙂
      
      Reply
- Rania
  
  March 28, 2018 at 5:22 am
  
  Chang the path of dirpath dirpath
  
  Reply
VINEETH KV

December 10, 2017 at 6:29 am

hi sophie,
Traceback (most recent call last):
File “/home/vineeth/pycharm-community-2017.3/helpers/pydev/pydev_run_in_console.py”, line 37, in run_file
pydev_imports.execfile(file, globals, locals) # execute the script
File “/home/vineeth/PycharmProjects/main/new.py”, line 166, in
sd = SpeechDetector()
File “/home/vineeth/PycharmProjects/main/new.py”, line 50, in __init__
self.decoder = Decoder(config)
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 324, in __init__
this = _pocketsphinx.new_Decoder(*args)
RuntimeError: new_Decoder returned -1

the above are the compilation result
how to correct the code in without error

Reply
- Sophie
  
  December 16, 2017 at 3:13 pm
  
  Hi Vineeth,
  
  In the spirit of discovery, your problem seems similar to ones that others have already posted about in the past. Can you try their solutions?
  
  Reply
Klaus

November 15, 2017 at 12:55 am

Hey Sopie (:

I installed Alexa access n my raspberry pi. Everything is working right, it’s reacting to “Alexa”. Unfortunatelly it has to record the whole time to recognize “Alexa”. I would like to have an offline sst application, which recognize a special name (for e.g. Dave) and THEN start the Alexa application. Do you think it is possible with sphinx?

Reply
- Klaus
  
  November 15, 2017 at 12:57 am
  
  *offline stt application
  
  Reply
- Sophie
  
  November 17, 2017 at 8:50 am
  
  Hey Klaus,
  
  Yep, entirely possible. CMU has implemented a short example script here: https://github.com/cmusphinx/pocketsphinx/blob/master/swig/python/test/kws_test.py
  
  Note that you’re still gonna have to have the script running though. If you want keyword recognition without having any code running at all, you’re probably going to have to do something analog, like with matching audio signals or something.
  
  Reply
deolu

November 10, 2017 at 3:36 pm

Hi Sophie,

Do i have to use python for this or I could just run the code directly on linux terminal because that’s what i did.

I am trying to get pocketsphinx to index an audio file already on my machine and search for keyword within it.

my code ;
pocketsphinx_continuous -infile success.wav -hmm en-us -kws_threshold 1e-40 -keyphrase “success” -time yes

error i got;
INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’live’, VARNORM=’no’, AGC=’none’
ERROR: “acmod.c”, line 79: Folder ‘en-us’ does not contain acoustic model definition ‘mdef’

i checked pocketsphinx and I do have the mdef file.

Reply
Steve

October 22, 2017 at 5:35 am

So strange- I’ve been playing around with PocketSphinx myself, for much the same reasons. I happened across your blog looking for some assistance (found it here, BTW, thanks!).

Lo and behold- the “previous post” link is to EOM for double pendulums, some other random thing I just happen to be playing around with in the last few weeks.

Duly bookmarked and favorited!

Reply
- Sophie
  
  October 23, 2017 at 7:31 am
  
  Haha, I’m glad that you liked my posts! I have a pretty eclectic set of interests and that’s probably reflected in this blog.
  
  Reply
Maria Villalobos

October 12, 2017 at 9:24 am

Is the transcription really working for you, though? I tried an example and I am not getting good results, any ideas?

Reply
- Sophie
  
  October 18, 2017 at 1:38 am
  
  Hmm, that’ll depend on a variety of factors. Inaccuracy could result from noise (either in environment or microphone quality) or in poor correlation of your speech against the data used to train the recognizer. If it’s not working, I would recommend checking the sound samples recorded by commenting out line 150, or retraining the recognizer.
  
  Reply
sara

October 10, 2017 at 6:21 am

Thank u 🙂

Reply
- Sophie
  
  October 18, 2017 at 1:38 am
  
  Glad I could help! ^_^
  
  Reply
Jorge

September 25, 2017 at 9:45 pm

What if I want to speech-to-text on a pre-recorded .wav file instead of on live audio?

Reply
- Sophie
  
  October 18, 2017 at 1:47 am
  
  Yep, on line 146 replace ‘filename’ with the path of the file you want to laod.
  
  Reply
- Sophie
  
  October 18, 2017 at 1:47 am
  
  Yep, on line 146 replace ‘filename’ with the path of the file you want to load.
  
  Reply
Alysa

August 11, 2017 at 3:35 am

Hello Sophie, i’m getting an error:
DETECTED: [‘‘, ‘ah’, ”]
Listening …
Traceback (most recent call last):
File “sophie.py”, line 168, in
sd.run()
File “sophie.py”, line 135, in run
cur_data = stream.read(self.CHUNK)
File “/usr/local/lib/python2.7/site-packages/pyaudio.py”, line 608, in read
return pa.read_stream(self._stream, num_frames, exception_on_overflow)
IOError: [Errno -9981] Input overflowed

Reply
Alysa

August 11, 2017 at 2:30 am

hello, actually in my folder en-us, there isn’t any file named en-us, which is used in line 45. What is the file en-us?

Reply
Chrishane

July 6, 2017 at 6:20 am

This is what I’m getting as output after compiling

INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152609
INFO: ngram_search_fwdtree.c(333): Created 723 root, 152481 non-root channels, 53 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
Getting intensity values from mic.
ALSA lib pcm_dsnoop.c:606:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
Finished
Average audio intensity is 668.187156541
ALSA lib pcm_dsnoop.c:606:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
* Mic set up and listening.

Nothing happens after this.. I can find out the error..can you help me?

Reply
- Sophie
  
  August 5, 2017 at 3:20 pm
  
  Have you verified that the mic is connected to the computer and can be accessed though PyAudio? What you’ve posted leads me to believe that everything is working, but no audio input is being recognized.
  
  Reply
  - Chrishane
    
    August 5, 2017 at 6:18 pm
    
    Yeah i was trying to run this program using the laptop dedicated mic..but after a while I was figure out that I have to connect a external mic, then it was worked 🙂 so is there any way to run this program using the laptop default mic…in order to do that what should I do?
    
    Reply
    - Sophie
      
      August 8, 2017 at 3:05 pm
      
      Lines 58 and 59 deal with accessing the microphone. You likely have to provide pyaudio the index of your internal microphone. This stackoverflow question might help: https://stackoverflow.com/questions/36894315/how-to-select-a-specific-input-device-with-pyaudio
      
      Reply
  - Farrukh Saleem
    
    August 9, 2017 at 7:04 pm
    
    Hi i am creating speech to text application and i want to keep mic on will you please help me for this?
    
    Reply
Abinaya

June 12, 2017 at 7:41 pm

I’m getting an error saying that there is no module named pocketsphinx.pocketsphinx

Why is that so ?

the code and pocketsphinx are in the same directory only

Reply
- Sophie
  
  August 5, 2017 at 3:18 pm
  
  Hi Abinaya,
  
  Have you followed the correct installation instructions on the individual repositories for the code? If you’re having import issues, its probably because the pip install didn’t fully work, or the sphinx packages don’t have the correct hierarchy. The physical layout of the folders must be such that:
  .
  ├── pocketsphinx/
  └── sphinxbase/
  └── stt.py
  
  Hope this helps!
  
  Reply
- Robyn
  
  March 16, 2018 at 12:03 pm
  
  Abinaya,
  
  I had the same problem, and simply getting rid of the second “.pocketsphinx” , so that it looked like:
  
  from pocketsphinx import *
  
  Reply
jim

May 30, 2017 at 6:20 pm

Hi Sophie,
that’s a really nice program. Much shorter & tidier than i’d have expected to work with a monster like Sphinx.
Well done.

i’m on Ubuntu, with python 3+
(& had to change lines 128, 130 & 153 – cast to int.)

Here’s the (thankfully short) stack trace for an error i can’t get past:
line 167, in
sd.run()
line 145, in run
filename = self.save_speech(list(prev_audio) + audio2send, p)
line 86, in save_speech
data = ”.join(data)
TypeError: sequence item 0: expected str instance, bytes found

Any inspiration?

Reply
- Sophie
  
  August 5, 2017 at 3:14 pm
  
  Hi Jim,
  
  Sorry for the delay. Have you tried casting data to a string before line 86? Haven’t tested this code on python 3+ yet, so the type casting might be weird.
  
  Reply
Shishira Shastri H

May 29, 2017 at 3:16 am

Hi Sophie,

Thanks for the code.
When i run the file, it prints: * Mic set up and listening.
And after that nothing happens… I tried printing the value of slid_win variable, it prints while the while loop runs infinitely…. Could you please tell me when the recording will be stopped ?
Or is there a way to stop it ?

Reply
- Shishira Shastri H
  
  May 29, 2017 at 3:20 am
  
  Also, I’m testing this on Windows machine and not Linux.
  Thanks!!
  
  Reply
- Sophie
  
  August 5, 2017 at 3:09 pm
  
  Hi Shishira,
  
  Sorry it took so long to get back to you. If you’re still having issues, I can think of a couple of places where your code could be erroring:
  
  1. Are you sure your microphone is connected to the computer and accessible by the program?
  2. On line 52, the setup_mic function sets a threshold noise level for the mic. Are you letting the microphone sit in a quiet environment when the code is first run so the correct threshold can be set?
  
  Hope this helps, good luck!
  
  Reply
  - Gopi
    
    November 20, 2017 at 9:57 pm
    
    Hey I am getting this error.
    
    prev_audio = deque(maxlen=self.PREV_AUDIO * rel)
    TypeError: an integer is required
    
    I changed self.PREV_AUDIO = 1
    instead of 0.5
    Now no error, but having the above situation Shishira encountered.
    
    Reply
    - Sophie
      
      December 16, 2017 at 3:13 pm
      
      Hey Gopi,
      
      You should be casting that entire expression to an integer, instead of changing self.PREV_AUDIO only. prev_audio = deque(maxlen=int(self.PREV_AUDIO * rel))
      
      Reply
Aji

May 10, 2017 at 6:47 pm

hello Sophie
my model dir located in /home/pocketsphinx/model
and inside en-us/en-us dir there is an mdef file.
but when the program running, that mdef file is not detected.
can you help me ?
many thanks for you

btw i use ubuntu 16.04 with python 2.7

Reply
- Sophie
  
  May 17, 2017 at 4:41 am
  
  Hi Aji,
  
  Have you made sure that the file path is correct in the program as well? That would be one lines 40 and 41 of the program.
  
  Reply
Ambrose Douglas

April 29, 2017 at 3:03 pm

Hi, so I got everything working fine. I’m just curious if anyone has had this work well enough for any practical use? If I could give my computer simple commands I would be very happy, but I can’t seem to get more words than simple ones like “you”, “it”, “are”, etc.

Do I need to find a different model?

any pointers would be awesome!

Reply
- Sophie
  
  April 30, 2017 at 5:38 pm
  
  Hey Ambrose,
  
  What are you trying to get it to recognize? There are a couple ways to improve accuracy:
  
  1. Reduce the size of the recognition dictionary. IE: If you only need the STT engine to recognize a small set of words instead of the entire english language, you can increase accuracy by deleting words out of the dictionary that you don’t need. The location of the dictionary is found on line 47 in the code.
  
  2. Adapting the acoustic model to be more accurate to the sound of your voice. Instructions for that can be found here: http://cmusphinx.sourceforge.net/wiki/tutorialadapt
  
  Hope this helps,
  -Sophie
  
  Reply
Josef

April 20, 2017 at 3:32 pm

Sophie,
You might want to look at io.BytesIO, instead of saving to a temporary file. This will keep the array in memory, even better you can pass the entire buffer to the recognizer bypassing the need to save it altogether.

Reply
- Sophie
  
  April 30, 2017 at 5:35 pm
  
  Oh, very interesting! That does seem more efficient than saving to a temp file, I’ll keep it in mind for future iterations of this code.
  
  Reply
Rahul Vansh

April 14, 2017 at 4:17 am

Please can you provide little guidance for how to set MODELDIR and DATADIR?

Reply
Rahul Vansh

April 14, 2017 at 4:13 am

When I’m running this code, it shows below error please give me solution for this error…

INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
ERROR: “acmod.c”, line 83: Folder ‘pocketsphinx/model/en-us/en-us’ does not contain acoustic model definition ‘mdef’
Traceback (most recent call last):
File “Test.py”, line 17, in
decoder = pocketsphinx.Decoder(config)
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 266, in init
this = _pocketsphinx.new_Decoder(*args)
RuntimeError: new_Decoder returned -1

Reply
- Abu
  
  July 27, 2017 at 8:35 pm
  
  hi can you please give me some help. I ran into similar problem..
  
  Reply
john

April 5, 2017 at 7:48 pm

INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’live’, VARNORM=’no’, AGC=’none’
ERROR: “acmod.c”, line 79: Folder ‘../../tools/pocketsphinx/model/en-us/en-us’ does not contain acoustic model definition ‘mdef’
Traceback (most recent call last):
File “stt.py”, line 166, in
sd = SpeechDetector()
File “stt.py”, line 50, in __init__
self.decoder = Decoder(config)
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 332, in __init__
this = _pocketsphinx.new_Decoder(*args)
RuntimeError: new_Decoder returned -1

Hello,
I get this error. Any thoughts?

Reply
- Sophie
  
  April 6, 2017 at 12:29 am
  
  You can kinda see the problem in the error message:
  
  ERROR: “acmod.c”, line 79: Folder ‘../../tools/pocketsphinx/model/en-us/en-us’ does not contain acoustic model definition ‘mdef’
  
  You need to change lines 40 and 41 so the MODELDIR and DATADIR that refer to the actual location of the files.
  
  Hope it helps!
  
  Reply
  - Cookiecrunch
    
    July 12, 2017 at 4:38 pm
    
    ERROR: “acmod.c”, line 83: Folder ‘C:\Python27\Lib\site-packages\pocketsphinx\model\en-us\en-us’ does not contain acoustic model definition ‘mdef’
    Traceback (most recent call last):
    File “sophierun.py”, line 326, in
    sd = SpeechDetector()
    File “sophierun.py”, line 94, in __init__
    self.decoder = Decoder(config)
    File “C:\Python27\lib\site-packages\pocketsphinx\pocketsphinx.py”, line 277, in __init__
    this = _pocketsphinx.new_Decoder(*args)
    RuntimeError: new_Decoder returned -1
    
    I have changed MODELDIR and DATADIR so that they refer the actual path of the files. Still I am getting this error. How do I rectify this?
    
    Reply
    - Sophie
      
      August 5, 2017 at 3:23 pm
      
      Hmm, you’re running this code on a windows machine, so I can’t fully vouch that this code will work. I can think of two things.
      
      1. The physical layout of the folders must be such that:
      .
      ├── pocketsphinx/
      └── sphinxbase/
      └── stt.py
      Have you verified that?
      
      2. If you enter the C:\Python27\Lib\site-packages\pocketsphinx\model\en-us\en-us URL in your file explorer, does it actually take you to the folder where the mdef file can be found?
      
      Hope this helps, good luck!
      
      Reply
Hector

March 14, 2017 at 7:11 am

I am working in Mac and although I install pocketsphinx by pip install it does not recognize me either pocketsphinx and sphinxbase. I do not have any folder with both but if I do pip freeze I see pocketsphinx

Reply
leoGalani

March 9, 2017 at 2:17 am

Worked like a charm 🙂

Thanks!

Reply
renato gallo

January 15, 2017 at 1:59 am

./tardis.py
Traceback (most recent call last):
File “./tardis.py”, line 166, in
sd = SpeechDetector()
File “./tardis.py”, line 44, in __init__
config = Decoder.default_config()
AttributeError: type object ‘pocketsphinx.Decoder’ has no attribute ‘default_config’

Reply
Eisen

January 2, 2017 at 7:17 am

How to install pocketsphinx-python in raspbian jessie?

Reply
- Raj
  
  January 24, 2017 at 10:17 pm
  
  sudo apt-get install python-pocketsphinx
  
  Reply
Daryll

December 28, 2016 at 12:28 pm

Wow nice tut 🙂 is this working in python 3.4 ?

Reply
- Sophie
  
  December 28, 2016 at 5:11 pm
  
  Aside from a few changes to the print statements and such, the code should be python 3.4 compatible. It’s currently written for python 2.7 though.
  
  Reply
  - Daryll
    
    December 28, 2016 at 5:37 pm
    
    I had a problem following the installation process of sphinx using Visual Studio I follow the instructions build it using Visual Studio 2015 but i got this error:
    TRACKER : error TRK0005: Failed to locate: “CL.exe”. The system cannot find the file specified
    
    Reply
    - Sophie
      
      December 29, 2016 at 7:08 am
      
      I wrote the above code for Ubuntu 14.04, so while it might work for UNIX based OS’ like OS X or other linux distros, I can’t say for sure how it would work with Windows.
      
      There are probably some libraries missing during the installation phase that aren’t covered in my installation instructions. You could try following the Windows install directions from the CMU Sphinx website directly to see if it’ll help with that issue. Here: http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx#windows
      
      Reply
      - Daryll
        
        December 30, 2016 at 2:41 pm
        
        I have a hard time installing Sphinx on my windows 64 bit . 🙁 And i get this error:
        
        Traceback (most recent call last):
        File “C:\Python27\pocketsphnx.py”, line 1, in
        from pocketsphinx.pocketsphinx import *
        File “C:\Python27\lib\site-packages\pocketsphinx\__init__.py”, line 35, in
        from sphinxbase import *
        File “C:\Python27\lib\site-packages\sphinxbase\__init__.py”, line 32, in
        from .ad import *
        File “C:\Python27\lib\site-packages\sphinxbase\ad.py”, line 35, in
        _ad = swig_import_helper()
        File “C:\Python27\lib\site-packages\sphinxbase\ad.py”, line 34, in swig_import_helper
        return importlib.import_module(‘_ad’)
        File “C:\Python27\lib\importlib\__init__.py”, line 37, in import_module
        __import__(name)
        ImportError: No module named _ad
      - Sophie
        
        December 30, 2016 at 3:56 pm
        
        Sorry, I’ve never done installations on Windows, so I won’t be able to help you much on that. 🙁 My suggestion would be to dual-boot or run Ubuntu 14.04/16.04 on a virtual box so you’d be able to follow the instructions as is, or Google your error to see if other people have solved it before.
      - Daryll
        
        December 30, 2016 at 5:49 pm
        
        I am running on windows. I have followed the tutorial on how to install sphinxbase ang pocketsphinx . Downloaded Visual Studio 2012 express but still go this error : sphinx error; missing pocketsphinx module: ensure that pocketsphinx is set up correctly.
David

December 25, 2016 at 2:50 pm

Sophie,
I am working on a voice recognition project and came across your code base. Got it up and running with no problems but was wondering if you could provide some insight to the specifics of the INFO: outputs.

I also noticed it transitions pretty quick from Listening… to Starting the recording… to Finishing the recording. Most of the time this seems to happen in the middle of testing speech recognition and I have to time when to speak. I also notice sometimes the output is just [SPEECH} other times just even though I was speaking and other times when there is no noise there is speech output being displayed.

Below is some of the output.

Listening …
Starting recording of phrase
Finished recording, decoding phrase
INFO: cmn_live.c(88): Update from
INFO: cmn_live.c(105): Update to
INFO: cmn_live.c(88): Update from
INFO: cmn_live.c(105): Update to
INFO: cmn_live.c(120): Update from
INFO: cmn_live.c(138): Update to
INFO: ngram_search_fwdtree.c(1550): 24051 words recognized (32/fr)
INFO: ngram_search_fwdtree.c(1552): 2808025 senones evaluated (3784/fr)
INFO: ngram_search_fwdtree.c(1556): 19077058 channels searched (25710/fr), 489688 1st, 672158 last
INFO: ngram_search_fwdtree.c(1559): 37370 words for which last channels evaluated (50/fr)
INFO: ngram_search_fwdtree.c(1561): 1405708 candidate words for entering last phone (1894/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 6.07 CPU 0.818 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 6.09 wall 0.821 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 473 words
INFO: ngram_search_fwdflat.c(948): 16143 words recognized (22/fr)
INFO: ngram_search_fwdflat.c(950): 996651 senones evaluated (1343/fr)
INFO: ngram_search_fwdflat.c(952): 1704946 channels searched (2297/fr)
INFO: ngram_search_fwdflat.c(954): 83335 words searched (112/fr)
INFO: ngram_search_fwdflat.c(957): 45835 word transitions (61/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.57 CPU 0.077 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.57 wall 0.077 xRT
INFO: ngram_search.c(1250): lattice start node ~~.0 end node~~ .669
INFO: ngram_search.c(1276): Eliminated 1 nodes before end node
INFO: ngram_search.c(1381): Lattice has 2546 nodes, 23747 links
INFO: ps_lattice.c(1380): Bestpath score: -24784
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:669:740) = -1342034
INFO: ps_lattice.c(1441): Joint P(O,S) = -1492405 P(S|O) = -150371
INFO: ngram_search.c(1027): bestpath 0.09 CPU 0.012 xRT
INFO: ngram_search.c(1030): bestpath 0.09 wall 0.012 xRT
(‘DETECTED: ‘, [‘~~‘, ‘[SPEECH]’, ”, ”, “what’s(2)”, ‘this’, ‘and(2)’, ‘he’, ”, ”, ‘[SPEECH]’, ”, ‘was(2)’, ‘~~‘])
Listening …
Starting recording of phrase
Finished recording, decoding phrase
INFO: cmn_live.c(88): Update from
INFO: cmn_live.c(105): Update to
INFO: cmn_live.c(88): Update from
INFO: cmn_live.c(105): Update to
INFO: cmn_live.c(88): Update from
INFO: cmn_live.c(105): Update to
INFO: cmn_live.c(88): Update from
INFO: cmn_live.c(105): Update to
INFO: cmn_live.c(120): Update from
INFO: cmn_live.c(138): Update to
INFO: ngram_search_fwdtree.c(1550): 39722 words recognized (37/fr)
INFO: ngram_search_fwdtree.c(1552): 3494277 senones evaluated (3296/fr)
INFO: ngram_search_fwdtree.c(1556): 22075849 channels searched (20826/fr), 576804 1st, 1109420 last
INFO: ngram_search_fwdtree.c(1559): 60213 words for which last channels evaluated (56/fr)
INFO: ngram_search_fwdtree.c(1561): 1900155 candidate words for entering last phone (1792/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 6.89 CPU 0.650 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 6.89 wall 0.650 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 705 words
INFO: ngram_search_fwdflat.c(948): 24076 words recognized (23/fr)
INFO: ngram_search_fwdflat.c(950): 1527037 senones evaluated (1441/fr)
INFO: ngram_search_fwdflat.c(952): 2894866 channels searched (2731/fr)
INFO: ngram_search_fwdflat.c(954): 142217 words searched (134/fr)
INFO: ngram_search_fwdflat.c(957): 73064 word transitions (68/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.97 CPU 0.092 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.97 wall 0.092 xRT
INFO: ngram_search.c(1250): lattice start node ~~.0 end node~~ .1055
INFO: ngram_search.c(1276): Eliminated 0 nodes before end node
INFO: ngram_search.c(1381): Lattice has 3021 nodes, 31799 links
INFO: ps_lattice.c(1380): Bestpath score: -40103
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:1055:1058) = -2135522
INFO: ps_lattice.c(1441): Joint P(O,S) = -2386068 P(S|O) = -250546
INFO: ngram_search.c(1027): bestpath 0.15 CPU 0.014 xRT
INFO: ngram_search.c(1030): bestpath 0.15 wall 0.014 xRT
(‘DETECTED: ‘, [‘~~‘, ‘i’, ‘have’, ‘a’, ‘somewhat(2)’, ‘is’, ‘the’, ‘weather’, ‘in’, ‘now’, ”, “it’s”, ”, ”, ”, ‘just’, ‘what(2)’, ‘~~‘])

Reply
- Sophie
  
  December 25, 2016 at 5:36 pm
  
  Hey David!
  
  Glad you got the code working!
  
  The short delay is likely due to the setup_mic function on lines 52-77. While this code is being run, the mic records a sound sample for a while and sets the base threshold as the average amplitude of the sound sample. So, when the code is first being initiated, you’d want to have the microphone be in as close to “neutral” sound level as possible. You can alter the values in that function to tune the thresholding to be better suited to your methods.
  
  Since [SPEECH] is a placeholder for a sound that the recognizer couldn’t classify, you might want to listen to the sound samples that are being recorded to see if the results make any sense. You can comment out line 150 if you want to do that. Hope this helps!
  
  Reply
  - David
    
    December 26, 2016 at 6:26 am
    
    Thanks. Can you explain what affect either increasing or decreasing the 0.2 avg value will have along with the 3500 threshold value?
    
    I haven’t changed any of the default values yet but notice I see the following quite a bit:
    ERROR: “ngram_search.c”, line 1139: Couldn’t find ~~in first frame~~
    
    ~~I also see various tags when there is an output… such as~~ ~~or what is the significance of them and how can I prevent those tags from being displayed?~~
    
    ~~Thanks~~
    
    Reply
    - Sophie
      
      December 26, 2016 at 3:02 pm
      
      3500 is the minimum threshold value, so changing it will affect the minimum sound thresholds during the mic setup method (i.e if you’re recording in a really quiet environment you want some sort of threshold at least). If you change the 0.2 constant, the threshold will be determined from a larger average of amplitudes. So if your mic prone to random spikes in amplitude due to noise, it would be better to increase the constant.
      
      As for the tags, a single word may have different pronunciations. So when you see something like was(2), its likely referring to pronunciation 2 in the word dictionary. You could manually strip these tags using string comprehension.
      
      Reply
- Daryll
  
  December 28, 2016 at 12:30 pm
  
  Sir David,
  Can you show us how you do the voice recognition? I am new to python and I am planning on building my own AI. Hope you could help me thanks 🙂
  
  Reply
  - David
    
    December 29, 2016 at 3:09 am
    
    Daryll,
    
    I apologize for my use of ‘voice recognition’ I meant speech recognition… there is a big difference.
    
    I am not focusing on having the system differentiate between physical human speakers… my focus is on having the system correctly interpret and execute execute tasks based on human speech input.
    
    Sorry for any confusion.
    -David
    
    Reply
David

December 18, 2016 at 4:02 am

Sophie,
I’m running into an error and am curious to know if there is a pocketsphinx-python version that will run with python 3.0-3.5.

I have existing 3.x functionality but when I attempt to incorporate pocketsphinx-python I get the following error:
ImportError: //pocketsphinx-python/sphinxbase/_ad.so: undefined symbol: PyInstance_Type

I’m an old Java programmer and when I say old, I’m talking about JDK version 1.4 and I’m not familiar with C. From what I’ve found based on searches I think the issue is with the python version the .so file was created against. If I change to version 2.x it will work but my existing code won’t.

Any input would be appreciated.

Reply
Harshit

December 16, 2016 at 1:27 am

Hey Sophie,

I am using ubuntu14.04 and python 2.7 and have installed pocketsphinx using `sudo apt-get install python-pocketsphinx` but i am getting the error: `no module named pocketsphinx` in the third line.

Is there way out?

Reply
- Sophie
  
  December 25, 2016 at 5:30 pm
  
  Hmm, did you follow the instructions in full on the github readme? You might need to use pip to get the correct paths set.
  
  There are several things that need to be installed for pocketsphinx to be imported correctly:
  sudo apt-get install -y python python-dev python-pip build-essential swig git
  sudo pip install pocketsphinx
  
  Reply
  - Hasib
    
    January 31, 2018 at 5:22 pm
    
    While executing “build-essential swig git” the following error is shown
    build-essential: command not found
    but build-essential and swig all are installed
    
    Reply
Rob

December 5, 2016 at 9:03 pm

Hi Sophie, thanks so much for the share! I have some issues if you do not mind taking a look at:

runtimeerror: new decoder returned -1

Any ideas?

Many thanks

Reply
- Sophie
  
  December 6, 2016 at 5:07 pm
  
  Hi Rob,
  
  On an initial guess it may be because your folders are not organized correctly or you didn’t correctly install all the modules. The physical layout of the folders must be such that:
  .
  ├── pocketsphinx/
  └── sphinxbase/
  └── stt.py
  
  What operating system are you running this code on? I’ve only tested it on Ubuntu 14.04 using Python 2.7
  
  Reply
John

November 14, 2016 at 6:53 am

hello,

Im getting an invalid sample rate error when i run it with rate of 16000. It works with the default sampling rate of my mic 48000 but cannot recognize words. must be a pyaudio issue?. how do i configure it to work with this script?

Traceback (most recent call last):
File “/home/pi/stt.py”, line 174, in
sd.run()
File “/home/pi/stt.py”, line 121, in run
self.setup_mic()
File “/home/pi/stt.py”, line 70, in setup_mic
frames_per_buffer=self.CHUNK)
File “build/bdist.linux-armv7l/egg/pyaudio.py”, line 750, in open
stream = Stream(self, *args, **kwargs)
File “build/bdist.linux-armv7l/egg/pyaudio.py”, line 441, in __init__
self._stream = pa.open(**arguments)
IOError: [Errno -9997] Invalid sample rate

Reply
- Sophie
  
  November 26, 2016 at 4:25 am
  
  Hey there,
  
  You probably knew this, but Sphinx only works at a sampling rate of 16kHz, which is why it won’t work if you pass it a sampling rate of 48kHz. 16000 shouldn’t be an invalid sampling rate, so it probably has something to do with how your mic is set up. I did some looking around, and it seems like people at the RaspberryPi forums have had the same issues:
  
  https://www.raspberrypi.org/forums/viewtopic.php?t=63136&p=468103
  https://www.raspberrypi.org/forums/viewtopic.php?f=37&t=97702
  
  Alternatively, have you tried using a different microphone? It might give you better results. Good luck!
  
  Reply
  - vishwas
    
    May 18, 2018 at 6:55 am
    
    how should i get sampling rate of 16k or there is any method to take input
    
    Reply
Anup

November 3, 2016 at 9:49 am

Hello,
I’m getting the following error when i’m trying to run your script.

slid_win = deque(maxlen=self.SILENCE_LIMIT * rel)
TypeError: an integer is required

Could you please help

Thanks,
Anup

Reply
- Sophie
  
  November 3, 2016 at 2:43 pm
  
  If you’re using Python 3.0, rel isn’t automatically converted to an integer when self.RATE/self.CHUNK is calculated.
  
  Replace that line with this:
  slid_win = deque(maxlen=self.SILENCE_LIMIT * int(rel))
  
  Reply
Rodrigo

October 10, 2016 at 8:55 am

Hi Sophie, as you said, I did try install again the packages.
So, now, when i’m installing pocket…python, I have an error on final of the output:

~/pocketsphinx-python $ sudo python setup.py install
running install
running bdist_egg
running egg_info
writing pocketsphinx.egg-info/PKG-INFO
writing top-level names to pocketsphinx.egg-info/top_level.txt
writing dependency_links to pocketsphinx.egg-info/dependency_links.txt
error: package directory ‘pocketsphinx/swig/python’ does not exist

“error: package directory ‘pocketsphinx/swig/python’ does not exist” But this directory already exist, my directory structure is:

/home/user/pocketsphinx
/home/user/sphinxbase
/home/user/pocketsphinx-python

Is it wrong?

If I try run the script I’m get this:

$ python sample.py
Traceback (most recent call last):
File “sample.py”, line 3, in
from pocketsphinx.pocketsphinx import *
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/__init__.py”, line 37, in
from pocketsphinx import *
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 42, in
_pocketsphinx = swig_import_helper()
File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 38, in swig_import_helper
_mod = imp.load_module(‘_pocketsphinx’, fp, pathname, description)
ImportError: libpocketsphinx.so.3: cannot open shared object file: No such file or directory

Reply
- Sophie
  
  October 10, 2016 at 4:43 pm
  
  For the first one: Is there a reason you’re not using pip? Since it’s the package installer for python, it might be easier to install pocketsphinx-python that way. Your directory structure is correct though. The error is saying you don’t have swig installed, did you follow all of the install instructions?
  
  sudo apt-get install -y python python-dev python-pip build-essential swig git sudo pip install pocketsphinx
  
  For the second: That’s the same issue you had previously right? That you solved by exporting the LD_LIBRARY_PATH?
  export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
  
  Reply
  - Rodrigo
    
    October 11, 2016 at 1:58 am
    
    I tryed pip and git clone..swig is already installed.
    I just forgot about export =/
    
    But, even do this I get errors:
    
    Traceback (most recent call last):
    File “sample.py”, line 162, in
    sd.run()
    File “sample.py”, line 109, in run
    self.setup_mic()
    File “sample.py”, line 58, in setup_mic
    frames_per_buffer=self.CHUNK)
    File “/usr/local/lib/python2.7/dist-packages/pyaudio.py”, line 750, in open
    stream = Stream(self, *args, **kwargs)
    File “/usr/local/lib/python2.7/dist-packages/pyaudio.py”, line 441, in __init__
    self._stream = pa.open(**arguments)
    IOError: [Errno -9996] Invalid input device (no default output device)
    
    Reply
    - Sophie
      
      October 11, 2016 at 4:35 am
      
      Okay, so that error is saying that it can’t find any of your microphones.
      You can check to see if the microphone is working outside of the script by following these instructions.
      
      If you have multiple input devices, you might have to modify line 117 so that PyAudio initializes with the correct microphone. You can take a look at the documentation here.
      
      Reply
    - tushar
      
      June 1, 2017 at 2:07 pm
      
      hey, i am running Centos7 and having the same problem. Did you sorted out your’s?
      
      Reply
Rodrigo

October 7, 2016 at 5:26 pm

There is a way to chance de language to Portugese?

Reply
- Rodrigo
  
  October 7, 2016 at 5:31 pm
  
  I’m getting this error:
  
  $ python sample.py
  Traceback (most recent call last):
  File “sample.py”, line 3, in
  from pocketsphinx.pocketsphinx import *
  File “sphinxbase.pxd”, line 150, in init pocketsphinx (pocketsphinx.c:7935)
  ValueError: PyCapsule_GetPointer called with invalid PyCapsule object
  
  Reply
  - Sophie
    
    October 8, 2016 at 2:13 am
    
    That appears to be an error internal to Python or Cython. I need a bit more information:
    
    What version of python are you using?
    Are you using Ubuntu? Or another operating system?
    
    On a first pass, it appears that you’ll need to do a reinstall after ensuring that you’ve configured Cython correctly. Maybe these instructions can help?
    
    Reply
    - Rodrigo
      
      October 8, 2016 at 5:59 am
      
      Hi, thanks for you answer!!
      After try your tip, I get this error:
      
      $ python teste.py
      Traceback (most recent call last):
      File “teste.py”, line 3, in
      from pocketsphinx.pocketsphinx import *
      File “sphinxbase.pxd”, line 150, in init pocketsphinx (pocketsphinx.c:7934)
      File “/usr/local/lib/python2.7/dist-packages/sphinxbase/__init__.py”, line 37, in
      from sphinxbase import *
      File “/usr/local/lib/python2.7/dist-packages/sphinxbase/sphinxbase.py”, line 42, in
      _sphinxbase = swig_import_helper()
      File “/usr/local/lib/python2.7/dist-packages/sphinxbase/sphinxbase.py”, line 38, in swig_import_helper
      _mod = imp.load_module(‘_sphinxbase’, fp, pathname, description)
      ImportError: libsphinxbase.so.3: cannot open shared object file: No such file or directory
      
      My Python version is:
      $ python –version
      Python 2.7.6
      
      I’m using Linux Mint:
      $ uname -a
      Linux LinuxMint 3.19.0-32-generic #37~14.04.1-Ubuntu SMP Thu Oct 22 09:41:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
      
      Thank you so much.
      
      Reply
      - Rodrigo
        
        October 8, 2016 at 6:08 am
        
        Ok, this last one I resolved with this:
        $ export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
        
        But now i’m getting this:
        
        $ python teste.py
        Traceback (most recent call last):
        File “teste.py”, line 3, in
        from pocketsphinx.pocketsphinx import *
        File “sphinxbase.pxd”, line 150, in init pocketsphinx (pocketsphinx.c:7934)
        ValueError: sphinxbase.NGramModel has the wrong size, try recompiling
        
        =/ this is script does not like me
      - Sophie
        
        October 9, 2016 at 5:19 am
        
        That does seem to be a Cython issue, and that would be internal to the pocketsphinx library–not the script that I posted. (You can tell because it’s failing at the import step before getting to any of the actual code :P)
        
        Have you tried uninstalling all the sphinx libraries and reinstalling? It’s kind of annoying, but it might fix your problem.
- Sophie
  
  October 8, 2016 at 2:11 am
  
  You can change the language settings by changing the configs you set. Modify lines 45-47 to use the language files that you’d like. (In this case, Portuguese).
  
  Of course, you’ll need to have language model for this to work. Take a look at the information provided here for implementation details.
  
  Reply
Amitava

September 12, 2016 at 6:26 am

Hi,
I get the following error when run the above code (F5)

Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:19:22) [MSC v.1500 32 bit (Intel)] on win32
Type “copyright”, “credits” or “license()” for more information.
>>>
=================== RESTART: C:\Sphinx\project\stt\stt.py ===================

Traceback (most recent call last):
File “C:\Sphinx\project\stt\stt.py”, line 166, in
sd = SpeechDetector()
File “C:\Sphinx\project\stt\stt.py”, line 50, in __init__
self.decoder = Decoder(config)
File “C:\Python27\lib\site-packages\pocketsphinx\pocketsphinx.py”, line 277, in __init__
this = _pocketsphinx.new_Decoder(*args)
RuntimeError: new_Decoder returned -1
>>>

Reply
- Sophie
  
  September 12, 2016 at 4:23 pm
  
  I’ll need a bit more info, but on an initial guess it’s because your folders are not organized correctly or you didn’t correctly install all the modules. The physical layout of the folders must be such that:
  .
  ├── pocketsphinx/
  └── sphinxbase/
  
  Also, you’re running this code on a Windows machine. I’ve only tested this code on Linux, so no guarantees that it will work on a different operating system because the drivers and system architecture is different.
  
  Reply
  - Amitava
    
    September 13, 2016 at 5:16 am
    
    The folders are organized as
    C:\Sphinx\project\stt
    ├── pocketsphinx/
    └── sphinxbase/
    └── stt.py
    
    where stt.py is the above source file.
    I manually copied the all 7 files (pocketsphinx.dll, pocketsphinx.dll, .. etc.) from C:\Sphinx\pocketsphinx\bin\Release\Win32 to the above directory:
    C:\Sphinx\project\stt\pocketsphinx
    
    Did similar thing for sphinxbase also.
    
    I used “pip install pocketsphinx” to install pocketsphinx, and the Installation was successful:
    Collecting pocketsphinx
    Downloading pocketsphinx-0.1.3-cp27-cp27m-win32.whl (29.0MB)
    100% |################################| 29.0MB 47kB/s
    Installing collected packages: pocketsphinx
    Successfully installed pocketsphinx-0.1.3
    
    The imports were also fine
    from pocketsphinx.pocketsphinx import *
    from sphinxbase.sphinxbase import *
    
    Could it be a 64 bit vs 32 bit issue?
    
    Reply
    - Sophie
      
      September 13, 2016 at 7:24 am
      
      I am tempted to believe that it is the 64bit vs 32bit — I’m not sure how well Sphinx works on a Windows computer, but you could install Ubuntu on a virtual machine and run this code on Linux that way.
      
      Reply
      - Amitava
        
        September 13, 2016 at 1:18 pm
        
        Hi Sophie,
        
        Thanks for your time and feedback. But you think my folder structure and the way I filled the pocketsphinx and sphinxbase folders above are correct, right?
      - Sophie
        
        September 16, 2016 at 12:43 pm
        
        Yep, they look alright to me! If you have a chance, try the installation process via an Ubuntu install (either as a native OS or as a virtual OS).

sophie's blog

Let's not use sledgehammers to turn screws.

Python speech to text with PocketSphinx

128 Comments

Leave a Reply Cancel reply