Python speech to text with PocketSphinx

I’ve wanted to use speech detection in my personal projects for the longest time, but the Google API has gradually gotten more and more restrictive as time passes. In order to ensure that my projects could work even without an internet connection, I looked for another speech recognition package that would preferably be easier to use. I found the Sphinx voice recognition suite of CMU to be a really great speech to text package. However, documentation and sample code is non-existent, so it took me forever to get anything done. Finally, I’ve figured it out! The example code is at the bottom of this post, but you can directly download it from Github here.

Here are the steps to take to get this working:

  1. Download SphinxBase and follow the install instructions
  2. Download PocketSphinx and follow the install instructions
  3. Download PocketSphinx-python and follow the install instructions
  4. Run the code below


The main problems I had with setting up PocketSphinx was the myriad of libraries that the main site told me to download. However, after lots of trial and error, I’ve realized that I really only need three.

  • SphinxBase is the base package that all of the other Sphinx programs use
  • PocketSphinx is the lightweight recognizer, since I was okay with the program being a bit inaccurate if it meant I could decode phrases faster
  • PocketSphinx-python is the wrapper to allow us to program in the best scripting language ever.

The code basically sets up the microphone and saves each phrase detected as a temporary .wav file which the Sphinx decoder then translates into a list of strings representing the spoken words. A phrase is defined as a bunch of sound sandwiched by duration of silence. I stole most of the phrase detection code from someone else two years ago, though unfortunately, I can’t remember who. If you’re reading this, thank you! 🙂

Anyhow, in the initialization of the run loop, we first define what the minimum threshold should be in defining “silence”. Then we launch into an infinitely running loop that will continue to listen to sounds over the microphone, calling the Sphinx decoder whenever a phrase has been saved. A sliding average is used as well during phrase detection, to make things a bit more accurate. You can load different voice recognition models into the decoder config if you want this speech recognition code to work for different languages.

Now that I have this speech detection code in a neat little importable class, I’m really excited about future capabilities of my projects. So many ideas, so little time!
-Sophie

[Addendum] Thanks to Carl at jazzystring1@gmail.com for getting this code working with Python3!

128 Comments

  1. Auslander

    Years later, and your post just got me running on an offline, non-Google-API based speech transcription project that would have taken me a week to do manually. Thank you.

  2. Mainak Biswas

    * Mic set up and listening.
    Traceback (most recent call last):
    File “test2.py”, line 161, in
    sd.run()
    File “test2.py”, line 122, in run
    slid_win = deque(maxlen=self.SILENCE_LIMIT * rel)
    TypeError: an integer is required

    Im getting a type error. I tried to typecast but still error is there.

  3. Mikener

    Hi Sophie,

    great work you have done. Maybe you can help me:

    * Mic set up and listening.

    Nothings happens then.
    Use: Python 2.7 , pyaudio 0.2.11, pi3 B,

    Tried several numbers with self.INPUT_DEVICE_INDEX in Class SpeechDetector -> but nothing new.

    Attached USB-Mic works perfectly with “pocketsphinx_continuous” -command.

    • Mikener

      Okay, some debug infos with logging-object:

      INFO:TestLogger:INITIALIZED
      INFO:TestLogger:Getting intensity values from mic.
      INFO:TestLogger:r-value: 2181.74425632
      INFO:TestLogger: Finished
      INFO:TestLogger:cur_data:
      INFO:TestLogger:x in slid_win: deque([0.0], maxlen=15)
      INFO:TestLogger:cur_data:
      INFO:TestLogger:x in slid_win: deque([0.0, 0.0], maxlen=15)
      ……

      r-value – changes when i speak to the mic during setup_mic()
      but it seems there is nothing in cur_data..

      • Sophie

        During the setup_mic() phase, the mic is active and listening to the default intensity values (sound level) of the room. It’ll then set that level as the trigger value for when it’ll start recording speech. So, during the setup phase, try to have the mic be in a fairly quiet room so it’ll trigger during the actual detection phase. Let me know if that helps! 🙂

  4. gundu

    Hi,
    I am running my code on windows. after a few corrections in my code, I am able to execute using the python script and here is the output.(for a few runs). I want to print the grimmer and the spoken words also. How do I get them?
    Mic set up and listening.
    Starting recording of phrase
    Finished recording, decoding phrase
    DETECTED: [‘‘, ‘[SPEECH]’]
    Listening …
    Starting recording of phrase
    Finished recording, decoding phrase
    DETECTED: [‘
    ‘, ‘‘]
    Listening …
    Starting recording of phrase
    Finished recording, decoding phrase
    DETECTED: [‘‘, ‘ugh’, ‘‘]
    Listening …
    Starting recording of phrase
    Finished recording, decoding phrase
    DETECTED: [‘‘, ‘[SPEECH]’, ‘‘]
    Listening …
    Starting recording of phrase
    Finished recording, decoding phrase
    DETECTED: [‘‘, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘ugh’, ‘‘]
    Listening …
    Starting recording of phrase
    Finished recording, decoding phrase
    DETECTED: [‘‘, ‘[SPEECH]’, ”, ‘and’, ‘[SPEECH]’, ‘‘]
    Listening …
    Starting recording of phrase
    Finished recording, decoding phrase
    DETECTED: [‘‘, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ”, ‘[SPEECH]’, ‘and’, ‘[SPEECH]’, ”, ‘that’, ”, ‘that’, ”, ‘bad’, ”, ‘‘]

    • Sophie

      I’m not sure about grammar, but the [SPEECH] bracket means that the decoder couldn’t interpret the words that it was hearing. I’d suggest retraining the decoder, or listening to the recorded voice files to see if there’s an issue with the clarity of the sound file you’re passing to the decoder.

  5. RookieConverter

    Hello Sophie,

    Like you said i followed all the instructions you have mentioned above:
    I have downloaded and complied both sphinxbase & pocketsphinx inside a folder.
    Now when i am trying to run your program it does not do anything. Sorry i dont know what i am doing wrong.

    using Windows and Python 3.5.4

    Also i am not able to find the below folders:
    DATADIR = “C:\Python34\Lib\site-packages\pocketsphinx\test\data”

    I am totally new to Python and i am running your code using IDLE. There are no errors as such but i guess the terminal would show a message as * Mic set up and listening. like you have specified in your run(self) function.

    Please Help 🙂 !!

  6. Carl David

    To those who are encountering the “new Decoder returned -1 error”, fix the path location of your model in line 40 and 41 🙂

  7. Carl David

    Thanks Sophie for your amazing post 🙂 It helps a lot. To those who are having a hard time running this code in Python 3+ (3.6 specifically) due to big changes to its core, here’s the code

    from pocketsphinx.pocketsphinx import *
    from sphinxbase.sphinxbase import *

    import os
    import pyaudio
    import wave
    import audioop
    from collections import deque
    import time
    import math

    “””
    Written by Sophie Li, 2016
    http://blog.justsophie.com/python-speech-to-text-with-pocketsphinx/
    “””

    class SpeechDetector:
    def __init__(self):
    # Microphone stream config.
    self.CHUNK = 1024 # CHUNKS of bytes to read each time from mic
    self.FORMAT = pyaudio.paInt16
    self.CHANNELS = 1
    self.RATE = 16000

    self.SILENCE_LIMIT = 1 # Silence limit in seconds. The max ammount of seconds where
    # only silence is recorded. When this time passes the
    # recording finishes and the file is decoded

    self.PREV_AUDIO = 0.5 # Previous audio (in seconds) to prepend. When noise
    # is detected, how much of previously recorded audio is
    # prepended. This helps to prevent chopping the beginning
    # of the phrase.

    self.THRESHOLD = 4500
    self.num_phrases = -1

    # These will need to be modified according to where the pocketsphinx folder is
    MODELDIR = “pocketsphinx/model”
    DATADIR = “pocketsphinx/test/data”

    # Create a decoder with certain model
    config = Decoder.default_config()
    config.set_string(‘-hmm’, os.path.join(MODELDIR, ‘en-us/en-us’))
    config.set_string(‘-lm’, os.path.join(MODELDIR, ‘en-us/en-us.lm.bin’))
    config.set_string(‘-dict’, os.path.join(MODELDIR, ‘en-us/cmudict-en-us.dict’))

    # Creaders decoder object for streaming data.
    self.decoder = Decoder(config)

    def setup_mic(self, num_samples=50):
    “”” Gets average audio intensity of your mic sound. You can use it to get
    average intensities while you’re talking and/or silent. The average
    is the avg of the .2 of the largest intensities recorded.
    “””
    print (“Getting intensity values from mic.”)
    p = pyaudio.PyAudio()
    stream = p.open(format=self.FORMAT,
    channels=self.CHANNELS,
    rate=self.RATE,
    input=True,
    frames_per_buffer=self.CHUNK)

    values = [math.sqrt(abs(audioop.avg(stream.read(self.CHUNK), 4)))
    for x in range(num_samples)]
    values = sorted(values, reverse=True)
    r = sum(values[:int(num_samples * 0.2)]) / int(num_samples * 0.2)
    print (” Finished “)
    print (” Average audio intensity is %s ” % r)
    stream.close()
    p.terminate()

    if r self.THRESHOLD for x in slid_win]) > 0:
    if started == False:
    print (“Starting recording of phrase”)
    started = True
    audio2send.append(cur_data)

    elif started:
    print (“Finished recording, decoding phrase”)
    filename = self.save_speech(list(prev_audio) + audio2send, p)
    r = self.decode_phrase(filename)
    print (“DETECTED: %s” % r)

    # Removes temp audio file
    os.remove(filename)
    # Reset all
    started = False
    slid_win = deque(maxlen=int(self.SILENCE_LIMIT * rel))
    prev_audio = deque(maxlen=int(0.5 * rel))
    audio2send = []
    print (“Listening …”)

    else:
    prev_audio.append(cur_data)

    print (“* Done listening”)
    stream.close()
    p.terminate()

    if __name__ == “__main__”:
    sd = SpeechDetector()
    sd.run()

    • Sophie

      Hey Carl,

      Thanks for getting the code working with Python 3! I’ll add your code as an addendum to my post if that’s ok.

      • MCC

        Hi Carl,

        Thanks for sharing your fantastic input.

        Can you please re-post your code as am running into few error messages when executing the code?

        Sophie – Well done for your input as well.

        Thanks,

        • Sophie

          Hi MCC, apologies for the late reply. Have you worked through the issues in your code? I did add Carl’s python3.x implementation to the bottom of the post.

  8. majo

    Hi Sophie! I hope you know there are people all over the world trying to compile your code

    I have I similar issue, when I do the cast that you suggest, another come up. I can belive that python 2.7 and 3.6 change this so much

    * Mic set up and listening.
    Starting recording of phrase
    Finished recording, decoding phrase
    Traceback (most recent call last):
    File “sophie.py”, line 167, in
    sd.run()
    File “sophie.py”, line 145, in run
    filename = self.save_speech(list(prev_audio) + audio2send, p)
    File “sophie.py”, line 86, in save_speech
    data = ”.join(data)
    TypeError: sequence item 0: expected str instance, bytes found

    Have any suggestinon to fix this one?

    • Sophie

      Hi Majo,

      I’m happy that my code has helped!
      Try casting data to a str. On line 86, data="".join(str(data)).

  9. Robyn

    Sorry to bother you again, but after switching to Anaconda(because I was told it was the best program for beginners such as myself), things are going a bit more smoothly, but I keep getting this error:

    runfile(‘C:/Users/ccatx/Downloads/pystuff/Lib/site-packages/deathtrial1.py’, wdir=’C:/Users/ccatx/Downloads/pystuff/Lib/site-packages’)
    Traceback (most recent call last):

    File “”, line 1, in
    runfile(‘C:/Users/ccatx/Downloads/pystuff/Lib/site-packages/deathtrial1.py’, wdir=’C:/Users/ccatx/Downloads/pystuff/Lib/site-packages’)

    File “C:\Users\ccatx\Downloads\pystuff\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 705, in runfile
    execfile(filename, namespace)

    File “C:\Users\ccatx\Downloads\pystuff\lib\site-packages\spyder\utils\site\sitecustomize.py”, line 102, in execfile
    exec(compile(f.read(), filename, ‘exec’), namespace)

    File “C:/Users/ccatx/Downloads/pystuff/Lib/site-packages/deathtrial1.py”, line 164, in
    sd = SpeechDetector()

    File “C:/Users/ccatx/Downloads/pystuff/Lib/site-packages/deathtrial1.py”, line 48, in __init__
    self.decoder = Decoder(config)

    File “C:\Users\ccatx\Downloads\pystuff\Lib\site-packages\pocketsphinx\pocketsphinx.py”, line 275, in __init__
    this = _pocketsphinx.new_Decoder(*args)

    RuntimeError: new_Decoder returned -1

    I have redirected the MODELDIR and DATADIR to what I believe are the right pathways, and I have put sphinxbase in my pocketspinx folder(I do not seem to have a stt.py file anywhere on my computer, so I have not added that), but neither have worked.

    I doubt this affects anything, but just in case it does, I get a warning sign next to the first two lines that read:

    ‘from pocketsphinx import *’ used; unable to detect undefined names
    ‘from sphinxbase import *’ used; unable to detect undefined names

    Thanks in advance!

    • Sophie

      Hi Robyn,

      Wow, this is a late reply–but better late than never?

      The error at the bottom is because you’re using wildcard imports, which the Flake8 Python style checker doesn’t like. It’s a style issue, so that shouldn’t have anything to do with the errors you’re seeing.

      One thing that catches my attention is the direction of the / and \ when referring to the file directories. Windows usually uses backwards-slash “\” and Unix uses “/” which is what I wrote this code in. You could try changing the direction of the slashes so they fit? I’m not actually sure, since I’ve never used a windows computer before.


      MODELDIR = "..\..\tools\pocketsphinx\model"
      DATADIR = "..\..\tools\pocketsphinx\test\data"

      Alternatively, I’d recommend trying to do further coding projects in macOS/Ubuntu, since it’ll make things a lot easier for you during the learning stages since a lot coding projects are built for Unix/Linux systems.

      Hope this helps! Or maybe is just informative if you’ve already figured it out. ^^’

      -Sophie

  10. Robyn

    Dear Sophie,

    I’m not sure if you’re even on this blog anymore, but I’ve been having a couple problems I can’t figure out:

    File “C:\Users\Robyn\Downloads\yikes”, line 164, in
    sd = SpeechDetector()
    File “C:\Users\Robyn\Downloads\yikes”, line 21, in __init__
    self.FORMAT = pyaudio.paInt16
    AttributeError: module ‘pyaudio’ has no attribute ‘paInt16’

    I’ve read your other answers on the SpeechDetector but I still couldn’t find a solution. I haven’t seen the paint16 one, however, and I checked and there is indeed no such file in my pyaudio download.

    *Note: In case you noticed, I didn’t name my file ‘yikes’ because of your code(which is actually very nice by the way), it was just the word I thought of when naming the file.

    • Sophie

      Hi Robyn,

      Yep, still here!

      A couple things I can think of:
      – Wrong version of pyaudio or python, for the record I used Python2.7 and pyaudio-0.2.11 though people have said this works with Python3.x
      — How did you install pyaudio? I did it though pip install pyaudio
      – You’ve named another file pyaudio.py and it’s importing the wrong file: see https://stackoverflow.com/questions/13813164/python-import-random-error
      – You’re on a Windows machine, and I’ve only tested this code on Ubuntu

      In any case, you can get around the issue by replacing pyaudio.paInt16 with the integer 8 and it should get you past that problem.

      Hope this helps!

      • Robyn

        I tried paInt8 with no avail, so I am looking into downloading an eariler version of pydio if possible. For whatever reason, I can’t use pip install(I am using Sublime Text, if that makes any difference), so I downloaded it to my computer regularly and then imported it. The version of pydio I had was 8.0.2, and yes, I am on Windows 10.

  11. Vinay

    Hi Sophie,

    We don’t want to use microphone.We have a wav file which needs to converted into text.Can you please guide us to code on it ,

    Thanks,
    Vinay

    • Sophie

      Hey Vinay,

      The decode_phrase function on line 95 takes in a .wav file. Perhaps that’s what you’re looking for?

  12. Fred

    Hi,

    I am also getting new_Decoder returned -1 error.
    I made sure all the paths are setup correctly and followed the readme guides correctly (hopefully) for all the repositories. I am a Windows 7 user.

    Any help would be appreciated!

    • Sophie

      Hmm, that might be a problem because you’re using a different OS. I won’t be able to explicitly help you out, but you could try checking the CMU sphinx forums to see if someone else has successfully used the software…

      Otherwise, installing ubuntu is a viable option. It’s free, has a lot of community support, and is linux based which will help if you want to do more coding projects in the future. 😉

      • Fred

        Thanks for the reply!

        I actually solved the issue. I specified the path wrong for the dictionary.. Make sure you point to the correct file people!

        Thanks to your help on initial setup, I am now finished training pocketsphinx to recognize what I need and started implementing my application.

        Thank you so much Sophie!

  13. VINEETH KV

    hi sophie,
    Traceback (most recent call last):
    File “/home/vineeth/pycharm-community-2017.3/helpers/pydev/pydev_run_in_console.py”, line 37, in run_file
    pydev_imports.execfile(file, globals, locals) # execute the script
    File “/home/vineeth/PycharmProjects/main/new.py”, line 166, in
    sd = SpeechDetector()
    File “/home/vineeth/PycharmProjects/main/new.py”, line 50, in __init__
    self.decoder = Decoder(config)
    File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 324, in __init__
    this = _pocketsphinx.new_Decoder(*args)
    RuntimeError: new_Decoder returned -1

    the above are the compilation result
    how to correct the code in without error

    • Sophie

      Hi Vineeth,

      In the spirit of discovery, your problem seems similar to ones that others have already posted about in the past. Can you try their solutions?

  14. Klaus

    Hey Sopie (:

    I installed Alexa access n my raspberry pi. Everything is working right, it’s reacting to “Alexa”. Unfortunatelly it has to record the whole time to recognize “Alexa”. I would like to have an offline sst application, which recognize a special name (for e.g. Dave) and THEN start the Alexa application. Do you think it is possible with sphinx?

  15. Hi Sophie,

    Do i have to use python for this or I could just run the code directly on linux terminal because that’s what i did.

    I am trying to get pocketsphinx to index an audio file already on my machine and search for keyword within it.

    my code ;
    pocketsphinx_continuous -infile success.wav -hmm en-us -kws_threshold 1e-40 -keyphrase “success” -time yes

    error i got;
    INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’live’, VARNORM=’no’, AGC=’none’
    ERROR: “acmod.c”, line 79: Folder ‘en-us’ does not contain acoustic model definition ‘mdef’

    i checked pocketsphinx and I do have the mdef file.

  16. Steve

    So strange- I’ve been playing around with PocketSphinx myself, for much the same reasons. I happened across your blog looking for some assistance (found it here, BTW, thanks!).

    Lo and behold- the “previous post” link is to EOM for double pendulums, some other random thing I just happen to be playing around with in the last few weeks.

    Duly bookmarked and favorited!

    • Sophie

      Haha, I’m glad that you liked my posts! I have a pretty eclectic set of interests and that’s probably reflected in this blog.

  17. Maria Villalobos

    Is the transcription really working for you, though? I tried an example and I am not getting good results, any ideas?

    • Sophie

      Hmm, that’ll depend on a variety of factors. Inaccuracy could result from noise (either in environment or microphone quality) or in poor correlation of your speech against the data used to train the recognizer. If it’s not working, I would recommend checking the sound samples recorded by commenting out line 150, or retraining the recognizer.

  18. Alysa

    Hello Sophie, i’m getting an error:
    DETECTED: [‘‘, ‘ah’, ”]
    Listening …
    Traceback (most recent call last):
    File “sophie.py”, line 168, in
    sd.run()
    File “sophie.py”, line 135, in run
    cur_data = stream.read(self.CHUNK)
    File “/usr/local/lib/python2.7/site-packages/pyaudio.py”, line 608, in read
    return pa.read_stream(self._stream, num_frames, exception_on_overflow)
    IOError: [Errno -9981] Input overflowed

  19. Alysa

    hello, actually in my folder en-us, there isn’t any file named en-us, which is used in line 45. What is the file en-us?

  20. Chrishane

    This is what I’m getting as output after compiling

    INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
    INFO: ngram_search_fwdtree.c(74): Initializing search tree
    INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones
    INFO: ngram_search_fwdtree.c(186): Creating search channels
    INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152609
    INFO: ngram_search_fwdtree.c(333): Created 723 root, 152481 non-root channels, 53 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    Getting intensity values from mic.
    ALSA lib pcm_dsnoop.c:606:(snd_pcm_dsnoop_open) unable to open slave
    ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
    ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
    Cannot connect to server socket err = No such file or directory
    Cannot connect to server request channel
    jack server is not running or cannot be started
    JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
    JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
    Finished
    Average audio intensity is 668.187156541
    ALSA lib pcm_dsnoop.c:606:(snd_pcm_dsnoop_open) unable to open slave
    ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
    ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
    Cannot connect to server socket err = No such file or directory
    Cannot connect to server request channel
    jack server is not running or cannot be started
    JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
    JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
    * Mic set up and listening.

    Nothing happens after this.. I can find out the error..can you help me?

  21. Abinaya

    I’m getting an error saying that there is no module named pocketsphinx.pocketsphinx

    Why is that so ?

    the code and pocketsphinx are in the same directory only

    • Sophie

      Hi Abinaya,

      Have you followed the correct installation instructions on the individual repositories for the code? If you’re having import issues, its probably because the pip install didn’t fully work, or the sphinx packages don’t have the correct hierarchy. The physical layout of the folders must be such that:
      .
      ├── pocketsphinx/
      └── sphinxbase/
      └── stt.py

      Hope this helps!

    • Robyn

      Abinaya,

      I had the same problem, and simply getting rid of the second “.pocketsphinx” , so that it looked like:

      from pocketsphinx import *

  22. jim

    Hi Sophie,
    that’s a really nice program. Much shorter & tidier than i’d have expected to work with a monster like Sphinx.
    Well done.

    i’m on Ubuntu, with python 3+
    (& had to change lines 128, 130 & 153 – cast to int.)

    Here’s the (thankfully short) stack trace for an error i can’t get past:
    line 167, in
    sd.run()
    line 145, in run
    filename = self.save_speech(list(prev_audio) + audio2send, p)
    line 86, in save_speech
    data = ”.join(data)
    TypeError: sequence item 0: expected str instance, bytes found

    Any inspiration?

    • Sophie

      Hi Jim,

      Sorry for the delay. Have you tried casting data to a string before line 86? Haven’t tested this code on python 3+ yet, so the type casting might be weird.

  23. Shishira Shastri H

    Hi Sophie,

    Thanks for the code.
    When i run the file, it prints: * Mic set up and listening.
    And after that nothing happens… I tried printing the value of slid_win variable, it prints while the while loop runs infinitely…. Could you please tell me when the recording will be stopped ?
    Or is there a way to stop it ?

    • Sophie

      Hi Shishira,

      Sorry it took so long to get back to you. If you’re still having issues, I can think of a couple of places where your code could be erroring:

      1. Are you sure your microphone is connected to the computer and accessible by the program?
      2. On line 52, the setup_mic function sets a threshold noise level for the mic. Are you letting the microphone sit in a quiet environment when the code is first run so the correct threshold can be set?

      Hope this helps, good luck!

      • Gopi

        Hey I am getting this error.

        prev_audio = deque(maxlen=self.PREV_AUDIO * rel)
        TypeError: an integer is required

        I changed self.PREV_AUDIO = 1
        instead of 0.5
        Now no error, but having the above situation Shishira encountered.

        • Sophie

          Hey Gopi,

          You should be casting that entire expression to an integer, instead of changing self.PREV_AUDIO only. prev_audio = deque(maxlen=int(self.PREV_AUDIO * rel))

  24. Aji

    hello Sophie
    my model dir located in /home/pocketsphinx/model
    and inside en-us/en-us dir there is an mdef file.
    but when the program running, that mdef file is not detected.
    can you help me ?
    many thanks for you

    btw i use ubuntu 16.04 with python 2.7

    • Sophie

      Hi Aji,

      Have you made sure that the file path is correct in the program as well? That would be one lines 40 and 41 of the program.

  25. Ambrose Douglas

    Hi, so I got everything working fine. I’m just curious if anyone has had this work well enough for any practical use? If I could give my computer simple commands I would be very happy, but I can’t seem to get more words than simple ones like “you”, “it”, “are”, etc.

    Do I need to find a different model?

    any pointers would be awesome!

    • Sophie

      Hey Ambrose,

      What are you trying to get it to recognize? There are a couple ways to improve accuracy:

      1. Reduce the size of the recognition dictionary. IE: If you only need the STT engine to recognize a small set of words instead of the entire english language, you can increase accuracy by deleting words out of the dictionary that you don’t need. The location of the dictionary is found on line 47 in the code.

      2. Adapting the acoustic model to be more accurate to the sound of your voice. Instructions for that can be found here: http://cmusphinx.sourceforge.net/wiki/tutorialadapt

      Hope this helps,
      -Sophie

  26. Josef

    Sophie,
    You might want to look at io.BytesIO, instead of saving to a temporary file. This will keep the array in memory, even better you can pass the entire buffer to the recognizer bypassing the need to save it altogether.

    • Sophie

      Oh, very interesting! That does seem more efficient than saving to a temp file, I’ll keep it in mind for future iterations of this code.

  27. Rahul Vansh

    When I’m running this code, it shows below error please give me solution for this error…

    INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    ERROR: “acmod.c”, line 83: Folder ‘pocketsphinx/model/en-us/en-us’ does not contain acoustic model definition ‘mdef’
    Traceback (most recent call last):
    File “Test.py”, line 17, in
    decoder = pocketsphinx.Decoder(config)
    File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 266, in init
    this = _pocketsphinx.new_Decoder(*args)
    RuntimeError: new_Decoder returned -1

  28. john

    INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’live’, VARNORM=’no’, AGC=’none’
    ERROR: “acmod.c”, line 79: Folder ‘../../tools/pocketsphinx/model/en-us/en-us’ does not contain acoustic model definition ‘mdef’
    Traceback (most recent call last):
    File “stt.py”, line 166, in
    sd = SpeechDetector()
    File “stt.py”, line 50, in __init__
    self.decoder = Decoder(config)
    File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 332, in __init__
    this = _pocketsphinx.new_Decoder(*args)
    RuntimeError: new_Decoder returned -1

    Hello,
    I get this error. Any thoughts?

    • Sophie

      You can kinda see the problem in the error message:

      ERROR: “acmod.c”, line 79: Folder ‘../../tools/pocketsphinx/model/en-us/en-us’ does not contain acoustic model definition ‘mdef’

      You need to change lines 40 and 41 so the MODELDIR and DATADIR that refer to the actual location of the files.

      Hope it helps!

      • Cookiecrunch

        ERROR: “acmod.c”, line 83: Folder ‘C:\Python27\Lib\site-packages\pocketsphinx\model\en-us\en-us’ does not contain acoustic model definition ‘mdef’
        Traceback (most recent call last):
        File “sophierun.py”, line 326, in
        sd = SpeechDetector()
        File “sophierun.py”, line 94, in __init__
        self.decoder = Decoder(config)
        File “C:\Python27\lib\site-packages\pocketsphinx\pocketsphinx.py”, line 277, in __init__
        this = _pocketsphinx.new_Decoder(*args)
        RuntimeError: new_Decoder returned -1

        I have changed MODELDIR and DATADIR so that they refer the actual path of the files. Still I am getting this error. How do I rectify this?

        • Sophie

          Hmm, you’re running this code on a windows machine, so I can’t fully vouch that this code will work. I can think of two things.

          1. The physical layout of the folders must be such that:
          .
          ├── pocketsphinx/
          └── sphinxbase/
          └── stt.py
          Have you verified that?

          2. If you enter the C:\Python27\Lib\site-packages\pocketsphinx\model\en-us\en-us URL in your file explorer, does it actually take you to the folder where the mdef file can be found?

          Hope this helps, good luck!

  29. Hector

    I am working in Mac and although I install pocketsphinx by pip install it does not recognize me either pocketsphinx and sphinxbase. I do not have any folder with both but if I do pip freeze I see pocketsphinx

  30. renato gallo

    ./tardis.py
    Traceback (most recent call last):
    File “./tardis.py”, line 166, in
    sd = SpeechDetector()
    File “./tardis.py”, line 44, in __init__
    config = Decoder.default_config()
    AttributeError: type object ‘pocketsphinx.Decoder’ has no attribute ‘default_config’

    • Sophie

      Aside from a few changes to the print statements and such, the code should be python 3.4 compatible. It’s currently written for python 2.7 though.

      • Daryll

        I had a problem following the installation process of sphinx using Visual Studio I follow the instructions build it using Visual Studio 2015 but i got this error:
        TRACKER : error TRK0005: Failed to locate: “CL.exe”. The system cannot find the file specified

        • Sophie

          I wrote the above code for Ubuntu 14.04, so while it might work for UNIX based OS’ like OS X or other linux distros, I can’t say for sure how it would work with Windows.

          There are probably some libraries missing during the installation phase that aren’t covered in my installation instructions. You could try following the Windows install directions from the CMU Sphinx website directly to see if it’ll help with that issue. Here: http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx#windows

          • Daryll

            I have a hard time installing Sphinx on my windows 64 bit . 🙁 And i get this error:

            Traceback (most recent call last):
            File “C:\Python27\pocketsphnx.py”, line 1, in
            from pocketsphinx.pocketsphinx import *
            File “C:\Python27\lib\site-packages\pocketsphinx\__init__.py”, line 35, in
            from sphinxbase import *
            File “C:\Python27\lib\site-packages\sphinxbase\__init__.py”, line 32, in
            from .ad import *
            File “C:\Python27\lib\site-packages\sphinxbase\ad.py”, line 35, in
            _ad = swig_import_helper()
            File “C:\Python27\lib\site-packages\sphinxbase\ad.py”, line 34, in swig_import_helper
            return importlib.import_module(‘_ad’)
            File “C:\Python27\lib\importlib\__init__.py”, line 37, in import_module
            __import__(name)
            ImportError: No module named _ad

          • Sophie

            Sorry, I’ve never done installations on Windows, so I won’t be able to help you much on that. 🙁 My suggestion would be to dual-boot or run Ubuntu 14.04/16.04 on a virtual box so you’d be able to follow the instructions as is, or Google your error to see if other people have solved it before.

          • Daryll

            I am running on windows. I have followed the tutorial on how to install sphinxbase ang pocketsphinx . Downloaded Visual Studio 2012 express but still go this error : sphinx error; missing pocketsphinx module: ensure that pocketsphinx is set up correctly.

  31. David

    Sophie,
    I am working on a voice recognition project and came across your code base. Got it up and running with no problems but was wondering if you could provide some insight to the specifics of the INFO: outputs.

    I also noticed it transitions pretty quick from Listening… to Starting the recording… to Finishing the recording. Most of the time this seems to happen in the middle of testing speech recognition and I have to time when to speak. I also notice sometimes the output is just [SPEECH} other times just even though I was speaking and other times when there is no noise there is speech output being displayed.

    Below is some of the output.

    Listening …
    Starting recording of phrase
    Finished recording, decoding phrase
    INFO: cmn_live.c(88): Update from
    INFO: cmn_live.c(105): Update to
    INFO: cmn_live.c(88): Update from
    INFO: cmn_live.c(105): Update to
    INFO: cmn_live.c(120): Update from
    INFO: cmn_live.c(138): Update to
    INFO: ngram_search_fwdtree.c(1550): 24051 words recognized (32/fr)
    INFO: ngram_search_fwdtree.c(1552): 2808025 senones evaluated (3784/fr)
    INFO: ngram_search_fwdtree.c(1556): 19077058 channels searched (25710/fr), 489688 1st, 672158 last
    INFO: ngram_search_fwdtree.c(1559): 37370 words for which last channels evaluated (50/fr)
    INFO: ngram_search_fwdtree.c(1561): 1405708 candidate words for entering last phone (1894/fr)
    INFO: ngram_search_fwdtree.c(1564): fwdtree 6.07 CPU 0.818 xRT
    INFO: ngram_search_fwdtree.c(1567): fwdtree 6.09 wall 0.821 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 473 words
    INFO: ngram_search_fwdflat.c(948): 16143 words recognized (22/fr)
    INFO: ngram_search_fwdflat.c(950): 996651 senones evaluated (1343/fr)
    INFO: ngram_search_fwdflat.c(952): 1704946 channels searched (2297/fr)
    INFO: ngram_search_fwdflat.c(954): 83335 words searched (112/fr)
    INFO: ngram_search_fwdflat.c(957): 45835 word transitions (61/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.57 CPU 0.077 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.57 wall 0.077 xRT
    INFO: ngram_search.c(1250): lattice start node .0 end node .669
    INFO: ngram_search.c(1276): Eliminated 1 nodes before end node
    INFO: ngram_search.c(1381): Lattice has 2546 nodes, 23747 links
    INFO: ps_lattice.c(1380): Bestpath score: -24784
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:669:740) = -1342034
    INFO: ps_lattice.c(1441): Joint P(O,S) = -1492405 P(S|O) = -150371
    INFO: ngram_search.c(1027): bestpath 0.09 CPU 0.012 xRT
    INFO: ngram_search.c(1030): bestpath 0.09 wall 0.012 xRT
    (‘DETECTED: ‘, [‘‘, ‘[SPEECH]’, ”, ”, “what’s(2)”, ‘this’, ‘and(2)’, ‘he’, ”, ”, ‘[SPEECH]’, ”, ‘was(2)’, ‘‘])
    Listening …
    Starting recording of phrase
    Finished recording, decoding phrase
    INFO: cmn_live.c(88): Update from
    INFO: cmn_live.c(105): Update to
    INFO: cmn_live.c(88): Update from
    INFO: cmn_live.c(105): Update to
    INFO: cmn_live.c(88): Update from
    INFO: cmn_live.c(105): Update to
    INFO: cmn_live.c(88): Update from
    INFO: cmn_live.c(105): Update to
    INFO: cmn_live.c(120): Update from
    INFO: cmn_live.c(138): Update to
    INFO: ngram_search_fwdtree.c(1550): 39722 words recognized (37/fr)
    INFO: ngram_search_fwdtree.c(1552): 3494277 senones evaluated (3296/fr)
    INFO: ngram_search_fwdtree.c(1556): 22075849 channels searched (20826/fr), 576804 1st, 1109420 last
    INFO: ngram_search_fwdtree.c(1559): 60213 words for which last channels evaluated (56/fr)
    INFO: ngram_search_fwdtree.c(1561): 1900155 candidate words for entering last phone (1792/fr)
    INFO: ngram_search_fwdtree.c(1564): fwdtree 6.89 CPU 0.650 xRT
    INFO: ngram_search_fwdtree.c(1567): fwdtree 6.89 wall 0.650 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 705 words
    INFO: ngram_search_fwdflat.c(948): 24076 words recognized (23/fr)
    INFO: ngram_search_fwdflat.c(950): 1527037 senones evaluated (1441/fr)
    INFO: ngram_search_fwdflat.c(952): 2894866 channels searched (2731/fr)
    INFO: ngram_search_fwdflat.c(954): 142217 words searched (134/fr)
    INFO: ngram_search_fwdflat.c(957): 73064 word transitions (68/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.97 CPU 0.092 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.97 wall 0.092 xRT
    INFO: ngram_search.c(1250): lattice start node .0 end node .1055
    INFO: ngram_search.c(1276): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1381): Lattice has 3021 nodes, 31799 links
    INFO: ps_lattice.c(1380): Bestpath score: -40103
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:1055:1058) = -2135522
    INFO: ps_lattice.c(1441): Joint P(O,S) = -2386068 P(S|O) = -250546
    INFO: ngram_search.c(1027): bestpath 0.15 CPU 0.014 xRT
    INFO: ngram_search.c(1030): bestpath 0.15 wall 0.014 xRT
    (‘DETECTED: ‘, [‘‘, ‘i’, ‘have’, ‘a’, ‘somewhat(2)’, ‘is’, ‘the’, ‘weather’, ‘in’, ‘now’, ”, “it’s”, ”, ”, ”, ‘just’, ‘what(2)’, ‘‘])

    • Sophie

      Hey David!

      Glad you got the code working!

      The short delay is likely due to the setup_mic function on lines 52-77. While this code is being run, the mic records a sound sample for a while and sets the base threshold as the average amplitude of the sound sample. So, when the code is first being initiated, you’d want to have the microphone be in as close to “neutral” sound level as possible. You can alter the values in that function to tune the thresholding to be better suited to your methods.

      Since [SPEECH] is a placeholder for a sound that the recognizer couldn’t classify, you might want to listen to the sound samples that are being recorded to see if the results make any sense. You can comment out line 150 if you want to do that. Hope this helps!

      • David

        Thanks. Can you explain what affect either increasing or decreasing the 0.2 avg value will have along with the 3500 threshold value?

        I haven’t changed any of the default values yet but notice I see the following quite a bit:
        ERROR: “ngram_search.c”, line 1139: Couldn’t find in first frame

        I also see various tags when there is an output… such as or what is the significance of them and how can I prevent those tags from being displayed?

        Thanks

        • Sophie

          3500 is the minimum threshold value, so changing it will affect the minimum sound thresholds during the mic setup method (i.e if you’re recording in a really quiet environment you want some sort of threshold at least). If you change the 0.2 constant, the threshold will be determined from a larger average of amplitudes. So if your mic prone to random spikes in amplitude due to noise, it would be better to increase the constant.

          As for the tags, a single word may have different pronunciations. So when you see something like was(2), its likely referring to pronunciation 2 in the word dictionary. You could manually strip these tags using string comprehension.

    • Daryll

      Sir David,
      Can you show us how you do the voice recognition? I am new to python and I am planning on building my own AI. Hope you could help me thanks 🙂

      • David

        Daryll,

        I apologize for my use of ‘voice recognition’ I meant speech recognition… there is a big difference.

        I am not focusing on having the system differentiate between physical human speakers… my focus is on having the system correctly interpret and execute execute tasks based on human speech input.

        Sorry for any confusion.
        -David

  32. David

    Sophie,
    I’m running into an error and am curious to know if there is a pocketsphinx-python version that will run with python 3.0-3.5.

    I have existing 3.x functionality but when I attempt to incorporate pocketsphinx-python I get the following error:
    ImportError: //pocketsphinx-python/sphinxbase/_ad.so: undefined symbol: PyInstance_Type

    I’m an old Java programmer and when I say old, I’m talking about JDK version 1.4 and I’m not familiar with C. From what I’ve found based on searches I think the issue is with the python version the .so file was created against. If I change to version 2.x it will work but my existing code won’t.

    Any input would be appreciated.

  33. Harshit

    Hey Sophie,

    I am using ubuntu14.04 and python 2.7 and have installed pocketsphinx using `sudo apt-get install python-pocketsphinx` but i am getting the error: `no module named pocketsphinx` in the third line.

    Is there way out?

    • Sophie

      Hmm, did you follow the instructions in full on the github readme? You might need to use pip to get the correct paths set.

      There are several things that need to be installed for pocketsphinx to be imported correctly:
      sudo apt-get install -y python python-dev python-pip build-essential swig git
      sudo pip install pocketsphinx

      • Hasib

        While executing “build-essential swig git” the following error is shown
        build-essential: command not found
        but build-essential and swig all are installed

  34. Rob

    Hi Sophie, thanks so much for the share! I have some issues if you do not mind taking a look at:

    runtimeerror: new decoder returned -1

    Any ideas?

    Many thanks

    • Sophie

      Hi Rob,

      On an initial guess it may be because your folders are not organized correctly or you didn’t correctly install all the modules. The physical layout of the folders must be such that:
      .
      ├── pocketsphinx/
      └── sphinxbase/
      └── stt.py

      What operating system are you running this code on? I’ve only tested it on Ubuntu 14.04 using Python 2.7

  35. John

    hello,

    Im getting an invalid sample rate error when i run it with rate of 16000. It works with the default sampling rate of my mic 48000 but cannot recognize words. must be a pyaudio issue?. how do i configure it to work with this script?

    Traceback (most recent call last):
    File “/home/pi/stt.py”, line 174, in
    sd.run()
    File “/home/pi/stt.py”, line 121, in run
    self.setup_mic()
    File “/home/pi/stt.py”, line 70, in setup_mic
    frames_per_buffer=self.CHUNK)
    File “build/bdist.linux-armv7l/egg/pyaudio.py”, line 750, in open
    stream = Stream(self, *args, **kwargs)
    File “build/bdist.linux-armv7l/egg/pyaudio.py”, line 441, in __init__
    self._stream = pa.open(**arguments)
    IOError: [Errno -9997] Invalid sample rate

  36. Anup

    Hello,
    I’m getting the following error when i’m trying to run your script.

    slid_win = deque(maxlen=self.SILENCE_LIMIT * rel)
    TypeError: an integer is required

    Could you please help

    Thanks,
    Anup

    • Sophie

      If you’re using Python 3.0, rel isn’t automatically converted to an integer when self.RATE/self.CHUNK is calculated.

      Replace that line with this:
      slid_win = deque(maxlen=self.SILENCE_LIMIT * int(rel))

  37. Rodrigo

    Hi Sophie, as you said, I did try install again the packages.
    So, now, when i’m installing pocket…python, I have an error on final of the output:

    ~/pocketsphinx-python $ sudo python setup.py install
    running install
    running bdist_egg
    running egg_info
    writing pocketsphinx.egg-info/PKG-INFO
    writing top-level names to pocketsphinx.egg-info/top_level.txt
    writing dependency_links to pocketsphinx.egg-info/dependency_links.txt
    error: package directory ‘pocketsphinx/swig/python’ does not exist

    “error: package directory ‘pocketsphinx/swig/python’ does not exist” But this directory already exist, my directory structure is:

    /home/user/pocketsphinx
    /home/user/sphinxbase
    /home/user/pocketsphinx-python

    Is it wrong?

    If I try run the script I’m get this:

    $ python sample.py
    Traceback (most recent call last):
    File “sample.py”, line 3, in
    from pocketsphinx.pocketsphinx import *
    File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/__init__.py”, line 37, in
    from pocketsphinx import *
    File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 42, in
    _pocketsphinx = swig_import_helper()
    File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 38, in swig_import_helper
    _mod = imp.load_module(‘_pocketsphinx’, fp, pathname, description)
    ImportError: libpocketsphinx.so.3: cannot open shared object file: No such file or directory

    • Sophie

      For the first one: Is there a reason you’re not using pip? Since it’s the package installer for python, it might be easier to install pocketsphinx-python that way. Your directory structure is correct though. The error is saying you don’t have swig installed, did you follow all of the install instructions?

      sudo apt-get install -y python python-dev python-pip build-essential swig git
      sudo pip install pocketsphinx

      For the second: That’s the same issue you had previously right? That you solved by exporting the LD_LIBRARY_PATH?
      export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

      • Rodrigo

        I tryed pip and git clone..swig is already installed.
        I just forgot about export =/

        But, even do this I get errors:

        Traceback (most recent call last):
        File “sample.py”, line 162, in
        sd.run()
        File “sample.py”, line 109, in run
        self.setup_mic()
        File “sample.py”, line 58, in setup_mic
        frames_per_buffer=self.CHUNK)
        File “/usr/local/lib/python2.7/dist-packages/pyaudio.py”, line 750, in open
        stream = Stream(self, *args, **kwargs)
        File “/usr/local/lib/python2.7/dist-packages/pyaudio.py”, line 441, in __init__
        self._stream = pa.open(**arguments)
        IOError: [Errno -9996] Invalid input device (no default output device)

        • Sophie

          Okay, so that error is saying that it can’t find any of your microphones.
          You can check to see if the microphone is working outside of the script by following these instructions.

          If you have multiple input devices, you might have to modify line 117 so that PyAudio initializes with the correct microphone. You can take a look at the documentation here.

    • Rodrigo

      I’m getting this error:

      $ python sample.py
      Traceback (most recent call last):
      File “sample.py”, line 3, in
      from pocketsphinx.pocketsphinx import *
      File “sphinxbase.pxd”, line 150, in init pocketsphinx (pocketsphinx.c:7935)
      ValueError: PyCapsule_GetPointer called with invalid PyCapsule object

      • Sophie

        That appears to be an error internal to Python or Cython. I need a bit more information:

        What version of python are you using?
        Are you using Ubuntu? Or another operating system?

        On a first pass, it appears that you’ll need to do a reinstall after ensuring that you’ve configured Cython correctly. Maybe these instructions can help?

        • Rodrigo

          Hi, thanks for you answer!!
          After try your tip, I get this error:

          $ python teste.py
          Traceback (most recent call last):
          File “teste.py”, line 3, in
          from pocketsphinx.pocketsphinx import *
          File “sphinxbase.pxd”, line 150, in init pocketsphinx (pocketsphinx.c:7934)
          File “/usr/local/lib/python2.7/dist-packages/sphinxbase/__init__.py”, line 37, in
          from sphinxbase import *
          File “/usr/local/lib/python2.7/dist-packages/sphinxbase/sphinxbase.py”, line 42, in
          _sphinxbase = swig_import_helper()
          File “/usr/local/lib/python2.7/dist-packages/sphinxbase/sphinxbase.py”, line 38, in swig_import_helper
          _mod = imp.load_module(‘_sphinxbase’, fp, pathname, description)
          ImportError: libsphinxbase.so.3: cannot open shared object file: No such file or directory

          My Python version is:
          $ python –version
          Python 2.7.6

          I’m using Linux Mint:
          $ uname -a
          Linux LinuxMint 3.19.0-32-generic #37~14.04.1-Ubuntu SMP Thu Oct 22 09:41:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

          Thank you so much.

          • Rodrigo

            Ok, this last one I resolved with this:
            $ export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

            But now i’m getting this:

            $ python teste.py
            Traceback (most recent call last):
            File “teste.py”, line 3, in
            from pocketsphinx.pocketsphinx import *
            File “sphinxbase.pxd”, line 150, in init pocketsphinx (pocketsphinx.c:7934)
            ValueError: sphinxbase.NGramModel has the wrong size, try recompiling

            =/ this is script does not like me

          • Sophie

            That does seem to be a Cython issue, and that would be internal to the pocketsphinx library–not the script that I posted. (You can tell because it’s failing at the import step before getting to any of the actual code :P)

            Have you tried uninstalling all the sphinx libraries and reinstalling? It’s kind of annoying, but it might fix your problem.

  38. Amitava

    Hi,
    I get the following error when run the above code (F5)

    Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:19:22) [MSC v.1500 32 bit (Intel)] on win32
    Type “copyright”, “credits” or “license()” for more information.
    >>>
    =================== RESTART: C:\Sphinx\project\stt\stt.py ===================

    Traceback (most recent call last):
    File “C:\Sphinx\project\stt\stt.py”, line 166, in
    sd = SpeechDetector()
    File “C:\Sphinx\project\stt\stt.py”, line 50, in __init__
    self.decoder = Decoder(config)
    File “C:\Python27\lib\site-packages\pocketsphinx\pocketsphinx.py”, line 277, in __init__
    this = _pocketsphinx.new_Decoder(*args)
    RuntimeError: new_Decoder returned -1
    >>>

    • Sophie

      I’ll need a bit more info, but on an initial guess it’s because your folders are not organized correctly or you didn’t correctly install all the modules. The physical layout of the folders must be such that:
      .
      ├── pocketsphinx/
      └── sphinxbase/

      Also, you’re running this code on a Windows machine. I’ve only tested this code on Linux, so no guarantees that it will work on a different operating system because the drivers and system architecture is different.

      • Amitava

        The folders are organized as
        C:\Sphinx\project\stt
        ├── pocketsphinx/
        └── sphinxbase/
        └── stt.py

        where stt.py is the above source file.
        I manually copied the all 7 files (pocketsphinx.dll, pocketsphinx.dll, .. etc.) from C:\Sphinx\pocketsphinx\bin\Release\Win32 to the above directory:
        C:\Sphinx\project\stt\pocketsphinx

        Did similar thing for sphinxbase also.

        I used “pip install pocketsphinx” to install pocketsphinx, and the Installation was successful:
        Collecting pocketsphinx
        Downloading pocketsphinx-0.1.3-cp27-cp27m-win32.whl (29.0MB)
        100% |################################| 29.0MB 47kB/s
        Installing collected packages: pocketsphinx
        Successfully installed pocketsphinx-0.1.3

        The imports were also fine
        from pocketsphinx.pocketsphinx import *
        from sphinxbase.sphinxbase import *

        Could it be a 64 bit vs 32 bit issue?

        • Sophie

          I am tempted to believe that it is the 64bit vs 32bit — I’m not sure how well Sphinx works on a Windows computer, but you could install Ubuntu on a virtual machine and run this code on Linux that way.

          • Amitava

            Hi Sophie,

            Thanks for your time and feedback. But you think my folder structure and the way I filled the pocketsphinx and sphinxbase folders above are correct, right?

          • Sophie

            Yep, they look alright to me! If you have a chance, try the installation process via an Ubuntu install (either as a native OS or as a virtual OS).

Leave a Reply to Sophie Cancel reply

Your email address will not be published. Required fields are marked *