Python speech to text with PocketSphinx

I’ve wanted to use speech detection in my personal projects for the longest time, but the Google API has gradually gotten more and more restrictive as time passes. In order to ensure that my projects could work even without an internet connection, I looked for another speech recognition package that would preferably be easier to use. I found the Sphinx voice recognition suite of CMU to be a really great speech to text package. However, documentation and sample code is non-existent, so it took me forever to get anything done. Finally, I’ve figured it out! The example code is at the bottom of this post, but you can directly download it from Github here.

Here are the steps to take to get this working:

  1. Download SphinxBase and follow the install instructions
  2. Download PocketSphinx and follow the install instructions
  3. Download PocketSphinx-python and follow the install instructions
  4. Run the code below


The main problems I had with setting up PocketSphinx was the myriad of libraries that the main site told me to download. However, after lots of trial and error, I’ve realized that I really only need three.

  • SphinxBase is the base package that all of the other Sphinx programs use
  • PocketSphinx is the lightweight recognizer, since I was okay with the program being a bit inaccurate if it meant I could decode phrases faster
  • PocketSphinx-python is the wrapper to allow us to program in the best scripting language ever.

The code basically sets up the microphone and saves each phrase detected as a temporary .wav file which the Sphinx decoder then translates into a list of strings representing the spoken words. A phrase is defined as a bunch of sound sandwiched by duration of silence. I stole most of the phrase detection code from someone else two years ago, though unfortunately, I can’t remember who. If you’re reading this, thank you! πŸ™‚

Anyhow, in the initialization of the run loop, we first define what the minimum threshold should be in defining “silence”. Then we launch into an infinitely running loop that will continue to listen to sounds over the microphone, calling the Sphinx decoder whenever a phrase has been saved. A sliding average is used as well during phrase detection, to make things a bit more accurate. You can load different voice recognition models into the decoder config if you want this speech recognition code to work for different languages.

Now that I have this speech detection code in a neat little importable class, I’m really excited about future capabilities of my projects. So many ideas, so little time!
-Sophie

86 Comments

  1. Klaus

    Hey Sopie (:

    I installed Alexa access n my raspberry pi. Everything is working right, it’s reacting to “Alexa”. Unfortunatelly it has to record the whole time to recognize “Alexa”. I would like to have an offline sst application, which recognize a special name (for e.g. Dave) and THEN start the Alexa application. Do you think it is possible with sphinx?

  2. Hi Sophie,

    Do i have to use python for this or I could just run the code directly on linux terminal because that’s what i did.

    I am trying to get pocketsphinx to index an audio file already on my machine and search for keyword within it.

    my code ;
    pocketsphinx_continuous -infile success.wav -hmm en-us -kws_threshold 1e-40 -keyphrase “success” -time yes

    error i got;
    INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’live’, VARNORM=’no’, AGC=’none’
    ERROR: “acmod.c”, line 79: Folder ‘en-us’ does not contain acoustic model definition ‘mdef’

    i checked pocketsphinx and I do have the mdef file.

  3. Steve

    So strange- I’ve been playing around with PocketSphinx myself, for much the same reasons. I happened across your blog looking for some assistance (found it here, BTW, thanks!).

    Lo and behold- the “previous post” link is to EOM for double pendulums, some other random thing I just happen to be playing around with in the last few weeks.

    Duly bookmarked and favorited!

    • Sophie

      Haha, I’m glad that you liked my posts! I have a pretty eclectic set of interests and that’s probably reflected in this blog.

  4. Maria Villalobos

    Is the transcription really working for you, though? I tried an example and I am not getting good results, any ideas?

    • Sophie

      Hmm, that’ll depend on a variety of factors. Inaccuracy could result from noise (either in environment or microphone quality) or in poor correlation of your speech against the data used to train the recognizer. If it’s not working, I would recommend checking the sound samples recorded by commenting out line 150, or retraining the recognizer.

  5. Alysa

    Hello Sophie, i’m getting an error:
    DETECTED: [‘‘, ‘ah’, ”]
    Listening …
    Traceback (most recent call last):
    File “sophie.py”, line 168, in
    sd.run()
    File “sophie.py”, line 135, in run
    cur_data = stream.read(self.CHUNK)
    File “/usr/local/lib/python2.7/site-packages/pyaudio.py”, line 608, in read
    return pa.read_stream(self._stream, num_frames, exception_on_overflow)
    IOError: [Errno -9981] Input overflowed

  6. Alysa

    hello, actually in my folder en-us, there isn’t any file named en-us, which is used in line 45. What is the file en-us?

  7. Chrishane

    This is what I’m getting as output after compiling

    INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
    INFO: ngram_search_fwdtree.c(74): Initializing search tree
    INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones
    INFO: ngram_search_fwdtree.c(186): Creating search channels
    INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152609
    INFO: ngram_search_fwdtree.c(333): Created 723 root, 152481 non-root channels, 53 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    Getting intensity values from mic.
    ALSA lib pcm_dsnoop.c:606:(snd_pcm_dsnoop_open) unable to open slave
    ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
    ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
    Cannot connect to server socket err = No such file or directory
    Cannot connect to server request channel
    jack server is not running or cannot be started
    JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
    JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
    Finished
    Average audio intensity is 668.187156541
    ALSA lib pcm_dsnoop.c:606:(snd_pcm_dsnoop_open) unable to open slave
    ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
    ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
    ALSA lib pcm_dmix.c:1029:(snd_pcm_dmix_open) unable to open slave
    Cannot connect to server socket err = No such file or directory
    Cannot connect to server request channel
    jack server is not running or cannot be started
    JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
    JackShmReadWritePtr::~JackShmReadWritePtr – Init not done for 4294967295, skipping unlock
    * Mic set up and listening.

    Nothing happens after this.. I can find out the error..can you help me?

  8. Abinaya

    I’m getting an error saying that there is no module named pocketsphinx.pocketsphinx

    Why is that so ?

    the code and pocketsphinx are in the same directory only

    • Sophie

      Hi Abinaya,

      Have you followed the correct installation instructions on the individual repositories for the code? If you’re having import issues, its probably because the pip install didn’t fully work, or the sphinx packages don’t have the correct hierarchy. The physical layout of the folders must be such that:
      .
      β”œβ”€β”€ pocketsphinx/
      └── sphinxbase/
      └── stt.py

      Hope this helps!

  9. jim

    Hi Sophie,
    that’s a really nice program. Much shorter & tidier than i’d have expected to work with a monster like Sphinx.
    Well done.

    i’m on Ubuntu, with python 3+
    (& had to change lines 128, 130 & 153 – cast to int.)

    Here’s the (thankfully short) stack trace for an error i can’t get past:
    line 167, in
    sd.run()
    line 145, in run
    filename = self.save_speech(list(prev_audio) + audio2send, p)
    line 86, in save_speech
    data = ”.join(data)
    TypeError: sequence item 0: expected str instance, bytes found

    Any inspiration?

    • Sophie

      Hi Jim,

      Sorry for the delay. Have you tried casting data to a string before line 86? Haven’t tested this code on python 3+ yet, so the type casting might be weird.

  10. Shishira Shastri H

    Hi Sophie,

    Thanks for the code.
    When i run the file, it prints: * Mic set up and listening.
    And after that nothing happens… I tried printing the value of slid_win variable, it prints while the while loop runs infinitely…. Could you please tell me when the recording will be stopped ?
    Or is there a way to stop it ?

    • Sophie

      Hi Shishira,

      Sorry it took so long to get back to you. If you’re still having issues, I can think of a couple of places where your code could be erroring:

      1. Are you sure your microphone is connected to the computer and accessible by the program?
      2. On line 52, the setup_mic function sets a threshold noise level for the mic. Are you letting the microphone sit in a quiet environment when the code is first run so the correct threshold can be set?

      Hope this helps, good luck!

      • Gopi

        Hey I am getting this error.

        prev_audio = deque(maxlen=self.PREV_AUDIO * rel)
        TypeError: an integer is required

        I changed self.PREV_AUDIO = 1
        instead of 0.5
        Now no error, but having the above situation Shishira encountered.

  11. Aji

    hello Sophie
    my model dir located in /home/pocketsphinx/model
    and inside en-us/en-us dir there is an mdef file.
    but when the program running, that mdef file is not detected.
    can you help me ?
    many thanks for you

    btw i use ubuntu 16.04 with python 2.7

    • Sophie

      Hi Aji,

      Have you made sure that the file path is correct in the program as well? That would be one lines 40 and 41 of the program.

  12. Ambrose Douglas

    Hi, so I got everything working fine. I’m just curious if anyone has had this work well enough for any practical use? If I could give my computer simple commands I would be very happy, but I can’t seem to get more words than simple ones like “you”, “it”, “are”, etc.

    Do I need to find a different model?

    any pointers would be awesome!

    • Sophie

      Hey Ambrose,

      What are you trying to get it to recognize? There are a couple ways to improve accuracy:

      1. Reduce the size of the recognition dictionary. IE: If you only need the STT engine to recognize a small set of words instead of the entire english language, you can increase accuracy by deleting words out of the dictionary that you don’t need. The location of the dictionary is found on line 47 in the code.

      2. Adapting the acoustic model to be more accurate to the sound of your voice. Instructions for that can be found here: http://cmusphinx.sourceforge.net/wiki/tutorialadapt

      Hope this helps,
      -Sophie

  13. Josef

    Sophie,
    You might want to look at io.BytesIO, instead of saving to a temporary file. This will keep the array in memory, even better you can pass the entire buffer to the recognizer bypassing the need to save it altogether.

    • Sophie

      Oh, very interesting! That does seem more efficient than saving to a temp file, I’ll keep it in mind for future iterations of this code.

  14. Rahul Vansh

    When I’m running this code, it shows below error please give me solution for this error…

    INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’current’, VARNORM=’no’, AGC=’none’
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    ERROR: “acmod.c”, line 83: Folder ‘pocketsphinx/model/en-us/en-us’ does not contain acoustic model definition ‘mdef’
    Traceback (most recent call last):
    File “Test.py”, line 17, in
    decoder = pocketsphinx.Decoder(config)
    File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 266, in init
    this = _pocketsphinx.new_Decoder(*args)
    RuntimeError: new_Decoder returned -1

  15. john

    INFO: feat.c(715): Initializing feature stream to type: ‘1s_c_d_dd’, ceplen=13, CMN=’live’, VARNORM=’no’, AGC=’none’
    ERROR: “acmod.c”, line 79: Folder ‘../../tools/pocketsphinx/model/en-us/en-us’ does not contain acoustic model definition ‘mdef’
    Traceback (most recent call last):
    File “stt.py”, line 166, in
    sd = SpeechDetector()
    File “stt.py”, line 50, in __init__
    self.decoder = Decoder(config)
    File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 332, in __init__
    this = _pocketsphinx.new_Decoder(*args)
    RuntimeError: new_Decoder returned -1

    Hello,
    I get this error. Any thoughts?

    • Sophie

      You can kinda see the problem in the error message:

      ERROR: β€œacmod.c”, line 79: Folder β€˜../../tools/pocketsphinx/model/en-us/en-us’ does not contain acoustic model definition β€˜mdef’

      You need to change lines 40 and 41 so the MODELDIR and DATADIR that refer to the actual location of the files.

      Hope it helps!

      • Cookiecrunch

        ERROR: “acmod.c”, line 83: Folder ‘C:\Python27\Lib\site-packages\pocketsphinx\model\en-us\en-us’ does not contain acoustic model definition ‘mdef’
        Traceback (most recent call last):
        File “sophierun.py”, line 326, in
        sd = SpeechDetector()
        File “sophierun.py”, line 94, in __init__
        self.decoder = Decoder(config)
        File “C:\Python27\lib\site-packages\pocketsphinx\pocketsphinx.py”, line 277, in __init__
        this = _pocketsphinx.new_Decoder(*args)
        RuntimeError: new_Decoder returned -1

        I have changed MODELDIR and DATADIR so that they refer the actual path of the files. Still I am getting this error. How do I rectify this?

        • Sophie

          Hmm, you’re running this code on a windows machine, so I can’t fully vouch that this code will work. I can think of two things.

          1. The physical layout of the folders must be such that:
          .
          β”œβ”€β”€ pocketsphinx/
          └── sphinxbase/
          └── stt.py
          Have you verified that?

          2. If you enter the C:\Python27\Lib\site-packages\pocketsphinx\model\en-us\en-us URL in your file explorer, does it actually take you to the folder where the mdef file can be found?

          Hope this helps, good luck!

  16. Hector

    I am working in Mac and although I install pocketsphinx by pip install it does not recognize me either pocketsphinx and sphinxbase. I do not have any folder with both but if I do pip freeze I see pocketsphinx

  17. renato gallo

    ./tardis.py
    Traceback (most recent call last):
    File “./tardis.py”, line 166, in
    sd = SpeechDetector()
    File “./tardis.py”, line 44, in __init__
    config = Decoder.default_config()
    AttributeError: type object ‘pocketsphinx.Decoder’ has no attribute ‘default_config’

    • Sophie

      Aside from a few changes to the print statements and such, the code should be python 3.4 compatible. It’s currently written for python 2.7 though.

      • Daryll

        I had a problem following the installation process of sphinx using Visual Studio I follow the instructions build it using Visual Studio 2015 but i got this error:
        TRACKER : error TRK0005: Failed to locate: β€œCL.exe”. The system cannot find the file specified

        • Sophie

          I wrote the above code for Ubuntu 14.04, so while it might work for UNIX based OS’ like OS X or other linux distros, I can’t say for sure how it would work with Windows.

          There are probably some libraries missing during the installation phase that aren’t covered in my installation instructions. You could try following the Windows install directions from the CMU Sphinx website directly to see if it’ll help with that issue. Here: http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx#windows

          • Daryll

            I have a hard time installing Sphinx on my windows 64 bit . πŸ™ And i get this error:

            Traceback (most recent call last):
            File “C:\Python27\pocketsphnx.py”, line 1, in
            from pocketsphinx.pocketsphinx import *
            File “C:\Python27\lib\site-packages\pocketsphinx\__init__.py”, line 35, in
            from sphinxbase import *
            File “C:\Python27\lib\site-packages\sphinxbase\__init__.py”, line 32, in
            from .ad import *
            File “C:\Python27\lib\site-packages\sphinxbase\ad.py”, line 35, in
            _ad = swig_import_helper()
            File “C:\Python27\lib\site-packages\sphinxbase\ad.py”, line 34, in swig_import_helper
            return importlib.import_module(‘_ad’)
            File “C:\Python27\lib\importlib\__init__.py”, line 37, in import_module
            __import__(name)
            ImportError: No module named _ad

          • Sophie

            Sorry, I’ve never done installations on Windows, so I won’t be able to help you much on that. πŸ™ My suggestion would be to dual-boot or run Ubuntu 14.04/16.04 on a virtual box so you’d be able to follow the instructions as is, or Google your error to see if other people have solved it before.

          • Daryll

            I am running on windows. I have followed the tutorial on how to install sphinxbase ang pocketsphinx . Downloaded Visual Studio 2012 express but still go this error : sphinx error; missing pocketsphinx module: ensure that pocketsphinx is set up correctly.

  18. David

    Sophie,
    I am working on a voice recognition project and came across your code base. Got it up and running with no problems but was wondering if you could provide some insight to the specifics of the INFO: outputs.

    I also noticed it transitions pretty quick from Listening… to Starting the recording… to Finishing the recording. Most of the time this seems to happen in the middle of testing speech recognition and I have to time when to speak. I also notice sometimes the output is just [SPEECH} other times just even though I was speaking and other times when there is no noise there is speech output being displayed.

    Below is some of the output.

    Listening …
    Starting recording of phrase
    Finished recording, decoding phrase
    INFO: cmn_live.c(88): Update from
    INFO: cmn_live.c(105): Update to
    INFO: cmn_live.c(88): Update from
    INFO: cmn_live.c(105): Update to
    INFO: cmn_live.c(120): Update from
    INFO: cmn_live.c(138): Update to
    INFO: ngram_search_fwdtree.c(1550): 24051 words recognized (32/fr)
    INFO: ngram_search_fwdtree.c(1552): 2808025 senones evaluated (3784/fr)
    INFO: ngram_search_fwdtree.c(1556): 19077058 channels searched (25710/fr), 489688 1st, 672158 last
    INFO: ngram_search_fwdtree.c(1559): 37370 words for which last channels evaluated (50/fr)
    INFO: ngram_search_fwdtree.c(1561): 1405708 candidate words for entering last phone (1894/fr)
    INFO: ngram_search_fwdtree.c(1564): fwdtree 6.07 CPU 0.818 xRT
    INFO: ngram_search_fwdtree.c(1567): fwdtree 6.09 wall 0.821 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 473 words
    INFO: ngram_search_fwdflat.c(948): 16143 words recognized (22/fr)
    INFO: ngram_search_fwdflat.c(950): 996651 senones evaluated (1343/fr)
    INFO: ngram_search_fwdflat.c(952): 1704946 channels searched (2297/fr)
    INFO: ngram_search_fwdflat.c(954): 83335 words searched (112/fr)
    INFO: ngram_search_fwdflat.c(957): 45835 word transitions (61/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.57 CPU 0.077 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.57 wall 0.077 xRT
    INFO: ngram_search.c(1250): lattice start node .0 end node .669
    INFO: ngram_search.c(1276): Eliminated 1 nodes before end node
    INFO: ngram_search.c(1381): Lattice has 2546 nodes, 23747 links
    INFO: ps_lattice.c(1380): Bestpath score: -24784
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:669:740) = -1342034
    INFO: ps_lattice.c(1441): Joint P(O,S) = -1492405 P(S|O) = -150371
    INFO: ngram_search.c(1027): bestpath 0.09 CPU 0.012 xRT
    INFO: ngram_search.c(1030): bestpath 0.09 wall 0.012 xRT
    (‘DETECTED: ‘, [‘‘, ‘[SPEECH]’, ”, ”, “what’s(2)”, ‘this’, ‘and(2)’, ‘he’, ”, ”, ‘[SPEECH]’, ”, ‘was(2)’, ‘‘])
    Listening …
    Starting recording of phrase
    Finished recording, decoding phrase
    INFO: cmn_live.c(88): Update from
    INFO: cmn_live.c(105): Update to
    INFO: cmn_live.c(88): Update from
    INFO: cmn_live.c(105): Update to
    INFO: cmn_live.c(88): Update from
    INFO: cmn_live.c(105): Update to
    INFO: cmn_live.c(88): Update from
    INFO: cmn_live.c(105): Update to
    INFO: cmn_live.c(120): Update from
    INFO: cmn_live.c(138): Update to
    INFO: ngram_search_fwdtree.c(1550): 39722 words recognized (37/fr)
    INFO: ngram_search_fwdtree.c(1552): 3494277 senones evaluated (3296/fr)
    INFO: ngram_search_fwdtree.c(1556): 22075849 channels searched (20826/fr), 576804 1st, 1109420 last
    INFO: ngram_search_fwdtree.c(1559): 60213 words for which last channels evaluated (56/fr)
    INFO: ngram_search_fwdtree.c(1561): 1900155 candidate words for entering last phone (1792/fr)
    INFO: ngram_search_fwdtree.c(1564): fwdtree 6.89 CPU 0.650 xRT
    INFO: ngram_search_fwdtree.c(1567): fwdtree 6.89 wall 0.650 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 705 words
    INFO: ngram_search_fwdflat.c(948): 24076 words recognized (23/fr)
    INFO: ngram_search_fwdflat.c(950): 1527037 senones evaluated (1441/fr)
    INFO: ngram_search_fwdflat.c(952): 2894866 channels searched (2731/fr)
    INFO: ngram_search_fwdflat.c(954): 142217 words searched (134/fr)
    INFO: ngram_search_fwdflat.c(957): 73064 word transitions (68/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.97 CPU 0.092 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.97 wall 0.092 xRT
    INFO: ngram_search.c(1250): lattice start node .0 end node .1055
    INFO: ngram_search.c(1276): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1381): Lattice has 3021 nodes, 31799 links
    INFO: ps_lattice.c(1380): Bestpath score: -40103
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:1055:1058) = -2135522
    INFO: ps_lattice.c(1441): Joint P(O,S) = -2386068 P(S|O) = -250546
    INFO: ngram_search.c(1027): bestpath 0.15 CPU 0.014 xRT
    INFO: ngram_search.c(1030): bestpath 0.15 wall 0.014 xRT
    (‘DETECTED: ‘, [‘‘, ‘i’, ‘have’, ‘a’, ‘somewhat(2)’, ‘is’, ‘the’, ‘weather’, ‘in’, ‘now’, ”, “it’s”, ”, ”, ”, ‘just’, ‘what(2)’, ‘‘])

    • Sophie

      Hey David!

      Glad you got the code working!

      The short delay is likely due to the setup_mic function on lines 52-77. While this code is being run, the mic records a sound sample for a while and sets the base threshold as the average amplitude of the sound sample. So, when the code is first being initiated, you’d want to have the microphone be in as close to “neutral” sound level as possible. You can alter the values in that function to tune the thresholding to be better suited to your methods.

      Since [SPEECH] is a placeholder for a sound that the recognizer couldn’t classify, you might want to listen to the sound samples that are being recorded to see if the results make any sense. You can comment out line 150 if you want to do that. Hope this helps!

      • David

        Thanks. Can you explain what affect either increasing or decreasing the 0.2 avg value will have along with the 3500 threshold value?

        I haven’t changed any of the default values yet but notice I see the following quite a bit:
        ERROR: “ngram_search.c”, line 1139: Couldn’t find in first frame

        I also see various tags when there is an output… such as or what is the significance of them and how can I prevent those tags from being displayed?

        Thanks

        • Sophie

          3500 is the minimum threshold value, so changing it will affect the minimum sound thresholds during the mic setup method (i.e if you’re recording in a really quiet environment you want some sort of threshold at least). If you change the 0.2 constant, the threshold will be determined from a larger average of amplitudes. So if your mic prone to random spikes in amplitude due to noise, it would be better to increase the constant.

          As for the tags, a single word may have different pronunciations. So when you see something like was(2), its likely referring to pronunciation 2 in the word dictionary. You could manually strip these tags using string comprehension.

    • Daryll

      Sir David,
      Can you show us how you do the voice recognition? I am new to python and I am planning on building my own AI. Hope you could help me thanks πŸ™‚

      • David

        Daryll,

        I apologize for my use of ‘voice recognition’ I meant speech recognition… there is a big difference.

        I am not focusing on having the system differentiate between physical human speakers… my focus is on having the system correctly interpret and execute execute tasks based on human speech input.

        Sorry for any confusion.
        -David

  19. David

    Sophie,
    I’m running into an error and am curious to know if there is a pocketsphinx-python version that will run with python 3.0-3.5.

    I have existing 3.x functionality but when I attempt to incorporate pocketsphinx-python I get the following error:
    ImportError: //pocketsphinx-python/sphinxbase/_ad.so: undefined symbol: PyInstance_Type

    I’m an old Java programmer and when I say old, I’m talking about JDK version 1.4 and I’m not familiar with C. From what I’ve found based on searches I think the issue is with the python version the .so file was created against. If I change to version 2.x it will work but my existing code won’t.

    Any input would be appreciated.

  20. Harshit

    Hey Sophie,

    I am using ubuntu14.04 and python 2.7 and have installed pocketsphinx using `sudo apt-get install python-pocketsphinx` but i am getting the error: `no module named pocketsphinx` in the third line.

    Is there way out?

    • Sophie

      Hmm, did you follow the instructions in full on the github readme? You might need to use pip to get the correct paths set.

      There are several things that need to be installed for pocketsphinx to be imported correctly:
      sudo apt-get install -y python python-dev python-pip build-essential swig git
      sudo pip install pocketsphinx

  21. Rob

    Hi Sophie, thanks so much for the share! I have some issues if you do not mind taking a look at:

    runtimeerror: new decoder returned -1

    Any ideas?

    Many thanks

    • Sophie

      Hi Rob,

      On an initial guess it may be because your folders are not organized correctly or you didn’t correctly install all the modules. The physical layout of the folders must be such that:
      .
      β”œβ”€β”€ pocketsphinx/
      └── sphinxbase/
      └── stt.py

      What operating system are you running this code on? I’ve only tested it on Ubuntu 14.04 using Python 2.7

  22. John

    hello,

    Im getting an invalid sample rate error when i run it with rate of 16000. It works with the default sampling rate of my mic 48000 but cannot recognize words. must be a pyaudio issue?. how do i configure it to work with this script?

    Traceback (most recent call last):
    File “/home/pi/stt.py”, line 174, in
    sd.run()
    File “/home/pi/stt.py”, line 121, in run
    self.setup_mic()
    File “/home/pi/stt.py”, line 70, in setup_mic
    frames_per_buffer=self.CHUNK)
    File “build/bdist.linux-armv7l/egg/pyaudio.py”, line 750, in open
    stream = Stream(self, *args, **kwargs)
    File “build/bdist.linux-armv7l/egg/pyaudio.py”, line 441, in __init__
    self._stream = pa.open(**arguments)
    IOError: [Errno -9997] Invalid sample rate

  23. Anup

    Hello,
    I’m getting the following error when i’m trying to run your script.

    slid_win = deque(maxlen=self.SILENCE_LIMIT * rel)
    TypeError: an integer is required

    Could you please help

    Thanks,
    Anup

    • Sophie

      If you’re using Python 3.0, rel isn’t automatically converted to an integer when self.RATE/self.CHUNK is calculated.

      Replace that line with this:
      slid_win = deque(maxlen=self.SILENCE_LIMIT * int(rel))

  24. Rodrigo

    Hi Sophie, as you said, I did try install again the packages.
    So, now, when i’m installing pocket…python, I have an error on final of the output:

    ~/pocketsphinx-python $ sudo python setup.py install
    running install
    running bdist_egg
    running egg_info
    writing pocketsphinx.egg-info/PKG-INFO
    writing top-level names to pocketsphinx.egg-info/top_level.txt
    writing dependency_links to pocketsphinx.egg-info/dependency_links.txt
    error: package directory ‘pocketsphinx/swig/python’ does not exist

    “error: package directory ‘pocketsphinx/swig/python’ does not exist” But this directory already exist, my directory structure is:

    /home/user/pocketsphinx
    /home/user/sphinxbase
    /home/user/pocketsphinx-python

    Is it wrong?

    If I try run the script I’m get this:

    $ python sample.py
    Traceback (most recent call last):
    File “sample.py”, line 3, in
    from pocketsphinx.pocketsphinx import *
    File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/__init__.py”, line 37, in
    from pocketsphinx import *
    File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 42, in
    _pocketsphinx = swig_import_helper()
    File “/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py”, line 38, in swig_import_helper
    _mod = imp.load_module(‘_pocketsphinx’, fp, pathname, description)
    ImportError: libpocketsphinx.so.3: cannot open shared object file: No such file or directory

    • Sophie

      For the first one: Is there a reason you’re not using pip? Since it’s the package installer for python, it might be easier to install pocketsphinx-python that way. Your directory structure is correct though. The error is saying you don’t have swig installed, did you follow all of the install instructions?

      sudo apt-get install -y python python-dev python-pip build-essential swig git
      sudo pip install pocketsphinx

      For the second: That’s the same issue you had previously right? That you solved by exporting the LD_LIBRARY_PATH?
      export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

      • Rodrigo

        I tryed pip and git clone..swig is already installed.
        I just forgot about export =/

        But, even do this I get errors:

        Traceback (most recent call last):
        File “sample.py”, line 162, in
        sd.run()
        File “sample.py”, line 109, in run
        self.setup_mic()
        File “sample.py”, line 58, in setup_mic
        frames_per_buffer=self.CHUNK)
        File “/usr/local/lib/python2.7/dist-packages/pyaudio.py”, line 750, in open
        stream = Stream(self, *args, **kwargs)
        File “/usr/local/lib/python2.7/dist-packages/pyaudio.py”, line 441, in __init__
        self._stream = pa.open(**arguments)
        IOError: [Errno -9996] Invalid input device (no default output device)

        • Sophie

          Okay, so that error is saying that it can’t find any of your microphones.
          You can check to see if the microphone is working outside of the script by following these instructions.

          If you have multiple input devices, you might have to modify line 117 so that PyAudio initializes with the correct microphone. You can take a look at the documentation here.

    • Rodrigo

      I’m getting this error:

      $ python sample.py
      Traceback (most recent call last):
      File “sample.py”, line 3, in
      from pocketsphinx.pocketsphinx import *
      File “sphinxbase.pxd”, line 150, in init pocketsphinx (pocketsphinx.c:7935)
      ValueError: PyCapsule_GetPointer called with invalid PyCapsule object

      • Sophie

        That appears to be an error internal to Python or Cython. I need a bit more information:

        What version of python are you using?
        Are you using Ubuntu? Or another operating system?

        On a first pass, it appears that you’ll need to do a reinstall after ensuring that you’ve configured Cython correctly. Maybe these instructions can help?

        • Rodrigo

          Hi, thanks for you answer!!
          After try your tip, I get this error:

          $ python teste.py
          Traceback (most recent call last):
          File “teste.py”, line 3, in
          from pocketsphinx.pocketsphinx import *
          File “sphinxbase.pxd”, line 150, in init pocketsphinx (pocketsphinx.c:7934)
          File “/usr/local/lib/python2.7/dist-packages/sphinxbase/__init__.py”, line 37, in
          from sphinxbase import *
          File “/usr/local/lib/python2.7/dist-packages/sphinxbase/sphinxbase.py”, line 42, in
          _sphinxbase = swig_import_helper()
          File “/usr/local/lib/python2.7/dist-packages/sphinxbase/sphinxbase.py”, line 38, in swig_import_helper
          _mod = imp.load_module(‘_sphinxbase’, fp, pathname, description)
          ImportError: libsphinxbase.so.3: cannot open shared object file: No such file or directory

          My Python version is:
          $ python –version
          Python 2.7.6

          I’m using Linux Mint:
          $ uname -a
          Linux LinuxMint 3.19.0-32-generic #37~14.04.1-Ubuntu SMP Thu Oct 22 09:41:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

          Thank you so much.

          • Rodrigo

            Ok, this last one I resolved with this:
            $ export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

            But now i’m getting this:

            $ python teste.py
            Traceback (most recent call last):
            File “teste.py”, line 3, in
            from pocketsphinx.pocketsphinx import *
            File “sphinxbase.pxd”, line 150, in init pocketsphinx (pocketsphinx.c:7934)
            ValueError: sphinxbase.NGramModel has the wrong size, try recompiling

            =/ this is script does not like me

          • Sophie

            That does seem to be a Cython issue, and that would be internal to the pocketsphinx library–not the script that I posted. (You can tell because it’s failing at the import step before getting to any of the actual code :P)

            Have you tried uninstalling all the sphinx libraries and reinstalling? It’s kind of annoying, but it might fix your problem.

  25. Amitava

    Hi,
    I get the following error when run the above code (F5)

    Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:19:22) [MSC v.1500 32 bit (Intel)] on win32
    Type “copyright”, “credits” or “license()” for more information.
    >>>
    =================== RESTART: C:\Sphinx\project\stt\stt.py ===================

    Traceback (most recent call last):
    File “C:\Sphinx\project\stt\stt.py”, line 166, in
    sd = SpeechDetector()
    File “C:\Sphinx\project\stt\stt.py”, line 50, in __init__
    self.decoder = Decoder(config)
    File “C:\Python27\lib\site-packages\pocketsphinx\pocketsphinx.py”, line 277, in __init__
    this = _pocketsphinx.new_Decoder(*args)
    RuntimeError: new_Decoder returned -1
    >>>

    • Sophie

      I’ll need a bit more info, but on an initial guess it’s because your folders are not organized correctly or you didn’t correctly install all the modules. The physical layout of the folders must be such that:
      .
      β”œβ”€β”€ pocketsphinx/
      └── sphinxbase/

      Also, you’re running this code on a Windows machine. I’ve only tested this code on Linux, so no guarantees that it will work on a different operating system because the drivers and system architecture is different.

      • Amitava

        The folders are organized as
        C:\Sphinx\project\stt
        β”œβ”€β”€ pocketsphinx/
        └── sphinxbase/
        └── stt.py

        where stt.py is the above source file.
        I manually copied the all 7 files (pocketsphinx.dll, pocketsphinx.dll, .. etc.) from C:\Sphinx\pocketsphinx\bin\Release\Win32 to the above directory:
        C:\Sphinx\project\stt\pocketsphinx

        Did similar thing for sphinxbase also.

        I used “pip install pocketsphinx” to install pocketsphinx, and the Installation was successful:
        Collecting pocketsphinx
        Downloading pocketsphinx-0.1.3-cp27-cp27m-win32.whl (29.0MB)
        100% |################################| 29.0MB 47kB/s
        Installing collected packages: pocketsphinx
        Successfully installed pocketsphinx-0.1.3

        The imports were also fine
        from pocketsphinx.pocketsphinx import *
        from sphinxbase.sphinxbase import *

        Could it be a 64 bit vs 32 bit issue?

        • Sophie

          I am tempted to believe that it is the 64bit vs 32bit — I’m not sure how well Sphinx works on a Windows computer, but you could install Ubuntu on a virtual machine and run this code on Linux that way.

          • Amitava

            Hi Sophie,

            Thanks for your time and feedback. But you think my folder structure and the way I filled the pocketsphinx and sphinxbase folders above are correct, right?

          • Sophie

            Yep, they look alright to me! If you have a chance, try the installation process via an Ubuntu install (either as a native OS or as a virtual OS).

Leave a Reply

Your email address will not be published. Required fields are marked *