2025, Oct 24 07:00

Resolve the ALSpeechRecognition 'modifiable_grammar' Error on Pepper with a Language Reinit

Seeing A grammar named 'modifiable_grammar' already exists on Pepper (NAOqi)? Fix it: swap languages, call ALSpeechRecognition setLanguage, then setVocabulary.

When building speech-driven interactions for Pepper on NAOqi, a deceptively simple call can derail the whole flow: setting a fresh vocabulary on ALSpeechRecognition. On repeated runs, some developers hit a runtime crash that reads A grammar named "modifiable_grammar" already exists. Below is a concise walkthrough of what triggers it, how to reproduce it, and a practical fix that has proven effective in real-world use.

Reproducing the issue with a minimal interaction flow

The following example drives a Pepper interaction loop on Python 2.7. It configures ALSpeechRecognition, dynamically prepares a vocabulary, and subscribes to capture the user’s response. Names are illustrative, but the behavior remains the same as the typical setup that hits the error.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
from naoqi import ALProxy
import time

LEXICON_READY_KEY = "MyApp/VocabInitialized"
ALT_LEXICONS = []

class PepperDialogAgent:
    def __init__(self, host="localhost", port=9559):
        self.host = host
        self.port = port
        self.tts_srv = None
        self.asr_srv = None
        self.mem_srv = None
        self._connect_services()

    def _connect_services(self):
        try:
            self.tts_srv = ALProxy("ALTextToSpeech", self.host, self.port)
            self.asr_srv = ALProxy("ALSpeechRecognition", self.host, self.port)
            self.mem_srv = ALProxy("ALMemory", self.host, self.port)
            self.asr_srv.setLanguage("Spanish")
            print("Robot services ready")
        except Exception as e:
            print("Failed to connect: {}".format(e))
            print("Make sure the robot is on, IP is correct, and you are on the same network.")
            raise

    def prompt_user(self, prompt_text, choices_map, wait_limit=15, conf_min=0.5):
        if not self.tts_srv or not self.asr_srv or not self.mem_srv:
            print("Initialization error")
            return (None, None)
        try:
            candidate_terms = list(choices_map.keys())
            self._configure_lexicon(candidate_terms)
            self.tts_srv.say(prompt_text)
            self.asr_srv.subscribe("DynASRSession")
            print("Listening...")
            print("Expected choices: {}".format(", ".join(candidate_terms)))
            self.mem_srv.insertData("WordRecognized", [])
            detected = self._await_input(wait_limit, conf_min)
            if detected:
                return self._map_result(detected, choices_map)
            else:
                self.tts_srv.say("No pude escuchar tu respuesta. Intenta hablar más claro.")
                return (None, None)
        except Exception as e:
            print("Interaction error: {}".format(e))
            return (None, None)
        finally:
            try:
                self.asr_srv.unsubscribe("DynASRSession")
            except:
                pass

    def _configure_lexicon(self, responses_list):
        try:
            self.asr_srv.pause(True)
            terms = []
            for r in responses_list:
                terms.append(r.lower())
                terms.append(r.upper())
                terms.append(r.capitalize())
            terms = list(set(terms))
            self.asr_srv.setVocabulary(terms, False)
            self.asr_srv.pause(False)
            print("Vocabulary loaded: {}".format(terms))
        except Exception as e:
            print("Vocabulary setup failed")
            print(e)
            raise e

    def _await_input(self, wait_limit, conf_min):
        picked = False
        t0 = time.time()
        time.sleep(1.0)
        while not picked and (time.time() - t0) < wait_limit:
            wr = self.mem_srv.getData("WordRecognized")
            if wr and len(wr) > 1:
                heard = wr[0]
                conf = wr[1]
                print("Heard '{}' with confidence {:.2f}".format(heard, conf))
                if conf > conf_min:
                    print("Accepted: {}".format(heard))
                    self.mem_srv.insertData("WordRecognized", [])
                    return heard
                else:
                    print("Confidence too low ({:.2f}), continuing...".format(conf))
                    self.mem_srv.insertData("WordRecognized", [])
            time.sleep(0.1)
        return None

    def _map_result(self, heard_word, choices_map):
        lowered = heard_word.lower()
        for key, payload in choices_map.items():
            if lowered == key.lower():
                msg = payload.get("text", "")
                val = payload.get("value", 0)
                print("User said '{}' - returning: ('{}', {})".format(heard_word, msg, val))
                return (msg, val)
        print("No mapping for: '{}'".format(heard_word))
        return (None, None)


def prompt_pepper_user(prompt_text, choices_map, host="localhost", port=9559, wait_limit=15):
    agent = PepperDialogAgent(host, port)
    return agent.prompt_user(prompt_text, choices_map, wait_limit)


if __name__ == "__main__":
    print("\n=== Example: Multiple options ===")
    q2 = "¿Qué te gustaría hacer? Puedes decir: bailar, cantar o hablar."
    options2 = {
        "bailar": {"text": "¡Perfecto! Vamos a bailar juntos.", "value": 1},
        "cantar": {"text": "¡Qué divertido! Me encanta cantar.", "value": 2},
        "hablar": {"text": "Excelente, podemos tener una buena conversación.", "value": 3}
    }
    txt, val = prompt_pepper_user(q2, options2)
    print("Result: ('{}', {})".format(txt, val))
    if val == 1:
        print("Activating dance mode...")
    elif val == 2:
        print("Activating singing mode...")
    elif val == 3:
        print("Activating conversation mode...")
    else:
        print("No answer recognized")

    print("\n=== Example: Yes/No ===")
    q1 = "¿Tienes alguna otra pregunta?"
    options1 = {
        "si": {"text": "¡Vamos!", "value": 1},
        "no": {"text": "Entiendo. Ha sido un placer. No dudes en volver a consultarme.", "value": 0}
    }
    txt, val = prompt_pepper_user(q1, options1)
    print("Result: ('{}', {})".format(txt, val))
    if txt:
        helper = PepperDialogAgent()
        helper.tts_srv.say(txt)

What is happening under the hood

The reported failure is explicit: the speech stack throws A grammar named "modifiable_grammar" already exists. In practice this tends to show up on the second run of an interaction, right when a new vocabulary is pushed. The platform signals that a grammar resource with that name is still present, and attempting to add another one with the same identifier fails. There is also an important constraint in the official API reference that is easy to miss: setLanguage must not be called at the same time as another ALSpeechRecognition or ALDialog method. Keeping that in mind clarifies two things. Changing the language should be sequenced away from any active recognition, and resetting the language context can help ALSpeechRecognition reinitialize its internal grammar state.

The practical fix: swap languages, then set your target language and vocabulary

A reliable way to recover from the modifiable_grammar conflict is to force a clean language reinitialization before you set a new vocabulary and start recognition. The pattern is straightforward. Temporarily switch between installed languages, then set the language you intend to use, then load the vocabulary and subscribe. The example below follows this order and keeps all calls strictly before recognition starts.

def init_asr_pipeline(asr_proxy, target_language, keyword_list):
    asr_proxy.setLanguage("German")
    asr_proxy.setLanguage("English")

    asr_proxy.setLanguage(target_language)
    asr_proxy.setVocabulary(keyword_list, False)
    asr_proxy.setAudioExpression(True)
    asr_proxy.setVisualExpression(True)

    asr_proxy.pause(False)
    asr_proxy.subscribe("speech_recognition")

Applied to the earlier interaction flow, the important part is to perform the language swap and final setLanguage right before you load the vocabulary and subscribe. If you are pausing the recognizer while altering the vocabulary, keep that pause sequence, then resume and only then subscribe. This keeps the API usage in line with the documentation note about not mixing setLanguage with other speech calls at the same time.

Why this detail matters

Voice-driven experiences tend to be retried multiple times in a session, especially during iterative testing or demo loops. If the speech stack holds on to a previously created grammar object and you blindly push a new setVocabulary, you risk hitting the same collision unpredictably. Explicitly reinitializing the language context gives you a deterministic reset point and makes repeated runs stable. It also reduces the time spent chasing non-deterministic behavior that looks like a random failure but is actually the engine protecting an existing grammar resource.

Wrapping up

If you see A grammar named "modifiable_grammar" already exists when calling ALSpeechRecognition.setVocabulary, reframe the setup sequence. Ensure no active recognition is running while you manipulate language and vocabulary, leverage a brief language swap to refresh the context, set the desired language last, then apply your keywords and subscribe. Following this order tends to make Pepper’s speech recognition behave consistently across repeated runs without touching the rest of your interaction logic.

The article is based on a question from StackOverflow by Manuel and an answer by DrR955.

pepper python robot