-
-
Notifications
You must be signed in to change notification settings - Fork 47
General Information About Alexa Development
This is basically a restatement of the guide at https://moduscreate.com/blog/build-an-alexa-skill-with-python-and-aws-lambda/ with some additional clarification. It traces the path of execution as a user interacts with our skill.
-
User issues voice command to Echo by saying, "Alexa" followed by a skill name and an intent. The intent may have parameters.
In this case, something like: "Alexa, ask Boston Info when is trash day?"
skill name : Boston Info intent : find trash days parameters : 1 Main Street apartment 2
We will give this intent a name, TrashDayIntent. You can see this in the intent schema.
-
Echo sends the request to the Alexa Service Platform.
This handles the speech recognition and translates the above voice command to a JSON document containing the intent and any parameters.
This JSON is sent to the skill (Boston Info in this example).
intent : trashday parameter : "1 Main Street apartment 2"
-
The skill receives the JSON.
We're implementing the skill as an AWS Lambda, so the JSON will be sent to the Lambda function at the ARN associated with the skill name.
-
The Lambda contains custom code that parses the JSON to identify the intent and corresponding arguments (in this example, the address).
The code then gathers data for the response. In this case that means a call to data.boston.gov to get the string of trash days associated with the provided address. Alternately this might mean accessing a database or session information.
This response data is serialized in a JSON response, which is returned to the Alexa Service Platform. It contains the response both as text for Alexa to say and as text/images for the smartphone app to display.
-
The Alexa Service Platform receives the response and conveys to the user using text-to-speech or the app display.
This communication paradigm is shown below.
Note: See Deploying Your Skill to learn how we currently push code to our development environments.
Because the python code in Boston Info's Lambda function uses external libraries, it must be uploaded as a .zip file.
To generate this .zip file, we must install all of the required Python packages in a directory that contains our code. Amazon provides instructions on how to do so: https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html
Once all the requisite libraries are installed, compress the contents of the directory. The instructions note:
Important: Zip the directory content, not the directory. The contents of the Zip file are available as the current working directory of the Lambda function.
Recall that in Part 2 of the installation instructions we set the Handler to lambda_function.lambda_handler. This is specifying the function that is executed when a voice command is issued to the Alexa device. If we compress the containing directory instead of its contents, this code is not available.
Below we've listed some basic vocabulary that will be useful to know when working on an Alexa skill.
From Amazon's documentation:
Amazon Resource Names (ARNs) uniquely identify AWS resources. We require an ARN when you need to specify a resource unambiguously across all of AWS, such as in IAM policies, Amazon Relational Database Service (Amazon RDS) tags, and API calls. https://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html#arn-syntax-lambda
Our skill will be stored in an AWS Lambda function, which we can identify by its ARN.
Slots are used for intents that require parameters. Each slot must have:
- a name (a string describing the slot)
- a type (this can be a type preconfigured by Amazon or a custom type)
The preconfigured slot type for a street address is described here.
Information on defining a custom slot type is available here.
Alexa needs a list of phrases that correspond to each of our skill's intents.
We provide this in the settings for our skill in the developer console (see Part 1 of installation).
The format for the list of sample utterances is
[intent] [phrase]
The phrase may contain a reference to a slot if there is one associated with the intent it invokes. The format for this is
{slot_name}
Example of a sample utterance:
SetAddressIntent my address is {Address}
Our skill will invoke an Amazon Lambda function. This is where the code that produces a response to the Alexa voice command resides.
There are several language options for this code, including Javascript (Node.js), Java, C#, and Python(2.7 or 3.6).
Selecting Python we are provided the following template:
def lambda_handler(event, context):
# TODO implement
return 'Hello from Lambda'
The event argument is the JSON received from the Alexa platform. It contains the intent and slot information from the voice command.
Structure of the event object:
session
sessionId: [session id],
application
applicationId: [application id]
attributes: {},
user
userId: [user id]
new: true
request:
type: [request type, e.g., IntentRequest]
requestId: [request id]
timestamp: [timestamp]
intent
name: [name of the invoked intent]
slots:
[slot name]
name: [name of the slot]
value: [value of the slot]
locale: "en-US"
version: "1.0"
The elements of this event object are discussed in detail at: https://developer.amazon.com/docs/custom-skills/request-and-response-json-reference.html.