5. The Amazon Alexa Service
Supported by two powerful frameworks that leverage public APIs
Lives In the Cloud
Automated Speech Recognition (ASR)
Natural Language Understanding (NLU)
Always Improving
6. The Amazon Alexa Service
Supported by two powerful frameworks that leverage open APIs
Lives In the Cloud
Automated Speech Recognition (ASR)
Natural Language Understanding (NLU)
Always Improving
Alexa Skills Kit
(ASK)
Create Great Content
ASK is how you connect
to your consumer
7. The Amazon Alexa Service
Supported by two powerful frameworks that leverage open APIs
Lives In the Cloud
Automated Speech Recognition (ASR)
Natural Language Understanding (NLU)
Always Improving
Alexa Skills Kit
(ASK)
Create Great Content
ASK is how you connect
to your consumer
Alexa Voice Service
(AVS)
Unparalleled Distribution
AVS allows your content
to be everywhere
8. Skills built using ASK
Tools that make it fast & easy for you to build skills
31. Alexa, open space facts
open, begin, start, launch, ask, tell
Wake Word Starting Phrase Skill invocation
Name
32. Alexa, ask space facts for trivia
UtteranceWake Word Skill invocation NameStarting Phrase
33. Alexa, ask space facts for trivia
tell me something
give me information
a fact
give me trivia
UtteranceWake Word Skill invocation NameStarting Phrase
53. Built-in Intents
A library of intents for
common actions.
Amazon provides training
data, but they can be
augmented.
AMAZON.CancelIntent
AMAZON.HelpIntent
AMAZON.StopIntent
AMAZON.NextIntent
AMAZON.NoIntent
AMAZON.RepeatIntent
AMAZON.StartOverIntent
AMAZON.ShuffleOnIntent
AMAZON.YesIntent
REQUIRED FOR
CERTIFICATION
54. Communicating with the endpoint
Your endpoint needs to receive and react to a JSON object
55. The Endpoint
Must be Internet-accessible
Adhere to ASK service interface
- JSON
Web service or AWS Lambda
Uses HTTP over SSL/TLS
- port 443
56. Communicating with the Endpoint
Request body:
• session: Information about the
current conversation
• request: Describes the user input
57. Communicating with the Endpoint
Response body:
• outputSpeech: Alexa’s response
• card: (optional) graphical response
• reprompt: (optional) reminder
• shouldEndSession: used to end or
keep session open
66. Remember to Check-In
• Ask the instructor for the link
• You’ll get a confirmation email with details to
earn free perks from Amazon
Editor's Notes
Hello, and welcome to Alexa Workshop.
Alexa Family
Echo
The Echo is the first and best-known endpoint for Alexa
Amazon launched the Amazon Echo in 2014. Echo is really a hands-free speaker with far-field voice recognition, which means you can just talk to it from across the room.
The Echo is the first and best-known endpoint of the Alexa Ecosystem. We released Echo in 2014 to allow customers to engage with Alexa and control their home via voice. Alexa and The Echo device was built to make life easier and more enjoyable.
Echo Dot:
is a hands-free, voice-controlled device that uses the same far-field voice recognition as Amazon Echo. Dot has a small built-in speaker—it can also connect to your speakers over Bluetooth or with the included audio cable.
The Echo and the Echo Dot are what we call far-field Alexa devices. You interact with them in a completely hand’s free way from anywhere in the room…even if that room is noisy.
The difference between Echo and Echo Dot is simple: Echo has a powerful built-in speaker that provides room filling sound.
Echo Dot is smaller and contains a less powerful speaker and works great when connected to another audio system. Both include the same 7 microphone mic-array with advanced beam-forming and noise cancelling technology and are otherwise functionally identical
Amazon Tap: Alexa is also available other Amazon devices including Tap, our a portable battery powered speaker
Other
Alexa is available on Amazon’s Fire Tablets, and Amazon shopping apps on mobile.
Alexa is also available on Fire TV via the push-to talk remote control that comes with it.
What is Alexa
It is a cloud based service that handles all the speech recognition, machine learning, and Natural language understanding for all Alexa enabled devices.
Since it lives in the cloud, is always getting smarter, it’s constantly improving and learning. The more you use it, the more it adapts to your speech patterns, vocabulary, and personal preferences.
And because Alexa takes all her intelligence from the cloud, new updates and features are delivered automatically.
We’re now interacting with technology in the most natural way possible – by talking.
http://alexa.design/video
There are so many possibilities when it comes to Alexa, and we are really excited about it. With Alexa, we are building a cloud-based voice service that’s free to all developers, companies, and hobbyists. Best of all, you don’t need a background in NLU or speech recognition to build great voice experiences for your customers.
Alexa is supported by two sets of APIs & SDKs -
Alexa Skills Kit (ASK) is an SDK that allows you to build custom skills that customers can voice enable on all Amazon Alexa products. Many of our customers who build their own smart home products with Alexa also create complementary skills that can be accessed in the Skills Storefront.
Alexa Voice Service (AVS): is a set of APIs and developer tools that you can use to build Alexa into your product, whether you’re in the automotive, smart home, or home audio industry.
--
On one side we have ASK (Alexa Skills Kit) – an API that allows you as a developer to add more capabilities to Alexa. So when we released Alexa, she’s didn’t have the capability to order an Uber, or order a pizza from Dominos. But what we did was that we opened up the API so these companies could build skills that create rich voice experiences for their customers; We now have over 12000 skills that have been published today, and we expect to see a lot more of these in the future.
All you Have to Do Is ASK (What is the Alexa Skills Kit?)
The ASK is our SDK, read human….our way of making the voice experience via Alexa possible.
ASK gives you the ability to create new voice-driven capabilities (also known as skills, think Apps) for Alexa using the new Alexa Skills Kit (ASK).
You can connect existing services to Alexa in minutes with just a few lines of code.
You can also build entirely new voice-powered experiences in a matter of hours, even if you know nothing about speech recognition or natural language processing.
On the other side is AVS (Alexa Voice Service), - set of APIs that allow you to integrate Alexa in to your devices and apps. So think cars, microwaves, refrigerator, speaker or the likes. As long as your device has a microphone, speaker and internet connection, you can integrate Alexa. In fact we recently released a Raspberry Pi version of Alexa using the AVS APIs.
AVS: Serving a Platform Agnostic Voice Experience
It’s through the Alexa Voice Service that, device makers and hardware manufacturers can incorporate an Alexa-driven voice experience into their devices.
Any device that has a speaker, a microphone, and an Internet connection can integrate Alexa.
Just imagine what that means. You can picture everything from a car to a microwave to a pen, and more...all enabled to deliver an experience by voice
Both ASK and AVS are completely free to use. Here’s a rule of thumb to understanding what feature set makes sense for your use case:
You can add your product to Alexa through the Alexa Skills Kit (ASK)
Or, you can add Alexa to your product with the Alexa Voice Service (AVS),
Let’s switch gears now and talk a bit about Skills – which is really capabilities that Alexa has.
What is a Skill?
Skills are how you, as a developer, make Alexa smarter. They give customers new experiences. They’re the voice-first apps for Alexa. When we launched Echo, Alexa could do the basics - weather, music, read the news, but now you can Lyft, Dominos etc.
There are two kinds of skills
built in skills (like playing music, weather forecast, general knowledge questions) and
custom skills that you as developers can build.
Building skills using Alexa Skills Kit (ASK)
The way you build skills is by using the Alexa Skills Kit.
The Alexa Skills Kit is a collection of self-service APIs, tools, documentation and code samples that make it fast and easy for you to add skills to Alexa. Thousands of developers are building skills to expand Alexa’s capabilities.
We launched the Alexa Skills Kit so anyone can develop Skills for Alexa, at no cost.
Very similar to Apps on your phone, except that nothing gets installed on the device.
What can you do with ASK
You can connect existing services to Alexa
You can also build entirely new voice-powered experiences in a matter of hours, even if you know nothing about speech recognition or natural language processing.
When Alexa launched in the US , it had dozens of capabilities or skills, and has now thousands of capabilities.
You can now say “Alexa, ask Lyft for a Lyft Line to work”
Or
Alexa, ask Capital One what did I spend?
Or
Alexa, tell Starbucks to start my order
The free Amazon Alexa App is a companion to your Alexa device for setup, remote control, and enhanced features. Alexa is always ready to play your favorite music, provide weather and news updates, answer questions, create lists, and much more.
You can also visit amazon.com/skill to view the complete catalog of skills.
Let’s see the Fact Skill in action before we start building it. Talk to Alexa – quick demo of the fact skill.
“Alexa, open space facts”
You can also say –
Alexa, tell me a fact, or
Alexa, give me a space fact
Wake Word - A command that the user says to tell Alexa that they want to talk to her. Example: “Alexa, ask History Buff what happened on December seventh.” Here, “Alexa” is the wake word. Alexa users can select from a defined set of wake words.
Starting Phrase – open, ask, begin, play, start, talk, tell etc.
Invocation Name: A name that represents the custom skill the user wants to use. Users say a skill’s invocation name to begin an interaction with a particular custom skill.
For example, if the invocation name is “Daily Horoscopes”, users can say:User: Alexa, ask Daily Horoscopes for the horoscope for Gemini
You must say the name of the skill as part of the user utterance. That’s the way Alexa can map it to the appropriate skill. It’s like launching a mobile app. You have to open the app to use the specific functionality.
Much like the web and mobile apps, there are two pieces to building an Alexa skill.
Alexa skills have two parts:
Configuration data in Amazon Developer Portal (Frontend)
Hosted Service responding to user requests (Backend)
Alexa skills have two parts:
Configuration data in Amazon Developer Portal (Frontend): done at developer.amazon.com
Hosted Service responding to user requests (Backend): we’ll be using AWS Lambda as our backend, so we’ll do this at aws.amazon.com
Work in progress: This slide will be tweaked.
Create VUI Interaction Model. Front-end = Skill Info + Interaction Model
Lambda – Your code or your hosted service backend
Connect VUI to code
Testing
Customization – Make it your own
Publish it
Create VUI Interaction Model (Front End)
Skill Info + Interaction Model
Create AWS Lambda Function: Your code or your hosted service backend
Connect VUI to the Lambda Function
Testing
Customization: Make it your own
Publish it
Create VUI Interaction Model (Front End)
Skill Info + Interaction Model
Create AWS Lambda Function: Your code or your hosted service backend
Connect VUI to the Lambda Function
Testing
Customization: Make it your own
Publish it
Create VUI Interaction Model (Front End)
Skill Info + Interaction Model
Create AWS Lambda Function: Your code or your hosted service backend
Connect VUI to the Lambda Function
Testing
Customization: Make it your own
Publish it
GitHub templates
A typical Alexa project on GitHub has the following structure:
/SpeechAssets
Provides the VUI or the Front End for the skill
Meant to go inside your skill at developer.amazon.com
/src
Provides the code for the skill
Meant to go into your Lambda Function at aws.amazon.com
About this Skill
This sample covers the basics of skill building. It delivers random facts or quotes and serves as a very simple example.
You can also customize your fact skill with your favorite topic.
Concepts you will learn with this skill
Intents and Intent Schema
Sample Utterance
Generating a randomized response from Alexa
We’ll be switching between these as we build our skill
Visit echosim.io, and login using Amazon
As a developer you are never asked to work with audio or raw text coming from the user.
You receive a JSON object that was generated by the Alexa Service, this is how it works.
This is “bird’s eye” view of a user interacting with a custom skill through an Echo.
We will go into further detail latter in this presentation, but it’s important to remember that Alexa and all skill’s code live in the cloud.
In order to understand what a user says, we first have to turn sounds into words.
This process is called speech recognition.
In this example we have the phonetic spelling for three sounds.
Let’s see what words these could form.
Forty times? Maybe the user wants to multiply something by forty
For tea times? Is the user searching for good times to have some tea?
For Tee Times?
Does the user want to play golf?
Or does the user want to play a lot of golf
Having a Natural Language Understanding Engine on top of speech recognition allows us to go from words to meanings.
We can also train this engine using utterances and slots to map user input with high accuracy.
The way we train the NLU engine is by using sample utterances, that we associate to an intent
Each intent define a specific behavior your skill can take, like buttons on a web page, they take user input and execute some code based on it.
Let’s take a step-by-step look at how user input, in the form of spoken word (audio), is turned into a JSON object that our code can read and respond to.
- The first thing that has to happen is for the device to “wake up” when it hears the correct word.
- Once the device is awake it’ll stream all the audio to the Alexa Service hosted by AWS in the cloud.
- Alexa devices like the Amazon Echo, or the Echo dot feature microphone arrays, these allow us to capture high quality audio, by using beam forming and canceling background noise.
Once the audio reaches the Alexa Service, it is converted into a JSON object, based on the meaning of the words the user spoke.
This JSON object is easy to read from any programing language and contains enough information to allow us to respond to the user’s input.
Your code just has to return a (properly formatted) JSON object to the Alexa Service and the service will take care of turning it into audio and routing it to the correct user and device.
Your response can contain plain text, Speech Synthesis Markup Language (SSM) and references to audio files to be played.
Intents are the behaviors your skill can take.
Sample Utterances are training data used to map user input to each behavior.
The name of an intent is what connects everything together.
Here we can see the intent schema &sample utterances side-to-side.
As you can see, the thing they have in common is the name of the intent.
Here is an example of a JSON object that would get sent to your code.
Here we can see that there is an intent component that has a name and it exactly matches what we had in our intent schema
Since we are using the alexa-sdk in our code we define an handler for an event that matches the intent name.
The intent name connects everything together, the intent schema, the training data, the JSON object and the code.
Along with custom intents, Amazon provides a series of “built in” intents you can leverage, these intents don’t require any training data.
The 3 highlighted intents are required for skill publishing.
We use the term endpoint to describe your code along with were it’s hosted.
You can leverage any programming language and hosting technology to build your endpoint.
The only requirement is that you securely receive and send JSON in the correct format.
The easiest way to host your endpoint is using AWS Lambda
This is an example of JSON object generated by the Alexa Voice Service based on user input.
The request body has two main components session and request.
The session object has information about the current conversation, including what user and skill made the request.
Request contains the payload LaunchRequest
IntentRequest
SessionEndedRequest
This is an what a JSON object generated by your endpoint should look like.
It can be broken down into the following components:
outputSpeech: This is the message users will hear as interpreted by Alexa.
card: Optional graphical component, that will be rendered and stored in the Alexa mobile app and alexa.amazon.com
reprompt: Optional message to remind a user we are waiting for input, if timeout is met.
shouldEndSession: Indicates if service should wait for user input by keeping the session open or end it.
There are three main types of requests, the Alexa Service will generate the appropriate type based on the users input
The sentence at the top is turned into a LaunchRequest, this is analogous to opening an app or website, we are just launching third party functionality.
This type of request is sent back so you can do any necessary cleanup and store data.
This example showcases how a single command from the user can wake up a device, launch a custom skill and trigger functionality within it.
The JSON object for this command would have the type listed as an IntentRequest and the intent name for this example would be GetNewFactIntent
The alexa-sdk gives us a series of tools that make working with JSON objects a lot easier, although it is not required for developing skills, it makes a huge difference.
It is packaged as node module distributed through NPM.
The SDK works as an event emitter and provides easy ways for us to declare and attach handlers for events.
It also allows us to quickly create responses by emit and event that contains :tell or :ask removing the need for us to craft JSON by hand.
Besides emitting an event that gets turned into a response, we can also use the emitter and handlers to control de code flow.
We can emit any event and as long as we have a handler for it, we can trigger any of our codes functionality.