Build Voice-Enabled Experiences with Alexa

•

0 likes•12,532 views

alexadevs

Technology

Meet Alexa
The cloud-based voice service that powers devices like Amazon Echo and Echo Dot

The Amazon Alexa Service
Supported by two powerful frameworks that leverage public APIs
Lives In the Cloud
Automated Speech Recognition (ASR)
Natural Language Understanding (NLU)
Always Improving

The Amazon Alexa Service
Supported by two powerful frameworks that leverage open APIs
Lives In the Cloud
Automated Speech Recognition (ASR)
Natural Language Understanding (NLU)
Always Improving
Alexa Skills Kit
(ASK)
Create Great Content
ASK is how you connect
to your consumer

Skills built using ASK
Tools that make it fast & easy for you to build skills

Build Voice-Enabled Experiences with Alexa

Alexa Resources - cameras out!
bit.ly/alexaquickstart
github.com/alexa
developer.amazon.com/ask
aws.amazon.com

Remember to Check-In
• Ask the instructor for the link
• You’ll get a confirmation email with details to
earn free perks from Amazon

Alexa, open space facts
Wake Word Starting Phrase Skill invocation
Name

II. Let’s Build
Objective: Create a skill that delivers random facts or quotes

What You Will Learn
• Voice User Interface (VUI) Design
• Intents & Utterances
• one-shot vs multi-turn interactions
• SSML/Speechcons
• AWS Lambda
• Skill Certification

Two sides to an Alexa skill
Alexa skills have two parts – a front-end and a back-end

Creating an Alexa Skill
Voice User Interface Programming Logic
+

Creating an Alexa Skill
+
developer.amazon.com aws.amazon.com

Creating an Alexa Skill
developer.amazon.com

Alexa Project Structure
/SpeechAssets
/IntentSchema.json
/SampleUtterances.txt
/src
/index.js

Open a New Browser Window
1. developer.amazon.com/alexa
2. aws.amazon.com
3. github.com/alexa
with these three tabs:

Alexa, open space facts
open, begin, start, launch, ask, tell
Wake Word Starting Phrase Skill invocation
Name

Alexa, ask space facts for trivia
UtteranceWake Word Skill invocation NameStarting Phrase

Alexa, ask space facts for trivia
tell me something
give me information
a fact
give me trivia
UtteranceWake Word Skill invocation NameStarting Phrase

III. How it works.
Utterance to intents.

Automatic Speech
Recognition
fȯr tē tīmz

Automatic Speech
Recognition
fȯr tē tīmz
Forty Times? 40x

Automatic Speech
Recognition
fȯr tē tīmz
Forty Times? 40x
For Tea Times?

Automatic Speech
Recognition
fȯr tē tīmz
Forty Times?
For Tea Times?
For Tee Times?
40x

Automatic Speech
Recognition
fȯr tē tīmz
Forty Times?
For Tea Times?
Four Tee Times?
40x

NLU engine to the rescue
Natural Language Understanding

Sample Utterances
In order to map user
input to a behavior, we
provide training data,
for each intent.

Intent Schema (JSON)
An array of intents.
Each intent is a behavior
for your skill.

Inputs & Outputs
User Audio in. Intents & Slots out.

Wake word detection
Signal processing
Beam forming
Request
Response

Audio
Utterances
JSON
Intents
Request
Response

Response
Request
Text to speech
SSML, streaming audio
JSON

Built-in Intents
A library of intents for
common actions.
Amazon provides training
data, but they can be
augmented.
AMAZON.CancelIntent
AMAZON.HelpIntent
AMAZON.StopIntent
AMAZON.NextIntent
AMAZON.NoIntent
AMAZON.RepeatIntent
AMAZON.StartOverIntent
AMAZON.ShuffleOnIntent
AMAZON.YesIntent
REQUIRED FOR
CERTIFICATION

Communicating with the endpoint
Your endpoint needs to receive and react to a JSON object

The Endpoint
Must be Internet-accessible
Adhere to ASK service interface
- JSON
Web service or AWS Lambda
Uses HTTP over SSL/TLS
- port 443

Communicating with the Endpoint
Request body:
• session: Information about the
current conversation
• request: Describes the user input

Communicating with the Endpoint
Response body:
• outputSpeech: Alexa’s response
• card: (optional) graphical response
• reprompt: (optional) reminder
• shouldEndSession: used to end or
keep session open

Types of requests
The journey from user utterance to intents.

IntentRequest : GetNewFactIntent
Alexa, ask space facts for trivia

Ask vs Tell
Tell:
Ask:
Present data to user, ends conversation (session).
Wait for user input, doesn’t end conversation (session).

Emit – output speech/event
Speech:
Event:
A way to route behavior in your code.

Recently uploaded

A Journey Into the Emotions of Software DevelopersNicole Novielli

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Training state-of-the-art general text embeddingZilliz

Rise of the Machines: Known As Drones...Rick Flair

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Sample pptx for embedding into website for demoHarshalMandlekar2

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Recently uploaded (20)

A Journey Into the Emotions of Software Developers

The State of Passkeys with FIDO Alliance.pptx

DevEX - reference for building teams, processes, and platforms

Gen AI in Business - Global Trends Report 2024.pdf

Time Series Foundation Models - current state and future directions

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

"Debugging python applications inside k8s environment", Andrii Soldatenko

Training state-of-the-art general text embedding

Rise of the Machines: Known As Drones...

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

TeamStation AI System Report LATAM IT Salaries 2024

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

What's New in Teams Calling, Meetings and Devices March 2024

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Sample pptx for embedding into website for demo

How AI, OpenAI, and ChatGPT impact business and software.

Featured

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

Featured (20)

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

Build Voice-Enabled Experiences with Alexa

1. Build Voice-Enabled Experiences with Alexa @AlexaDevs

2. Meet the Alexa Family

3. Meet Alexa The cloud-based voice service that powers devices like Amazon Echo and Echo Dot

4. alexa.design/video

5. The Amazon Alexa Service Supported by two powerful frameworks that leverage public APIs Lives In the Cloud Automated Speech Recognition (ASR) Natural Language Understanding (NLU) Always Improving

6. The Amazon Alexa Service Supported by two powerful frameworks that leverage open APIs Lives In the Cloud Automated Speech Recognition (ASR) Natural Language Understanding (NLU) Always Improving Alexa Skills Kit (ASK) Create Great Content ASK is how you connect to your consumer

7. The Amazon Alexa Service Supported by two powerful frameworks that leverage open APIs Lives In the Cloud Automated Speech Recognition (ASR) Natural Language Understanding (NLU) Always Improving Alexa Skills Kit (ASK) Create Great Content ASK is how you connect to your consumer Alexa Voice Service (AVS) Unparalleled Distribution AVS allows your content to be everywhere

8. Skills built using ASK Tools that make it fast & easy for you to build skills

9. Alexa, ask Lyft for a Lyft Line to work

11. Alexa, tell Starbucks start my order

12. Alexa has skills Amazon.com/skills

13. Alexa Resources - cameras out! bit.ly/alexaquickstart github.com/alexa developer.amazon.com/ask aws.amazon.com

14. Remember to Check-In • Ask the instructor for the link • You’ll get a confirmation email with details to earn free perks from Amazon

15. I. Demo “Alexa, Open Space Facts”

16. Alexa, open space facts Wake Word Starting Phrase Skill invocation Name

17. II. Let’s Build Objective: Create a skill that delivers random facts or quotes

18. What You Will Learn • Voice User Interface (VUI) Design • Intents & Utterances • one-shot vs multi-turn interactions • SSML/Speechcons • AWS Lambda • Skill Certification

19. Two sides to an Alexa skill Alexa skills have two parts – a front-end and a back-end

20. Creating an Alexa Skill Voice User Interface Programming Logic +

21. Creating an Alexa Skill + developer.amazon.com aws.amazon.com

22. Creating an Alexa Skill

23. Creating an Alexa Skill developer.amazon.com

24. Creating an Alexa Skill aws.amazon.com

25. Creating an Alexa Skill + developer.amazon.com

26. Alexa Skill Templates github.com/alexa

27. Alexa Project Structure /SpeechAssets /IntentSchema.json /SampleUtterances.txt /src /index.js

28. Fact Skill Template alexa.design/fact

29. Open a New Browser Window 1. developer.amazon.com/alexa 2. aws.amazon.com 3. github.com/alexa with these three tabs:

30. Echosim.io Let’s test our skill

31. Alexa, open space facts open, begin, start, launch, ask, tell Wake Word Starting Phrase Skill invocation Name

32. Alexa, ask space facts for trivia UtteranceWake Word Skill invocation NameStarting Phrase

33. Alexa, ask space facts for trivia tell me something give me information a fact give me trivia UtteranceWake Word Skill invocation NameStarting Phrase

34. III. How it works. Utterance to intents.

35. Audio Cards Request Response

36. Speech Recognition

37. Automatic Speech Recognition fȯr tē tīmz

38. Automatic Speech Recognition fȯr tē tīmz Forty Times? 40x

39. Automatic Speech Recognition fȯr tē tīmz Forty Times? 40x For Tea Times?

40. Automatic Speech Recognition fȯr tē tīmz Forty Times? For Tea Times? For Tee Times? 40x

41. Automatic Speech Recognition fȯr tē tīmz Forty Times? For Tea Times? Four Tee Times? 40x

42. NLU engine to the rescue Natural Language Understanding

43. Sample Utterances In order to map user input to a behavior, we provide training data, for each intent.

44. Intent Schema (JSON) An array of intents. Each intent is a behavior for your skill.

45. Inputs & Outputs User Audio in. Intents & Slots out.

46. Wake word detection Signal processing Beam forming Request Response

47. Audio Utterances JSON Intents Request Response

48. Response Request Text to speech SSML, streaming audio JSON

49. Intents & Utterances

50. Intents are the Connection

51. Intents are the Connection - JSON

52. Intents are the Connection - Code

53. Built-in Intents A library of intents for common actions. Amazon provides training data, but they can be augmented. AMAZON.CancelIntent AMAZON.HelpIntent AMAZON.StopIntent AMAZON.NextIntent AMAZON.NoIntent AMAZON.RepeatIntent AMAZON.StartOverIntent AMAZON.ShuffleOnIntent AMAZON.YesIntent REQUIRED FOR CERTIFICATION

54. Communicating with the endpoint Your endpoint needs to receive and react to a JSON object

55. The Endpoint Must be Internet-accessible Adhere to ASK service interface - JSON Web service or AWS Lambda Uses HTTP over SSL/TLS - port 443

56. Communicating with the Endpoint Request body: • session: Information about the current conversation • request: Describes the user input

57. Communicating with the Endpoint Response body: • outputSpeech: Alexa’s response • card: (optional) graphical response • reprompt: (optional) reminder • shouldEndSession: used to end or keep session open

58. Types of requests The journey from user utterance to intents.

59. Alexa, open space facts LaunchRequest

60. Alexa, exit SessionEndedRequest

61. IntentRequest : GetNewFactIntent Alexa, ask space facts for trivia

62. Alexa SDK: emit, ask, tell

63. Ask vs Tell Tell: Ask: Present data to user, ends conversation (session). Wait for user input, doesn’t end conversation (session).

64. Emit – output speech/event Speech: Event: A way to route behavior in your code.

65. Alexa Resources - cameras out! bit.ly/alexaquickstart github.com/alexa developer.amazon.com/ask aws.amazon.com

66. Remember to Check-In • Ask the instructor for the link • You’ll get a confirmation email with details to earn free perks from Amazon

Editor's Notes

Hello, and welcome to Alexa Workshop.
Alexa Family Echo The Echo is the first and best-known endpoint for Alexa Amazon launched the Amazon Echo in 2014. Echo is really a hands-free speaker with far-field voice recognition, which means you can just talk to it from across the room. The Echo is the first and best-known endpoint of the Alexa Ecosystem. We released Echo in 2014 to allow customers to engage with Alexa and control their home via voice. Alexa and The Echo device was built to make life easier and more enjoyable. Echo Dot: is a hands-free, voice-controlled device that uses the same far-field voice recognition as Amazon Echo. Dot has a small built-in speaker—it can also connect to your speakers over Bluetooth or with the included audio cable. The Echo and the Echo Dot are what we call far-field Alexa devices. You interact with them in a completely hand’s free way from anywhere in the room…even if that room is noisy. The difference between Echo and Echo Dot is simple: Echo has a powerful built-in speaker that provides room filling sound. Echo Dot is smaller and contains a less powerful speaker and works great when connected to another audio system. Both include the same 7 microphone mic-array with advanced beam-forming and noise cancelling technology and are otherwise functionally identical Amazon Tap: Alexa is also available other Amazon devices including Tap, our a portable battery powered speaker Other Alexa is available on Amazon’s Fire Tablets, and Amazon shopping apps on mobile. Alexa is also available on Fire TV via the push-to talk remote control that comes with it.
What is Alexa It is a cloud based service that handles all the speech recognition, machine learning, and Natural language understanding for all Alexa enabled devices. Since it lives in the cloud, is always getting smarter, it’s constantly improving and learning. The more you use it, the more it adapts to your speech patterns, vocabulary, and personal preferences. And because Alexa takes all her intelligence from the cloud, new updates and features are delivered automatically.
We’re now interacting with technology in the most natural way possible – by talking. http://alexa.design/video
There are so many possibilities when it comes to Alexa, and we are really excited about it. With Alexa, we are building a cloud-based voice service that’s free to all developers, companies, and hobbyists. Best of all, you don’t need a background in NLU or speech recognition to build great voice experiences for your customers. Alexa is supported by two sets of APIs & SDKs - Alexa Skills Kit (ASK) is an SDK that allows you to build custom skills that customers can voice enable on all Amazon Alexa products. Many of our customers who build their own smart home products with Alexa also create complementary skills that can be accessed in the Skills Storefront. Alexa Voice Service (AVS): is a set of APIs and developer tools that you can use to build Alexa into your product, whether you’re in the automotive, smart home, or home audio industry. --
On one side we have ASK (Alexa Skills Kit) – an API that allows you as a developer to add more capabilities to Alexa. So when we released Alexa, she’s didn’t have the capability to order an Uber, or order a pizza from Dominos. But what we did was that we opened up the API so these companies could build skills that create rich voice experiences for their customers; We now have over 12000 skills that have been published today, and we expect to see a lot more of these in the future. All you Have to Do Is ASK (What is the Alexa Skills Kit?) The ASK is our SDK, read human….our way of making the voice experience via Alexa possible. ASK gives you the ability to create new voice-driven capabilities (also known as skills, think Apps) for Alexa using the new Alexa Skills Kit (ASK). You can connect existing services to Alexa in minutes with just a few lines of code. You can also build entirely new voice-powered experiences in a matter of hours, even if you know nothing about speech recognition or natural language processing.
On the other side is AVS (Alexa Voice Service), - set of APIs that allow you to integrate Alexa in to your devices and apps. So think cars, microwaves, refrigerator, speaker or the likes. As long as your device has a microphone, speaker and internet connection, you can integrate Alexa. In fact we recently released a Raspberry Pi version of Alexa using the AVS APIs. AVS: Serving a Platform Agnostic Voice Experience It’s through the Alexa Voice Service that, device makers and hardware manufacturers can incorporate an Alexa-driven voice experience into their devices. Any device that has a speaker, a microphone, and an Internet connection can integrate Alexa. Just imagine what that means. You can picture everything from a car to a microwave to a pen, and more...all enabled to deliver an experience by voice Both ASK and AVS are completely free to use. Here’s a rule of thumb to understanding what feature set makes sense for your use case: You can add your product to Alexa through the Alexa Skills Kit (ASK) Or, you can add Alexa to your product with the Alexa Voice Service (AVS),
Let’s switch gears now and talk a bit about Skills – which is really capabilities that Alexa has. What is a Skill? Skills are how you, as a developer, make Alexa smarter. They give customers new experiences. They’re the voice-first apps for Alexa. When we launched Echo, Alexa could do the basics - weather, music, read the news, but now you can Lyft, Dominos etc. There are two kinds of skills built in skills (like playing music, weather forecast, general knowledge questions) and custom skills that you as developers can build. Building skills using Alexa Skills Kit (ASK) The way you build skills is by using the Alexa Skills Kit. The Alexa Skills Kit is a collection of self-service APIs, tools, documentation and code samples that make it fast and easy for you to add skills to Alexa. Thousands of developers are building skills to expand Alexa’s capabilities. We launched the Alexa Skills Kit so anyone can develop Skills for Alexa, at no cost. Very similar to Apps on your phone, except that nothing gets installed on the device. What can you do with ASK You can connect existing services to Alexa You can also build entirely new voice-powered experiences in a matter of hours, even if you know nothing about speech recognition or natural language processing.
When Alexa launched in the US , it had dozens of capabilities or skills, and has now thousands of capabilities. You can now say “Alexa, ask Lyft for a Lyft Line to work”
Or Alexa, ask Capital One what did I spend?
Or Alexa, tell Starbucks to start my order
The free Amazon Alexa App is a companion to your Alexa device for setup, remote control, and enhanced features. Alexa is always ready to play your favorite music, provide weather and news updates, answer questions, create lists, and much more. You can also visit amazon.com/skill to view the complete catalog of skills.
Let’s see the Fact Skill in action before we start building it. Talk to Alexa – quick demo of the fact skill. “Alexa, open space facts” You can also say – Alexa, tell me a fact, or Alexa, give me a space fact
Wake Word - A command that the user says to tell Alexa that they want to talk to her. Example: “Alexa, ask History Buff what happened on December seventh.” Here, “Alexa” is the wake word. Alexa users can select from a defined set of wake words. Starting Phrase – open, ask, begin, play, start, talk, tell etc. Invocation Name: A name that represents the custom skill the user wants to use. Users say a skill’s invocation name to begin an interaction with a particular custom skill. For example, if the invocation name is “Daily Horoscopes”, users can say:User: Alexa, ask Daily Horoscopes for the horoscope for Gemini You must say the name of the skill as part of the user utterance. That’s the way Alexa can map it to the appropriate skill. It’s like launching a mobile app. You have to open the app to use the specific functionality.
Much like the web and mobile apps, there are two pieces to building an Alexa skill.
Alexa skills have two parts: Configuration data in Amazon Developer Portal (Frontend) Hosted Service responding to user requests (Backend)
Alexa skills have two parts: Configuration data in Amazon Developer Portal (Frontend): done at developer.amazon.com Hosted Service responding to user requests (Backend): we’ll be using AWS Lambda as our backend, so we’ll do this at aws.amazon.com
Work in progress: This slide will be tweaked. Create VUI Interaction Model. Front-end = Skill Info + Interaction Model Lambda – Your code or your hosted service backend Connect VUI to code Testing Customization – Make it your own Publish it
Create VUI Interaction Model (Front End) Skill Info + Interaction Model Create AWS Lambda Function: Your code or your hosted service backend Connect VUI to the Lambda Function Testing Customization: Make it your own Publish it
Create VUI Interaction Model (Front End) Skill Info + Interaction Model Create AWS Lambda Function: Your code or your hosted service backend Connect VUI to the Lambda Function Testing Customization: Make it your own Publish it
Create VUI Interaction Model (Front End) Skill Info + Interaction Model Create AWS Lambda Function: Your code or your hosted service backend Connect VUI to the Lambda Function Testing Customization: Make it your own Publish it
GitHub templates
A typical Alexa project on GitHub has the following structure: /SpeechAssets Provides the VUI or the Front End for the skill Meant to go inside your skill at developer.amazon.com /src Provides the code for the skill Meant to go into your Lambda Function at aws.amazon.com
About this Skill This sample covers the basics of skill building. It delivers random facts or quotes and serves as a very simple example. You can also customize your fact skill with your favorite topic. Concepts you will learn with this skill Intents and Intent Schema Sample Utterance Generating a randomized response from Alexa
We’ll be switching between these as we build our skill
Visit echosim.io, and login using Amazon
As a developer you are never asked to work with audio or raw text coming from the user. You receive a JSON object that was generated by the Alexa Service, this is how it works.
This is “bird’s eye” view of a user interacting with a custom skill through an Echo. We will go into further detail latter in this presentation, but it’s important to remember that Alexa and all skill’s code live in the cloud.
In order to understand what a user says, we first have to turn sounds into words. This process is called speech recognition.
In this example we have the phonetic spelling for three sounds. Let’s see what words these could form.
Forty times? Maybe the user wants to multiply something by forty
For tea times? Is the user searching for good times to have some tea?
For Tee Times? Does the user want to play golf?
Or does the user want to play a lot of golf
Having a Natural Language Understanding Engine on top of speech recognition allows us to go from words to meanings. We can also train this engine using utterances and slots to map user input with high accuracy.
The way we train the NLU engine is by using sample utterances, that we associate to an intent
Each intent define a specific behavior your skill can take, like buttons on a web page, they take user input and execute some code based on it.
Let’s take a step-by-step look at how user input, in the form of spoken word (audio), is turned into a JSON object that our code can read and respond to.
- The first thing that has to happen is for the device to “wake up” when it hears the correct word. - Once the device is awake it’ll stream all the audio to the Alexa Service hosted by AWS in the cloud. - Alexa devices like the Amazon Echo, or the Echo dot feature microphone arrays, these allow us to capture high quality audio, by using beam forming and canceling background noise.
Once the audio reaches the Alexa Service, it is converted into a JSON object, based on the meaning of the words the user spoke. This JSON object is easy to read from any programing language and contains enough information to allow us to respond to the user’s input.
Your code just has to return a (properly formatted) JSON object to the Alexa Service and the service will take care of turning it into audio and routing it to the correct user and device. Your response can contain plain text, Speech Synthesis Markup Language (SSM) and references to audio files to be played.
Intents are the behaviors your skill can take. Sample Utterances are training data used to map user input to each behavior. The name of an intent is what connects everything together.
Here we can see the intent schema &sample utterances side-to-side. As you can see, the thing they have in common is the name of the intent.
Here is an example of a JSON object that would get sent to your code. Here we can see that there is an intent component that has a name and it exactly matches what we had in our intent schema
Since we are using the alexa-sdk in our code we define an handler for an event that matches the intent name. The intent name connects everything together, the intent schema, the training data, the JSON object and the code.
Along with custom intents, Amazon provides a series of “built in” intents you can leverage, these intents don’t require any training data. The 3 highlighted intents are required for skill publishing.
We use the term endpoint to describe your code along with were it’s hosted.
You can leverage any programming language and hosting technology to build your endpoint. The only requirement is that you securely receive and send JSON in the correct format. The easiest way to host your endpoint is using AWS Lambda
This is an example of JSON object generated by the Alexa Voice Service based on user input. The request body has two main components session and request. The session object has information about the current conversation, including what user and skill made the request. Request contains the payload LaunchRequest IntentRequest SessionEndedRequest
This is an what a JSON object generated by your endpoint should look like. It can be broken down into the following components: outputSpeech: This is the message users will hear as interpreted by Alexa. card: Optional graphical component, that will be rendered and stored in the Alexa mobile app and alexa.amazon.com reprompt: Optional message to remind a user we are waiting for input, if timeout is met. shouldEndSession: Indicates if service should wait for user input by keeping the session open or end it.
There are three main types of requests, the Alexa Service will generate the appropriate type based on the users input
The sentence at the top is turned into a LaunchRequest, this is analogous to opening an app or website, we are just launching third party functionality.
This type of request is sent back so you can do any necessary cleanup and store data.
This example showcases how a single command from the user can wake up a device, launch a custom skill and trigger functionality within it. The JSON object for this command would have the type listed as an IntentRequest and the intent name for this example would be GetNewFactIntent
The alexa-sdk gives us a series of tools that make working with JSON objects a lot easier, although it is not required for developing skills, it makes a huge difference. It is packaged as node module distributed through NPM.
The SDK works as an event emitter and provides easy ways for us to declare and attach handlers for events. It also allows us to quickly create responses by emit and event that contains :tell or :ask removing the need for us to craft JSON by hand.
Besides emitting an event that gets turned into a response, we can also use the emitter and handlers to control de code flow. We can emit any event and as long as we have a handler for it, we can trigger any of our codes functionality.

Build Voice-Enabled Experiences with Alexa

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Build Voice-Enabled Experiences with Alexa

Editor's Notes