Thursday, Amazon AWS Blogpost Announced Company moves cloud processing for Nvidia GPU’s Alexa Personal Assistant Inference Application Specific Integrated Circuit (ASIC). Amazon Dev Sebastian describes the hardware design of Storemac Inferno as follows:
AWS inference Machine learning inductor is a custom chip built by AWS to speed up workload and optimize their cost. Each AWS inference The chip has four Neuroncourse. Each neuroncore executes high performance Systolic series Matrix multiplication engine, which greatly speeds up typical deep learning operations such as convolution and transformers. Neuroncourse also has a large on-chip cache that can help reduce external memory access, significantly reduce latency, and increase output.
When an Amazon customer — usually an Echo or Echo Dot holder — uses Alexa Personal Assistant, processing takes place on a much smaller device. The workload for a simple Alexa request looks like this:
- One human said to Amazon Echo: “Alexa, what’s so special about Earl Gray tea?”
- Echo uses its own on-board processing – Alexa finds the word Wake
- Echo transmits the request to Amazon data centers
- In the Amazon Data Center, the voice stream is converted to phonemes (inference AI workload)
- Still in the data center, phonemes are converted to words (inference AI workload
- Words assemble in phrases (inference AI workload)
- Phrases are distilled on purpose (inference AI workload)
- The intent will be redirected to the appropriate fulfillment service, which will provide the response as a JSON document
- JSON document translated, including text for Alexa’s answer
- The text form of Alexa’s answer is converted to natural-sounding speech (inference AI workload)
- Natural speech is transmitted back to the Echo device for audio playback- “This is bergamot orange oil.”
As you can see, all the actual work done in fulfilling Alexa’s request takes place in the cloud – not on the Echo or Echo Dot device itself. And much of that cloud work is not by conventional-logical inference — it is on the answer side Neural network Processing.
According to Stormack, converting this inductive workload from Nvidia GPU hardware to Amazon’s own infernia chip has reduced costs by 30 percent and improved end-to-end latency on Alexa’s text-to-speech workload by 25 percent. Amazon is not the only company to use the Infernia processor-chip Amazon AWS Inf1 cases, which are available to the general public and compete with Amazon’s GPU-powered G4 cases.
Amazon’s AWS Neuron Software Development Kit allows machine-learning developers to use inferredia to target popular frameworks, including TensorFlow, Pythorch, and MXNet.
Image listing by Amazon