Buzzwords like Artificial Intelligence, Machine Learning, Deep-learning have all in recent years gained a lot of attention. This is in many parts thanks to the large, internet-based corporations that utilize them for extremely interesting tasks like image or speech recognition, natural language processing and more while Most of us use such systems on a daily basis .
When thinking about Machine Learning, the first thing that comes to our mind is the cloud hosted on enormous data centers with thousands of servers. For many applications this is the standard and has been for many years. However, with recent increase in hardware availability and performance, thanks to the advent of the Internet of Things, and decrease in cost, a vast range of use cases are moving from the cloud directly to the edge. In this paradigm shift, the node devices are becoming more autonomous as the intelligence shifts closer to the field and away from the cloud, where the events take place. This has already enabled very interesting applications, like autonomous drones, ADAS systems in automotive, smart mobile robots and this is certainly just the beginning. In the following article, we will provide an overview of what a system designer must consider when working on an artificial intelligence in the edge solution. The typical flow comes down to understanding the task to be solved, choosing the algorithm, training and deploying a model for inference.
The Goal: defining the problem
When working on any solution, probably the most important, and surprisingly often neglected, step is to identify the problem that we are trying to solve. Truly understanding the issue is crucial to choosing the right angle and technology to tackle it. Is what you are trying to address a complex classification task that requires a deep neural network with many hidden layers?
Machine learning is not limited to deep learning and neural networks only. There are plenty of so-called “classical” machine learning algorithms, for example k-means, support vector machines, statistical models, that are often less resource-intensive and may in fact be an optimal solution. It is there important to experiment and be able to fail-quickly in order to move forward with a more appropriate approach. That said, the deep learning is what has been as of recent the main driving force in popularizing artificial intelligence.
Architecture: Choosing the correct tools
The application requirements and constraints are what drives the specification of the final product that incorporates an Artificial Intelligence-related algorithm. These are related to robustness, inference time, hardware resources and quality of service. This is especially when considering edge deployment and choosing appropriate the embedded platform. Robustness is the accuracy of the model’s output and the ability to generalize, e.g. the likelihood of computing a correct output and avoiding overfitting. Typically, the more complex the model (or deeper, more layered) and the richer the dataset, the more robust the model tends to be.
Defining a desired inference time is entirely dependant on the application. In some cases, for example in automotive, it is crucial for safety reasons to get a response from a machine vision system under a millisecond. This not the case for a sensor fusion system with slow-changing measurements where one could infer only every a minute or so. Inference speed depends on the model complexity – more layers correspond to more computations and that results in longer inference time. This can be offset by selecting and using more powerful compute resources, e.g. embedded GPUs, DSPs, Neural accelerators with OpenCL kernels to fully utilize the available resources.
In addition, the model memory footprint grows with the number of neurons and weights. Each weight is a number that must be stored in memory. To reduce the size of the model, and often to address hardware specifics, one can convert the weights from floats or doubles and use integers instead.
Quality of service and reliability of the system depends on the deployment model. In a cloud-based approach, the fact that a connection is needed, can result in the system is unreliability. What happens if the server is unreachable? Still, a decision must be made. In such cases, the edge may be the only viable solution, e.g. in autonomous cars, isolated environments. It is also essential to understand that the Machine Learning-based algorithms are inherently probabilistic systems and the output is the likelihood with a certain dose of uncertainty. However, for many use cases, the accuracy or reliability of predictions made by AI systems already exceeds those made by humans. Whether the system designer should consider a 90% or 99% probability to be high enough depends on the application and its requirements.
Finally, when considering an appropriate hardware and software, a designer should realize that the difficulty of development and scalability of certain solutions may differ.
AI is not new to Arrow Electronics but now we believe that this is the time to drive this technology bottom-up meaning that we need to address all options available and to fit it to the customer demand and requirements.
In September 2018 , Arrow Electronics and NVIDIA have signed a global agreement to bring the NVIDIA® Jetson™ Xavier™, a first-of-its-kind computer designed for AI, robotics and edge computing, to companies worldwide to create next-generation autonomous machines.
Jetson Xavier — available as a developer kit that customers can use to prototype designs — is supported by comprehensive software for building AI applications.
This includes the NVIDIA JetPack™ and DeepStream SDKs, as well as CUDA®, cuDNN and TensorRT™ software libraries. At its heart is the new NVIDIA Xavier processor, which provides more computing capability than a powerful workstation and comes in three energy-efficient operating modes.
The Tech Specs for Jetson AGX Xavier is GPU 512-core Volta GPU with Tensor Cores , CPU 8-core ARM v8.2 64-bit CPU, 8MB L2 + 4MB L3 , Memory 16GB 256-Bit LPDDR4x | 137GB/s , Storage 32GB eMMC 5.1 ,DL Accelerator (2x) NVDLA Engines , Vision Accelerator 7-way VLIW Vision Processor ,Encoder/Decoder (2x) 4Kp60 | HEVC/(2x) 4Kp60 | 12-Bit Support ,Size 105 mm x 105 mm as Deployment Module (Jetson AGX Xavier) .
Data & Training: get the right answer
Data is the true currency of Artificial Intelligence. By collecting, processing and analyzing data companies can get important and meaningful insights into business processes, human behavior or recognize patterns. No wonder many internet-based companies like Google or Amazon invest so heavily into storing and processing the data they have access to. In deep learning, the datasets are used to train neural networks. In general, the larger the dataset, the better the accuracy and more robust the model. To make it even less susceptible to environmental factors (sunlight, dirt on lenses, noise, vibration, etc), the data is typically augmented, for instance by rotating images, cropping, adding artificial noise.
There are different approaches to training a model and these are briefly the supervised, unsupervised and reinforced learning. In the first, the dataset is labeled and, for image classification, constitues of pairs of images and labels. The image is forward propagated through the model’s layers, each layer adding a bit more abstraction to finally get the classification value. The output is compared to the label, and the error is then backpropagated from the end to the start to update the weights. In unsupervised learning, the dataset is unlabeled and the model finds patterns on its own. Reinforced learning is best explained by taking an example of a video game. The goal is to maximize the score by taking a set of subsequent actions and responding to feedback from environment, for instance performing a series of consecutive control decisions to move from one place to another.
Deployment and Inference: the unsolved challenge
Most of the training of deep neural networks typically takes place on large GPUs. When it comes to inference, i.e. forward propagation of the neural network to obtain a prediction or classification on a single sample, there are various platforms that can be used. Depending on the requirements, it is possible to deploy and run models on devices like Cortex-M, Cortex-A with GPUs or Neural accelerators, FPGAs or specialized ASICs. These obviously vary by processing power, energy consumption and cost. The tricky part is how to efficiently and easily deploy a model. The models are typically trained using deep learning frameworks like Tensorflow or Caffe. These models must be converted to a format that can be run by the inference engine on the edge device, for example using Open Neural Network Exchange format (ONNX) or to a plain file with weights for ARM CMSIS-NN on Cortex-M. To further optimize, weights may optimized by pruning (removing close to zero values), quantization (moving from float32 to integer) or compression.
Finally, the heavy-lifting on the device is done by an inference engine. It is mainly up to vendors to provide support the target processors and components for frameworks like OpenCL or OpenCV. Unfortunately, the market at the moment is very fragmented and we can see various proprietary SDKs or tools, and no single standard how to deploy and infer on the edge. What is promising is that with standards like ONNX there is an increasing interest in the industry for standardization.
Conclusion: the Edge is getting smarter
The Artificial Intelligence has been the biggest trend in recent years. For the edge devices, the key obstacles to adoption are the lack of understanding and difficulty in deploying and running. As suppliers compete to attract customers and establish their solution as the go-to standard, Arrow has the unique possibility to understand the different approaches from our partners and recognize where different platforms may be the most useful for our customers. We are using our expertise in Artificial Intelligence to aid the customers and demystify the edge computing.