Dec 14, 2020
There's been a lot of hype over the past few years about both drones and Artificial Intelligence or AI. In this article we're going to look at what exactly AI means for drones. We'll talk about what's happening now and take a peek at what may happen in the near future.
Innovations help the human race by automating manual tasks that were often laborious and inefficient. From the printing press to the electric light to the Lotus 1-2-3 spreadsheet, when new technologies are introduced, we see a massive jump in productivity. Drones are the equivalent of the 80's IBM PC and Apple Mac II. And mobile applications for drones are today's Lotus 1-2-3. When drones are combined with new AI applications, we’re going to see a similar jump in efficiency with many more manual tasks being automated.
These automated drones will be utilized in many different industries, including Agriculture, Construction, Shipping, Railways, Warehouses, Healthcare and Energy to name just a few. These industries and more are going to see dramatic changes in their day to day operations over the coming decade. Drones together with AI are good at counting, measuring, identifying things or reading text and it doesn't matter if what you're looking at is moving or not. If you want an automated security guard, want to count crops or cattle, identify defects in a solar panel, read barcodes off boxes in a warehouse or read the text on the side of a shipping container, you can already do that and more from the comfort of your office.
The different types of AI drones.
What exactly do we mean when we say AI in this domain? Are we talking about obstacle avoidance, or follow-me functionality or are we talking about computer vision? In most cases, it’s a combination of Machine Learning (ML) together with Computer Vision (CV). ML allows us to use image classification and object detection to identify someone or something. And CV allows us to split the video into images and then apply our ML models against the video that the drone camera is capturing and display the results in real time.
Traditional programming is analogous to a recipe. You create a list of things to do in a specific order and assuming you make no mistakes out pops a cake every time you run the program. Of course, there are different styles and languages in traditional programming, but they're just variations on a theme. The order and how you group the ingredients and instructions may change but it's still just a recipe. Machine Learning is a completely different type of programming. Instead of a procedural approach we teach or train the computer to pattern match. Thousands of pictures are used to tell the computer to recognize a cow. So later on, when the drone camera flies over a cow, the computer will say hold on, I've seen that before, that's a cow. It's also not exactly the same each time. The same cake does not pop out every time you run the program. It will be slightly different, a little bit smaller or a little bit bigger.
Putting theory into practice.
Machine Learning uses neural networks which are multiple layers of nodes. We tell the model what the inputs are and what the outputs are. Over a period of time, the network or model learns what outputs to expect depending on the inputs. Or to put it another way, it learns to predict the outputs based on what inputs it’s given. We present a trained model with an image, and it will output what it thinks it sees based on the previous training.
To show you what the TensorFlow code looks like, the following is a simple TensorFlow model from a Google Codelab that takes the MNIST database of letters and numbers as its input and will then be able to predict what letter or number it is later presented with.
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
We define the sequential model as having an input image size of 28x28 pixels. It uses a multi-layer neural network, with a 128-node first layer and a 10-node second layer. The rectified linear activation function or relu is a function that will output the input directly if it is positive, otherwise, it will output zero. The dropout helps make sure we don't overfit the model so it learns correctly and doesn't get stuck in the training. I show this not to complicate things, but to illustrate that programming in TensorFlow is all about making sure the inputs are the correct shape, choosing the models and layer, and then setting parameters to make sure the model is as efficient as possible. We are not creating the neural network.
There are four stages in Machine Learning, including Labeling, Training, Testing and Deploying. As an example, it might help to talk about a real application we created to count cattle. We used 5,000 images to train the model or neural network. These images came from the drone's camera flying over many different herds of cattle. We collected images of cows in Ireland, Romania, South Africa, Australia and several states here in the U.S.
The weather conditions ranged from very hot, to rainy and some images were even taken in light snow. We also flew the drone at different times of day to capture many lighting conditions. A box is then drawn around each of the cows in each of the images. This is what we call labeling. It's important to get as many different conditions, as many different breeds and as many different images to train your network. It's also very important to make sure the boxes are drawn as accurately as possible, and that there are no mistakes such as boxes around any other types of animals or humans.
To train the model, we use a framework from Google called TensorFlow. There are also other options like PyTorch from Facebook. These frameworks mean we don't have to create our own neural network, we can simply use someone else's. The act of training, then becomes choosing the framework. For models that detect objects, compare TensorFlow vs. PyTorch or COCO vs Yolo V5, then figure out how to add the labeled images. We feed the images and the coordinates of the boxes to train our pre-existing model. Training can take hours or sometimes even days depending on the number of images and the speed of your computer. It’s best to do all the number crunching in the cloud on Colab or Google Cloud Platform where you can get access to large numbers of GPU's or TPU's to speed up the process.
Note that 70% of our images are used to train the model and the remaining 30% are used to test how good our model is at recognizing, in this case, cows. The test images will give us a good idea of how well the model will work in the real world. Machine Learning is also an iterative process, so we need to train, test and repeat. We do this until we get the highest percentage of cows recognized in our tests.
Finally, we need to deploy it. Since this is a drone app, we need to export our optimized trained model in a format for Android or iOS which we can use together with the drone. Fortunately, recent changes in TensorFlow makes this very straightforward, and we simply export the trained model in what is known as a TensorFlow Lite format. We can then import the TensorFlow Lite model into the Android Studio IDE which will auto-generate all the methods and functions you need to detect objects in the drone's video stream.
This is known as Edge Detection or Edge AI where the mobile app will identify any objects for which it's been trained in real time on your phone. You may also want to deploy your new mobile app on Google Play or iTunes so that others can download the app and count their own cows, sheep, crops or whatever you programmed. We also went one step further and allowed the user to create an automated flight, which took a picture at a number or intervals or waypoints so when we finished the flight, we had a giant picture of the entire pasture or field. We then used the same algorithm to get a count of all the cows in all the images in what is known as stitching. Typically, this is not done on the mobile phone but in a cloud server at a later point.
AI drone limitations.
It would be wrong to suggest there aren't obstacles to overcome with AI drones. There are still only a handful of companies and projects that combine AI and drones. Mostly, these are for transmission line or wind turbine inspections where the drone looks for problems on the powerline or cracked turbine blades. Data collection and labeling remains a big task. Sometimes, if you're lucky, someone already has a database of labeled images. A good place to start is TensorFlow's Datasets or TFDS which is a repository of over 200 unique labeled datasets.
If you need your own dataset, it's going to take considerable effort to get a good enough set of images to create an accurate model. While we got a reasonable result with 5,000 images above, a real application is going to need 15,000 images or more for a better degree of accuracy. Most drones also won't fly in adverse weather conditions, so if your security system can't fly then you're going to need backup security guards. Finally, drones have a very limited flight time, and any automated flight will need multiple batteries, so they're not fully automated as yet.
AI and drones, in summary.
AI and drones are very complimentary technologies. Although there are flight time limitations with the current drones on the market, battery technology is rapidly advancing by leaps and bounds. And, while there are also issues finding or creating a labeled dataset, there have been recent advancements in using computer generated synthetic datasets which are very promising. In a few years expect to see a large number of outdoor and indoor manual tasks being automated by drones. Most of the work it will replace are mundane, labor intensive tasks more suited to computers. Ultimately, this will free up humans to do much more interesting work.