How to Acquire Large Satellite Image Datasets for Machine Learning Projects
Historically, only governments and large corporations have had access to quality satellite images. In recent years, satellite image datasets have become available to anyone with a computer and an internet connection. The quality, quantity, and precision of these datasets is continuously improving, and there are many free and commercial platforms at your disposal to acquire satellite images. On top of that, the prices of acquiring the images have fallen significantly, as well as the prices and availability of the tools that will allow you to analyze the images for machine learning and data science projects.
In this article, I hope to inspire you to start looking into the power and utility of publicly available satellite image datasets available today. I will show you a high-level overview of where these images come from, then I will dive deeper into the details about which features you should think about when choosing the right data source. In a future article, I will give you an overview of the architecture that you need to have in place before you can start working with them on your local computer.
Let’s jump right into satellites. How is this kind of dataset unique? Why should you bother with satellite images?
Satellite Image Data at Your Fingertips
First of all, you can get complete coverage of the Earth, which means that you can select virtually any location on the planet, and you will be able to see what that place looks like. Further, the images are readily available. You can go to a website and easily download an image for any location that you want because there are public space programs that offer free images to whoever wants them. So when you start your research, I strongly encourage you to take a look at the available free resources first. We’ve included a list of those resources at the bottom of this article. There are plenty of commercial options available that provide higher-quality images for specialized purposes. You can reach out to Appsilon directly for assistance with acquiring commercial satellite datasets.
One way to think about satellite image datasets is that they give you the ability to travel backward in time. When you think of satellite images you might think about Google Maps, which provides you with satellite images that give you a snapshot of the surface of the Earth. But with access to the right provider, you can go back in time and access images for any day that you want, going back years – in some cases back to the 1980s. This added temporal dimension gives you additional abilities when it comes to analyzing data. Imagine that you can take a look at one point on Earth, and then go back in time and see how this place has changed. You can then build predictive models to forecast what this place is going to look like in the future.
This visualization shows the scale of what is going on above our heads, outside the stratosphere. Right now there are more than 4,500 satellites orbiting our planet, and over 600 of them are constantly taking photos. There are more and more preparing to launch, especially since this area of technology has been accelerating very rapidly in recent years. This means that the quantity and quality of satellite image datasets is rapidly improving.
Currently, the best resolution that you can get from a satellite image is 25cm per pixel. This means that if you zoom in very closely on a quality satellite image, one pixel is going to represent approximately 25 cm of the Earth’s surface. If a satellite image shows a person, then that person will be represented by approximately three pixels. Three pixels is not much to go on, but if you combine this rough representation of a person with their shadow, then you can confirm that those three pixels is indeed a person.
How to Acquire Satellite Datasets
Now I would like to jump into more specific aspects of satellite imagery — what kind of dataset it is and how you can acquire these datasets.
There are two types of available satellite data. There are public datasets that are freely available with quality that is good enough for many use cases. And there are several commercial outfits that offer even better images with more potential uses.
The best-known public datasets are provided by Landsat and Sentinel. You can Google those companies right now and find the right image for you. One image is going to be about 1GB of data. It’s not immediately obvious how you can work with these images, but later on, I’ll explain how to do it easily. You can also feel free to reach out to us for more information on working with larger commercial datasets.
There are plenty of commercial companies acquiring satellite images. Commercial datasets are primarily provided by Maxar, Planet Labs, Airbus Defence & Space, Imagesat, and Skywatch. Recently, Planet Labs launched 150 satellites each the size of a shoebox. So right now there is a huge constellation of small satellites capturing images. Currently, you can get a new image every two days. Another interesting company to watch is SkyWatch. SkyWatch is a hub for satellite images. They gather images from all of the other providers – they don’t have their own satellites. SkyWatch is a good place to find decent prices for commercial satellite images.
I am often asked about image prices. The prices range from a few dollars for a single image to ~$1000 for the highest possible quality image. So if you want to identify people in a lot of images or you need a consistent and precise historical record for research, it is going to be quite expensive for you at the moment. However, given how fast the technology is progressing the prices should decline in the future. In a sense, we are at a moment where there is a wave of new satellite image technology coming. The wave hasn’t reached its peak yet. If you start researching right now, you will be on top of the wave when satellite images are cheap and available. Now is the perfect time to start playing with satellite image datasets.
Satellite Images: Spatial and Temporal Resolution
When selecting datasets, the first consideration is image resolution. The bigger the resolution, the more details we’re able to see. But there are some tradeoffs which we’ll discuss soon. In this plot, you can see how spatial resolution has changed over the years. We started with 100m in 1970, and now we’re down to 25-30 centimeters.
Spatial resolution is not the only resolution we need to consider when designing solutions based on satellite imagery. Equally important, and sometimes even more important, is the temporal resolution. How often do we get a picture of a given area? What is the revisit time?
Landsat, one of the publicly available satellite image datasets, gives you 30 meters resolution and you get one picture every 14 days. Sentinel gives you 10m resolution every 5 to 7 days. So if you want to invest in your project, you have the option for much better resolution and frequency of images.
Satellite Images: Layers of Information
The way that sensors work in satellites is a really exciting topic. When you think about a satellite image, it’s more than just taking a picture with a normal camera. Humans are able to decode red, green, and blue. But a satellite can decode much more electromagnetic information than that. Some satellites have 12 sensors, which means that you get an image that has 12 layers of information.
For us, an image is just a matrix that has values for red, green, and blue, but from the satellite, you get many more values that the human eye is not able to process. For example, with a satellite image, you can have an infrared channel, which can be used to detect the health of vegetation. So this is completely new data that one would not be able to detect with the naked eye. The infrared channel reflects differently from the chlorophyll in the plants, allowing for the detection of sick plants from space. There is plenty more you can do with these extra layers of information. For instance, you can also detect moisture levels on the surface of the planet, which cannot be done very easily with standard visual color information.
There is also radar technology. Many of you know LIDAR, which tells you the height of a given surface. It is important to note that clouds are a huge problem when it comes to satellite images. There are plenty of people working on algorithms to eliminate the cloud problem in satellite imaging, but there is no ideal solution yet. Radar technology allows you to look through clouds, but you don’t get all of the other layers of information that I mentioned above. You only really get quality information about elevation.
On the right, we have a visible band image of a certain area in Sumatra, Indonesia. As you can see our view is obstructed by clouds. Now, on the left, we see the same area in a radar image. We now have more and more radar images available, which is useful because radar can see through clouds. This can be crucial in many cases.
One thing to keep in mind when you choose a data source — depending on your project, you may not necessarily need the best possible resolution. You might want to experiment with the free images at a resolution of 10m or 30m. You may also investigate what the image provider actually gives you. Some platforms do some of the pre-processing work for you. For example, I mentioned that one image can include 1GB of data. You can actually ask some providers to cut the image into one particular shapefile and in return, you’ll get just 1MB of data — a small image that consists only of the area that you wanted. This can be very helpful if you’re working with a large number of images.
I hope you now have a sense of the many current and developing options for satellite imagery. In the past, such datasets were accessible to only a select few. Now there are many free and commercial platforms at your disposal. You can leverage temporal resolution, spatial resolution, and a dozen bands of the electromagnetic spectrum to aid in your projects. On top of that, the prices of acquiring satellite images have fallen significantly, as well as the prices and availability of the tools that will allow you to analyze these images. For my next article on satellites, we will further explore how to use satellite images in practice, and I will explain why R is an excellent tool for analyzing satellite images. I will share our experiences of trial and error with satellite images to save you time and effort.
Public Data Sets
Commercial Data Sets
Thanks for reading! For more, follow me on Linkedin.
Follow Appsilon Data Science on Social Media
- Follow @Appsilon on Twitter
- Follow Appsilon on LinkedIn
- Sign up for our company newsletter
- Try out our R Shiny open source packages
- Sign up for the AI for Good newsletter