Files
RS4OSINT/docs/search.json
Ollie Ballinger efbedd2afe recompiled
2023-01-09 16:51:41 +00:00

65 lines
18 KiB
JSON
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[
{
"objectID": "index.html",
"href": "index.html",
"title": "Google Earth Engine for OSINT",
"section": "",
"text": "Introduction\nThe analysis of satellite imagery is a foundational element of open source investigations. In the past decade, the quantity, quality, and availability thereof has increased dramatically. Capabilities and insights that were once only available to governments are now accessible to the general public. Satellite imagery is being used to collect evidence of genocide and other war crimes in Ukraine, Nigeria, Burundi, Cameroon, the DRC, South Sudan, Papua, and Venezuela. It has been used to monitor environmental degradation and hold extractive industries to account from Iraq to Guatemala. The ability to analyze satellite imagery is a critical skill for anyone interested in open source investigations.\nThough no-code platforms such as Sentinelhub have been invaluable in allowing the OSINT community to access and process satellite imagery, the analytical capabilities of these platforms are limited. Google Earth Engine (GEE) is a cloud-based platform that stores petabytes of satellite imagery from a variety of sources and allows users to perform advanced analyses on Google servers for free using a browser-based interface. This textbook is designed for investigators who want to perform more sophisticated analysis using geospatial data, and assumes no prior knowledge of coding or remote sensing (satellite imagery analysis). It is organized into two parts: an introduction to remote sensing and GEE, and a series of case studies that demonstrate how to use GEE for open source investigations."
},
{
"objectID": "index.html#table-of-contents",
"href": "index.html#table-of-contents",
"title": "Google Earth Engine for OSINT",
"section": "Table of Contents",
"text": "Table of Contents\n\nLearning\n\nRemote Sensing\nData Acquisition\nApplication Development\n\nCase Studies\n\nWar at Night\nRefinery Detection\nDeforestation\nShip Detection\nObject Detection Recently, a team of over 100 scientists came together to write a book called “Cloud-Based Remote Sensing with Google Earth Engine: Fundamentals and Applications”. Its a great resource for learning about remote sensing and Earth Engine. The material in this chapter is a subset of the book, edited to fit the scope of this guide. If youre interested in learning more, check out the full book."
},
{
"objectID": "index.html#what-is-google-earth-engine",
"href": "index.html#what-is-google-earth-engine",
"title": "Google Earth Engine for OSINT",
"section": "What is Google Earth Engine?",
"text": "What is Google Earth Engine?\nAs geospatial datasets—particularly satellite imagery collections—increase in size, researchers are increasingly relying on cloud computing platforms such as Google Earth Engine (GEE) to analyze vast quantities of data.\nGEE is free and allows users to write open-source code that can be run by others in one click, thereby yielding fully reproducible results. These features have put GEE on the cutting edge of scientific research. The following plot visualizes the number of journal articles conducted using different geospatial analysis software platforms:\n\nDespite only being released in 2015, the number of geospatial journal articles using Google Earth Engine (shown in red above) has outpaced every other major geospatial analysis software, including ArcGIS, Python, and R in just five years. GEE applications have been developed and used to present interactive geospatial data visualizations by NGOs, Universities, the United Nations, and the European Commission. By storing and running computations on google servers, GEE is far more accessible to those who dont have significant local computational resources; all you need is an internet connection."
},
{
"objectID": "object_detection.html",
"href": "object_detection.html",
"title": "Deep Learning",
"section": "",
"text": "Introduction\nThe Ship Detection tutorial explored a use case in which we might want to monitor the activity of ships in a particular location. That was a fairly straightforward task: the sea is very flat, and ships (especially large cargo and military vessels) protrude significantly. Using radar imagery, we could just set a threshold because if anything on the water is reflecting radio waves, its probably a ship.\nOne shortcoming of this approach is that it doesnt tell us what kind of ship weve detected. Sure, you could use the shape and size to distinguish between a fishing vessel and an aircraft carrier. But what about ships of similar sizes? Or what if you wanted to use satellite imagery to identify things other than ships, like airplanes, cars, or bridges? This sort of task called “object detection” is a bit more complicated.\nIn this tutorial, well be using a deep learning model called YOLOv5 to detect objects in satellite imagery. Well be training the model on a custom dataset, and then using it to dynamically identify objects in satellite imagery of different resolutions pulled from Google Earth Engine. The tutorial is broken up into three sections:\nUnlike previous tutorials which used the GEE JavaScript API, this one will utilize Python; this is because these sorts of deep learning models arent available in GEE natively yet. By the end, well be able to generate images such as the one below:"
},
{
"objectID": "object_detection.html#object-detection-in-satellite-imagery",
"href": "object_detection.html#object-detection-in-satellite-imagery",
"title": "Deep Learning",
"section": "Object Detection in Satellite Imagery",
"text": "Object Detection in Satellite Imagery\nObject detction in satellite imagery has a variety of useful applications.\nTheres the needle-in-a-haystack problem of needing to monitor a large area for a small number of objects. Immediately prior to the invasion of Ukraine, for example, a number of articles emerged showing Russian military vehicles and equipment popping up in small clearings in the forest near the border with Ukraine. Many of these deployments were spotted by painstakingly combing through high resolution satellite imagery, looking for things that look like trucks. One problem with this approach is that you need to know roughly where to look. The second, and more serious problem, is that you need to be on the lookout in the first place. Object detection, applied to satellite imagery, can automatically comb through vast areas and identify objects of interest. If planes and trucks start showing up in unexpected places, youll know about it.\nPerhaps youre not monitoring that large of an area, but you want frequent updates about whats going on. What sorts of objects (planes, trucks, cars, etc.) are present? How many of each? Where are they located? Instead of having to manually look through new imagery as it becomes available, you could have a model automatically analyze new collections and output a summary.\n\nYOLOv5\nObject detection is a fairly complicated task, and there are a number of different approaches to it. In this tutorial, well be using a model called YOLOv5. YOLO stands for You Only Look Once, and its a model that was developed by Joseph Redmon et. al., and the full paper detailing the model can be found here.\nThe YOLOv5 model is a convolutional neural network (CNN), which is a type of deep learning model. CNNs are very good at identifying patterns in images, particularly in small regions of images. This is important for object detection, because we want to be able to identify objects even if theyre partially obscured by other objects.\nYOLO works by chopping an image up into a grid, and then predicting the location and size of objects in each grid cell:\n\nIt learns the locations of these objects by training on a dataset of images in which each object is indicated by a bounding box. Then, when its shown a new image, it will attempt to predict bounding boxes around the objects in that image. The standard YOLO model is trained on the COCO dataset, which contains over 200,000 images of 80 different objects ranging from people to cars to dogs. YOLO models pre-trained on this dataset work great out of the box to detect objects in videos, photographs, and live streams. But the nature of the objects were interested in is a bit different.\nLuckily, we can simply re-train the YOLOv5 model on datasets of labeled satellite imagery. The rest of this tutorial will walk through the process of training YOLOv5 on a custom dataset, and then using it to dynamically identify objects in satellite imagery pulled from Google Earth Engine."
},
{
"objectID": "object_detection.html#training",
"href": "object_detection.html#training",
"title": "Deep Learning",
"section": "Training",
"text": "Training\nThe process of re-training the YOLOv5 model on satellite imagery is fairly straightforward and can be accomplished in just three steps; first, were going to clone the YOLOv5 repository which contains the model code and the training scripts. Then, well download a dataset of satellite imagery and labels from Roboflow, and finally, well train the model on that dataset.\nLets start by cloning the YOLOv5 repository. Note: well be using a fork of the original repository that Ive modified to include some pre-trained models that well be using later on.\n!git clone https://github.com/oballinger/yolov5_RS # clone repo\n%cd yolov5_RS # change directory to repo\n%pip install -qr requirements.txt # install dependencies\n%pip install -q roboflow # install roboflow\n\nimport torch # install pytorch\nimport os # for os related operations\nfrom IPython.display import Image, clear_output # to display images\nOnce weve downloaded the YOLOv5 repository, well need to download a dataset of labelled satellite imagery. For this example, were going to stick with ship detection as our use case, but expand upon it. We want to be able to distinguish between different types of ships, and we want to use freely-available satellite imagery.\nTo that end, well be using this dataset, which contains 3400 labeled images taken from Sentinel-2 (10m/px) and PlanetScope (3m/px) satellites. Ships in these images are labeled by drawing an outline around them:\n\nThe image above shows three ships and what is known as an STS a “Ship-To-Ship” transfer which is when a ship is transferring cargo to another ship. There are a total of seven classes of ship in this dataset:\n\nThis dataset can be downloaded directly from Roboflow using the following code:\nfrom roboflow import Roboflow\nrf = Roboflow(api_key=\"<YOUR API KEY>\")\nproject = rf.workspace('ibl-huczk').project(\"ships-2fvbx\")\ndataset = project.version(\"1\").download(\"yolov5\")\nYoull need to get your own API key from Roboflow, which you can do here, and insert it in the second line of code. Roboflow is a platform for managing and training deep learning models on custom datasets. Its free to use for up to 3 projects, and hosts a large number of datasets that you can use to train your models. To use a different dataset, you can simply change the project name and version number in the second and third lines of code.\nFinally, we can train our YOLOv5 model on the dataset we just downloaded in just one line of code:\n!python train.py --data {dataset.location}/data.yaml --batch 32 --cache\nThis should take about an hour.\n\nAccuracy Assessment\nUsing Tensorboard, we can log the performance of our model over the course of the training process:\n\n\n\n\nOne metric in particular, mAP 0.5, is a good indicator of how well our model is performing. We can see it increasing rapidly at first, and then leveling off after around 30 epochs of training. The rest of this subsection will explain what exactly the mAP 0.5 value represents in this context. If youre interested in training your own model at some point, the rest of this subsection will be of interest. If youre just interested in deploying a pre-trained model, you can skip ahead to the next subsection.\nIn the past when weve worked on machine learning projects (for example in the makeshift refinery identifion tutorial), our training and validation data was a set of points geographic coordinates which we labeled as either being a refinery or not. Calculating the accuracy of that model was fairly straightforward, since predictions were either true positives, true negatives, false positives, or false negatives.\nThis is slightly more complicated for object detection. Were not going pixel-by-pixel and trying to say “this is a ship” or “this is not a ship.” Instead, were looking at a larger image, and trying to draw boxes around the ships. The problem is that there are many ways to draw a box around a ship. The image below shows the labels used in our training data to indicate the location of ships.\n \nThe predicted bounding boxes are very close to the actual bounding boxes, but theyre not exactly the same. The first step in evaluating the performance of our model is to determine how close the predicted boxes are to the actual boxes. We can do this by calculating the intersection over union (IoU) of the predicted and actual boxes. This is essentially a measure of how much overlap there is between the the predicted and actual boxes:\n\n\n\nIntersection over Union\n\n\nThe IoU is a value between 0 and 1, where 0 means that the boxes dont overlap at all, and 1 means that the boxes overlap perfectly. Now we can set a threshold value for the IoU, and say that if the IoU is greater than that threshold, then well count that as a correct prediction. Now that we can classify a prediction as correct or incorrect, we can calculate two important metrics: \\[\\text{Precision} = \\frac{\\text{True Positives}}{\\text{True Positives} + \\text{False Positives}}\\]\nThis is the proportion of positive identifications that are actually correct. If my model detects 100 ships and 90 of them are actually ships, then my precision is 90%.\n\\[\\text{Recall} = \\frac{\\text{True Positives}}{\\text{True Positives} + \\text{False Negatives}}\\]\nThis is the proportion of actual positives that are identified correctly. If there are 100 ships in the image, and my model detects 90 of them, then my recall is 90%.\nThese two metrics are inversely related; I could easily get 100% recall by drawing lots of boxes everywhere to increase my chances of detecting all the ships. Conversely, I could get 100% precision by being extremely conservative and just drawing one or two boxes around the ships Im most confident about. The key is to maximize both: we want our model to be sensitive enough to detect as many ships as possible (high recall), but also precise enough to only draw boxes around the ships that are actually there (high precision). Researchers find this balance using a Precision-Recall curve (PR curve), which plots precision on the y-axis and recall on the x-axis. Below is the Precision-Recall curve for our final model, for each class:\n\n\n\nPrecision-Recall curve from the best\n\n\nStarting from the top left corner, we set a very high confidence threshold: precision is 1, meaning that every box we draw is a ship, but recall is near 0 meaning that were not detecting any ships. As we lower the confidence threshold, we start to detect more ships, but we also start to draw boxes around things that arent ships. Towards the middle of the curve, were detecting most of the ships, but were also drawing boxes around a lot of false positives. Towards the bottom right corner, were detecting all the ships, but were also generating lots of false positives.\nThe goal is to find the point on the curve where precision and recall are both high; the closer the peak of our curve is to the top right corner, the better. A perfect model would touch the top right corner: it would have precision of 1 and recall of 1, detecting all of the ships without making any false positives. The area under this curve is called the Average Precision (AP), and is a measure of how close the curve is to the top right corner. The perfect model would have an AP of 1.\nSome of classes have a very high AP the value for the Aircraft Carrier class is 0.995, which is very high (though this could be down to the fact that we have a relatively small number of images with aircraft carriers in them). Ship-To-Ship (STS) transfer operations also have a high AP, at 0.951. However, other classes notably the “Ship” class have a low AP. This may be because the “Ship” class is a catch-all for any ship that doesnt fit into one of the other classes, so it encompasses lots of weird looking ships.\nFinally, the mean Average Precision (mAP) is the average of the AP for each class, shown as the thick blue line above. Remember, all of this is premised on using a 0.5 threshold in the overlap (IoU) between our predicted boxes and the labels, which is why the final metric is called mAP 0.5. The mAP 0.5 for our model is 0.775, which is pretty good.\nThis number is very useful when training a model in several different ways using the same dataset, in order to select the best performing one. Its not that useful for comparing models trained on different datasets, since the mAP 0.5 is dependent on the number of classes in the dataset and the nature of those classes. For example, in the next section well be using a different model trained on the DOTA dataset which has a mAP 0.5 of around 0.68, largely due to the fact that it has around twice as many classes and many of them are similar to each other."
},
{
"objectID": "object_detection.html#inference",
"href": "object_detection.html#inference",
"title": "Deep Learning",
"section": "Inference",
"text": "Inference\nThis image shows"
},
{
"objectID": "object_detection.html#getting-up",
"href": "object_detection.html#getting-up",
"title": "Habits",
"section": "Getting up",
"text": "Getting up\n\nTurn off alarm\nGet out of bed"
},
{
"objectID": "object_detection.html#going-to-sleep",
"href": "object_detection.html#going-to-sleep",
"title": "Habits",
"section": "Going to sleep",
"text": "Going to sleep\n\nGet in bed\nCount sheep\n\nInstead, we have to look at the entire image and say “this is a ship” or “this is not a ship.”\nUsing Tensorboard, we can log the performance of our model over many iterations:\n\n\n\n\nThere are four accuracy metrics that we can use to evaluate the performance of our model:\n\nmAP 0.5e is the mean average precision at a 0.5 IoU threshold.\nmAP 0.5:0.95e is the mean average precision at a 0.5 to 0.95 IoU threshold."
}
]