mirror of
https://github.com/bellingcat/RS4OSINT.git
synced 2026-06-10 20:48:36 +03:00
86 lines
33 KiB
JSON
86 lines
33 KiB
JSON
[
|
||
{
|
||
"objectID": "index.html",
|
||
"href": "index.html",
|
||
"title": "Google Earth Engine for OSINT",
|
||
"section": "",
|
||
"text": "Introduction\nThe analysis of satellite imagery is a foundational element of open source investigations. In the past decade, the quantity, quality, and availability thereof has increased dramatically. Capabilities and insights that were once only available to governments are now accessible to the general public. Satellite imagery is being used to collect evidence of genocide and other war crimes in Ukraine, Nigeria, Burundi, Cameroon, the DRC, South Sudan, Papua, and Venezuela. It has been used to monitor environmental degradation and hold extractive industries to account from Iraq to Guatemala. The ability to analyze satellite imagery is a critical skill for anyone interested in open source investigations.\nThough no-code platforms such as Sentinelhub have been invaluable in allowing the OSINT community to access and process satellite imagery, the analytical capabilities of these platforms are limited. Google Earth Engine (GEE) is a cloud-based platform that stores petabytes of satellite imagery from a variety of sources and allows users to perform advanced analyses on Google servers for free using a browser-based interface. This textbook is designed for investigators who want to perform more sophisticated analysis using geospatial data, and assumes no prior knowledge of coding or remote sensing (satellite imagery analysis). It is organized into two parts– an introduction to remote sensing and GEE, and a series of case studies that demonstrate how to use GEE for open source investigations:"
|
||
},
|
||
{
|
||
"objectID": "index.html#table-of-contents",
|
||
"href": "index.html#table-of-contents",
|
||
"title": "Google Earth Engine for OSINT",
|
||
"section": "Table of Contents",
|
||
"text": "Table of Contents\n\nLearning\n\nRemote Sensing\nData Acquisition\nAlgorithms\nApplication Development\n\nCase Studies\n\nWar at Night\nRefinery Detection"
|
||
},
|
||
{
|
||
"objectID": "index.html#what-is-google-earth-engine",
|
||
"href": "index.html#what-is-google-earth-engine",
|
||
"title": "Google Earth Engine for OSINT",
|
||
"section": "What is Google Earth Engine?",
|
||
"text": "What is Google Earth Engine?\nAs geospatial datasets—particularly satellite imagery collections—increase in size, researchers are increasingly relying on cloud computing platforms such as Google Earth Engine (GEE) to analyze vast quantities of data.\nGEE is free and allows users to write open-source code that can be run by others in one click, thereby yielding fully reproducible results. These features have put GEE on the cutting edge of scientific research. The following plot visualizes the number of journal articles conducted using different geospatial analysis software platforms:\n\nDespite only being released in 2015, the number of geospatial journal articles using Google Earth Engine (shown in red above) has outpaced every other major geospatial analysis software, including ArcGIS, Python, and R in just five years. By storing and running computations on google servers, GEE is far more accessible to those who don’t have significant local computational resources; all you need is an internet connection.\nGEE applications have been developed and used to present interactive geospatial data visualizations by NGOs, Universities, the United Nations, and the European Commission."
|
||
},
|
||
{
|
||
"objectID": "ch3.html#getting-started",
|
||
"href": "ch3.html#getting-started",
|
||
"title": "3 Algorithms",
|
||
"section": "3.1 Getting Started",
|
||
"text": "3.1 Getting Started"
|
||
},
|
||
{
|
||
"objectID": "SyriaNTL.html#data",
|
||
"href": "SyriaNTL.html#data",
|
||
"title": "War at Night",
|
||
"section": "Data",
|
||
"text": "Data\nSatellite images of Syria taken at night capture a subtle trace left by human civilization: lights. Apartment buildings, street lights, highways, powerplants– all are illuminated at night and can be seen from space. Researchers often use these nighttime lights signatures to track development; as cities grow, villages recieve power, and infrastructure is built, areas emit more light. But this works both ways. As cities are demolished, villages burned, and highways cutoff, they stop emitting lights.\nThe timelapse below uses imagery from the Defense Meteorological Satellite Program (DMSP), a joint program run by the U.S. Department of Defense and the National Oceanographic and Atmospheric Agency. One image is taken per year between 2005 and 2013:"
|
||
},
|
||
{
|
||
"objectID": "SyriaNTL.html#ukraine",
|
||
"href": "SyriaNTL.html#ukraine",
|
||
"title": "War at Night",
|
||
"section": "Ukraine",
|
||
"text": "Ukraine\n\nPre-Processing\n\n\nAnalysis"
|
||
},
|
||
{
|
||
"objectID": "SyriaNTL.html#iraq",
|
||
"href": "SyriaNTL.html#iraq",
|
||
"title": "War at Night",
|
||
"section": "Iraq",
|
||
"text": "Iraq\nA link to the GEE code for this section can be found here.\n\nPre-Processing\nFirst, let’s start by importing a few useful packages written by Gennadii Donchyts. We’ll use utils and text to annotate the date of each image on the timelapse. We’ll also define an Area of Interest (AOI), which is just a rectangle. You can do this manually by clicking the drawing tools in the top left. I’ve drawn an AOI over the area covering Mosul, Irbil, and Kirkuk in Northern Iraq.\nvar utils = require(\"users/gena/packages:utils\");\nvar text = require(\"users/gena/packages:text\");\n\n// define the Area of Interest (AOI)\nvar AOI = ee.Geometry.Polygon(\n [[[42.555362833405326, 36.62010778397765],\n [42.555362833405326, 35.18296243288332],\n [44.681217325592826, 35.18296243288332],\n [44.681217325592826, 36.62010778397765]]])\n\n// start and end dates for our gif \nvar startDate = '2013-01-01';\nvar endDate = '2018-01-01';\n\n// a filename for when we export the gif\nvar export_name='qayyarah_viirs'\n \n// A palette to visualize the VIIRS imagery. This one is similar to Matplotlib's \"Magma\" palette. \nvar viirs_palette = [\n \"#000004\",\n \"#320a5a\",\n \"#781b6c\",\n \"#bb3654\",\n \"#ec6824\",\n \"#fbb41a\",\n \"#fcffa4\",\n];\n\n// Visualisation parameters for the VIIRS imagery, defining a minimum and maximum value, and referencing the palette we just created\nvar VIIRSvis = { min: -0.1, max: 1.6, palette: viirs_palette };\nNext, we’ll load the VIIRS nighttime lights imagery. We want to select the avg_rad band of the image collection, and filter blank images. Sometimes, we get blank images over an area in VIIRS if our AOI is on the edge of the satellite’s imaging swath. We can filter these images, similarly to how we filter for cloudy images in Sentinel-2:\nvar VIIRS= ee.ImageCollection(\"NOAA/VIIRS/DNB/MONTHLY_V1/VCMCFG\") \n .select('avg_rad')\n // Calculate the sum of the 'avg_rad' band within the AOI\n .map(function(image) { \n var blank=image.reduceRegions({\n collection: AOI, \n reducer: ee.Reducer.sum(), \n scale: 10})\n .first()\n .get('sum')\n // For each image, define a property 'blank' that stores the sum of the 'avg_rad' band within the AOI. \n // We're also going to take a base 10 log of the image-- this will help us visualize the data by dampening extreme values \n return image.set('blank', blank).log10().unmask(0)\n })\n // Now, we can filter images which are fully or partially blank over our AOI\n .filter(ee.Filter.gt('blank', 10))\n // Finally, we filter the collection to the specified date range\n .filterDate(startDate, endDate)\n \nLet’s have a look at the first image in the collection to make sure everything’s looking right. We’ll set the basemap to satellite and center our AOI:\nMap.setOptions('HYBRID')\nMap.centerObject(AOI)\nMap.addLayer(VIIRS.first(),VIIRSvis,'Nighttime Lights')\n\nIf we decrease the opacity of the VIIRS layer, we can see the cities of Mosul, Erbil, and Kirkuk shining brightly at night. We can also see a string of bright lights between Kirkuk and Erbil– these are methane flares from oil wells.\n\n\nAnalysis\nHaving pre-processed the VIIRS imagery, we can now define a function gif that will take:\n\nAn image collection (col, in this case the nighttime lights imagery VIIRS)\nVisualization parameters (col_vis, in this case VIIRSvis)\nAn Area of Interest AOI\n\nThe function will then return a timelapse.\nvar gif = function (col, col_vis, AOI) {\n\n // Define the date annotations to be printed in the top left of the gif in white\n var annotations = [\n {\n textColor: \"white\",\n position: \"left\",\n offset: \"1%\",\n margin: \"1%\",\n property: \"label\",\n // Dynamically size the annotations according to the size of the AOI\n scale: AOI.area(100).sqrt().divide(200),\n },\n ];\n\n // Next, we want to map over the image collection,\n var rgbVis = col.map(function (image) {\n // Get the date of the image and format it\n var start = ee.Date(image.get(\"system:time_start\"));\n var label = start.format(\"YYYY-MM-dd\");\n // And visualize the image using the visualization parameters defined earlier.\n // We also want to set a property called \"label\" that stores the formatted date \n return image.visualize(col_vis).set({ label: label });\n });\n\n // Now we use the label proprty and the annotateImage function from @gena_d to annotate each image with the date. \n rgbVis = rgbVis.map(function (image) {\n return text.annotateImage(image, {}, AOI, annotations);\n });\n\n // Define GIF visualization parameters.\n var gifParams = {\n maxPixels: 27017280,\n region: AOI,\n crs: \"EPSG:3857\",\n dimensions: 640,\n framesPerSecond: 5,\n };\n\n // Export the gif to Google Drive\n Export.video.toDrive({\n collection: rgbVis,\n description: export_name,\n dimensions: 1080,\n framesPerSecond: 5,\n region: AOI,\n });\n // Print the GIF URL to the console.\n print(rgbVis.getVideoThumbURL(gifParams));\n\n // Render the GIF animation in the console.\n print(ui.Thumbnail(rgbVis, gifParams));\n};\nOk that was a pretty big chunk of code. But the good news is that we basically never have to touch it again, since we can just feed it different inputs. For example, if I want to generate a gif of nighttime lights over a different area, it’s as simple as dragging the AOI. If I want to look at a different time period, I can just edit the startDate and endDate variables. And if I want to visualize an entirely different type of satellite imagery– Sentinel-1, Sentinel-2, or anything else, all I have to do is change the image collection (col) and visualization parameters (col_vis) variables. Now, let’s look at some timelapses.\n\nThe Fall of Mosul\nThe function returns a timelapse of nighttime lights over Northern Iraq:\ngif(VIIRS, VIIRSvis, AOI);\n\n\n\nI’ve done a bit of post-processing to this gif, adding more annotations and blending between frames to make it a bit smoother. I typically use ffmpeg and ezgif for the finishing touches.\n\n\nThis timelapse gives a play-by-play of one of the most important campaigns in the war against the Islamic State. In the first few frames, Mosul is under the control of the Kurdistan Regional Government (KRG). In the summer of 2014, ISIS captures the city, and power is cut off. Mosul and many villages along the Tigris river are plunged into darkness. In 2015, the front line in the campaign to retake the city emerges around Mosul, advancing in 2016 and 2017. Mosul is eventually retaken by the KRG in 2017, after which it brightens once again as electricity is restored.\n\n\nThe Qayyarah Fires\nFarther south, there is an interesting detail. Above the “h” in “Qayyarah”, a bright set of lights emerges just before Mosul is recaptured, around December 2016. Fleeing Islamic State fighters set fire to the Qayyarah oilfields, which burned for months.\nUsing the VIIRS data we’ve already loaded, we can further analyze the effect of the conflict using a chart. First, let’s define two rectangles (again, you can draw these) over Mosul and Qayyarah:\nvar mosul = ee.Feature(\n ee.Geometry.Polygon(\n [[[43.054977780266675, 36.438274276521234],\n [43.054977780266675, 36.290642221212416],\n [43.24792516796199, 36.290642221212416],\n [43.24792516796199, 36.438274276521234]]], null, false),\n {\n \"label\": \"Mosul\",\n \"system:index\": \"0\"\n }),\n\n qayyarah = ee.Feature(\n ee.Geometry.Polygon(\n [[[43.08240275545117, 35.8925587996721],\n [43.08240275545117, 35.77899970860588],\n [43.26642375154492, 35.77899970860588],\n [43.26642375154492, 35.8925587996721]]], null, false),\n {\n \"label\": \"Qayyarah\",\n \"system:index\": \"0\"\n })\n\n// Let's put these together in a list \nvar regions=[qayyarah, mosul]\nOnce we’ve got the rectangles, we can make a chart that will take the mean value of the VIIRS images in each rectangle over time:\nvar chart =\n ui.Chart.image\n .seriesByRegion({\n imageCollection: VIIRS,\n regions: regions,\n reducer: ee.Reducer.mean(),\n seriesProperty:'label'\n }).setOptions({\n title: 'Nighttime Lights'\n });\n \nprint(chart)\n\nWe can clearly see Mosul (the red line) darkening in 2014 as the city is taken by ISIS. During this period the Qayyarah oilfileds are, as we might expect, quite dark. All of a sudden in 2016 Qayyarah becomes brighter at night than the city of Mosul ever was, as the oilfields are set on fire. Then, almost exactly when the blaze in Qayyarah is extinguished and the area darkens (i.e. when the blue line falls back to near zero), Mosul brightens once again (i.e. the red line rises) as the city is liberated.\n\n\n\nThe Battle for Aleppo\nThe images below were taken between 2012 and 2014. Vast swaths of the city darken as neighbourhoods are razed by fighting.\n\nThough this is a trend that can be observed across the country, nowhere is the decline in nightlights more visible than in Aleppo. Below is a comparison of longitudinal trends in nighlights signatures between several cities:\n\nThe most salient trend is Aleppo plummeting over the course of 2012, and becoming steadily darker over the course of the next four years. Raqqa drops in 2012 as well, but remains in flux until 2017, when the battle to reclaim the city pluges it into near total darkness. Damascus also experiences a dip in 2012, but stabilizes relatively quickly. The Turkish city of Gaziantep– less than 100km from Aleppo and roughly 1/5th the size– stands in stark contrast to the Syrian cities, becoming progressively brighter over the entire period.\nAnother interesting pattern here is the difference in seasonal trends in nightlights. Under normal circumstances in this part of the world, cities become brighter at night during the summer months. Restaurants, bars, and markets stay open later and conduct business outdoors. Gaziantep, which still attracts scores of tourists every year, displays pronounced seasonality. Damascus, the most stable of the three Syrian cities, also maintains a seasonal trend throughout the war. In contrast, both Raqqa and Aleppo maintain extremely low and roughly constant levels of nightlights year-round during the periods following intense fighting.\nReliable economic data for Syria haven’t been available for nearly a decade, and assessing the country’s recovery is consequently difficult. But subtle indications of economic growth are visible above: all three Syrian cities have been on a steady upward trend since 2017, and beginning to display seasonal variation once again.\n\n\nFighting for Oil\nThroughout the war, sudden massive spikes in nightlights signatures can be observed throughout the country. In the center of the map just west of Palmyra, some particularly large spikes occur in 2017:\nThese flashes of light show gas wells being set on fire, a common form of sabotage carried out by retreating Islamic State fighters. Modified Sentinel-2 imagery of the Hayyan gas field (indicated by the green box above) shows this in greater detail. Substituing the Red band in an RGB image with Near Infrared (NIR) highlights thermal signatures, showing fires burning brightly even during the day.\nThe large complex on the right is the Hayyan Gas Plant, which produced nearly 1/3 of Syria’s electricity. The plant and its associated wells changed hands several times throughout the war, but were under Islamic State control until February 2017. In the video below, Islamic State fighters can be seen rigging the plant with explosives and destroying it on January 8th:\nIn February, three Russian oil and gas companies (Zarubij Naft, Lukoil and Gazprom Neft) were given restoration, exploration, and production rights to the hydrocarbon deposits West of Palmyra. On January 12th, 2017, the Syrian Army’s 5th Legion and Russian special forces launched a counterattack known as the “Palmyra offensive”, with the aim of retaking several important hydrocarbon deposits including Hayyan.\nThe timing of well fires aligns closely with a detailed timeline of the campaign.The Near Infrared Sentinel-2 image below shows the layout of the Hayyan Gas Plant and the wells in the Hayyan gas field:\nThe Syrian Army took the Hayyan gas field on February 4th, and retreating ISIS fighters set fire to wells 1, and 3. However, ISIS managed to briefly retake the Hayyan field on February 7th, setting fire to wells 2 and 4. These moments in the Palmyra Offensive are captured in NIR signatures\nInterestingly, despite the massive explosion caused by the bombing of the Hayyan Gas Plant, no prolonged thermal anomalies were detected over the area of the plant itself. The well fires, on the other hand, lasted for months. Below is an image of well fire at the Hayyan field taken from this video; based on the nearby infrastructure and date (04/02/2017) of posting, it is likely Well-3."
|
||
},
|
||
{
|
||
"objectID": "RojavaRefineries.html",
|
||
"href": "RojavaRefineries.html",
|
||
"title": "Refinery Detection",
|
||
"section": "",
|
||
"text": "Machine Learning Workflow\nNow that we’ve got a model that can identify oil from multispectral satellite imagery fairly well, we can set about making our results accessible.\nOne of the things we’re particularly interested in is the distribution of small refineries. The way we’re currently visualizing the prediction (the raster output from the model where predicted oil is shown in red and everything else is transparent) makes it hard to see these small refineries when we zoom out:\nWe can convert our raster into a series of points using the reduceToVectors function. In essence, this takes homogenous regions of an image (e.g., an area predicted to be oil surrounded by an area not predicted to be oil) and converts it into a point:\nNow the distribution of small refineries is much more easily visible as blue dots:\nIf we zoom out even further, we can see clusters of points that correspond to areas of high oil production. Using geolocated photographs, we can roughly ground-truth the model output:"
|
||
},
|
||
{
|
||
"objectID": "RojavaRefineries.html#pre-processing",
|
||
"href": "RojavaRefineries.html#pre-processing",
|
||
"title": "Refinery Detection",
|
||
"section": "Pre-Processing",
|
||
"text": "Pre-Processing\nAs always, the first step in our project will be to load and pre-process satellite imagery. For this project, we’ll be using Sentinel-2 imagery. Let’s load imagery from 2020-2021, filter out cloudy images, and define visualization parameters:\nvar start='2020-04-01'\nvar end='2021-07-01'\n\nvar bands = ['B2', 'B3', 'B4','B5','B6','B7','B8', 'B8A','B11','B12']\n\nvar sentinel = ee.ImageCollection('COPERNICUS/S2_SR')\n .filter(ee.Filter.date(start, end))\n .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 10))\n .mean()\n .select(bands)\n\nvar s_rgb = {\n min: 0.0,\n max: 3000,\n bands:['B4', 'B3', 'B2'],\n opacity:1\n};\nWhen loading the Sentinel-2 imagery, I’ve also onlyh selected the bands that we will ultimately use in our analysis. There are a number of other bands included in the data that we don’t need. I’ve omitted a few bands (B1, B9, B10) because they’re collected at a much lower spatial resolution (60 meters) compared to the other bands.\nA couple types of landcover are so readily identifiable that we can remove them with thresholds. Water and vegetation both have spectral indices; we looked at NDVI above, but there’s a similar one for water called NDWI. These can be calculated from Sentinel-2 imagery as follows:\nvar ndvi=sentinel.normalizedDifference(['B8','B4'])\n .select(['nd'],['ndvi'])\n\nvar ndwi=sentinel.normalizedDifference(['B3','B8'])\n .select(['nd'],['ndwi'])\nWe use the normalizedDifference function and specify which bands we want to use for each index. NDVI uses the red and near infrared bands (B4 and B8), while NDWI uses bands 3 and 8. Finally, we want to rename the resulting band from ‘nd’ to the name of the spectral index.\nNow we can use these indices to filter out water and vegetation. We do this using the updateMask function, and specify that we want to remove areas that have an NDVI value higher than 0.2 and and NDWI value higher than 0.3. You can play around with these thesholds until you achieve the desired results.\n\nvar image=sentinel.updateMask(ndwi.lt(0.3))\n .updateMask(ndvi.lt(0.2))\n .addBands(ndvi)\n .select(bands)\nWe also want to only select bands that are relevant to our analysis; Sentinel\nFinally, let’s clip the image to our Area of Interest (AOI) and add it to the map using the visualization parameters we defined earlier.\nMap.addLayer(image.clip(AOI), s_rgb, 'Sentinel');\n\n\n\nwater and vegetation have been removed from this Sentinel-2 image. What remains is largely fallow agricultural land, urban areas, and oil spills.\n\n\nNow that we’ve loaded and preporcessed our satellite imagery, we can proceed with the rest of our task. Ultimately, we want to create a map of the study area which shows us different “landcovers” (materials). This can broadly be achieved in three steps:\n\nGenerate labeled landcover data\nTrain a model using labeled data\nValidate the model"
|
||
},
|
||
{
|
||
"objectID": "RojavaRefineries.html#generating-labeled-data",
|
||
"href": "RojavaRefineries.html#generating-labeled-data",
|
||
"title": "Refinery Detection",
|
||
"section": "1. Generating Labeled Data",
|
||
"text": "1. Generating Labeled Data\nA vital step in any machine learning workflow is the generation of labeled data, which we will use to train a model to differentiated between different types of land cover and later to test the model’s accuracy. By looking around the study area, we can get a sense of the different land cover classes that we might encounter:\n\nAgricultural Land\nUrban Areas\nOil Contamination\n\nNaturally we could subdivide each of these into sub-categories, and there are probably other classes we haven’t included that may be present in the study area. The choice of classes is partly informed by the nature of the task at hand. In theory, the most efficient number of classes for this task would be two: oil, and everything else. The problem is that the “everything else” category would be pretty noisy since it would include a wide range of materials, making it harder to distinguish this from oil. In practice, a visual inspection of major landcover classes in the study area is a quick-and-dirty way of getting at roughly the right number of classes. This is also an iterative process: you can start with a set of labeled data, look at the model results, and adjust your sampling accordingly. More on this later.\nThe main landcover class we’re interested in is, of course, oil. Some oil contamination is readily visible from the high resolution satellite basemap; rivers of oil flow from the leaking Ger Zero refinery. We can draw polygons around the oil contamination like so:\n\nThe same process is applied to agricultural land and urban areas. In general, you want to make sure that you’re sampling from all across the study area. I’ve generated between 4-10 polygons per landcover class in different places. We’re now left with a featureCollection composed of polygons for each class. I’ve named them oil, agriculture, and urban.\nHowever, I don’t just want to use all of the pixels contained in these polygons for training. There are several reasons for this. First, it would likely lead to overfitting. Second, there are probably over a million pixels between all of the polygons, which would slow things down unnecessarily. Third, I haven’t drawn the polygons to be equal sizes across classes, so I could end up with way more points from one class compared to another. It’s OK to have some imbalance between classes, but you don’t want it to be extreme.\nAs such, the next step involves taking random samples of points from within these polygons. I do so using the randomPoints function:\nvar oil_points=ee.FeatureCollection.randomPoints(oil, 3000).map(function(i){\n return i.set({'class': 0})})\n \nvar urban_points=ee.FeatureCollection.randomPoints(urban, 1000).map(function(i){\n return i.set({'class': 1})})\n \nvar agriculture_points=ee.FeatureCollection.randomPoints(agriculture, 2000).map(function(i){\n return i.set({'class': 2})})\nIn the first line, I create a new featureCollection called oil_points which contains 3000 points sampled from the polygons in the oil featureCollection. I then map through each of these points, and set a property called “class” equal to 0. I do the same for the urban and agricultural areas, setting the “class” property of these featureCollections to 1 and 2, respectively. Ultimately, our model will output a raster in which each pixel will contain one of these three values. A value of 0 in the output will represent the model predicting that that pixel is oil, based on the training data; a value of 1 would indicate predicted urban land cover, and 2 predicted agricultural landcover.\nNow we want to create one feature collection called “sample”, which will contain all three sets of points.\nvar sample=ee.FeatureCollection([oil_points,\n urban_points,\n agriculture_points\n ])\n .flatten()\n .randomColumn();\nWe’ve also assigned a property called “random” using the randomColumn function. This lets us split our featureCollection into two: one used for training the model, and one used for validation. We’ll use a 70-30 split.\nvar split=0.7\nvar training_sample = sample.filter(ee.Filter.lt('random', split));\nvar validation_sample = sample.filter(ee.Filter.gte('random', split));"
|
||
},
|
||
{
|
||
"objectID": "RojavaRefineries.html#training-a-model",
|
||
"href": "RojavaRefineries.html#training-a-model",
|
||
"title": "Refinery Detection",
|
||
"section": "2. Training a Model",
|
||
"text": "2. Training a Model\nHaving generated labeled training and testing data, we now want to teach an algorithm to associate the pixels in those areas (in particular, their spectral profiles) with a specific landcover class.\nThe list of points we generated in the previous step contain a label (0: oil, 1: urban, 2: agriculture). However, they do not yet contain any information about the spectral profile of the Sentinel-2 image. The sampleRegions function lets us assign a the band values from an image as properties to our feature collection. We do this for both training sample and the validation sample.\nvar training = image.sampleRegions({\n collection: training_sample,\n properties: ['class'],\n scale: 10,\n});\n\nvar validation = image.sampleRegions({\n collection: validation_sample,\n properties: ['class'],\n scale: 10\n});\nEach point in the featureCollections above will contain a property denoting each Sentinel-2 band’s value at that location, as well as the property denoting the class label.\nNow we’re ready to train the model. We’ll be using a Random Forest classifier, which basically works by trying to separate your data into the specified classes by setting lots of thresholds in your input properties (in our case, Sentinel-2 band values). It’s a versatile and widely-used model.\nWe first call a random forest classifier with 500 trees. More trees usually yields higher accuracy, though there are diminishing returns. Too many trees will result in your computation timing out. We then train the model using the train function, which we supply with the training data as well as the name of the property that contains our class labels (“class”).\nvar model = ee.Classifier.smileRandomForest(500)\n .train(training, 'class');\nThe trained model now associates Sentinel-2 band values with one of three landcover classes. We can now feed the model pixels it has never seen before, and it will use what it now knows about the spectral profiles of the differnt classes to predict the class of the new pixel.\nvar prediction = image.classify(model)\nprediction is now a raster which contains one of three values (0: oil, 1: urban, 2: agriculture). We’re only interested in oil, so let’s isolate the regions in this raster that have a value of 0, and add them in red to the map:\nvar oil_prediction=prediction.updateMask(prediction.eq(0))\n\nMap.addLayer(oil_prediction, {palette:'red'}, 'Predicted Oil Conamination')"
|
||
},
|
||
{
|
||
"objectID": "RojavaRefineries.html#validation",
|
||
"href": "RojavaRefineries.html#validation",
|
||
"title": "Refinery Detection",
|
||
"section": "3. Validation",
|
||
"text": "3. Validation\nThe image above should look somewhat familiar. It’s Ger Zero, where we trained part of our model. We can see in red the areas which the model predicts to be oil pollution. These largley align with the areas that we can see as being contaminated based on the high resolution basemap. It’s not perfect, but it’s pretty good.\nLet’s scroll to another area, far from where the model was trained. This image shows two clusters of makeshift refineries which were identified by the model. This is good, though we can only get so far by visually inspecting the output from our model. To get a better sense of our model’s performance, we can use the validation data that we generated previously. Remember, these are labeled points which our model was not trained on, and has never seen before.\nWe’ll take the validation featureCollection containing our labeled points, and have our model classify it.\nvar validated = validation.classify(model);\nNow the validated variable is a featureCollection which contains both manual labels and predicted labels from our model. We can compare the manual labels to the predicted output to get a sense of how well our model is performing. This is called a Confusion Matrix (or an Error Matrix):\nvar testAccuracy = validated.errorMatrix('class', 'classification');\n\nprint('Confusion Matrix ', testAccuracy);\n\n\n\n\n\n\n\n\n\n\n\n\n\nLabels\n\n\n\n\n\n\n\nOil\nUrban\nAgriculture\n\n\n\nOil\n876\n1\n5\n\n\nPrediction\nUrban\n0\n168\n8\n\n\n\nAgriculture\n1\n4\n514\n\n\n\nNow, we can see that of the 877 points that were labeled “oil”, only one was falsely predicted to be agicultural land. The model also falsely predicted as oil one point that was labeled urban, and five points that were labeled agriculture. Not bad. We can get a sense of the model’s overall accuracy using the accuracy function on the confusion matrix:\nprint('Validation overall accuracy: ', testAccuracy.accuracy())\nThis tells us that the overall accuracy of our model is around 98%. However, we shouldn’t take this estimate at face value. There are a number of complicated reasons (spatial autocorrelation in the training data, for example) why this figure is probably inflatred. If we were submitting this analysis to a peer-reviewed journal, we’d take great care in addressing this, but for our purposes we can use the accuracy statistics to guide our analysis and get a rough sense of how well the model is performing.\nThis model isn’t perfect; it often misclassifies the shorelines of lakes as oil, or certain parts of urban areas. As previously mentioned, training a model is often an iterative process. At this stage, if your accuracy is not as high as you’d like it to be, you can use the output to figure out how to tweak the model. For example, you may observe that your model is confusing urban areas with oil spills. You can draw a polygon over the erroneous area, label it urban landcover and retrain the model thereby hopefully improving accuracy. We could further refine our model in this way."
|
||
}
|
||
] |