Google Earth Engine - Plotting Fire Data on a Bubble Map

www.pexels.com/photo/forest-tree-fire-hot-50700/

Winifred Chung

My name is Winifred, I am a second-year student studying Computer Science at UC Berkeley. I was first exposed to data science one year ago and quickly realized just how applicable it is to various fields of study. After joining the Cabinet of Curiosity team, I learned more about existing natural history databases and collaborated with other team members to dive into and study these databases. I became particularly interested in the Google Earth Engine database and the vast amount of geophysical, climate, and imagery data it contained. Data visualization was also something I wanted to learn more about, which led me to exploring California wildfire data from the Earth Engine database and using Plotly to create a map that represented this data. In this post I will be explaining my workflow for accessing data from the Google Earth Engine, how I cleaned and subsetted this data, and how I made interactive databases with (Plot.ly)[https://plot.ly].

Acquiring Data From Google Earth Engine

Google Earth Engine hosts satellite imagery and other geospatial data. This website also functions as a tool that allows for analysis of these datasets, which are made available for global-scale data mining. This data is amazing - images of Earth may go back more than 40 years and are ingested on a daily basis. Datasets include the USGS/NASA Landsat catalog, precipitation data, elevation data, and many more. Earth Engine also allows users to upload their own data for analysis.

Earth Engine has been used in many interesting case studies, such as the Map of Life, as well as analysis of global forest cover change.

Downloading Data

Browse the Data Catalog to find a dataset that interests you. I picked one that contains data about fires. Notice the link listed under the “Dataset Provider” section.

FIRMS provides “active fire data”, which consists of data from the past 24 hours, 48 hours, and 7 days. However, because FIRMS data dates back to November 2000, I wanted to know how to download data older than 7 days.

Once you’ve reached the FIRMS dataset provider link, scroll down and select “Archive Download.” Then, click on “Create New Request.”

png

I wanted to narrow down the data to Southern California wildfires, so in the Download Request box, I decided to get data on a custom region and drew a polygon that roughly enclosed California. For fire data source, I chose VIIRS, which is an instrument used to detect fires and gather data. My date range of choice was September 1st 2017 to September 1st 2018, and I decided to download it as a .csv file. Finally, you must enter your email so that FIRMS can send the data to you. You should receive an email within a few days.

png

Once you receive the email, just download the .csv file, and you’re ready to go!

Using Plotly for Python

Plotly has a Python library that allows you to create interactive data visualizations online, such as line plots, scatter maps, heatmaps, and 3D network graphs. In this tutorial, we will create a bubble map that visualizes wildfire data in California (based on the FIRMS data we acquired).

I knew I wanted to visualize the data using a map, and after looking at many different graphing libraries, I decided to use Plotly. Plotly seemed to be the most user-friendly because the website had very thorough map tutorials and easy-to-follow documentation. Additionally, Plotly hosts graphs, which can be embedded anywhere. These graphs are saved in your account.

First, install Plotly’s Python package using

pip install plotly

Next, create an account and obtain your API key. This will be used to set your credentials.

Now, you’re ready to fire up Python and set up your credentials:

import plotly plotly.tools.set_credentials_file(username='your_username', api_key='your_api_key')

Once that is done, let’s import some libraries that we will use in this tutorial:

import plotly.plotly as py
import pandas as pd
from datascience import *
import numpy as np

#included for distribution figures.
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats

Using the FIRMS dataset

Our FIRMS dataset is quite big, so let’s narrow it down to just a few points on the map. I’d like to plot 5 bright, 5 moderately bright, and 5 dim wildfires in the dataset.

First, let’s load the .csv file into a table:

data= pd.read_csv('../data/SoCal_fires.csv')  
data.head()

	latitude	longitude	brightness	scan	track	acq_date	acq_time	satellite	instrument	confidence	version	bright_t31	frp
0	36.234	-118.430	321.9	1.0	1.0	2017-10-07	1845	Terra	MODIS	44	6.1	302.5	14.5
1	36.258	-118.421	319.4	4.7	2.0	2017-10-15	1756	Terra	MODIS	72	6.1	294.9	138.7
2	36.259	-118.425	318.2	4.7	2.0	2017-10-15	1756	Terra	MODIS	44	6.1	294.4	118.3
3	36.256	-118.420	345.1	1.1	1.0	2017-10-15	2112	Aqua	MODIS	92	6.1	298.3	50.0
4	36.272	-118.429	326.3	1.1	1.0	2017-10-16	1839	Terra	MODIS	71	6.1	293.0	23.1

Filtering the data

Now, let’s select our columns of interest. For the purposes of this tutorial, we care about the latitude, longitude, brightness, and dates of our data.

# list(data) # list all the column names
py_data = data[['latitude', 'longitude', 'brightness', 'acq_date']] #pandas
py_data.head()

	latitude	longitude	brightness	acq_date
0	36.234	-118.430	321.9	2017-10-07
1	36.258	-118.421	319.4	2017-10-15
2	36.259	-118.425	318.2	2017-10-15
3	36.256	-118.420	345.1	2017-10-15
4	36.272	-118.429	326.3	2017-10-16

Subsetting the data

Now, let’s subset and look at the distribution of the data. We probably wouldn’t expect there to be an equal amount of fires throughout the varying degrees of brightness; if that were the case, we’d have just as many small, dim fires as large, intense fires.

To start and learn about the data, I wanted to take just a subset of the data so it is easier to handle for visualization. Let’s start by looking at the distribution of fire intensities found in this dataset.

## Let's look at the distribution of the brightness.
%matplotlib inline

sns.set(color_codes=True)
sns.distplot(py_data['brightness'])

png

## Split into three quanitles, to ensure I get a subset of the data represenative of the fire intensity distribution
py_data['quantile'] = pd.qcut(py_data['brightness'], 3, labels=["dim", "moderate", "bright"]) 
py_data.head()

	latitude	longitude	brightness	acq_date	quantile
0	36.234	-118.430	321.9	2017-10-07	moderate
1	36.258	-118.421	319.4	2017-10-15	moderate
2	36.259	-118.425	318.2	2017-10-15	moderate
3	36.256	-118.420	345.1	2017-10-15	bright
4	36.272	-118.429	326.3	2017-10-16	moderate

sns.boxplot(data=py_data, x='quantile', y='brightness')
#sns.distplot(py_data['brightness'], hue='quantile')

png

As we can see, the dataset contains a lot more dim fires. Because there is an uneven distribution of fires per degree of brightness, we will randomly sample from each quantile. I will first split up the table by quantile, and then sample from each table.

#filter out the tables 
dim_data = py_data[py_data["quantile"] == "dim"]
moderate_data = py_data[py_data["quantile"] == "moderate"]
bright_data = py_data[py_data["quantile"] == "bright"]

#take 5 samples from the dim fires data
dim_sample = dim_data.sample(n=5)
moderate_sample = moderate_data.sample(n=5)
bright_sample = bright_data.sample(n=5)

Now that we have sampled from each quantile, let’s create a new table with all of the sampled data.

#append the dataframes to each other and sort by increasing brightness
sorted_samples = dim_sample.append(moderate_sample).append(bright_sample).sort_values("brightness")
sorted_samples

	latitude	longitude	brightness	acq_date	quantile
5373	36.261	-118.471	303.4	2017-10-09	dim
3321	35.883	-120.896	308.0	2018-01-26	dim
1675	36.905	-121.433	310.4	2018-01-16	dim
5508	36.132	-118.624	310.9	2017-09-10	dim
1447	36.908	-120.176	311.2	2018-03-25	dim
1785	34.536	-119.156	316.1	2017-12-17	moderate
2959	34.622	-119.468	318.1	2017-12-21	moderate
4759	34.231	-118.340	318.6	2017-09-02	moderate
5458	36.122	-118.714	318.7	2017-09-02	moderate
2130	34.392	-119.031	325.3	2017-12-05	moderate
5139	35.561	-119.488	344.0	2018-03-27	bright
4937	37.233	-120.743	370.0	2017-11-06	bright
2248	34.421	-119.494	380.2	2017-12-10	bright
2127	34.434	-119.047	413.1	2017-12-05	bright
5642	36.115	-118.653	441.0	2017-09-04	bright

#fix the indices
sorted_samples.index = range(15)

Plotting the data

Here’s where things get a bit tricky. Ultimately, we would like to plot a figure with a specific set of data and a specific layout. Plotly has many attributes to help us accomplish our goal, but how do we know which ones to use? Luckily, Plotly has a figure reference!

Let’s initialize our data first. We want to have different-colored bubbles on our bubble map so that we can differentiate between the three levels of brightness. Let’s create a list of colors to choose from, using their RGB values. I will use yellow for dim fires, orange for moderately bright fires, and red for bright fires.

colors = ["rgb(255,255,0)", "rgb(255,128,0)", "rgb(255,0,0)"] #yellow, orange, red

Next, we’ll create a list called limits, which contains three elements. Each element contains the beginning index and ending index of a brightness level. This list will allow us to map certain attributes to a specific range of data. For example, if we assign the first element of [colors] to the first element in [limits], the 5 brightest fires will appear as red bubbles on the map.

#we are grouping by brightness, where first five (0,5) are brightest, second group of five (5,10) are moderately bright, etc.
limits=[(0,5),(5,10),(10,15)] 

Now, it’s time to create a list that contains all the information that we want to represent on the map. We’ll call this list fires. In the end, we want this list to hold three dictionaries (one for each brightness level).

Since we have three brightness levels, we want to assign specific attributes to each of them. To do this, we will iterate through each of the groups and create a dictionary of attributes for each. If you are unclear about these attributes, refer to the figure reference!.

fires=[] #the data that we want to represent on the map
    
for i in range(len(limits)): #we want to iterate 3 times to create 3 traces on the map 
    #for the purpose of scaling the size of each bubble, we will divide by a factor and also scale this factor so we get a bigger
    #variation in size 
    if i == 0: 
        scale = 17*1.2
    elif i == 2: 
        scale = 17*0.8
    else: 
        scale = 17
    group = sorted_samples.loc[np.r_[limits[i][0]:limits[i][1]]]
    fire=dict(
        type = 'scattergeo', #the type of figure we want to create 
        locationmode = 'USA-states', 
        lon = group['longitude'],
        lat = group['latitude'],
         #sets the properties of the bubbles 
        marker=dict(
                #scale the size of the bubble; our bubble size is based on the brightness 
                size = group["brightness"].at[limits[i][0]]/scale, #difference in size is too subtle if we just use one number for scale
                color=colors[i], #the color of the bubbles in this group 
                line = dict(width=0.5, color='rgb(40,40,40)'), 
                sizemode='diameter'
                ),
        name='{0}-{1}'.format(limits[i][0], limits[i][1])) #legend labels
    fires.append(fire)

You can see what our fires dictionary looks like by just calling the data fires.

Next, we will dictate what our figure layout should be. The first part of this code essentially sets the stage for your visualization. The last line does the plotting.

layout=dict(
        title='Range of Wildfire Brightnesses in California from 09/01/2017 to 09/01/2018',
        showlegend=True,
        geo = dict(
            projection=dict( type='albers usa'), #provides the gray USA map 
            center=dict(lon=-116.4194, lat=34.9592), #centers the map on the middle of SoCal when you first create the map
            zoom=6, #Zoom factor of map when you create it 
            showland = True,
            landcolor = 'rgb(217, 217, 217)',
            subunitwidth=1,
            countrywidth=1,
            subunitcolor="rgb(255, 255, 255)",
            countrycolor="rgb(255, 255, 255)"
        ),
    )
fig = dict(data=fires, layout=layout) #our figure with fires data and the layout we want 

py.iplot(fig, validate=False, filename='SoCal-FIRMS-bubble-map') #plot the data! 

Add more data points!

Here, I am just doing the same steps but getting a total of 300 data points in the end and tweaking the map.

brightest100=bright_data.sample(n=100).sort_values('brightness')
brightest100.head()

	latitude	longitude	brightness	acq_date	quantile
5957	34.464	-119.618	334.5	2017-12-15	bright
2697	34.368	-119.364	334.6	2017-12-08	bright
1412	37.955	-120.030	334.7	2017-11-07	bright
844	35.288	-120.452	335.3	2017-10-09	bright
1738	34.434	-119.087	335.3	2017-12-05	bright

dimmest100=dim_data.sample(n=100).sort_values('brightness')
moderate100= moderate_data.sample(n=100).sort_values('brightness')
sorted_300=dimmest100.append(moderate100).append(brightest100)

I also wanted to label each bubble with the brightness value and the date of acquisition, so I first created a new “label” column in nthe sorted_300 table.

#create a new "label" column that combines the brightness and acquisition date, separated by a comma
sorted_300["label"] = sorted_300['brightness'].astype(str) + ', ' + sorted_300['acq_date'].astype(str)
sorted_300.head()

	latitude	longitude	brightness	acq_date	quantile	label
420	36.364	-119.084	300.2	2018-01-26	dim	300.2, 2018-01-26
4235	37.363	-120.500	300.2	2017-12-04	dim	300.2, 2017-12-04
2212	37.211	-119.408	300.3	2017-12-06	dim	300.3, 2017-12-06
3196	37.339	-120.703	300.7	2017-12-04	dim	300.7, 2017-12-04
5202	36.663	-119.968	301.4	2017-12-22	dim	301.4, 2017-12-22

#example of a single label 
sorted_300["label"].iloc[0]

'300.2, 2018-01-26'

Then, I added a “text” attribute to the fire dictionary and set it equal to a specific splice of the “labels300” array. I also changed the trace names in the legend by creating a new list of labels called “legend labels” and editing the “name” attribute in the fire dictionary.

limits300=[(0,100),(100,200),(200,300)]
legendlabels=['Dim', 'Moderate','Bright'] #new legend labels 
fires300=[] 
for i in range(len(limits)): 
    if i == 0: 
        scale = 17*1.2
    elif i == 2: 
        scale = 17*0.8
    else: 
        scale = 17
    #group=filtered_data300.take(range(limits300[i][0], limits300[i][1])) 
    group300 = sorted_300.loc[np.r_[limits300[i][0]:limits300[i][1]]]
    fire300=dict(
        type = 'scattergeo',
        locationmode = 'USA-states', 
        lon = group300['longitude'],
        lat = group300['latitude'],
        #text= labels300[range(limits300[i][0], limits300[i][1])],
        text = sorted_300["label"].iloc[np.r_[limits300[i][0]:limits300[i][1]]],
        location=['California'], 
        marker=dict(
                size= group300["brightness"].at[limits300[i][0]]/scale, 
                color=colors[i], 
                line = dict(width=0.5, color='rgb(40,40,40)'), 
                sizemode='diameter',
                opacity=0.7 #make the bubbles see-through
                ), 
        name=legendlabels[i]) 
    fires300.append(fire300)

Now it’s time to visualize everything!

layout300=dict(
        title='Range of Wildfire Brightnesses in California from 09/01/2017 to 09/01/2018',
        showlegend=True, 
        geo = dict(
            projection=dict( type='albers usa'), 
            center=dict(lon=-119.4179, lat=36.7783), 
            showland = True,
            landcolor = 'rgb(64,64,64)', #changed color of map to dark gray 
            subunitwidth=1,
            countrywidth=1,
            subunitcolor="rgb(255, 255, 255)",
            countrycolor="rgb(255, 255, 255)"
        ),
    )
fig2 = dict(data= fires300, layout= layout300)
py.iplot(fig2, validate=False, filename='FIRMS300-bubble-map') 

Using the CA Oct 2017- Apr 2018 dataset

The ca1718.csv file contians FIRMS data for Calfornia’s 2017-2018 fire season, which lasts from October to April. I performed the same actions as above, taking a total of 300 data points for the visualization.

data1718=pd.read_csv('../data/ca1718.csv')[['latitude', 'longitude', 'bright_ti4', 'acq_date']]
data1718.head()

	latitude	longitude	bright_ti4	acq_date
0	40.73124	-124.00435	295.3	2017-11-30
1	40.64605	-123.88713	336.2	2018-04-25
2	39.94927	-120.95713	326.1	2017-11-06
3	39.65937	-121.01781	327.8	2017-11-28
4	39.69487	-121.00845	331.8	2017-11-28

Subsetting the data, again

In order to properly subset that data. I want to understand the data distribution, that way I make sure that the subset I am making reflect the real distribution of the data.

## Let's lool at the distribution of the brightness.
%matplotlib inline

sns.set(color_codes=True)
sns.distplot(data1718['bright_ti4'])

png

# I split into quantiles and subset by that. 
data1718['quantile'] = pd.qcut(data1718['bright_ti4'], 3, labels=["dim", "moderate", "bright"])
data1718= data1718.sort_values('bright_ti4')
data1718.head()

	latitude	longitude	bright_ti4	acq_date	quantile
18447	35.39388	-120.07444	208.0	2017-11-23	dim
25410	35.57818	-119.25189	208.0	2018-01-25	dim
21099	34.33866	-118.35213	208.0	2017-12-06	dim
14952	34.26500	-118.34959	208.0	2017-12-05	dim
9585	34.51419	-119.56939	208.0	2017-12-16	dim

dim1718 = data1718[data1718['quantile'] == 'dim'].sample(100).sort_values('bright_ti4')
moderate1718 = data1718[data1718['quantile'] == 'moderate'].sample(100).sort_values('bright_ti4')
bright1718 = data1718[data1718['quantile'] == 'bright'].sample(100).sort_values('bright_ti4')

sorted_1718 = dim1718.append(moderate1718).append(bright1718)
sorted_1718.index = range(300)
sorted_1718.head()

	latitude	longitude	bright_ti4	acq_date	quantile
0	39.36066	-123.21239	208.0	2017-10-09	dim
1	38.81009	-122.92694	295.1	2017-10-12	dim
2	36.22182	-119.07079	295.4	2018-03-30	dim
3	34.31768	-118.51015	295.5	2018-02-24	dim
4	34.03887	-117.82152	295.5	2018-01-21	dim

#creating the "label" column
sorted_1718["label"] = sorted_1718['bright_ti4'].astype(str) + ', ' + sorted_1718['acq_date'].astype(str)
sorted_1718.head()

	latitude	longitude	bright_ti4	acq_date	quantile	label
0	39.36066	-123.21239	208.0	2017-10-09	dim	208.0, 2017-10-09
1	38.81009	-122.92694	295.1	2017-10-12	dim	295.1, 2017-10-12
2	36.22182	-119.07079	295.4	2018-03-30	dim	295.4, 2018-03-30
3	34.31768	-118.51015	295.5	2018-02-24	dim	295.5, 2018-02-24
4	34.03887	-117.82152	295.5	2018-01-21	dim	295.5, 2018-01-21

limits1718=[(0,100),(100,200),(200,300)]
legendlabels=['Bright','Moderate', 'Dim']
fires1718=[] 
for i in range(len(limits1718)): 
    if i == 0: 
        scale = 17*1.2
    elif i == 2: 
        scale = 17*0.9
    else: 
        scale = 17
    group1718 = sorted_1718.loc[np.r_[limits1718[i][0]:limits1718[i][1]]]
    fire1718=dict(
        type = 'scattergeo',
        locationmode = 'USA-states', 
        lon = group1718['longitude'],
        lat = group1718['latitude'],
        text = sorted_1718["label"].iloc[np.r_[limits1718[0][0]:limits1718[0][1]]],
        marker=dict(
                size= group1718["bright_ti4"].at[limits1718[i][0]]/scale, 
                color=colors[i], 
                line = dict(width=0.5, color='rgb(40,40,40)'), #the outline of each bubble 
                sizemode='diameter',
                opacity = 0.7
                ), 
        name=legendlabels[i]) 
    fires1718.append(fire1718)

layout1718=dict(
        title='Range of Wildfire Brightnesses in California from 10/01/2017 to 04/30/2018',
        showlegend=True,
        opacity=0.7,
        geo = dict(
            projection=dict( type='albers usa'), 
            center=dict(lon=-119.4179, lat=36.7783), 
            showland = True,
            landcolor = 'rgb(64,64,64)',
            subunitwidth=1,
            countrywidth=1,
            subunitcolor="rgb(255, 255, 255)",
            countrycolor="rgb(255, 255, 255)"
        ),
    )
fig = dict(data=fires1718, layout=layout1718)
py.iplot(fig, validate=False, filename='FIRMS1718-bubble-map') 

Conclusion

It was quite an experience learning how to translate data that I found on Google Earth Engine into a visual representation using Plotly. Plotly was quite difficult to figure out at first, but I was able to make a few tweaks that improved the graph a bit. It was interesting to see how a majority of wildfires occurred around Central California. Additionally, the majority of the worst fires during the 2017-18 wildfire season occurred in Northern California.

There are still a few things I’d like to tweak in this project. I spent a while trying to figure out how to get a gradient scale in the legend to see just how much brightness varies among these fires, but was unable to find a way. There are also multiple data points that represent the same fires, except the data were recorded on different days. It would be helpful to clean the data a bit more to ensure there are no repetitive points. This would also clean up the visual aspect of the map.

Overall, I was glad to have learned more about data visualization and the Plotly Python Library. Familiarizing myself with all the methods will definitely be helpful for future data exploration. I was also surprised by the amount of fire data there are on the internet, and I’m glad that Google Earth Engine provides an easy way to discover such datasets.