Learn to ask about species interactions effectively using automation and network graph visualization
Yikang Li
What do birds prey on? What are the main parasites that host on humans? You can answer all these questions and more with the Global Biotic Interactions (GloBI) database.
To explore the GloBI database a bit further then last time, I decided to create some Python functions that make it easier for users to explore their favorite organism interactions. This post will be very useful if you are interested in exploring GloBI data efficiently using a Python Jupyter Notebook enviroment. I created a few functions that allow both visualizing how organisms interact with each other and by creating URL links to information on (species and groups of species) through Wikipedia directly in the Jupyter notebook.
One of the most efficient ways to view information like this is to use Network Graphs. Network graphs use information from both the link and node data sets to generate a graphical depiction of the network (1). Directed Network graphs give even more information by helping show how your data links direct different nodes. For GloBI data, the link is the interactions type, like “eats”, while the node is the species or organism you are interested in. This directed network graphs allow you to efficiently ask questions like “What are the top taxa that birds prey on?”
In this post I will:
- search for data by taxa name,
- find the top target taxa for which candidate organisms interact with,
- make your data output come alive with automation of URL links to Wikipedia
- create directed network visualizations
This tutorial assumes you know how to do basic to novice Python programming and know how to use Jupyter Notebook enviroments.
Import interaction data
We are going to skip ahead to loading the data, if you would like to know more about accessing GloBI’s data and learning a bit more about what the data is, please see the previous post.
If you would like to follow along on a Jupyter notebook, please download the notebook here: part2_globi_exploration.ipynb. To fully reproduce this analysis you will need the data file, which is 6.54GB. Download the interactions.tsv
file here: interactions.tsv.gz.
## Required Python packages
import pandas as pd
import pytaxize
import re
import matplotlib.pyplot as plt
# This takes a few minutes to load in.
data = pd.read_csv('../../interactions.tsv', delimiter='\t', encoding='utf-8')
len(data)
3729065
data.head()
sourceTaxonId | sourceTaxonIds | sourceTaxonName | sourceTaxonRank | sourceTaxonPathNames | sourceTaxonPathIds | sourceTaxonPathRankNames | sourceTaxonSpeciesName | sourceTaxonSpeciesId | sourceTaxonGenusName | ... | eventDateUnixEpoch | argumentTypeId | referenceCitation | referenceDoi | referenceUrl | sourceCitation | sourceNamespace | sourceArchiveURI | sourceDOI | sourceLastSeenAtUnixEpoch | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | EOL_V2:1056176 | EOL_V2:1056176 | WORMS:137208 | WD:Q301089 | O... | Lepidochelys kempii | species | Cheloniidae | Lepidochelys | Lepidochelys kempii | EOL:8123 | EOL:59582 | EOL_V2:1056176 | family | genus | species | Lepidochelys kempii | EOL_V2:1056176 | Lepidochelys | ... | 7.574076e+11 | https://en.wiktionary.org/wiki/support | Donna Shaver. 1998. Sea Turtle Strandings Alon... | NaN | NaN | http://gomexsi.tamucc.edu | GoMexSI/interaction-data | https://github.com/GoMexSI/interaction-data/ar... | NaN | 2019-02-12T23:05:38.038Z |
1 | EOL_V2:1056176 | EOL_V2:1056176 | WORMS:137208 | WD:Q301089 | O... | Lepidochelys kempii | species | Cheloniidae | Lepidochelys | Lepidochelys kempii | EOL:8123 | EOL:59582 | EOL_V2:1056176 | family | genus | species | Lepidochelys kempii | EOL_V2:1056176 | Lepidochelys | ... | 7.574076e+11 | https://en.wiktionary.org/wiki/support | Donna Shaver. 1998. Sea Turtle Strandings Alon... | NaN | NaN | http://gomexsi.tamucc.edu | GoMexSI/interaction-data | https://github.com/GoMexSI/interaction-data/ar... | NaN | 2019-02-12T23:05:38.038Z |
2 | EOL_V2:1056176 | EOL_V2:1056176 | WORMS:137208 | WD:Q301089 | O... | Lepidochelys kempii | species | Cheloniidae | Lepidochelys | Lepidochelys kempii | EOL:8123 | EOL:59582 | EOL_V2:1056176 | family | genus | species | Lepidochelys kempii | EOL_V2:1056176 | Lepidochelys | ... | 7.574076e+11 | https://en.wiktionary.org/wiki/support | Donna Shaver. 1998. Sea Turtle Strandings Alon... | NaN | NaN | http://gomexsi.tamucc.edu | GoMexSI/interaction-data | https://github.com/GoMexSI/interaction-data/ar... | NaN | 2019-02-12T23:05:38.038Z |
3 | EOL_V2:1056176 | EOL_V2:1056176 | WORMS:137208 | WD:Q301089 | O... | Lepidochelys kempii | species | Cheloniidae | Lepidochelys | Lepidochelys kempii | EOL:8123 | EOL:59582 | EOL_V2:1056176 | family | genus | species | Lepidochelys kempii | EOL_V2:1056176 | Lepidochelys | ... | 7.574076e+11 | https://en.wiktionary.org/wiki/support | Donna Shaver. 1998. Sea Turtle Strandings Alon... | NaN | NaN | http://gomexsi.tamucc.edu | GoMexSI/interaction-data | https://github.com/GoMexSI/interaction-data/ar... | NaN | 2019-02-12T23:05:38.038Z |
4 | EOL_V2:1056176 | EOL_V2:1056176 | WORMS:137208 | WD:Q301089 | O... | Lepidochelys kempii | species | Cheloniidae | Lepidochelys | Lepidochelys kempii | EOL:8123 | EOL:59582 | EOL_V2:1056176 | family | genus | species | Lepidochelys kempii | EOL_V2:1056176 | Lepidochelys | ... | 7.574076e+11 | https://en.wiktionary.org/wiki/support | Donna Shaver. 1998. Sea Turtle Strandings Alon... | NaN | NaN | http://gomexsi.tamucc.edu | GoMexSI/interaction-data | https://github.com/GoMexSI/interaction-data/ar... | NaN | 2019-02-12T23:05:38.038Z |
5 rows × 80 columns
# Checking out all the interaction types
data['interactionTypeName'].unique()
array(['eats', 'interactsWith', 'pollinates', 'parasiteOf', 'preysOn',
'pathogenOf', 'visitsFlowersOf', 'dispersalVectorOf', 'adjacentTo',
'endoparasitoidOf', 'symbiontOf', 'endoparasiteOf', 'hasVector',
'ectoParasiteOf', 'vectorOf', 'livesOn', 'livesNear',
'parasitoidOf', 'guestOf', 'livesInsideOf', 'farms',
'ectoParasitoid', 'inhabits', 'kills', 'hasDispersalVector',
'livesUnder', 'kleptoparasiteOf', 'hasHost', 'eatenBy',
'flowersVisitedBy', 'hasParasite', 'preyedUponBy', 'pollinatedBy',
'hostOf', 'visits', 'commensalistOf', 'hasPathogen'], dtype=object)
Drop duplicates
My goal is to look at the different types of interaction data in the dataset and build network visualizations from this information, therefore I am only really interested in unique cases of interaction. So in this next step let’s drop the data that isn’t unique in these three columns.
data.drop_duplicates(['sourceTaxonId', 'interactionTypeName', 'targetTaxonId'], inplace = True)
## We dropped from
len(data)
967624
Search for data by taxa name
For example, suppose we are interested in the interactions involving one of the weirder speices on this planet ‘Homo sapiens’.
# What are all the types of interactions involving Homo sapiens as sourceTaxon?
data[data['sourceTaxonName'] == 'Homo sapiens']['interactionTypeName'].unique()
array(['interactsWith', 'eats', 'hostOf'], dtype=object)
# Number of records of interactions involving Homo sapiens as sourceTaxon?
len(data[data['sourceTaxonName'] == 'Homo sapiens'])
666
Now let’s focus on a certain type of interaction involving the sourceTaxon “Homo sapiens”, for example, “eats”.
hs_eats_data = data[(data['sourceTaxonName'] == 'Homo sapiens') & (data['interactionTypeName'] == 'eats')]
hs_eats_data.head()
sourceTaxonId | sourceTaxonIds | sourceTaxonName | sourceTaxonRank | sourceTaxonPathNames | sourceTaxonPathIds | sourceTaxonPathRankNames | sourceTaxonSpeciesName | sourceTaxonSpeciesId | sourceTaxonGenusName | ... | eventDateUnixEpoch | argumentTypeId | referenceCitation | referenceDoi | referenceUrl | sourceCitation | sourceNamespace | sourceArchiveURI | sourceDOI | sourceLastSeenAtUnixEpoch | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
525804 | EOL:327955 | EOL:327955 | INAT_TAXON:43584 | NBN:NHMSYS0000... | Homo sapiens | species | Animalia | Chordata | Mammalia | Primates | Ho... | EOL:1 | EOL:694 | EOL:1642 | EOL:1645 | EOL:16... | kingdom | phylum | class | order | family | ge... | Homo sapiens | EOL:327955 | Homo | ... | NaN | https://en.wiktionary.org/wiki/support | Worthington, A. 1989. Adaptations for avian fr... | 10.1007/BF00379040. | NaN | F. Gabriel. Muñoz. 2017. Palm-Animal frugivore... | fgabriel1891/Plant-Frugivore-Interactions-Sout... | https://github.com/fgabriel1891/Plant-Frugivor... | NaN | 2019-02-12T23:08:35.599Z |
527097 | EOL:327955 | EOL:327955 | INAT_TAXON:43584 | NBN:NHMSYS0000... | Homo sapiens | species | Animalia | Chordata | Mammalia | Primates | Ho... | EOL:1 | EOL:694 | EOL:1642 | EOL:1645 | EOL:16... | kingdom | phylum | class | order | family | ge... | Homo sapiens | EOL:327955 | Homo | ... | NaN | https://en.wiktionary.org/wiki/support | Hazarika, T.k. Lalramchuana. Nautiyal. B.P. 20... | 10.1007/s10722-012-9799-5 | NaN | F. Gabriel. Muñoz. 2017. Palm-Animal frugivore... | fgabriel1891/Plant-Frugivore-Interactions-Sout... | https://github.com/fgabriel1891/Plant-Frugivor... | NaN | 2019-02-12T23:08:35.599Z |
527098 | EOL:327955 | EOL:327955 | INAT_TAXON:43584 | NBN:NHMSYS0000... | Homo sapiens | species | Animalia | Chordata | Mammalia | Primates | Ho... | EOL:1 | EOL:694 | EOL:1642 | EOL:1645 | EOL:16... | kingdom | phylum | class | order | family | ge... | Homo sapiens | EOL:327955 | Homo | ... | NaN | https://en.wiktionary.org/wiki/support | Hazarika, T.k. Lalramchuana. Nautiyal. B.P. 20... | 10.1007/s10722-012-9799-5 | NaN | F. Gabriel. Muñoz. 2017. Palm-Animal frugivore... | fgabriel1891/Plant-Frugivore-Interactions-Sout... | https://github.com/fgabriel1891/Plant-Frugivor... | NaN | 2019-02-12T23:08:35.599Z |
527099 | EOL:327955 | EOL:327955 | INAT_TAXON:43584 | NBN:NHMSYS0000... | Homo sapiens | species | Animalia | Chordata | Mammalia | Primates | Ho... | EOL:1 | EOL:694 | EOL:1642 | EOL:1645 | EOL:16... | kingdom | phylum | class | order | family | ge... | Homo sapiens | EOL:327955 | Homo | ... | NaN | https://en.wiktionary.org/wiki/support | Hazarika, T.k. Lalramchuana. Nautiyal. B.P. 20... | 10.1007/s10722-012-9799-5 | NaN | F. Gabriel. Muñoz. 2017. Palm-Animal frugivore... | fgabriel1891/Plant-Frugivore-Interactions-Sout... | https://github.com/fgabriel1891/Plant-Frugivor... | NaN | 2019-02-12T23:08:35.599Z |
527100 | EOL:327955 | EOL:327955 | INAT_TAXON:43584 | NBN:NHMSYS0000... | Homo sapiens | species | Animalia | Chordata | Mammalia | Primates | Ho... | EOL:1 | EOL:694 | EOL:1642 | EOL:1645 | EOL:16... | kingdom | phylum | class | order | family | ge... | Homo sapiens | EOL:327955 | Homo | ... | NaN | https://en.wiktionary.org/wiki/support | Hazarika, T.k. Lalramchuana. Nautiyal. B.P. 20... | 10.1007/s10722-012-9799-5 | NaN | F. Gabriel. Muñoz. 2017. Palm-Animal frugivore... | fgabriel1891/Plant-Frugivore-Interactions-Sout... | https://github.com/fgabriel1891/Plant-Frugivor... | NaN | 2019-02-12T23:08:35.599Z |
5 rows × 80 columns
len(hs_eats_data)
378
I’m going to clean up this table a bit. The code below is first selecting which columns I want to keep and droping data with missing values from ‘targetTaxonId’,’targetTaxonName’,’targetTaxonPathNames’,’targetTaxonPathIds’.
target_hs_eats = hs_eats_data[['targetTaxonId',
'targetTaxonName','targetTaxonPathNames',
'targetTaxonPathIds', 'targetTaxonPathRankNames',
'targetTaxonSpeciesName', 'targetTaxonSpeciesId',
'targetTaxonGenusName', 'targetTaxonGenusId', 'targetTaxonFamilyName',
'targetTaxonFamilyId', 'targetTaxonOrderName', 'targetTaxonOrderId',
'targetTaxonClassName', 'targetTaxonClassId', 'targetTaxonPhylumName',
'targetTaxonPhylumId', 'targetTaxonKingdomName', 'targetTaxonKingdomId']].dropna(subset=['targetTaxonId',
'targetTaxonName','targetTaxonPathNames','targetTaxonPathIds'])
target_hs_eats.head()
targetTaxonId | targetTaxonName | targetTaxonPathNames | targetTaxonPathIds | targetTaxonPathRankNames | targetTaxonSpeciesName | targetTaxonSpeciesId | targetTaxonGenusName | targetTaxonGenusId | targetTaxonFamilyName | targetTaxonFamilyId | targetTaxonOrderName | targetTaxonOrderId | targetTaxonClassName | targetTaxonClassId | targetTaxonPhylumName | targetTaxonPhylumId | targetTaxonKingdomName | targetTaxonKingdomId | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
525804 | EOL_V2:1142757 | Hyphaene petersiana | Plantae | Tracheophyta | Liliopsida | Arecales... | EOL_V2:281 | EOL:4077 | EOL_V2:4074 | EOL:8192... | kingdom | phylum | class | order | family | ge... | Hyphaene petersiana | EOL_V2:1142757 | Hyphaene | EOL:29186 | Arecaceae | EOL:8193 | Arecales | EOL:8192 | Liliopsida | EOL_V2:4074 | Tracheophyta | EOL:4077 | Plantae | EOL_V2:281 |
527098 | EOL:2508660 | Syzygium cumini | Plantae | Tracheophyta | Magnoliopsida | Myrta... | EOL_V2:281 | EOL:4077 | EOL:283 | EOL:4328 | E... | kingdom | phylum | class | order | family | ge... | Syzygium cumini | EOL:2508660 | Syzygium | EOL_V2:2508658 | Myrtaceae | EOL:8095 | Myrtales | EOL:4328 | Magnoliopsida | EOL:283 | Tracheophyta | EOL:4077 | Plantae | EOL_V2:281 |
527099 | EOL:4263 | Styracaceae | Plantae | Tracheophyta | Magnoliopsida | Erica... | EOL_V2:281 | EOL:4077 | EOL:283 | EOL:4186 | E... | kingdom | phylum | class | order | family | NaN | NaN | NaN | NaN | Styracaceae | EOL:4263 | Ericales | EOL:4186 | Magnoliopsida | EOL:283 | Tracheophyta | EOL:4077 | Plantae | EOL_V2:281 |
527100 | EOL_V2:2888768 | Spondias pinnata | Plantae | Tracheophyta | Magnoliopsida | Sapin... | EOL_V2:281 | EOL:4077 | EOL:283 | EOL:4311 | E... | kingdom | phylum | class | order | family | ge... | Spondias pinnata | EOL_V2:2888768 | Spondias | EOL:61097 | Anacardiaceae | EOL:4410 | Sapindales | EOL:4311 | Magnoliopsida | EOL:283 | Tracheophyta | EOL:4077 | Plantae | EOL_V2:281 |
527101 | EOL:1082661 | Smilax ovalifolia | Plantae | Tracheophyta | Liliopsida | Liliales... | EOL_V2:281 | EOL:4077 | EOL_V2:4074 | EOL:4173... | kingdom | phylum | class | order | family | ge... | Smilax ovalifolia | EOL:1082661 | Smilax | EOL_V2:107257 | Smilacaceae | EOL:8171 | Liliales | EOL:4173 | Liliopsida | EOL_V2:4074 | Tracheophyta | EOL:4077 | Plantae | EOL_V2:281 |
len(target_hs_eats)
309
To get a brief overview of what type of data we have for Homo sapiens, we look at all the taxa that are associated with humans’ eating habits.
target_hs_eats.groupby(target_hs_eats['targetTaxonClassName']).size().sort_values(ascending = False)
targetTaxonClassName
Mammalia 102
Actinopterygii 53
Magnoliopsida 52
Aves 25
Bivalvia 19
Liliopsida 8
Malacostraca 7
Gastropoda 5
Elasmobranchii 4
Reptilia 4
Ascidiacea 3
Insecta 3
Anthozoa 2
Holothuroidea 2
Cephalopoda 2
Anopla 1
Bangiophyceae 1
Ulvophyceae 1
Chondrichthyes 1
Chrysophyceae 1
Dothideomycetes 1
Teleostei 1
Phaeophyceae 1
Echinoidea 1
dtype: int64
Find the top target taxa for which candidate organisms interact with
Above all, we have found a list of top target classes of ‘Homo sapiens’ for the “eats” interaction type. But what if I wanted to look for the top target, not only in “eats”, but across any of the columns. For this I created a function, ‘find_top_target’, that could get a list of any rank for any source taxon and any interaction type.
def find_top_target(source, interaction_type, rank):
""" Function that takes inputs of interests and finds corresponding top targets.
Args:
source: the source taxon that we are interested in, can be in any level.
interaction_type: the interaction type that we are interested in,
should be consistent with the names of interaction types from tsv.file.
rank: the level of target taxon that we are interested in,
should be consistent with the column names of tsv.file, such as 'targetTaxonFamilyName', 'targetTaxonOrderName',
'targetTaxonClassName'...
Returns:
The top target taxons in certain rank for certain source taxon and certain interaction type,
in descending order of number of records.
"""
d = data[data['sourceTaxonName'] == source]
interacts_d = d[d['interactionTypeName'] == interaction_type]
interacts_d_cleaned = interacts_d[['targetTaxonId',
'targetTaxonName','targetTaxonPathNames',
'targetTaxonPathIds', 'targetTaxonPathRankNames',
'targetTaxonSpeciesName', 'targetTaxonSpeciesId',
'targetTaxonGenusName', 'targetTaxonGenusId', 'targetTaxonFamilyName',
'targetTaxonFamilyId', 'targetTaxonOrderName', 'targetTaxonOrderId',
'targetTaxonClassName', 'targetTaxonClassId', 'targetTaxonPhylumName',
'targetTaxonPhylumId', 'targetTaxonKingdomName', 'targetTaxonKingdomId']].dropna(subset=['targetTaxonId',
'targetTaxonName','targetTaxonPathNames','targetTaxonPathIds'])
return interacts_d_cleaned.groupby(interacts_d_cleaned[rank]).size().sort_values(ascending = False)
Here are a few examples of the function at work.
# Find top target taxons in Class for homo sapiens with interaction type 'eats'
find_top_target('Homo sapiens', 'interactsWith', 'targetTaxonClassName')
targetTaxonClassName
Mammalia 109
Actinopterygii 42
Insecta 20
Arachnida 12
Aves 12
Magnoliopsida 8
Liliopsida 8
Eurotiomycetes 7
Reptilia 4
Bivalvia 4
Cestoda 4
Elasmobranchii 3
Malacostraca 3
Tremellomycetes 2
Dothideomycetes 2
Cephalopoda 2
Agaricomycetes 2
Gastropoda 1
Echinoidea 1
Conoidasida 1
Coccidia 1
Chondrichthyes 1
Incertae 1
Polyplacophora 1
Secernentea 1
Zoomastigophora 1
dtype: int64
#Find top target taxons in Family for homo sapiens with interaction type 'hostOf'
find_top_target('Homo sapiens', 'hostOf', 'targetTaxonFamilyName')
targetTaxonFamilyName
Ixodidae 11
Diphyllobothriidae 4
Rhopalopsyllidae 3
Pulicidae 3
Trombiculidae 1
Taeniidae 1
Pediculidae 1
Oxyuridae 1
Echinorhynchidae 1
dtype: int64
Instead of inputting a source species, what if we input a source in other levels like class or family?
# Find top target taxons in Class for Actinopterygii with interaction type 'preysOn'
find_top_target('Actinopterygii', 'preysOn', 'targetTaxonClassName')
targetTaxonClassName
Actinopterygii 7
Cephalopoda 1
dtype: int64
Here, the source ‘Actinopterygii’ itself is in the class level. And we can see that the top target class of ‘Actinopterygii’ preys on is also ‘Actinopterygii’, which means the species under ‘Actinopterygii’ always preys on species under same the same class. But what is Actinopterygii?
Make your data output come alive with automation of URL links to Wikipedia
If you are like me you have been copying and pasting these species and taxon names and Googling them to find out what the hell they are. I learned that Actinopterygii is fish, which makes sense, especially because the lead contributor to GloBI is Fishbase which might skew these results a bit. Also, if you are like me, you have gotten sick of all the copying and pasting, so I created a tool that did that for me. The function below allows us to link the results of my top targets with their associated Wikipedia pages.
Warning: This function only works if you are using Jupyter Notebooks!
def make_clickable_both(val):
name, url = val.split('#')
return f'<a href="{url}">{name}</a>'
def top_targets_with_wiki(source, interaction_type, rank):
""" Function that takes inputs of interests and finds corresponding top targets linked to their Wikipedia pages.
Args:
source: the source taxon that we are interested in, can be in any level.
interaction_type: the interaction type that we are interested in,
should be consistent with the names of interaction types from tsv.file.
rank: the level of target taxon that we are interested in,
should be consistent with the column names of tsv.file, such as 'targetTaxonFamilyName', 'targetTaxonOrderName',
'targetTaxonClassName'...
Returns:
The top target taxons in certain rank with clickable Wikipedia links for certain source taxon and certain interaction type,
in descending order of number of records.
"""
top_targets = find_top_target(source, interaction_type, rank)
target_df = pd.DataFrame(top_targets)
target_df.columns = ['count']
urls = dict(name= list(target_df.index),
url= ['https://en.wikipedia.org/wiki/' + str(i) for i in list(target_df.index)])
target_df.index = [i + '#' + j for i,j in zip(urls['name'], urls['url'])]
index_list = list(target_df.index)
target_df.index =[make_clickable_both(i) for i in index_list]
df = target_df.style.format({'wiki': make_clickable_both})
return df
Examples Using the top_targets_with_links
Function
Dont’ Forget: For this function to work correctly you must be using Jupyter notebooks.
What do short tail bats eat?
This first example we are asking to give the results of all taxons that are eaten by ‘Carollia perspicillata’, the short tailed bat.
top_targets_with_wiki('Carollia perspicillata', 'eats', 'targetTaxonClassName')
count | |
---|---|
Magnoliopsida | 40 |
Liliopsida | 3 |
What are Humans the hosts of?
Use the top_targets_with_wiki()
function and click the result links to really creep yourself out!
top_targets_with_wiki('Homo sapiens', 'hostOf', 'targetTaxonFamilyName')
count | |
---|---|
Ixodidae | 11 |
Diphyllobothriidae | 4 |
Rhopalopsyllidae | 3 |
Pulicidae | 3 |
Trombiculidae | 1 |
Taeniidae | 1 |
Pediculidae | 1 |
Oxyuridae | 1 |
Echinorhynchidae | 1 |
What do fish prey on?
top_targets_with_wiki('Actinopterygii', 'preysOn', 'targetTaxonClassName')
count | |
---|---|
Actinopterygii | 7 |
Cephalopoda | 1 |
Using the top_targets_with_wiki()
function makes exploring GloBi data really fun! Try it on some species you are interested in!
Visualize GloBI Data by Building Directed Graphs
The most obvious way to look at this type of data is through network visualizations. For this I used the networkx Python package. Although there are many different ways in which you can visualize networks, I found this package the easiest to work with.
First I created a function that plots the results from the find_top_target
function I created earlier. This plot_interaction
function inputs the same arguments with one additional argument to allow specifying how many you would like to include in the network.
## you need the networx package
import networkx as nx
def plot_interaction(source, interaction_type, rank, n = None):
""" Function that plots directed graphs of results from 'find_top_target'.
Args:
source: the source taxon that we are interested in, can be in any level.
interaction_type: one interaction type or a list of interaction types that we are interested in,
should be consistent with the names of interaction types from tsv.file.
rank: the level of target taxon that we are interested in,
should be consistent with the column names of tsv.file, such as 'targetTaxonFamilyName', 'targetTaxonOrderName',
'targetTaxonClassName'...
n: select first n top targets to plot, default to plot all top targets.
Returns:
A directed graph containing information of the source and target taxons, interaction_type
"""
G = nx.DiGraph()
if not isinstance(interaction_type, list):
interaction_type = [interaction_type]
for interaction in interaction_type:
if n:
top_targets = find_top_target(source, interaction, rank)[: n]
else:
top_targets = find_top_target(source, interaction, rank)
for name in ([source]+ list(top_targets.index)):
G.add_node(name)
for target in top_targets.index:
G.add_edge(source, target, label = interaction)
plt.figure(figsize=(8,8))
edge_labels = nx.get_edge_attributes(G,'label')
pos = nx.spring_layout(G)
nx.draw_networkx_edge_labels(G,pos, edge_labels = edge_labels, font_size=15, font_color='orange')
nx.draw_networkx(G, pos, with_labels=True, node_size=1500, node_color="skyblue", alpha= 1, arrows=True,
linewidths=1, font_color="grey", font_size=15, style = 'dashed')
plt.axis('off')
plt.tight_layout()
plt.show()
#interaction plot of top 5 target classes that Homo sapiens eats:
plot_interaction('Homo sapiens', 'eats', 'targetTaxonClassName', 5)
We can see that Mammalia, Magnoliopsida, Actinopterygii, Aves and Bivalvia are top 5 target class that Homo sapiens eats. For me, it is surprising to see Magnoliopsida, which is a valid botanical name for a class of flowering plants.
#without indicating n, interaction plot of all target classes that Homo sapiens eats:
plot_interaction('Homo sapiens', 'eats', 'targetTaxonClassName')
#interaction plot of top 5 families that Homo sapiens are host of :
plot_interaction('Homo sapiens', 'hostOf', 'targetTaxonFamilyName', 5)
#interaction plot of top 5 classes that Aves (birds!) preys on :
plot_interaction('Aves', 'preysOn', 'targetTaxonClassName', 5)
It’s really fun to see what birds prey on! They have a wide range of groups of species that they prey on! Insects, mammals, fish, amphibians, and cephlapods. Feel free to use this function to explore more species within this data. I hope this makes it easier for other to explore this amazing data.
Below are examples of ways to map more than one interaction type. The first example shows all the top five families in which Humans are a host of, then below is the top five families of species with both the ‘host of’ and ‘eats’ interation types
#interaction plot of top 5 families that Homo sapiens eats and top 5 families that Homo sapiens are host of :
plot_interaction('Homo sapiens', ['hostOf'], 'targetTaxonFamilyName', 5)
#interaction plot of top 5 orders that Homo sapiens eats and top 5 orders that Homo sapiens are host of :
plot_interaction('Homo sapiens', ['eats', 'hostOf'], 'targetTaxonOrderName', 5)
Conclusion: What I learned while working with GloBi
I really enjoyed my time working with the data in GloBI. GloBI speciallization in species interactions makes it different from other databases I explored. It makes connections between different species rather than focusing on one species at a time, which allows exploring interesting characteristics such as species interaction networks. If I have time in the future, I would like to explore the map of interactions between species by connecting GloBI to other databases which contains information like location. I expect interesting patterns to be found when connecting species interaction types to a geographic map. How does location affect interactions between species? I would love to see someone look into this!
Through the overall data exploration, I learned how to make something into clickable URLs, which was incredibly helpful to understanding what my data meant. I also gained experience on drawing directed graph using “networkx” with Python, which I already see the application of using Networks for other types of data beyond GloBI. In the past, I had little background knowledge in ecology, but after exploring GloBI, I have gained (although superficial) an understanding of how one could explore ecology on this planet.
When working with the GloBI database, I got the oppurtunity to discuss the GloBI database architecture with one of the main contibutors of GloBi, Jorrit Poelen. During the discussion with Jorrit, we talked about where the limitations come from, how to keep track of different versions and how to effectively connect data sources(museums), database and users. What is more, I also learned about API queries, cloud storage, and version control behind the GloBI database.
Overall, it has been an enjoyable research, through which I have obtained a lot of new knowledge on database management and Ecology. It also enriched my experiences performing statistical analysis of research questions, data exploration, and visualization. Overall, I gained confidence in data analysis and am more comfortable with collaborative coding and creating reproducible analysis methods.
This project was performed during my last semester at UC Berkeley and I am excited to do more data science work with any interesting data I can find!