Biosecurity Alerts GBIF Data Use Club Seminar 2024 Callum Waite
Erin Roger
Shandiya Balasubramaniam

We acknowledge the Traditional Owners of the lands on which we live and work, and pay our respects to Elders past and present. We recognise the spiritual and cultural significance of land, water, and all that is in the environment to Traditional Owners, and their continuing connection to Country.

The Atlas of Living Australia (ALA)

• One of several facilities funded by the Aus govt for national research infrastructure

• Over 850 data providers & weekly ingest of datasets

• Citizen science is our fastest growing data source

Brief history of introduced species in Australia

© Michael Hains (CC) BY-NC

• 5,000-10,000 ya

• Dingo (Canis familiaris ssp. Dingo)

Colonisation of Australia: 1788-1800

Expansion & acclimatisation: 1800’s

Trade & globalisation: 1900-today

Invasive species in the ALA

ALA hosts 2300+ introduced species & 1.9+ million occurrences of pests, weeds, and diseases


Red eared slider
236 occurrences
© Laurent Lebois (CC) BY-NC some rights reserved


Lantana
36,729 occurrences
© Andrew Kinsela

{koel} facilitates the process of searching for taxa within spatial and temporal constraints, summarising this information in a table, and sending the table as an email

Workflow

1
Ingest &
process lists

2
Search for
occurrences

3
Filter & download
occurrences

4
Compile into a table
& send email

From a list…

correct_name provided_name synonyms common_name state lga shape
Solenopsis invicta Solenopsis invicta NA Red Imported Fire Ant AUS NA NA
Austropuccinia psidii Austropuccinia psidii Uredo rangelii Myrtle Rust QLD NA NA
Psittacula krameri Psittacula krameri NA Indian ringneck parrot VIC, TAS NA NA
Leucanthemum vulgare Leucanthemum vulgare Chrysanthemum leucanthemum Ox-Eye Daisy NA Darwin Municipality NA
Anoplophora Anoplophora spp. NA Exotic Longhorn Beetles NA City of Marion, City of Holdfast Bay NA
Rhinella marina Rhinella marina (Linnaeus, 1758) Bufo marinus, Rana marina Cane Toad NA NA QLD_Protected_areas
Erica lusitanica Erica lusitanica NA Spanish Heath VIC Lithgow City Council NA

… to an email

Complexities
in coding

  • Taxonomic
  • Temporal
  • Spatial

Taxonomic
challenges
Cleaning provided taxon names

clean_names <- function(name) {
  cleaned_name <- name |>
    gsub("\u00A0", " ", .) |>      # remove non-ASCII whitespaces (NBSP)
    gsub("\u200B", " ", .) |>      # ... (ZWSP)
    gsub("\n", " ", .) |>          # replace line breaks with spaces
    gsub(";", ",", .) |>           # replace semi-colons with commas
    gsub(" ,", ",", .) |>          # remove spaces before commas
    gsub("\\s{2,}", " ", .) |>     # remove multiple spaces
    gsub(",$", "", .) |>           # remove trailing commas
    gsub(" +$", "", .) |>          # remove trailing spaces
    gsub(",(\\w)", ", \\1", .) |>  # add spaces between commas and text
    gsub(" sp\\.", "", .) |>
    gsub(" spp\\.", "", .) |>      # remove spp. and sp. abbreviations
    str_squish(.)
  
  return(cleaned_name)
}

Taxonomic
challenges
Alerting on different ranks

fields <- c("genus", "species", "subspecies", "scientificName")

request_data() |>
  galah_filter(firstLoadedDate >= upload_date_start,
               firstLoadedDate <= upload_date_end,
               eventDate >= event_date_start,
               eventDate <= event_date_end,
               {{field}} == search_terms) |>
  galah_select(scientificName, vernacularName,
               genus, species, subspecies,
               decimalLatitude, decimalLongitude,
               cl22, cl10923, cl1048, cl966, cl21,
               firstLoadedDate, basisOfRecord,
               group = c("basic", "media")) |>
  collect() |>
  mutate(match = field,
         search_term = .data[[field]],
         across(-c(images, sounds, videos), as.character),
         across(c(images, sounds, videos), as.list))

Temporal
challenges

request_data() |>
  galah_filter(firstLoadedDate >= upload_date_start,
               firstLoadedDate <= upload_date_end,
               eventDate >= event_date_start,
               eventDate <= event_date_end,
               {{field}} == search_terms) |>
  galah_select(scientificName, vernacularName,
               genus, species, subspecies,
               decimalLatitude, decimalLongitude,
               cl22, cl10923, cl1048, cl966, cl21,
               firstLoadedDate, basisOfRecord,
               group = c("basic", "media")) |>
  collect() |>
  mutate(match = field,
         search_term = .data[[field]],
         across(-c(images, sounds, videos), as.character),
         across(c(images, sounds, videos), as.list))

Spatial
challenges

correct_name provided_name synonyms common_name state lga shape
Solenopsis invicta Solenopsis invicta NA Red Imported Fire Ant AUS NA NA
Austropuccinia psidii Austropuccinia psidii Uredo rangelii Myrtle Rust QLD NA NA
Psittacula krameri Psittacula krameri NA Indian ringneck parrot VIC, TAS NA NA
Leucanthemum vulgare Leucanthemum vulgare Chrysanthemum leucanthemum Ox-Eye Daisy NA Darwin Municipality NA
Anoplophora Anoplophora spp. NA Exotic Longhorn Beetles NA City of Marion, City of Holdfast Bay NA
Rhinella marina Rhinella marina (Linnaeus, 1758) Bufo marinus, Rana marina Cane Toad NA NA QLD_Protected_areas
Erica lusitanica Erica lusitanica NA Spanish Heath VIC Lithgow City Council NA

Broader
insights

Modular code

  • Supports changing functionality
  • Easy debugging
  • Consistent & neutral code design

Broader
insights

Modular code

occ_list <- species_records |>
  filter(!is.na(decimalLatitude) & !is.na(decimalLongitude)) |>
  identify_aus() |>
  identify_state() |>
  identify_shape(shapes_path = shapes_path) |>
  identify_lga() |>
  filter(state == "AUS" |
      (!is.na(state) & flagged_state) |
      (!is.na(lga) & flagged_lga) |
      (!is.na(shape) & flagged_shape)) |>
  select(-flagged_state,-flagged_lga,-flagged_shape) |>
  exclude_records() |>
  as_tibble()

Broader
insights

Unit tests

  • Continually updating biological dataset
  • Choosing the right test data
  • Informs workflow design


Toby Hudson, CC BY-SA 3.0

In 12
months…

  • Manual running of scripts to automation via GitHub Actions
  • In the process of transitioning to core ALA systems
  • From 2 lists & 2 users to 35 lists & 54 users
  • Demonstrated links to management and policy

There is no doubt that the biosecurity alert system has improved our statewide surveillance capability [in Queensland]. While we have only been using it for a short period we have already recorded several significant detections. I’ve been promoting the system at every opportunity.

- Steve Csurhes, Biosecurity Queensland


Slides: shandiya.quarto.pub/datauseclub2024
Code: github.com/shandiya/DataUseClub2024