Analysing massive open human mobility data using {spanishoddata}, {duckdb}, and flowmaps

July 3, 2025

Contents

What we will do today?

What we will do today?

  • explore data

  • discuss insights, idea and limitations

  • potentially co-author a workshop-report short paper (in less than 2 weeks, strict deadline) with some visualisations and insights

What do we know about human mobility?

Survey-based human mobility

European Labour Force Surveys

Survey-based human mobility

German Mobility Survey

Census-based human mobility

UK Census

Census-based human mobility

UK Census

Register-based human mobility

Netherlands Register Data

IOT GPS Data

The Volkswagen leak

Mobile Apps + Mobile Network GPS Data

App usage by Orange customers in France

Mobile phone data

Flowminder

Mobile phone data for humanitarian and development efforts in low- and middle-income countries

Flowminder Software

FlowKit - privacy preserving mobile phone data aggregation

Spanish Open Mobility Big Data

Spanish Open Mobility Big Data

~ 5 years of daily hourly flows

Data by Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024)

Based on 13 million customers of Orange Spain, expanded to full population of Spain

Data interface

Spanish Open Mobility Big Data

3500+ zones across Spain and beyond

Data by Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024)

Based on 13 million customers of Orange Spain, expanded to full population of Spain

Data interface

Spanish Open Mobility Big Data

Spanish Open Mobility Big Data

Spanish Open Mobility Big Data

Spanish Open Mobility Big Data

Academic research

Spanish Open Mobility Big Data

Effects of human mobility on dispersal of disease-transmitting mosquitoes

Uses Labour Force Survey data
(Lucati et al. 2022)

Builds on previous research using mobile phone based data
(Kotov, Bartumeus, and Palmer 2024)

Multi-MNO Project

Multi-MNO Project

Multi-MNO Project

Mobility data aggregation using unified methodology

![](media/mobility-data-slides/tf-mno-position-paper.png

Multi-MNO Project

How it works

Multi-MNO Software

https://github.com/eurostat/multimno

Multi-MNO Outputs

What to expect?

spanishoddata R package

spanishoddata

access Spanish Open Mobility Big Data from R

spanishoddata - access open human mobility data

A package with many companions

spanishoddata use cases

Split work and non-work trips

spanishoddata use cases

Split into different trip types

spanishoddata use cases

Split into different trip types

Get the data with {spanishoddata}

Get the data

Download one by one?

Get the data

Write your own XML parser?

Get the data

Time consuming options

Download one by one?

Write your own XML parser?
  • Custom code to download and import multiple days
  • Variable names in Spanish
  • No gurantee of consistent variable types
  • Limited by available memory
  • Slow data processing (raw csv data)

::::

Get the data

Time consuming options

Download one by one?

Write your own XML parser?

The fastest way

Use {spanishoddata} package

library(spanishoddata)
spod_set_data_dir("data")

od_data <- spod_get(
  type = "origin-destination",
  zones = "districts",
  dates = c(
    start = "2022-03-01",
    end = "2022-03-07"
  )
)

Get the data

The fastest way

Use {spanishoddata} package

library(spanishoddata)
spod_set_data_dir("data")

od_data <- spod_get(
  type = "origin-destination",
  zones = "districts",
  dates = c(
    start = "2022-01-01",
    end = "2022-01-04"
  )
)
library(dplyr)
glimpse(od_data)

Rows: ??
Columns: 20          
Database: DuckDB v1.2.1 [root@Darwin 24.4.0:R 4.5.0/:memory:]
$ date                        <date> 2022-01-04, 2022-01-04, 2
$ hour                        <int> 0, 0, 0, 1, 1, 3, 4, 4, 5,…
$ id_origin                   <fct> 01001, 01001, 01001, 01001
$ id_destination              <fct> 01009_AM, 01009_AM, 01009_…
$ distance                    <fct> 2-10, 2-10, 2-10, 2-10, 2-
$ activity_origin             <fct> home, frequent_activity, w…
$ activity_destination        <fct> frequent_activity, home, h…
$ study_possible_origin       <lgl> FALSE, FALSE, FALSE, FALSE…
$ study_possible_destination  <lgl> FALSE, FALSE, FALSE, FALSE…
$ residence_province_ine_code <fct> 01, 01, 01, 01, 01, 01, 01
$ residence_province_name     <fct> "Araba/Álava", "Araba/Álav…
$ income                      <fct> 10-15, >15, >15, >15, >15,…
$ age                         <fct> NA, NA, NA, NA, NA, NA, NA…
$ sex                         <fct> NA, NA, NA, NA, NA, NA, NA…
$ n_trips                     <dbl> 4.894, 1.779, 1.094, 1.094…
$ trips_total_length_km       <dbl> 27.966, 5.997, 4.081, 4.16…
$ year                        <int> 2022, 2022, 2022, 2022, 20…
$ month                       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day                         <int> 4, 4, 4, 4, 4, 4, 4, 4, 4,…

Big Data on a Small Laptop

DuckDB in Action

Imagine a typical laptop

DuckDB in Action

Filter and summary

library(dplyr)

od_data |>
  filter(
    year == 2022,
    month %in% c(2, 3, 4)
    ) |>
  summarise(
    n_trips = mean(n_trips)
  ) |>
  collect()

DuckDB in Action

Summary on full data set

library(dplyr)

od_data |>
  summarise(
    n_trips = mean(n_trips)
  ) |>
  collect()

DuckDB in Action

Summary over multiple groups on full data set

library(dplyr)

od_data |>
  group_by(
    year,
    month,
    day,
    id_origin,
    id_destination
  )
  summarise(
    n_trips = mean(n_trips)
  ) |>
  collect()

DuckDB in Action

Summary over multiple groups on full data set

library(dplyr)

od_data |>
  group_by(
    year,
    month,
    day,
    id_origin,
    id_destination
  )
  summarise(
    n_trips = mean(n_trips)
  ) |>
  collect()

Flowmaps with {flowmapper} and {flowmapblue}

{flowmapblue} (Boyandin 2024) for interactive flowmaps

{flowmapper} (Mast 2024) for static flowmaps and producing spatial data

Get in touch

Egor Kotov

ekotov.pro

Tutorial website

https://www.ekotov.pro/agit-2025-spanishoddata/

References

Boyandin, Ilya. 2024. Flowmap.blue Widget for r. https://doi.org/10.32614/CRAN.package.flowmapblue.
Kotov, Egor, Frederic Bartumeus, and John Palmer. 2024. “Effects of Human Mobility on the Spread of Disease-Transmitting Mosquitoes in Spain: Insights from Mobile Phone Data.” Abstracts of the ICA 7: 78. https://ica-abs.copernicus.org/articles/7/78/2024/ica-abs-7-78-2024.pdf.
Kotov, Egor, Robin Lovelace, and Eugeni Vidal-Tortosa. 2024. Spanishoddata. https://doi.org/10.32614/CRAN.package.spanishoddata.
Lucati, Federica, Sarah Delacour, John R. B. Palmer, Jenny Caner, Aitana Oltra, Claudia Paredes-Esquivel, Simone Mariani, et al. 2022. “Multiple Invasions, Wolbachia and Human-Aided Transport Drive the Genetic Variability of Aedes Albopictus in the Iberian Peninsula.” Scientific Reports 12 (1): 20682. https://doi.org/10.1038/s41598-022-24963-3.
Martínez-Durive, Orlando E., Sachit Mishra, Cezary Ziemlicki, Stefania Rubrichi, Zbigniew Smoreda, and Marco Fiore. 2023. “The NetMob23 Dataset: A High-resolution Multi-region Service-level Mobile Data Traffic Cartography.” arXiv. https://doi.org/10.48550/arXiv.2305.06933.
Mast, Johannes. 2024. Flowmapper: Draw Flows (Migration, Goods, Money, Information) on ’Ggplot2’ Plots. https://github.com/JohMast/flowmapper.
Ministerio de Transportes y Movilidad Sostenible (MITMS). 2024. “Estudio de La Movilidad Con Big Data (Study of Mobility with Big Data).” https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data.
Mühleisen, Hannes, and Mark Raasveldt. 2024. Duckdb: DBI Package for the DuckDB Database Management System. https://doi.org/10.32614/CRAN.package.duckdb.
Raasveldt, Mark, and Hannes Muehleisen. 2018. DuckDB.” https://github.com/duckdb/duckdb.