Mobility Flows and Accessibility Using R and Big Open Data

1. Getting the Spanish Open Human Mobility Data in a Reproducible Way using {spanishoddata} and aggregating it with {duckdb}

July 21, 2025

Get in touch

Workshop overview

13:30 — Getting the Open Human Mobility Data in a Reproducible Way using spanishoddata R package and aggregating the data using duckdb
Visualization of Mobility using flowmapper and flowmapblue
15:00 — 15:30 Official Coffee Break
— 17:00 Accessibility and its relation to actual mobility

Getting the Spanish Open Human Mobility Data in a Reproducible Way using {spanishoddata} and aggregating it with {duckdb}

Contents

Survey-based human mobility

European Labour Force Surveys

Survey-based human mobility

German Mobility Survey

Census-based human mobility

UK Census

Census-based human mobility

UK Census

Register-based human mobility

Netherlands Register Data

IOT GPS Data

The Volkswagen leak

Mobile Apps + Mobile Network GPS Data

App usage by Orange customers in France

Mobile phone data

Flowminder

Mobile phone data for humanitarian and development efforts in low- and middle-income countries

Flowminder Software

FlowKit - privacy preserving mobile phone data aggregation

Spanish Open Mobility Big Data

Spanish Open Mobility Big Data

~ 5 years of daily hourly flows

Data by Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024)

Based on 13 million customers of Orange Spain, expanded to full population of Spain

Data interface

Spanish Open Mobility Big Data

3500+ zones across Spain and beyond

Data by Ministerio de Transportes y Movilidad Sostenible (MITMS) (2024)

Based on 13 million customers of Orange Spain, expanded to full population of Spain

Data interface

Spanish Open Mobility Big Data

Spanish Open Mobility Big Data

Spanish Open Mobility Big Data

Spanish Open Mobility Big Data

Academic research

Multi-MNO Project

Multi-MNO Project

Multi-MNO Project

Mobility data aggregation using unified methodology

Multi-MNO Project

How it works

Multi-MNO Software

https://github.com/eurostat/multimno

Multi-MNO Outputs

What to expect?

spanishoddata R package

spanishoddata

access Spanish Open Mobility Big Data from R

spanishoddata - access open human mobility data


A package with many companions

spanishoddata use cases

Split work and non-work trips

spanishoddata use cases

Split into different trip types

spanishoddata use cases

Split into different trip types

spanishoddata use cases

Valencia flood (DANA) in October 2024

spanishoddata use cases

Valencia flood (DANA) in October 2024

The share of people who have not spent the night at home.

spanishoddata use cases

Anntennas were offline - Facebook shows a slightly different picture

Get the data with {spanishoddata}

Get the data

Download one by one?

Get the data

Write your own XML parser?

Get the data

Time consuming options

Download one by one?

Write your own XML parser?
  • Custom code to download and import multiple days
  • Variable names in Spanish
  • No gurantee of consistent variable types
  • Limited by available memory
  • Slow data processing (raw csv data)

::::

Get the data

Time consuming options

Download one by one?

Write your own XML parser?

The fastest way

Use {spanishoddata} package

library(spanishoddata)
spod_set_data_dir("data")

od_data <- spod_get(
  type = "origin-destination",
  zones = "districts",
  dates = c(
    start = "2022-03-01",
    end = "2022-03-07"
  )
)

Get the flows data

The fastest way

Use {spanishoddata} package

library(spanishoddata)
spod_set_data_dir("data")

od_data <- spod_get(
  type = "origin-destination",
  zones = "districts",
  dates = c(
    start = "2022-01-01",
    end = "2022-01-04"
  )
)
library(dplyr)
glimpse(od_data)

Rows: ??
Columns: 20          
Database: DuckDB v1.2.1 [root@Darwin 24.4.0:R 4.5.0/:memory:]
$ date                        <date> 2022-01-04, 2022-01-04, 2
$ hour                        <int> 0, 0, 0, 1, 1, 3, 4, 4, 5,…
$ id_origin                   <fct> 01001, 01001, 01001, 01001
$ id_destination              <fct> 01009_AM, 01009_AM, 01009_…
$ distance                    <fct> 2-10, 2-10, 2-10, 2-10, 2-
$ activity_origin             <fct> home, frequent_activity, w…
$ activity_destination        <fct> frequent_activity, home, h…
$ study_possible_origin       <lgl> FALSE, FALSE, FALSE, FALSE…
$ study_possible_destination  <lgl> FALSE, FALSE, FALSE, FALSE…
$ residence_province_ine_code <fct> 01, 01, 01, 01, 01, 01, 01
$ residence_province_name     <fct> "Araba/Álava", "Araba/Álav…
$ income                      <fct> 10-15, >15, >15, >15, >15,…
$ age                         <fct> NA, NA, NA, NA, NA, NA, NA…
$ sex                         <fct> NA, NA, NA, NA, NA, NA, NA…
$ n_trips                     <dbl> 4.894, 1.779, 1.094, 1.094…
$ trips_total_length_km       <dbl> 27.966, 5.997, 4.081, 4.16…
$ year                        <int> 2022, 2022, 2022, 2022, 20…
$ month                       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day                         <int> 4, 4, 4, 4, 4, 4, 4, 4, 4,…

Get the flows data

library(dplyr)
glimpse(od_data)

Rows: ??
Columns: 20          
Database: DuckDB v1.2.1 [root@Darwin 24.4.0:R 4.5.0/:memory:]
$ date                        <date> 2022-01-04, 2022-01-04, 2
$ hour                        <int> 0, 0, 0, 1, 1, 3, 4, 4, 5,…
$ id_origin                   <fct> 01001, 01001, 01001, 01001
$ id_destination              <fct> 01009_AM, 01009_AM, 01009_…
$ distance                    <fct> 2-10, 2-10, 2-10, 2-10, 2-
$ activity_origin             <fct> home, frequent_activity, w…
$ activity_destination        <fct> frequent_activity, home, h…
$ study_possible_origin       <lgl> FALSE, FALSE, FALSE, FALSE…
$ study_possible_destination  <lgl> FALSE, FALSE, FALSE, FALSE…
$ residence_province_ine_code <fct> 01, 01, 01, 01, 01, 01, 01
$ residence_province_name     <fct> "Araba/Álava", "Araba/Álav…
$ income                      <fct> 10-15, >15, >15, >15, >15,…
$ age                         <fct> NA, NA, NA, NA, NA, NA, NA…
$ sex                         <fct> NA, NA, NA, NA, NA, NA, NA…
$ n_trips                     <dbl> 4.894, 1.779, 1.094, 1.094…
$ trips_total_length_km       <dbl> 27.966, 5.997, 4.081, 4.16…
$ year                        <int> 2022, 2022, 2022, 2022, 20…
$ month                       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day                         <int> 4, 4, 4, 4, 4, 4, 4, 4, 4,…

Get the codebook in R or online

spod_codebook(ver = 2)

Get the boundaries data

zones <- spod_get_zones(
  zones = "districts",
  ver = 2 # for 2022 data onwards
)

zones |>
  sf::st_geometry() |>
  ggplot2::ggplot() +
  ggplot2::geom_sf()

Cite the data and the package

spod_cite()
Plain text citations:
---------------------
To cite the spanishoddata package:
Kotov E, Lovelace R, Vidal-Tortosa E (2024). spanishoddata.
doi:10.32614/CRAN.package.spanishoddata
https://doi.org/10.32614/CRAN.package.spanishoddata,
https://github.com/rOpenSpain/spanishoddata. 

To cite the Ministry's mobility study website:
Ministerio de Transportes y Movilidad Sostenible (MITMS)
(2024). “Estudio de la movilidad con Big Data (Study of
mobility with Big Data).”
https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data. 

To cite the methodology for 2020-2021 data:
Ministerio de Transportes, Movilidad y Agenda Urbana (MITMA)
(2021). Análisis de la movilidad en España con tecnología Big
Data durante el estado de alarma para la gestión de la crisis
del COVID-19 (Analysis of mobility in Spain with Big Data
technology during the state of alarm for COVID-19 crisis
management).
https://cdn.mitma.gob.es/portal-web-drupal/covid-19/bigdata/mitma-estudiomovilidadcovid-19informemetodologico_v3.pdf. 

To cite the methodology for 2022 and onwards data:
Ministerio de Transportes y Movilidad Sostenible (MITMS)
(2024). Estudio de movilidad de viajeros de ámbito nacional
aplicando la tecnología Big Data. Informe metodológico (Study
of National Traveler mobility Using Big Data Technology.
Methodological Report).
https://www.transportes.gob.es/recursosmfom/paginabasica/recursos/a3informemetodologicoestudiomovilidadmitms_v8.pdf. 

Note: A more up-to-date methodology document may be available at https://www.transportes.gob.es/ministerio/proyectos-singulares/estudios-de-movilidad-con-big-data/metodologia-del-estudio-de-movilidad-con-bigdata


Markdown citations:
-------------------
**To cite the spanishoddata package:**
Kotov E, Lovelace R, Vidal-Tortosa E (2024). _spanishoddata_.
doi:10.32614/CRAN.package.spanishoddata
https://doi.org/10.32614/CRAN.package.spanishoddata,
https://github.com/rOpenSpain/spanishoddata. 

**To cite the Ministry's mobility study website:**
Ministerio de Transportes y Movilidad Sostenible (MITMS)
(2024). “Estudio de la movilidad con Big Data (Study of
mobility with Big Data).”
https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data. 

**To cite the methodology for 2020-2021 data:**
Ministerio de Transportes, Movilidad y Agenda Urbana (MITMA)
(2021). _Análisis de la movilidad en España con tecnología Big
Data durante el estado de alarma para la gestión de la crisis
del COVID-19 (Analysis of mobility in Spain with Big Data
technology during the state of alarm for COVID-19 crisis
management)_.
https://cdn.mitma.gob.es/portal-web-drupal/covid-19/bigdata/mitma_-_estudio_movilidad_covid-19_informe_metodologico_v3.pdf. 

**To cite the methodology for 2022 and onwards data:**
Ministerio de Transportes y Movilidad Sostenible (MITMS)
(2024). _Estudio de movilidad de viajeros de ámbito nacional
aplicando la tecnología Big Data. Informe metodológico (Study
of National Traveler mobility Using Big Data Technology.
Methodological Report)_.
https://www.transportes.gob.es/recursos_mfom/paginabasica/recursos/a3_informe_metodologico_estudio_movilidad_mitms_v8.pdf. 

> **Note:** A more up-to-date methodology document may be available at https://www.transportes.gob.es/ministerio/proyectos-singulares/estudios-de-movilidad-con-big-data/metodologia-del-estudio-de-movilidad-con-bigdata


BibTeX citations:
-----------------
%% To cite the spanishoddata package
@Manual{r-spanishoddata,
  title = {spanishoddata},
  author = {Egor Kotov and Robin Lovelace and Eugeni Vidal-Tortosa},
  year = {2024},
  url = {https://github.com/rOpenSpain/spanishoddata},
  doi = {10.32614/CRAN.package.spanishoddata},
}

%% To cite the Ministry's mobility study website
@Misc{mitms_mobility_web,
  title = {Estudio de la movilidad con Big Data (Study of mobility with Big Data)},
  author = {{Ministerio de Transportes y Movilidad Sostenible (MITMS)}},
  year = {2024},
  url = {https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data},
}

%% To cite the methodology for 2020-2021 data
@Manual{mitma_methodology_2020_v3,
  title = {Análisis de la movilidad en España con tecnología Big Data durante el estado de alarma para la gestión de la crisis del COVID-19 (Analysis of mobility in Spain with Big Data technology during the state of alarm for COVID-19 crisis management)},
  author = {{Ministerio de Transportes, Movilidad y Agenda Urbana (MITMA)}},
  year = {2021},
  url = {https://cdn.mitma.gob.es/portal-web-drupal/covid-19/bigdata/mitma_-_estudio_movilidad_covid-19_informe_metodologico_v3.pdf},
}

%% To cite the methodology for 2022 and onwards data
@Manual{mitms_methodology_2022_v8,
  title = {Estudio de movilidad de viajeros de ámbito nacional aplicando la tecnología Big Data. Informe metodológico (Study of National Traveler mobility Using Big Data Technology. Methodological Report)},
  author = {{Ministerio de Transportes y Movilidad Sostenible (MITMS)}},
  year = {2024},
  url = {https://www.transportes.gob.es/recursos_mfom/paginabasica/recursos/a3_informe_metodologico_estudio_movilidad_mitms_v8.pdf},
}

%% Note: A more up-to-date methodology document may be available at https://www.transportes.gob.es/ministerio/proyectos-singulares/estudios-de-movilidad-con-big-data/metodologia-del-estudio-de-movilidad-con-bigdata

Big Data on a Small Laptop

DuckDB Format Advantages

DuckDB in Action

Imagine a typical laptop

DuckDB in Action

Filter and summary

library(dplyr)

od_data |>
  filter(
    year == 2022,
    month %in% c(2, 3, 4)
    ) |>
  summarise(
    n_trips = mean(n_trips)
  ) |>
  collect()

DuckDB in Action

Summary on full data set

library(dplyr)

od_data |>
  summarise(
    n_trips = mean(n_trips)
  ) |>
  collect()

DuckDB in Action

Summary over multiple groups on full data set

library(dplyr)

od_data |>
  group_by(
    year,
    month,
    day,
    id_origin,
    id_destination
  )
  summarise(
    n_trips = mean(n_trips)
  ) |>
  collect()

DuckDB in Action

Summary over multiple groups on full data set

library(dplyr)

od_data |>
  group_by(
    year,
    month,
    day,
    id_origin,
    id_destination
  )
  summarise(
    n_trips = mean(n_trips)
  ) |>
  collect()

References

Kotov, Egor, Robin Lovelace, and Eugeni Vidal-Tortosa. 2024. Spanishoddata. https://doi.org/10.32614/CRAN.package.spanishoddata.
Martínez-Durive, Orlando E., Sachit Mishra, Cezary Ziemlicki, Stefania Rubrichi, Zbigniew Smoreda, and Marco Fiore. 2023. “The NetMob23 Dataset: A High-resolution Multi-region Service-level Mobile Data Traffic Cartography.” arXiv. https://doi.org/10.48550/arXiv.2305.06933.
Ministerio de Transportes y Movilidad Sostenible (MITMS). 2024. “Estudio de La Movilidad Con Big Data (Study of Mobility with Big Data).” https://www.transportes.gob.es/ministerio/proyectos-singulares/estudio-de-movilidad-con-big-data.
Mühleisen, Hannes, and Mark Raasveldt. 2024. Duckdb: DBI Package for the DuckDB Database Management System. https://doi.org/10.32614/CRAN.package.duckdb.
Raasveldt, Mark, and Hannes Muehleisen. 2018. DuckDB.” https://github.com/duckdb/duckdb.