Read spatial data into SedonaDB

This function reads spatial files and transfers them to SedonaDB. Parquet files (local, HTTP, S3) are read natively by SedonaDB. Other formats (Shapefile, GeoJSON, GPKG) are read via DuckDB and streamed via Arrow.

Usage

sx_read(
  path,
  data_reader = "auto",
  query = NULL,
  view_name = NULL,
  target = "sedonadb",
  options = NULL,
  shp_encoding = NULL,
  layer = NULL,
  spatial_filter = NULL,
  open_options = NULL,
  allowed_drivers = NULL,
  hive_partitioning = NULL,
  union_by_name = NULL,
  max_batch_size = NULL,
  verbosity = NULL,
  ...
)

Arguments

path

Character string. Path to the file (local, HTTP, or S3) to read. Can also be a table name if using the DuckDB reader with a custom conn.

data_reader

Character string specifying which data reader to use. Options:

"auto" (default): Auto-detect based on file type
"sedonadb": Native SedonaDB reader (parquet only)
"duckdb": Use DuckDB spatial reader (shp, gpkg, geojson, etc.) Most users should use "auto".

query

Optional SQL query (for DuckDB reader only). If NULL, reads all data. Use %s placeholder for the path if needed.

view_name

Character (optional). Name to register the result as a persistent view in the active backend. If NULL (default), returns the result directly without creating a view.

Not all backends support named views. Check backend-specific documentation for availability.

target

Target engine for loading. Currently only "sedonadb" is supported.

options

Named list of options for SedonaDB parquet reader (e.g., S3 credentials). Example: list("aws.region" = "us-west-2", "aws.skip_signature" = TRUE)

shp_encoding

Character encoding for Shapefile attribute data (e.g., "UTF-8", "CP1252"). Only used when reading Shapefiles via DuckDB. Useful for non-ASCII characters.

layer

Layer name to read (for multi-layer formats like GPKG).

spatial_filter

WKT string or sf/sfc object defining a spatial filter bounding box. Only rows intersecting this box will be read.

open_options

Character vector of driver-specific open options for GDAL (e.g., c("HEADERS=FORCE") for CSV). Passed to ST_Read.

allowed_drivers

Character vector of GDAL driver names to restrict reading to.

hive_partitioning

Logical. For partitioned Parquet directories, if TRUE, interprets the directory structure as Hive-style partitioning.

union_by_name

Logical. For multi-file Parquet reads, if TRUE, unifies columns by name across files (handles schema variations).

max_batch_size

Integer. Maximum batch size for GDAL reads via ST_Read.

verbosity

Character or NULL. Controls message output for this function call.

"quiet": Suppress all informational messages.
"info": Show standard progress and status messages.
"debug": Show additional diagnostic messages for troubleshooting.

If NULL (the default), uses the global sx.verbosity option. See sx_options() for persistent configuration.

...

Additional arguments passed to the data reader (e.g., conn for DuckDB).

Value

A sedonadb_dataframe.

Examples

if (FALSE) { # \dontrun{
# Auto-detect: parquet uses SedonaDB, shp uses DuckDB
sdf <- sx_read("path/to/file.parquet")
sdf <- sx_read("path/to/file.shp")

# S3 parquet with options
sdf <- sx_read(
  "s3://bucket/path/file.parquet",
  options = list("aws.region" = "us-west-2")
)

# HTTP parquet
sdf <- sx_read("https://example.com/data.parquet")

# Read shapefile with explicit encoding
df <- sx_read("data.shp", shp_encoding = "CP1252")

# Read specific layer from GPKG
df <- sx_read("data.gpkg", layer = "counties")

# Read with driver-specific open options
df <- sx_read("data.csv", open_options = c("HEADERS=FORCE"))

# Read partitioned parquet with Hive partitioning
df <- sx_read("data_partitioned/", hive_partitioning = TRUE)

# Read from existing DuckDB table
df <- sx_read("my_table", conn = my_duckdb_conn)
} # }