This function reads spatial files and transfers them to SedonaDB. Parquet files (local, HTTP, S3) are read natively by SedonaDB. Other formats (Shapefile, GeoJSON, GPKG) are read via DuckDB and streamed via Arrow.
Usage
sx_read(
path,
data_reader = "auto",
query = NULL,
view_name = NULL,
target = "sedonadb",
options = NULL,
shp_encoding = NULL,
layer = NULL,
spatial_filter = NULL,
open_options = NULL,
allowed_drivers = NULL,
hive_partitioning = NULL,
union_by_name = NULL,
max_batch_size = NULL,
verbosity = NULL,
...
)Arguments
- path
Character string. Path to the file (local, HTTP, or S3) to read. Can also be a table name if using the DuckDB reader with a custom
conn.- data_reader
Character string specifying which data reader to use. Options:
"auto"(default): Auto-detect based on file type"sedonadb": Native SedonaDB reader (parquet only)"duckdb": Use DuckDB spatial reader (shp, gpkg, geojson, etc.) Most users should use"auto".
- query
Optional SQL query (for DuckDB reader only). If NULL, reads all data. Use
%splaceholder for the path if needed.- view_name
Character (optional). Name to register the result as a persistent view in the active backend. If NULL (default), returns the result directly without creating a view.
Not all backends support named views. Check backend-specific documentation for availability.
- target
Target engine for loading. Currently only
"sedonadb"is supported.- options
Named list of options for SedonaDB parquet reader (e.g., S3 credentials). Example:
list("aws.region" = "us-west-2", "aws.skip_signature" = TRUE)- shp_encoding
Character encoding for Shapefile attribute data (e.g., "UTF-8", "CP1252"). Only used when reading Shapefiles via DuckDB. Useful for non-ASCII characters.
- layer
Layer name to read (for multi-layer formats like GPKG).
- spatial_filter
WKT string or
sf/sfcobject defining a spatial filter bounding box. Only rows intersecting this box will be read.- open_options
Character vector of driver-specific open options for GDAL (e.g.,
c("HEADERS=FORCE")for CSV). Passed toST_Read.- allowed_drivers
Character vector of GDAL driver names to restrict reading to.
- hive_partitioning
Logical. For partitioned Parquet directories, if
TRUE, interprets the directory structure as Hive-style partitioning.- union_by_name
Logical. For multi-file Parquet reads, if
TRUE, unifies columns by name across files (handles schema variations).- max_batch_size
Integer. Maximum batch size for GDAL reads via
ST_Read.- verbosity
Character or NULL. Controls message output for this function call.
"quiet": Suppress all informational messages."info": Show standard progress and status messages."debug": Show additional diagnostic messages for troubleshooting.
If NULL (the default), uses the global
sx.verbosityoption. Seesx_options()for persistent configuration.- ...
Additional arguments passed to the data reader (e.g.,
connfor DuckDB).
Examples
if (FALSE) { # \dontrun{
# Auto-detect: parquet uses SedonaDB, shp uses DuckDB
sdf <- sx_read("path/to/file.parquet")
sdf <- sx_read("path/to/file.shp")
# S3 parquet with options
sdf <- sx_read(
"s3://bucket/path/file.parquet",
options = list("aws.region" = "us-west-2")
)
# HTTP parquet
sdf <- sx_read("https://example.com/data.parquet")
# Read shapefile with explicit encoding
df <- sx_read("data.shp", shp_encoding = "CP1252")
# Read specific layer from GPKG
df <- sx_read("data.gpkg", layer = "counties")
# Read with driver-specific open options
df <- sx_read("data.csv", open_options = c("HEADERS=FORCE"))
# Read partitioned parquet with Hive partitioning
df <- sx_read("data_partitioned/", hive_partitioning = TRUE)
# Read from existing DuckDB table
df <- sx_read("my_table", conn = my_duckdb_conn)
} # }
