spod_duckdb_od {spanishoddata} | R Documentation |
Creates a duckdb connection to origin-destination data
Description
This function creates a duckdb connection to the origin-destination data stored in CSV.gz files.
Usage
spod_duckdb_od(
con = DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:", read_only = FALSE),
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"),
ver = NULL,
data_dir = spod_get_data_dir()
)
Arguments
con |
A duckdb connection object. If not specified, a new in-memory connection will be created. |
zones |
The zones for which to download the data. Can be |
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. |
data_dir |
The directory where the data is stored. Defaults to the value returned by |
Value
A duckdb
connection object with 2 views:
-
od_csv_raw
- a raw table view of all cached CSV files with the origin-destination data that has been previously cached in $SPANISH_OD_DATA_DIR -
od_csv_clean
- a cleaned-up table view ofod_csv_raw
with column names and values translated and mapped to English. This still includes all cached data.
The structure of the cleaned-up views od_csv_clean
is as follows:
- date
Date
. The full date of the trip, including year, month, and day.- id_origin
factor
. The identifier for the origin location of the trip, formatted as a code (e.g., '01001_AM').- id_destination
factor
. The identifier for the destination location of the trip, formatted as a code (e.g., '01001_AM').- activity_origin
factor
. The type of activity at the origin location (e.g., 'home', 'work'). Note: Only available for district level data.- activity_destination
factor
. The type of activity at the destination location (e.g., 'home', 'other'). Note: Only available for district level data.- residence_province_ine_code
factor
. The province of residence for the group of individual making the trip, encoded according to the INE classification. Note: Only available for district level data.- residence_province_name
factor
. The province of residence for the group of individuals making the trip (e.g., 'Cuenca', 'Girona'). Note: Only available for district level data.- time_slot
integer
. The time slot (the hour of the day) during which the trip started, represented as an integer (e.g., 0, 1, 2).- distance
factor
. The distance category of the trip, represented as a code (e.g., '002-005' for 2-5 km).- n_trips
double
. The number of trips taken within the specified time slot and distance.- trips_total_length_km
double
. The total length of all trips in kilometers for the specified time slot and distance.- year
double
. The year of the trip.- month
double
. The month of the trip.- day
double
. The day of the trip.
The structure of the original data in od_csv_raw
is as follows:
- fecha
Date
. The date of the trip, including year, month, and day.- origen
character
. The identifier for the origin location of the trip, formatted as a character string (e.g., '01001_AM').- destino
character
. The identifier for the destination location of the trip, formatted as a character string (e.g., '01001_AM').- actividad_origen
character
. The type of activity at the origin location (e.g., 'casa', 'trabajo').- actividad_destino
character
. The type of activity at the destination location (e.g., 'otros', 'trabajo').- residencia
character
. The code representing the residence of the individual making the trip (e.g., '01') according to the official INE classification.- edad
character
. The age of the individual making the trip. This data is actaully filled with 'NA' values, which is why this column is removed in the cleaned-up and translated view described above.- periodo
integer
. The time period during which the trip started, represented as an integer (e.g., 0, 1, 2).- distancia
character
. The distance category of the trip, represented as a character string (e.g., '002-005' for 2-5 km).- viajes
double
. The number of trips taken within the specified time period and distance.- viajes_km
double
. The total length of all trips in kilometers for the specified time period and distance.- day
double
. The day of the trip.- month
double
. The month of the trip.- year
double
. The year of the trip.