df_from_csv {duckplyr} | R Documentation |
df_from_csv
Description
These functions ingest data from a file using a table function. The results are transparently converted to a data frame, but the data is only read when the resulting data frame is actually accessed.
df_from_csv()
reads a CSV file using the read_csv_auto()
table function.
duckplyr_df_from_csv()
is a thin wrapper around df_from_csv()
that calls as_duckplyr_df()
on the output.
df_from_parquet()
reads a Parquet file using the read_parquet()
table function.
duckplyr_df_from_parquet()
is a thin wrapper around df_from_parquet()
that calls as_duckplyr_df()
on the output.
df_to_parquet()
writes a data frame to a Parquet file via DuckDB.
If the data frame is a duckplyr_df
, the materialization occurs outside of R.
An existing file will be overwritten.
This function requires duckdb >= 0.10.0.
df_from_file()
uses arbitrary table functions to read data.
See https://duckdb.org/docs/data/overview for a documentation
of the available functions and their options.
duckplyr_df_from_file()
is a thin wrapper around df_from_file()
that calls as_duckplyr_df()
on the output.
Usage
df_from_csv(path, ..., options = list(), class = NULL)
duckplyr_df_from_csv(path, ..., options = list(), class = NULL)
df_from_parquet(path, ..., options = list(), class = NULL)
duckplyr_df_from_parquet(path, ..., options = list(), class = NULL)
df_to_parquet(data, path)
df_from_file(path, table_function, ..., options = list(), class = NULL)
duckplyr_df_from_file(
path,
table_function,
...,
options = list(),
class = NULL
)
Arguments
path |
Path to a file, a directory, or a set of filenames using wildcards. |
... |
These dots are for future extensions and must be empty. |
options |
Arguments to the DuckDB function
indicated by |
class |
An optional class to add to the data frame.
The returned object will always be a data frame.
Pass |
data |
A data frame to be written to disk. |
table_function |
The name of a table-valued
DuckDB function such as |
Value
A data frame for df_from_file()
, or a duckplyr_df
for
duckplyr_df_from_file()
, extended by the provided class
.
Examples
# Create simple CSV file
path <- tempfile("duckplyr_test_", fileext = ".csv")
write.csv(data.frame(a = 1:3, b = letters[4:6]), path, row.names = FALSE)
# Reading is immediate
df <- df_from_csv(path)
# Materialization only upon access
names(df)
df$a
# Return as tibble:
df_from_file(
path,
"read_csv",
options = list(delim = ",", auto_detect = TRUE),
class = class(tibble())
)
# Read multiple file at once
path2 <- tempfile("duckplyr_test_", fileext = ".csv")
write.csv(data.frame(a = 4:6, b = letters[7:9]), path2, row.names = FALSE)
duckplyr_df_from_csv(file.path(tempdir(), "duckplyr_test_*.csv"))
unlink(c(path, path2))
# Write a Parquet file:
path_parquet <- tempfile(fileext = ".parquet")
df_to_parquet(df, path_parquet)
# With a duckplyr_df, the materialization occurs outside of R:
df %>%
as_duckplyr_df() %>%
mutate(b = a + 1) %>%
df_to_parquet(path_parquet)
duckplyr_df_from_parquet(path_parquet)
unlink(path_parquet)