arrange.data {gainML} | R Documentation |
Generates datasets that consist of the measurements from REF, CTR-b, and CTR-n turbines only. Filters the datasets by eliminating data points with a missing measurement and those with negative power output (optional). Generates training and test datasets for k-fold CV and splits the entire data into period 1 data and period 2 data.
arrange.data(df1, df2, df3, p1.beg, p1.end, p2.beg, p2.end, time.format = "%Y-%m-%d %H:%M:%S", k.fold = 5, col.time = 1, col.turb = 2, bootstrap = NULL, free.sec = NULL, neg.power = FALSE)
df1 |
A dataframe for reference turbine data. This dataframe must include five columns: timestamp, turbine id, wind direction, power output, and air density. |
df2 |
A dataframe for baseline control turbine data. This dataframe must include four columns: timestamp, turbine id, wind speed, and power output. |
df3 |
A dataframe for neutral control turbine data. This dataframe must
include four columns and have the same structure with |
p1.beg |
A string specifying the beginning date of period 1. By default,
the value needs to be specified in %Y-%m-%d format, for example,
|
p1.end |
A string specifying the end date of period 1. For example, if
the value is |
p2.beg |
A string specifying the beginning date of period 2. |
p2.end |
A string specifying the end date of period 2. Defined similarly
as |
time.format |
A string describing the format of time stamps used in the
data to be analyzed. The default value is |
k.fold |
An integer defining the number of data folds for the period 1
analysis and prediction. In the period 1 analysis, k-fold cross
validation (CV) will be applied to choose the optimal set of covariates
that results in the least prediction error. The value of |
col.time |
An integer specifying the column number of time stamps in wind turbine datasets. The default value is 1. |
col.turb |
An integer specifying the column number of turbines' id in wind turbine datasets. The default value is 2. |
bootstrap |
An integer indicating the current replication (run) number
of bootstrap. If set to |
free.sec |
A list of vectors defining free sectors. Each vector in the
list has two scalars: one for starting direction and another for ending
direction, ordered clockwise. For example, a vector of |
neg.power |
Either |
The function returns a list of several datasets including the following.
train
A list containing k datasets that will be used to train the machine learning model.
test
A list containing k datasets that will be used to test the machine learning model.
per1
A dataframe containing the period 1 data.
per2
A dataframe containing the period 2 data.
df.ref <- with(wtg, data.frame(time = time, turb.id = 1, wind.dir = D, power = y, air.dens = rho)) df.ctrb <- with(wtg, data.frame(time = time, turb.id = 2, wind.spd = V, power = y)) df.ctrn <- df.ctrb df.ctrn$turb.id <- 3 # For Full Sector Analysis data <- arrange.data(df.ref, df.ctrb, df.ctrn, p1.beg = '2014-10-24', p1.end = '2014-10-27', p2.beg = '2014-10-27', p2.end = '2014-10-30') # For Free Sector Analysis free.sec <- list(c(310, 50), c(150, 260)) data <- arrange.data(df.ref, df.ctrb, df.ctrn, p1.beg = '2014-10-24', p1.end = '2014-10-27', p2.beg = '2014-10-27', p2.end = '2014-10-30', free.sec = free.sec) length(data$train) #This equals to k. length(data$test) #This equals to k. head(data$per1) #This shows the beginning of the period 1 dataset. head(data$per2) #This shows the beginning of the period 2 dataset.