predict_race {wru} | R Documentation |
Race prediction function.
Description
predict_race
makes probabilistic estimates of individual-level race/ethnicity.
Usage
predict_race(
voter.file,
census.surname = TRUE,
surname.only = FALSE,
surname.year = 2010,
census.geo,
census.key,
census.data = NA,
age = FALSE,
sex = FALSE,
party,
retry = 0
)
Arguments
voter.file |
An object of class data.frame .
Must contain a row for each individual being predicted,
as well as a field named surname containing each individual's surname.
If using geolocation in predictions, voter.file must contain a field named
state , which contains the two-character abbreviation for each individual's
state of residence (e.g., "nj" for New Jersey).
If using Census geographic data in race/ethnicity predictions,
voter.file must also contain at least one of the following fields:
county , tract , block , and/or place .
These fields should contain character strings matching U.S. Census categories.
County is three characters (e.g., "031" not "31" ),
tract is six characters, and block is four characters. Place is five characters.
See below for other optional fields.
|
census.surname |
A TRUE /FALSE object. If TRUE ,
function will call merge_surnames to merge in Pr(Race | Surname)
from U.S. Census Surname List (2000 or 2010) and Spanish Surname List.
If FALSE , voter.file object must contain additional fields specifying
Pr(Race | Surname), named as follows: p_whi for Whites,
p_bla for Blacks, p_his for Hispanics/Latinos,
p_asi for Asians, and/or p_oth for Other.
Default is TRUE .
|
surname.only |
A TRUE /FALSE object. If TRUE , race predictions will
only use surname data and calculate Pr(Race | Surnname). Default is FALSE .
|
surname.year |
A number to specify the year of the census surname statistics.
These surname statistics is stored in the data, and will be automatically loaded.
The default value is 2010 , which means the surname statistics from the
2010 census will be used. Currently, the other available choice is 2000 .
|
census.geo |
An optional character vector specifying what level of
geography to use to merge in U.S. Census 2010 geographic data. Currently
"county" , "tract" , "block" , and "place" are supported.
Note: sufficient information must be in user-defined voter.file object.
If census.geo = "county" , then voter.file
must have column named county .
If census.geo = "tract" , then voter.file
must have columns named county and tract .
And if census.geo = "block" , then voter.file
must have columns named county , tract , and block .
If census.geo = "place" , then voter.file
must have column named place .
Specifying census.geo will call census_helper function
to merge Census geographic data at specified level of geography.
|
census.key |
A character object specifying user's Census API
key. Required if census.geo is specified, because
a valid Census API key is required to download Census geographic data.
|
census.data |
A list indexed by two-letter state abbreviations,
which contains pre-saved Census geographic data.
Can be generated using get_census_data function.
|
age |
An optional TRUE /FALSE object specifying whether to
condition race predictions on age (in addition to surname and geolocation).
Default is FALSE . Must be same as age in census.data object.
May only be set to TRUE if census.geo option is specified.
If TRUE , voter.file should include a numerical variable age .
|
sex |
optional TRUE /FALSE object specifying whether to
condition race predictions on sex (in addition to surname and geolocation).
Default is FALSE . Must be same as sex in census.data object.
May only be set to TRUE if census.geo option is specified.
If TRUE , voter.file should include a numerical variable sex ,
where sex is coded as 0 for males and 1 for females.
|
party |
An optional character object specifying party registration field
in voter.file , e.g., party = "PartyReg" .
If specified, race/ethnicity predictions will be conditioned
on individual's party registration (in addition to geolocation).
Whatever the name of the party registration field in voter.file ,
it should be coded as 1 for Democrat, 2 for Republican, and 0 for Other.
|
retry |
The number of retries at the census website if network interruption occurs.
|
Details
This function implements the Bayesian race prediction methods outlined in
Imai and Khanna (2015). The function produces probabilistic estimates of
individual-level race/ethnicity, based on surname, geolocation, and party.
Value
Output will be an object of class data.frame
. It will
consist of the original user-input data with additional columns with
predicted probabilities for each of the five major racial categories:
pred.whi
for White,
pred.bla
for Black,
pred.his
for Hispanic/Latino,
pred.asi
for Asian/Pacific Islander, and
pred.oth
for Other/Mixed.
Examples
data(voters)
predict_race(voters, surname.only = TRUE)
predict_race(voter.file = voters, surname.only = TRUE)
## Not run: predict_race(voter.file = voters, census.geo = "tract", census.key = "...")
## Not run: predict_race(voter.file = voters, census.geo = "tract", census.key = "...", age = T)
## Not run: predict_race(voter.file = voters, census.geo = "place", census.key = "...", sex = T)
## Not run: CensusObj <- get_census_data("...", state = c("NY", "DC", "NJ"));
predict_race(voter.file = voters, census.geo = "tract", census.data = CensusObj, party = "PID")
## End(Not run)
## Not run: CensusObj2 <- get_census_data(key = "...", state = c("NY", "DC", "NJ"), age = T, sex = T);
predict_race(voter.file = voters, census.geo = "tract", census.data = CensusObj2, age = T, sex = T)
## End(Not run)
## Not run: CensusObj3 <- get_census_data(key = "...", state = c("NY", "DC", "NJ"), census.geo = "place");
predict_race(voter.file = voters, census.geo = "place", census.data = CensusObj3)
## End(Not run)
[Package
wru version 0.1-12
Index]