Handling missing/incomplete data in R--is there function to mask but not remove NAs?

Posted by doug on Stack Overflow See other posts from Stack Overflow or by doug
Published on 2010-04-10T12:52:47Z Indexed on 2010/04/11 22:13 UTC
Read the original article Hit count: 275

Filed under:
|

As you would expect from a DSL aimed at data analysis, R handles missing/incomplete data very well, for instance:

Many R functions have an 'na.rm' flag that you can set to 'T' to remove the NAs:

mean( c(5,6,12,87,9,NA,43,67), na.rm=T)

But if you want to deal with NAs before the function call, you need to do something like this:

to remove each 'NA' from a vector:

vx = vx[!is.na(a)]

to remove each 'NA' from a vector and replace it w/ a '0':

ifelse(is.na(vx), 0, vx)

to remove entire each row that contains 'NA' from a data frame:

dfx = dfx[complete.cases(dfx),]

All of these functions permanently remove 'NA' or rows with an 'NA' in them.

Sometimes this isn't quite what you want though--making an 'NA'-excised copy of the data frame might be necessary for the next step in the workflow but in subsequent steps you often want those rows back (e.g., to calculate a column-wise statistic for a column that has missing rows caused by a prior call to 'complete cases' yet that column has no 'NA' values in it).

to be as clear as possible about what i'm looking for: python/numpy has a class, 'masked array', with a 'mask' method, which lets you conceal--but not remove--NAs during a function call. Is there an analogous function in R?

© Stack Overflow or respective owner

Related posts about r

    Related posts about data