Behavior of <- NULL on lists versus data.frames for removing data

Posted by Ananda Mahto on Stack Overflow See other posts from Stack Overflow or by Ananda Mahto
Published on 2013-10-17T18:49:34Z Indexed on 2013/10/17 21:54 UTC
Read the original article Hit count: 228

Filed under:
|

Many R users eventually figure out lots of ways to remove elements from their data. One way is to use NULL, particularly when you want to do something like drop a column from a data.frame or drop an element from a list.

Eventually, a user comes across a situation where they want to drop several columns from a data.frame at once, and they hit upon <- list(NULL) as the solution (since using <- NULL will result in an error).

A data.frame is a special type of list, so it wouldn't be too tough to imagine that the approaches for removing items from a list should be the same as removing columns from a data.frame. However, they produce different results, as can be seen in the example below.

## Make some small data--two data.frames and two lists
cars1 <- cars2 <- head(mtcars)[1:4]
cars3 <- cars4 <- as.list(cars2)

## Demonstration that the `list(NULL)` approach works
cars1[c("mpg", "cyl")] <- list(NULL)
cars1
#                   disp  hp
# Mazda RX4          160 110
# Mazda RX4 Wag      160 110
# Datsun 710         108  93
# Hornet 4 Drive     258 110
# Hornet Sportabout  360 175
# Valiant            225 105

## Demonstration that simply using `NULL` does not work
cars2[c("mpg", "cyl")] <- NULL
# Error in `[<-.data.frame`(`*tmp*`, c("mpg", "cyl"), value = NULL) : 
#   replacement has 0 items, need 12

Switch to applying the same concept to a list, and compare the difference in behavior.

## Does not fully drop the items, but sets them to `NULL`
cars3[c("mpg", "cyl")] <- list(NULL)
# $mpg
# NULL
# 
# $cyl
# NULL
# 
# $disp
# [1] 160 160 108 258 360 225
# 
# $hp
# [1] 110 110  93 110 175 105

## *Does* drop the `list` items while this would
##   have produced an error with a `data.frame`
cars4[c("mpg", "cyl")] <- NULL
# $disp
# [1] 160 160 108 258 360 225
# 
# $hp
# [1] 110 110  93 110 175 105

The main questions I have are, if a data.frame is a list, why does it behave so differently in this scenario? Is there a foolproof way of knowing when an element will be dropped, when it will produce an error, and when it will simply be given a NULL value? Or do we depend on trial-and-error for this?

© Stack Overflow or respective owner

Related posts about r

    Related posts about data.frame