Partitioning data set in r based on multiple classes of observations

Posted by Danny on Stack Overflow See other posts from Stack Overflow or by Danny
Published on 2012-11-23T22:44:38Z Indexed on 2012/11/23 23:04 UTC
Read the original article Hit count: 270

Filed under:

I'm trying to partition a data set that I have in R, 2/3 for training and 1/3 for testing. I have one classification variable, and seven numerical variables. Each observation is classified as either A, B, C, or D.

For simplicity's sake, let's say that the classification variable, cl, is A for the first 100 observations, B for observations 101 to 200, C till 300, and D till 400. I'm trying to get a partition that has 2/3 of the observations for each of A, B, C, and D (as opposed to simply getting 2/3 of the observations for the entire data set since it will likely not have equal amounts of each classification).

When I try to sample from a subset of the data, such as sample(subset(data, cl=='A')), the columns are reordered instead of the rows.

To summarize, my goal is to have 67 random observations from each of A, B, C, and D as my training data, and store the remaining 33 observations for each of A, B, C, and D as testing data. I have found a very similar question to mine, but it did not factor in multiple variables.

I feel silly asking this question because it seems so simple, but I'm stumped. Also, this is my first question on this site, so I apologize in advance for any faux pas on my part.

Related posts about partitioning

Partitioning Webcast Details - 17/03/2010

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Hi AllHere are the details for Wednesday's (17th March 2010) webcast on Partitioning:Webcast is at http://strtc.oracle.com (IE6, 7 & 8 supported only)Conference ID for the webcast is 6168728There is no conference keyPlease use your real name in the name field (just makes it easier for us to help… >>> More
UNR Installation: Partitioning Error

as seen on Super User - Search for 'Super User'
Hi all, I have a Samsung N120 netbook (with upgraded 2GB RAM). I'm trying to install Ubuntu Netbook Remix, but when I set my partitions, I get an error. The setup I want right now is: Recovery Partition - 6 GB Windows XP Home Partition - 40 GB General Partition - remaining space UNR Partition -… >>> More
Parallelism in .NET – Part 5, Partitioning of Work

as seen on Reed Copsey - Search for 'Reed Copsey'
When parallelizing any routine, we start by decomposing the problem. Once the problem is understood, we need to break our work into separate tasks, so each task can be run on a different processing element. This process is called partitioning. Partitioning our tasks is a challenging feat… >>> More
How do I mount a "DiskSecure Multiboot" partition?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
For a hard drive that has 4 or 5 partitions, I was able to mount one of them using Ubuntu LiveCD: sudo mount /dev/sda1 /mnt but is there a way to mount to the other partitions? (if using sudo fdisk -l, it only shows /dev/sda) GParted's snapshot is: Right now, the fdisk info is as follows: ubuntu@ubuntu:~$… >>> More
Can't remove GPT data from MBR

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I am having difficulty getting the Ubuntu installer (and gparted) to recognize the partitions on my MBR type disk. Other operating systems and disk tools read the disk structure and the files on it fine. I have used fixparts to write a new MBR but the issue persists. I assume the issue stems from… >>> More

Developer IT

Partitioning data set in r based on multiple classes of observations - Developer IT

Partitioning data set in r based on multiple classes of observations

r

partitioning

random-sample

Related posts about r

Related posts about partitioning

Partitioning Webcast Details - 17/03/2010

UNR Installation: Partitioning Error

Parallelism in .NET – Part 5, Partitioning of Work

How do I mount a "DiskSecure Multiboot" partition?

Can't remove GPT data from MBR

Categories cloud