Search Results

Search found 4 results on 1 pages for 'briandk'.

Page 1/1 | 1

How can I structure and recode messy categorical data in R?

- by briandk

I'm struggling with how to best structure categorical data that's messy, and comes from a dataset I'll need to clean. The Coding Scheme I'm analyzing data from a university science course exam. We're looking at patterns in student responses, and we developed a coding scheme to represent the kinds of things students are doing in their answers. A subset of the coding scheme is shown below. Note that within each major code (1, 2, 3) are nested non-unique sub-codes (a, b, ...). What the Raw Data Looks Like I've created an anonymized, raw subset of my actual data which you can view here. Part of my problem is that those who coded the data noticed that some students displayed multiple patterns. The coders' solution was to create enough columns (reason1, reason2, ...) to hold students with multiple patterns. That becomes important because the order (reason1, reason2) is arbitrary--two students (like student 41 and student 42 in my dataset) who correctly applied "dependency" should both register in an analysis, regardless of whether 3a appears in the reason column or the reason2 column. How Can I Best Structure Student Data? Part of my problem is that in the raw data, not all students display the same patterns, or the same number of them, in the same order. Some students may do just one thing, others may do several. So, an abstracted representation of example students might look like this: Note in the example above that student002 and student003 both are coded as "1b", although I've deliberately shown the order as different to reflect the reality of my data. My (Practical) Questions Should I concatenate reason1, reason2, ... into one column? How can I (re)code the reasons in R to reflect the multiplicity for some students? Thanks I realize this question is as much about good data conceptualization as it is about specific features of R, but I thought it would be appropriate to ask it here. If you feel it's inappropriate for me to ask the question, please let me know in the comments, and stackoverflow will automatically flood my inbox with sadface emoticons. If I haven't been specific enough, please let me know and I'll do my best to be clearer.

Read the article
How can I superimpose modified loess lines on a ggplot2 qplot?

- by briandk

Background Right now, I'm creating a multiple-predictor linear model and generating diagnostic plots to assess regression assumptions. (It's for a multiple regression analysis stats class that I'm loving at the moment :-) My textbook (Cohen, Cohen, West, and Aiken 2003) recommends plotting each predictor against the residuals to make sure that: The residuals don't systematically covary with the predictor The residuals are homoscedastic with respect to each predictor in the model On point (2), my textbook has this to say: Some statistical packages allow the analyst to plot lowess fit lines at the mean of the residuals (0-line), 1 standard deviation above the mean, and 1 standard deviation below the mean of the residuals....In the present case {their example}, the two lines {mean + 1sd and mean - 1sd} remain roughly parallel to the lowess {0} line, consistent with the interpretation that the variance of the residuals does not change as a function of X. (p. 131) How can I modify loess lines? I know how to generate a scatterplot with a "0-line,": # First, I'll make a simple linear model and get its diagnostic stats library(ggplot2) data(cars) mod <- fortify(lm(speed ~ dist, data = cars)) attach(mod) str(mod) # Now I want to make sure the residuals are homoscedastic qplot (x = dist, y = .resid, data = mod) + geom_smooth(se = FALSE) # "se = FALSE" Removes the standard error bands But does anyone know how I can use ggplot2 and qplot to generate plots where the 0-line, "mean + 1sd" AND "mean - 1sd" lines would be superimposed? Is that a weird/complex question to be asking?

Read the article
How can I manipulate the strip text of facet plots in ggplot2?

- by briandk

I'm wondering how I can manipulate the size of strip text in facetted plots. My question is similar to a question on plot titles, but I'm specifically concerned with manipulating not the plot title but the text that appears in facet titles (strip_h). As an example, consider the mpg dataset. library(ggplot2) qplot(hwy, cty, data = mpg) + facet_grid( . ~ manufacturer) The resulting output produces some facet titles that don't fit in the strip. I'm thinking there must be a way to use grid to deal with the strip text. But I'm still a novice and wasn't sure from the grid appendix in Hadley's book how, precisely, to do it. Also, I was afraid if I did it wrong it would break my washing machine, since I believe all technology is connected through The Force :-( Many thanks in advance.

Read the article
How can I neatly clean my R workspace while preserving certain objects?

- by briandk

Suppose I'm messing about with some data by binding vectors together, as I'm wont to do on a lazy sunday afternoon. x <- rnorm(25, mean = 65, sd = 10) y <- rnorm(25, mean = 75, sd = 7) z <- 1:25 dd <- data.frame(mscore = x, vscore = y, caseid = z) I've now got my new dataframe dd, which is wonderful. But there's also still the detritus from my prior slicings and dicings: > ls() [1] "dd" "x" "y" "z" What's a simple way to clean up my workspace if I no longer need my "source" columns, but I want to keep the dataframe? That is, now that I'm done manipulating data I'd like to just have dd and none of the smaller variables that might inadvertently mask further analysis: > ls() [1] "dd" I feel like the solution must be of the form rm(ls[ -(dd) ]) or something, but I can't quite figure out how to say "please clean up everything BUT the following objects."

Read the article

1