Extract information from conditional formula

Posted by Ken Williams on Stack Overflow See other posts from Stack Overflow or by Ken Williams
Published on 2010-03-11T17:41:29Z Indexed on 2010/03/11 17:44 UTC
Read the original article Hit count: 527

Filed under:
|

I'd like to write an R function that accepts a formula as its first argument, similar to lm() or glm() and friends. In this case, it's a function that takes a data frame and writes out a file in SVMLight format, which has this general form:

<line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float> 
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>

for example, the following data frame:

  result qid     f1     f2     f3     f4   f5     f6     f7     f8
1     -1   1 0.0000 0.1253 0.0000 0.1017 0.00 0.0000 0.0000 0.9999
2     -1   1 0.0098 0.0000 0.0000 0.0000 0.00 0.0316 0.0000 0.3661
3      1   1 0.0000 0.0000 0.1941 0.0000 0.00 0.0000 0.0509 0.0000
4     -1   2 0.0000 0.2863 0.0948 0.0000 0.34 0.0000 0.7428 0.0608
5      1   2 0.0000 0.0000 0.0000 0.4347 0.00 0.0000 0.9539 0.0000
6      1   2 0.0000 0.7282 0.9087 0.0000 0.00 0.0000 0.0000 0.0355

would be represented as follows:

-1 qid:1 2:0.1253 4:0.1017 8:0.9999
-1 qid:1 1:0.0098 6:0.0316 8:0.3661
1  qid:1 3:0.1941 7:0.0509
-1 qid:2 2:0.2863 3:0.0948 5:0.3400 7:0.7428 8:0.0608
1  qid:2 4:0.4347 7:0.9539
1  qid:2 2:0.7282 3:0.9087 8:0.0355

The function I'd like to write would be called something like this:

write.svmlight(result ~ f1+f2+f3+f4+f5+f6+f7+f8 | qid, data=mydata, file="out.txt")

Or even

write.svmlight(result ~ . | qid, data=mydata, file="out.txt")

But I can't figure out how to use model.matrix() and/or model.frame() to know what columns it's supposed to write. Are these the right things to be looking at?

Any help much appreciated!

© Stack Overflow or respective owner

Related posts about r

    Related posts about formula