Search Results

Search found 10 results on 1 pages for 'rodbc'.

Page 1/1 | 1 

  • RODBC string getting truncated

    - by sayan dasgupta
    Hi all, I am fetching data from MySql Server into R using RODBC. So in one column of the database is a character vector SELECT MAX(CHAR_LENGTH(column)) FROM reqtable; RETURNS 26566 Now I will show you an example how I am running into the problem `library(RODBC) con <- odbcConnect("mysqlcon") rslts <- as.numeric(sqlQuery(con, "SELECT CHAR_LENGTH(column) FROM reqtable LIMIT 10", as.is=TRUE)[,1]) ` returns > rslts [1] 62 31 17 103 30 741 28 73 25 357 where as rslts <- nchar(as.character(sqlQuery(con, "SELECT column FROM reqtable LIMIT 10", as.is=TRUE)[,1])) returns > rslts [1] 62 31 17 103 30 255 28 73 25 255 So strings with length 255 is getting truncated at 255. Is there a way I can get the full string. Thanks

    Read the article

  • how to set charset for MySQL in RODBC

    - by lokheart
    I have a data with chinese characters as field names and data, I have imported them from xls to access 2007 and export them to ODBC. Then I use RODBC to read them in R, the field names is OK, but for the data, all of the chinese characters are shown as ?. I have read the RODBC manual and it said: If it is possible to set the DBMS or ODBC driver to communicate in the character set of the R session then this should be done. For example, MySQL can set the communication character set via SQL, e.g. SET NAMES 'utf8'. I guess this is the problem, but how can I provide this command to MySQL via RODBC? Thanks!

    Read the article

  • RODBC sqlSave and column names

    - by waanders
    I've a question about using sqlSave. How maps RODBC data in the data frame to the database table columns? If I've a table with columns X and Y and a data frame with columns X and Y, RODBC puts X into X and Y into Y (I found out by trail-and-error). But can I explicitly tell R how to map data.frame columns to database table columns, like put A in X and B in Y. I'm rather new to R and think the RODBC manual is a bit cryptic. Nor can I find an example on the internet.

    Read the article

  • How to convert searchTwitter results (from library(twitteR)) into a data.frame?

    - by analyticsPierce
    I am working on saving twitter search results into a database (SQL Server) and am getting an error when I pull the search results from twitteR. If I execute: library(twitteR) puppy <- as.data.frame(searchTwitter("puppy", session=getCurlHandle(),num=100)) I get an error of: Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class structure("status", package = "twitteR") into a data.frame This is important because in order to use RODBC to add this to a table using sqlSave it needs to be a data.frame. At least that's the error message I got: Error in sqlSave(localSQLServer, puppy, tablename = "puppy_staging", : should be a data frame So does anyone have any suggestions on how to coerce the list to a data.frame or how I can load the list through RODBC?

    Read the article

  • Copy of Access mdb database being updated by live database

    - by James
    I'm trying to compute statistics for data held in an Access .mdb database. In order to avoid interfering with the live database, I'm working from a copy which I made by simply using copy-paste in Windows Explorer. The copy resides in the same directory, but with a different name. I'm using R and RODBC to connect to the copy of the file. The strange thing is that new data that is being updated on the original live database is appearing in my queries. This is despite the file timestamps of the copy not changing at all. It is also causing some slowdown in the live database. My understanding is that the .mdb files are standalone, or is this not the case? Should I have copied the database in a different way?

    Read the article

  • Solving Big Problems with Oracle R Enterprise, Part I

    - by dbayard
    Abstract: This blog post will show how we used Oracle R Enterprise to tackle a customer’s big calculation problem across a big data set. Overview: Databases are great for managing large amounts of data in a central place with rigorous enterprise-level controls.  R is great for doing advanced computations.  Sometimes you need to do advanced computations on large amounts of data, subject to rigorous enterprise-level concerns.  This blog post shows how Oracle R Enterprise enables R plus the Oracle Database enabled us to do some pretty sophisticated calculations across 1 million accounts (each with many detailed records) in minutes. The problem: A financial services customer of mine has a need to calculate the historical internal rate of return (IRR) for its customers’ portfolios.  This information is needed for customer statements and the online web application.  In the past, they had solved this with a home-grown application that pulled trade and account data out of their data warehouse and ran the calculations.  But this home-grown application was not able to do this fast enough, plus it was a challenge for them to write and maintain the code that did the IRR calculation. IRR – a problem that R is good at solving: Internal Rate of Return is an interesting calculation in that in most real-world scenarios it is impractical to calculate exactly.  Rather, IRR is a calculation where approximation techniques need to be used.  In this blog post, we will discuss calculating the “money weighted rate of return” but in the actual customer proof of concept we used R to calculate both money weighted rate of returns and time weighted rate of returns.  You can learn more about the money weighted rate of returns here: http://www.wikinvest.com/wiki/Money-weighted_return First Steps- Calculating IRR in R We will start with calculating the IRR in standalone/desktop R.  In our second post, we will show how to take this desktop R function, deploy it to an Oracle Database, and make it work at real-world scale.  The first step we did was to get some sample data.  For a historical IRR calculation, you have a balances and cash flows.  In our case, the customer provided us with several accounts worth of sample data in Microsoft Excel.      The above figure shows part of the spreadsheet of sample data.  The data provides balances and cash flows for a sample account (BMV=beginning market value. FLOW=cash flow in/out of account. EMV=ending market value). Once we had the sample spreadsheet, the next step we did was to read the Excel data into R.  This is something that R does well.  R offers multiple ways to work with spreadsheet data.  For instance, one could save the spreadsheet as a .csv file.  In our case, the customer provided a spreadsheet file containing multiple sheets where each sheet provided data for a different sample account.  To handle this easily, we took advantage of the RODBC package which allowed us to read the Excel data sheet-by-sheet without having to create individual .csv files.  We wrote ourselves a little helper function called getsheet() around the RODBC package.  Then we loaded all of the sample accounts into a data.frame called SimpleMWRRData. Writing the IRR function At this point, it was time to write the money weighted rate of return (MWRR) function itself.  The definition of MWRR is easily found on the internet or if you are old school you can look in an investment performance text book.  In the customer proof, we based our calculations off the ones defined in the The Handbook of Investment Performance: A User’s Guide by David Spaulding since this is the reference book used by the customer.  (One of the nice things we found during the course of this proof-of-concept is that by using R to write our IRR functions we could easily incorporate the specific variations and business rules of the customer into the calculation.) The key thing with calculating IRR is the need to solve a complex equation with a numerical approximation technique.  For IRR, you need to find the value of the rate of return (r) that sets the Net Present Value of all the flows in and out of the account to zero.  With R, we solve this by defining our NPV function: where bmv is the beginning market value, cf is a vector of cash flows, t is a vector of time (relative to the beginning), emv is the ending market value, and tend is the ending time. Since solving for r is a one-dimensional optimization problem, we decided to take advantage of R’s optimize method (http://stat.ethz.ch/R-manual/R-patched/library/stats/html/optimize.html). The optimize method can be used to find a minimum or maximum; to find the value of r where our npv function is closest to zero, we wrapped our npv function inside the abs function and asked optimize to find the minimum.  Here is an example of using optimize: where low and high are scalars that indicate the range to search for an answer.   To test this out, we need to set values for bmv, cf, t, emv, tend, low, and high.  We will set low and high to some reasonable defaults. For example, this account had a negative 2.2% money weighted rate of return. Enhancing and Packaging the IRR function With numerical approximation methods like optimize, sometimes you will not be able to find an answer with your initial set of inputs.  To account for this, our approach was to first try to find an answer for r within a narrow range, then if we did not find an answer, try calling optimize() again with a broader range.  See the R help page on optimize()  for more details about the search range and its algorithm. At this point, we can now write a simplified version of our MWRR function.  (Our real-world version is  more sophisticated in that it calculates rate of returns for 5 different time periods [since inception, last quarter, year-to-date, last year, year before last year] in a single invocation.  In our actual customer proof, we also defined time-weighted rate of return calculations.  The beauty of R is that it was very easy to add these enhancements and additional calculations to our IRR package.)To simplify code deployment, we then created a new package of our IRR functions and sample data.  For this blog post, we only need to include our SimpleMWRR function and our SimpleMWRRData sample data.  We created the shell of the package by calling: To turn this package skeleton into something usable, at a minimum you need to edit the SimpleMWRR.Rd and SimpleMWRRData.Rd files in the \man subdirectory.  In those files, you need to at least provide a value for the “title” section. Once that is done, you can change directory to the IRR directory and type at the command-line: The myIRR package for this blog post (which has both SimpleMWRR source and SimpleMWRRData sample data) is downloadable from here: myIRR package Testing the myIRR package Here is an example of testing our IRR function once it was converted to an installable package: Calculating IRR for All the Accounts So far, we have shown how to calculate IRR for a single account.  The real-world issue is how do you calculate IRR for all of the accounts?This is the kind of situation where we can leverage the “Split-Apply-Combine” approach (see http://www.cscs.umich.edu/~crshalizi/weblog/815.html).  Given that our sample data can fit in memory, one easy approach is to use R’s “by” function.  (Other approaches to Split-Apply-Combine such as plyr can also be used.  See http://4dpiecharts.com/2011/12/16/a-quick-primer-on-split-apply-combine-problems/). Here is an example showing the use of “by” to calculate the money weighted rate of return for each account in our sample data set.  Recap and Next Steps At this point, you’ve seen the power of R being used to calculate IRR.  There were several good things: R could easily work with the spreadsheets of sample data we were given R’s optimize() function provided a nice way to solve for IRR- it was both fast and allowed us to avoid having to code our own iterative approximation algorithm R was a convenient language to express the customer-specific variations, business-rules, and exceptions that often occur in real-world calculations- these could be easily added to our IRR functions The Split-Apply-Combine technique can be used to perform calculations of IRR for multiple accounts at once. However, there are several challenges yet to be conquered at this point in our story: The actual data that needs to be used lives in a database, not in a spreadsheet The actual data is much, much bigger- too big to fit into the normal R memory space and too big to want to move across the network The overall process needs to run fast- much faster than a single processor The actual data needs to be kept secured- another reason to not want to move it from the database and across the network And the process of calculating the IRR needs to be integrated together with other database ETL activities, so that IRR’s can be calculated as part of the data warehouse refresh processes In our next blog post in this series, we will show you how Oracle R Enterprise solved these challenges.

    Read the article

  • How to analyse Wikipedia article's data base with R?

    - by Tal Galili
    Hi all, This is a "big" question, that I don't know how to start, so I hope some of you can give me a direction. And if this is not a "good" question, I will close the thread with an apology. I wish to go through the database of Wikipedia (let's say the English one), and do statistics. For example, I am interested in how many active editors (which should be defined) Wikipedia had at each point of time (let's say in the last 2 years). I don't know how to build such a database, how to access it, how to know which types of data it has and so on. So my questions are: What tools do I need for this (besides basic R) ? MySQL on my computer? RODBC database connection? How do you start planning for such a project?

    Read the article

  • How to include multiple tables programmaticaly into a Sweave document using R

    - by PaulHurleyuk
    Hello, I want to have a sweave document that will include a variable number of tables in. I thought the example below would work, but it doesn't. I want to loop over the list foo and print each element as it's own table. % \documentclass[a4paper]{article} \usepackage[OT1]{fontenc} \usepackage{longtable} \usepackage{geometry} \usepackage{Sweave} \geometry{left=1.25in, right=1.25in, top=1in, bottom=1in} \listfiles \begin{document} <<label=start, echo=FALSE, include=FALSE>>= startt<-proc.time()[3] library(RODBC) library(psych) library(xtable) library(plyr) library(ggplot2) options(width=80) #Produce some example data, here I'm creating some dummy dataframes and putting them in a list foo<-list() foo[[1]]<-data.frame(GRP=c(rep("AA",10), rep("Aa",10), rep("aa",10)), X1=rnorm(30), X2=rnorm(30,5,2)) foo[[2]]<-data.frame(GRP=c(rep("BB",10), rep("bB",10), rep("BB",10)), X1=rnorm(30), X2=rnorm(30,5,2)) foo[[3]]<-data.frame(GRP=c(rep("CC",12), rep("cc",18)), X1=rnorm(30), X2=rnorm(30,5,2)) foo[[4]]<-data.frame(GRP=c(rep("DD",10), rep("Dd",10), rep("dd",10)), X1=rnorm(30), X2=rnorm(30,5,2)) @ \title{Docuemnt to test putting a variable number of tables into a sweave Document} \author{"Paul Hurley"} \maketitle \section{Text} This document was created on \today, with \Sexpr{print(version$version.string)} running on a \Sexpr{print(version$platform)} platform. It took approx \input{time} sec to process. <<label=test, echo=FALSE, results=tex>>= cat("Foo") @ that was a test, so is this <<label=table1test, echo=FALSE, results=tex>>= print(xtable(foo[[1]])) @ \newpage \subsection{Tables} <<label=Tables, echo=FALSE, results=tex>>= for(i in seq(foo)){ cat("\n") cat(paste("Table_",i,sep="")) cat("\n") print(xtable(foo[[i]])) cat("\n") } #cat("<<label=endofTables>>= ") @ <<label=bye, include=FALSE, echo=FALSE>>= endt<-proc.time()[3] elapsedtime<-as.numeric(endt-startt) @ <<label=elapsed, include=FALSE, echo=FALSE>>= fileConn<-file("time.tex", "wt") writeLines(as.character(elapsedtime), fileConn) close(fileConn) @ \end{document} Here, the table1test chunk works as expected, and produced a table based on the dataframe in foo[[1]], however the loop only produces Table(underscore)1.... Any ideas what I'm doing wrong ?

    Read the article

  • IN SQL operator in R-Shiny

    - by Piyush
    I am taking multiple selection for component as per below code. selectInput("cmpnt", "Choose Component:", choices = as.character(levels(Material_Data()$CMPNT_NM)),multiple = TRUE) But I am trying to write a sql statement as given below, then its not working. Neither it is throwing any error message. When I was selecting one option at a time (without mutiple = TRUE) then it was working (since I was using "=" operator). But after using "multiple=TRUE" I need to use IN operator, which is not working. Input_Data2 <- fn$sqldf( paste0( "select * from Input_Data1 where MTRL_NBR = '$mtrl1' and CMPNT_NM in ('$cmpnt1')") ) Thanks in advance for any help on this. Thanks jdharrison! Pleasefind the detailed code: # server.R library(RODBC) library(shiny) library(sqldf) Input_Data <- readRDS("InputSource.rds") Mtrl <- factor(Input_Data$MTRL_NBR) Mtrl_List <- levels(Mtrl) shinyServer(function(input, output) { # First UI input (Service column) filter clientData output$Choose_Material <- renderUI({ if (is.null(clientData())) return("No client selected") selectInput("mtrl", "Choose Material:", choices = as.character(levels(clientData()$MTRL_NBR)), selected = input$mtrl ) }) # Second UI input (Rounds column) filter service-filtered clientData output$Choose_Component <- renderUI({ if(is.null(input$mtrl)) return() if (is.null(Material_Data())) return("No service selected") selectInput("cmpnt", "Choose Component:", choices = as.character(levels(Material_Data()$CMPNT_NM)),multiple = TRUE) }) # First data load (client data) clientData <- reactive({ # get(input$Input_Data) return(Input_Data) }) # Second data load (filter by service column) Material_Data <- reactive({ dat <- clientData() if (is.null(dat)) return(NULL) if (!is.null(input$mtrl)) # ! dat <- dat[dat$MTRL_NBR %in% input$mtrl,] dat <- droplevels(dat) return(dat) }) output$Choose_Columns <- renderUI({ if(is.null(input$mtrl)) return() if(is.null(input$cmpnt)) return() colnames <- names(Input_Data) checkboxGroupInput("columns", "Choose Columns To Display The Data:", choices = colnames, selected = colnames) }) output$text <- renderText({ print(input$cmpnt) }) output$data_table <- renderTable({ if(is.null(input$mtrl)) return() if (is.null(input$columns) || !(input$columns %in% names(Input_Data))) return() Input_Data1 <- Input_Data[, input$columns, drop = FALSE] cmpnt1 <- input$cmpnt mtrl1 <- input$mtrl Input_Data2 <- fn$sqldf( paste0( "select * from Input_Data1 where MTRL_NBR = '$mtrl1' and CMPNT_NM in ('$cmpnt1')") ) head(Input_Data2, 10) }) })

    Read the article

  • Generate lags R

    - by Btibert3
    Hi All, I hope this is basic; just need a nudge in the right direction. I have read in a database table from MS Access into a data frame using RODBC. Here is a basic structure of what I read in: PRODID PROD Year Week QTY SALES INVOICE Here is the structure: str(data) 'data.frame': 8270 obs. of 7 variables: $ PRODID : int 20001 20001 20001 100001 100001 100001 100001 100001 100001 100001 ... $ PROD : Factor w/ 1239 levels "1% 20qt Box",..: 335 335 335 128 128 128 128 128 128 128 ... $ Year : int 2010 2010 2010 2009 2009 2009 2009 2009 2009 2010 ... $ Week : int 12 18 19 14 15 16 17 18 19 9 ... $ QTY : num 1 1 0 135 300 270 300 270 315 315 ... $ SALES : num 15.5 0 -13.9 243 540 ... $ INVOICES: num 1 1 2 5 11 11 10 11 11 12 ... Here are the top few rows: head(data, n=10) PRODID PROD Year Week QTY SALES INVOICES 1 20001 Dolie 12" 2010 12 1 15.46 1 2 20001 Dolie 12" 2010 18 1 0.00 1 3 20001 Dolie 12" 2010 19 0 -13.88 2 4 100001 Cage Free Eggs 2009 14 135 243.00 5 5 100001 Cage Free Eggs 2009 15 300 540.00 11 6 100001 Cage Free Eggs 2009 16 270 486.00 11 7 100001 Cage Free Eggs 2009 17 300 540.00 10 8 100001 Cage Free Eggs 2009 18 270 486.00 11 9 100001 Cage Free Eggs 2009 19 315 567.00 11 10 100001 Cage Free Eggs 2010 9 315 569.25 12 I simply want to generate lags for QTY, SALES, INVOICE for each product but I am not sure where to start. I know R is great with Time Series, but I am not sure where to start. I have two questions: 1- I have the raw invoice data but have aggregated it for reporting purposes. Would it be easier if I didn't aggregate the data? 2- Regardless of aggregation or not, what functions will I need to loop over each product and generate the lags as I need them? In short, I want to loop over a set of records, calculate lags for a product (if possible), append the lags (as they apply) to the current record for each product, and write the results back to a table in my database for my reporting software to use. Any help you can provide will be greatly appreciated! Many thanks in advance, Brock

    Read the article

1