Search Results

Search found 265 results on 11 pages for 'regression'.

Page 8/11 | < Previous Page | 4 5 6 7 8 9 10 11  | Next Page >

  • Running [R] on a Netbook

    - by Thomas
    I am interested in purchasing a netbook to do field research in another country. My hardware specifications for the nebtook are fairly basic: Be rugged enough to survive a bit of wear and tear Fairly fast processing (the ability to upgrade from 1GB of RAM to 2GB) A battery life of longer than 6 hours At least a 10 inch screen A decent camera for Skyping However, I am mainly concerned about being able to do basic statistical analysis in conjunction with R Be able run a Spreadsheet program to do basic data input (like Excel or Open Office) Use R to do basic data analysis (Regression, some simulation (nothing crazy), data cleaning, and some of the functionality) Word Processing (Word or Open Office) Do you have any suggestions on which models or brands my fit my needs? Some of the models I am considering: Samsung NB-30 Toshiba NB 305 Asus Eee PC 1005HA Lenovo S10-2 Does anyone use R on a netbook, and if so do you have any recommendations on how best to optimize it? This article from Lifehacker mentions some OS. Anybody use these in conjunction with R? Any help would be much appreciated.

    Read the article

  • How to assess the risk of a java version upgrade?

    - by Roy Tang
    I'm being asked to assess whether we can safely upgrade the java version on one of our production-deployed webapps. The codebase is fairly large and we want to avoid having to regression test everything (no automated tests sadly), but we've already encountered at least one problem during some manual testing (XmlStringReader.getLocalName now throws an IllegalStateExeption when it just used to return null) and higher-ups are pretty nervous about the upgrade. The current suggested approach is to do a source compare of the JDK sources for each version and assess those changes to see which ones might have impact, but it seems there's a lot of changes to go through (and as mentioned the codebase is kinda large). Is it safe and easier to just review the java version changes for each version? Or is there an easier way to conduct this assessment? Edit: I forgot to mention the version upgrade being considered is a minor version upgrade, i.e. 1.6.10 to 1.6.33

    Read the article

  • How do you remove/clean-up code which is no longer used?

    - by clarke ching
    So, we have a project which had to be radically descoped in order to ship on time. It's got a lot of code left in it which is not actually used. I want to clean up the code, removing any dead-wood. I have the authority to do it and I can convince people that it's a commercially sensible thing to do. [I have a lot of automated unit tests, some automated acceptance tests and a team of testers who can manually regression test.] My problem: I'm a manager and I don't know technically how to go about it. Any help?

    Read the article

  • Proper Translation of equation to C#

    - by Shykin
    I am trying to replicate this equation: Slope(b) = (NSXY - (SX)(SY)) / (NSX2 - (SX)2) in C# but I'm getting the following issue: If I make the average of X = 1 + 2 + 3 + 4 + 5 and the average Y = 5 + 4 + 3 + 2 + 1 it gives me a positive slope even though it is clearly counting down. If I place the same numbers into this calculator: http://www.easycalculation.com/statistics/regression.php It gives me a negative slope in the linked calculator with the same data. I'm trying to narrow down the reasons so is the following a proper translation from equation to C# code: Slope(b) = (NSXY - (SX)(SY)) / (NSX2 - (SX)2) to Slope (m) = ((x * avgX * avgY) - (avgX * avgY)) / ((x * Math.Pow(avgX, 2)) - Math.Pow(avgX, 2));

    Read the article

  • R statistics, change ranked tables to paired

    - by cousin_pete
    I have data for many tables like: event_id player finish 1 a 1 1 b 2 1 c 3 1 d 4 2 b 1 2 e 2 2 f 3 2 a 3 2 g 5 Many event_id's, each from 5 to 20 players, finish may be tied. In order to use conditional logistic regression in R I would like to reformat the tables to be like: event_id player1 player2 result 1 a b 1 1 a c 1 1 a d 1 1 b c 1 1 b d 1 1 c d 1 2 b e 1 2 b f 1 2 b a 1 2 b g 1 2 e f 1 2 e a 1 2 e g 1 2 f a 0.5 2 f g 1 2 a g 1 An event_id of 4 players will have 4*3/2 = 6 records in the new table, 5 players will have 5*4/2 = 10 records and so on. If player "a" has "finish" less than player "b" the "result" is 1. If "finish" is equal the "result" is 0.5. If player "a" has finish greater than player "b" then the "result" would be 0. Any help appreciated!

    Read the article

  • How do I Integrate Production Database Hot Fixes into Shared Database Development model?

    - by TetonSig
    We are using SQL Source Control 3, SQL Compare, SQL Data Compare from RedGate, Mercurial repositories, TeamCity and a set of 4 environments including production. I am working on getting us to a dedicated environment per developer, but for at least the next 6 months we are stuck with a shared model. To summarize our current system, we have a DEV SQL server where developers first make changes/additions. They commit their changes through SQL Source Control to a local hgdev repository. When they execute an hg push to the main repository, TeamCity listens for that and then (among other things) pushes hgdev repository to hgrc. Another TeamCity process listens for that and does a pull from hgrc and deploys the latest to a QA SQL Server where regression and integration tests are run. When those are passed a push from hgrc to hgprod occurs. We do a compare of hgprod to our PREPROD SQL Server and generate deployment/rollback scripts for our production release. Separate from the above we have database Hot Fixes that will need to be applied in between releases. The process there is for our Operations team make changes on the PreProd database, and then after testing, to use SQL Source Control to commit their hot fix changes to hgprod from the PREPROD database, and then do a compare from hgprod to PRODUCTION, create deployment scripts and run them on PRODUCTION. If we were in a dedicated database per developer model, we could simply automatically push hgprod back to hgdev and merge in the hot fix change (through TeamCity monitoring for hgprod checkins) and then developers would pick it up and merge it to their local repository and database periodically. However, given that with a shared model the DEV database itself is the source of all changes, this won't work. Pushing hotfixes back to hgdev will show up in SQL Source Control as being different than DEV SQL Server and therefore we need to overwrite the reposistory with the "change" from the DEV SQL Server. My only workaround so far is to just have OPS assign a developer the hotfix ticket with a script attached and then we run their hotfixes against DEV ourselves to merge them back in. I'm not happy with that solution. Other than working faster to get to dedicated environment, are they other ways to keep this loop going automatically?

    Read the article

  • BigData and Customer Experience: Happy Together

    - by Isabel F. Peñuelas
    The two big buzzes of the year may lay closer than it appears. Both concepts intersect at various points: BigData and Return of Investment of Marketing Campaigns On a recent post Big Data Is The Future Of Marketing Jeff Dachis explains very clearly how “Big data analytics finally allows marketers to identify, measure, and manage what is positively impacting their Brand”. Regression analysis applied to big data volumes coming from social media will substitute the failed attempts to justify marketing investments on social media in terms of followers and likes, he continues, “the measurement models applied by marketers on TV Campaigns don´t work on social”, we need to study the data with fresh eyes and maybe then we will start understanding and measuring brand engagemet. Social CRM and BigData The real value of Social CRM start by analyzing mass of big data from social media in order of applying social intelligence techniques that allow us to classify new customer niches and communities and define appropriated strategies to contact potential customers. Gartner Says that the Market for Social CRM is on pace to surpass $1 Billion in Revenue by Year-End 2012 but in words of Zach Hofer-Shall, Analyst at Forrester Research “Social customer relationship management is hard” (The Social CRM Arms Race Heats ). To succeed brands need three things: Investing in new social tools, investing in consultancy and investing in infrastructure for massive data storage and analysis. Neither CeX or BigData are easy and cheap wins. But what are the customer benefits of such investments? Big Data and Brand Engagement Time is the most valuable asset of todays consumers: tired of information overload, exhausted by the terabytes of offering, anxious because of not having the same fast multichannel experience with their services’ marketers or preferred goods providers than the one they found on their social media. Yes, I know you have read this before- me too. But is real. The motto of the Customer Experience philosophy of providing a consistent experience through multiple touchpoints that makes the relationship customer/brand easier and valuable finds it basis on understanding customer/s preferences and context for which BigData analysis is another imperative. In summary, I believe that using BigData Analysis in combination with appropriated CeX strategies and technologies is a promising direction for achieving: efficiency and marketing cost-savings; growing the customer base; and increasing customer conversion and retention. In a world: The Direction of Future Marketing.

    Read the article

  • Why does Chrome video performance substantially degrade after waking from suspend in 10.10?

    - by Grant Heaslip
    Note: For some more details, some of which may not be true given what I've figured out, see this post. When I first boot my computer, video performance (both native H.264 HTML5 in YouTube and Vimeo, and in Flash) in Chrome is perfectly reasonable. CPU usage stays slow, everything works correctly, and the video is silky-smooth. But for whatever reason, if I suspend my computer then wake it up, video performance plummets. Full screen HTML5 video is choppy at best, and full-screen Flash video basically brings my computer to its knees (I'm talking less than a frame a second, and a 5 second lead time to leave full-screen after hitting Esc). Restarting Chrome doesn't fix this — I need to completely restart my machine before performance goes back to normal. Video performance in other applications, such as Movie Player, doesn't seem to be affected at all by the suspend cycle — it's only Chrome. I'm using a Lenovo X201, with an Intel GMA HD graphics chipset, and Intel compnents all around (I don't need any proprietary drivers). This didn't happen in 10.04, and I haven't anything that I think would have caused this to happen. It's possible that a Chrome release could have caused this, but it seems less likely than a regression between 10.04 and 10.10. Any ideas? EDIT: In response Georg's comment, logging in and out doesn't fix it. Restarting Compiz or switching to Metacity (at least by using "compiz/metacity --replace & disown" — am I doing it right?) doesn't help (actually, it seemed to help somewhat with Flash once, but I haven't been able to reproduce this). I'm not sure about GDM — when I use "sudo restart gdm" I get kicked back to the Linux shell (?), which I have no idea how to get out of. Also, I want to make very clear that this isn't just a case of Flash sucking (it does,but that's beside the point). I"m seeing the same general problem with HTML5 videos, and Flash is performing better on my Nexus One than it does on my Core i5 laptop. There's something screwy going on with Chrome and/or 10.10.

    Read the article

  • Is is possible to get a patch included in the current release? If so, how?

    - by Oli
    So a while back I reported a bug in Compiz's Place Window plugin. It's a fairly major regression for people affected by it: mainly those using Gnome-Fallback, judging by the reports. A patch surfaced a short time later. I created a PPA for testing and everybody involved so far is reporting the issues are fixed. It even fixes another bug. I've done testing with a standard Unity desktop and can say (for my testing) no adverse effects were visible. I want to get this pushed to Ubuntu right now for two main reasons: I'm selfish. I don't want to need to update my PPA every time a new version of Compiz is pushed to 12.04. I don't want Ubuntu users seeing their windows flying around because of a silly little bug. I want this patch pushed to Ubuntu's version of Compiz as soon as possible, so we can mark these bugs fixed and move on with our lives. Whose leg do I have to hump to get this pulled into Ubuntu right now? I don't maintain this project and it's an upstream thing but it's fairly integral to Ubuntu. I could go to Compiz but I imagine that if they accept the patch, it'll be months (at least a release) before it's anywhere near Ubuntu. And when I do find the right person, how can I make the process as slick as possible for them? I want them to see my request, go "Yup, that all looks great, done" and that be it. I don't want seventeen rounds of emails addressing aspects of the patch. More importantly, I don't want to waste their time either. And what do I have to provide them? My packaging skills are... lamentable. This was my first attempt at patching a package for redistribution so I've probably made every single packaging error known to man. Will they be happy with the original patch (so they can apply it themselves) or should I repackage things so the diff/changelog is a little cleaner (it took me a few goes and the versioning is all over the place). Note: This question is about Compiz but I'd prefer if answers could address other styles of package too so we have an authoritative and comprehensive thread of how to get things fixed.

    Read the article

  • How are software projects 'typically' managed/deployed

    - by rguilbault
    My company is evaluating adopting off-the-shelf ALM products to aid in our development lifecycle; we currently use our own homegrown solutions to manage requirements gathering, specification documentation, testing, etc. One of the issues I am having is that we have what we call a pipeline, which consists of particular stops: [Source] - [QC] - [Production] At the first stop, the developer works out a solution to some requested change and performs individual testing. When that process is complete (and peer review has been performed), our ALM system physically moves the affected programs from the [Source] runtime environment to the [QC] runtime environment. You can think of this as analogous to moving some web pages from the 'test' server to the 'live' server, where QC personnel can bang on the system and complain that the developer has it all wrong ;-) Once QC signs off that the changes are working, the system again moves the code along to the next stage, where additional testing is performed, etc. I have been searching the internet for a few days trying to find how the process is accomplished anywhere else -- I have read a bit about builds, automated testing, various ALM products, etc. but nowhere does any of this state how builds interact with initial change requests, what the triggers are, how dependencies are managed, how the various forms of testing are accommodated (e.g. unit testing, integration testing, regression testing), etc. Can anyone point me to any resources or attempt to explain (generically) how a change could/should be tracked and moved though the development lifecycle? I'd be very appreciative. To keep things consistent, let's say that we have a project called Calculator, which we want to add support for the basic trigonometric functions: sine, cosine and tangent. I'm open to reorganizing the company however we need to in order to accomplish due diligence testing and we can suppose that any tools are available for use (if that helps to illustrate the process). To start things off, I think I understand this much: we document the requirements, e.g.: support sine, cosine and tangent functions we create some type of change request/work order to assign to programming coding takes place, commits are made to version control peer review commences programmer marks the work order as completed? ... now what? How does QC do their thing? Would they perform testing before closing the 'work order'?

    Read the article

  • Trying to find resources to learn how to test software [closed]

    - by Davek804
    First off, yes this is a general question, and I'd be perfectly happy to move this to another portion of SE, but I didn't see a more fitting sub. Basically, I am hoping a more experienced QA tester can come along and really fill in some basics for me. So far, websites seem to be sparse in terms of explaining languages involved, basic practices, etc. So, I'm sorry in advance if this is too general, but towards the end of this post I ask some specific questions if it's just absolutely unacceptable to speak in general terms. I just landed a position as Junior Systems and QA Engineer with a social media startup. Their QA and testing is almost nonexistent, so if I do a good job, I imagine I'll find a lot of bugs and have a secure role in the business. I'm pretty good with the systems aspect of my role, but I need to learn more about the QA and testing aspects. We run hardware that's touchscreen based - the user can use and interact with the devices. So, in terms of my QA role, in the short term, I need to build scripts to test the hardware/software as a 'user' to try to uncover bugs. First off, what language should these scripts be written in? Does anyone have some examples? What about the longer term 'automated testing'? I'm familiar with regression testing as the developer adds in new features, sure, but the 50,000 other types of testing, not so much. Most of our hardware runs dotnet/C# code, with some of the servers running Java - but I don't expect to need to run tests on the Java side at this point. I hope to meet with one developer today and try to get a good idea of the output from the hardware so that I can 'mock' this data that gets sent to servers, to try to bugtest. Eventually, we will be moving the hardware to be closer to where I live and work, so that I can test virtually and on real hardware. So a lot of the bugs we're dealing with now are like this: the Local Server, which kiosks report their data to gets updated from the kiosks, but the remote server does not. Or, vis versa when the user registers on a kiosk, the remote server updates but the local server does not. But yeah, without much more detail, I imagine a lot of this info isn't helpful. I've bought a book "How Google Tests Software", but it's really a book more about 'how their software testing is different from Microsoft'. It doesn't teach how to test so much as why their methods are better. Does anyone have a good book that I can buy? An ebook maybe? My local Barnes and Noble kinda had a terrible selection. I also figure a book from 2005 is not necessarily that good either.

    Read the article

  • What characters are illegal in Cisco IOS username secret passwords?

    - by Alain O'Dea
    I am using username secret to add users with encrypted passwords to our switches and firewall. I have been battling with the same switches and firewall for a couple of hours trying to get securely generated hard passwords for all admins. Sometimes, the passwords would go into config, but wouldn't work for login. According to the documentation for enable secret a password must not begin with a number and ? has to be entered as Ctrl-V then ? to escape it. I followed that and still got passwords I could not use sometimes. There was no error when I ran username, but the password would be rejected on login by some, but not all of the switches. They are all WS-C2960-48PST-L. The passwords it didn't like contained back ticks "`" (that character under tilde ~ under Esc). The "misbehaving" switches are running: Cisco IOS Software, C2960 Software (C2960-LANBASEK9-M), Version 12.2(50)SE5, RELEASE SOFTWARE (fc1) The "working" switches are running: Cisco IOS Software, C2960 Software (C2960-LANBASEK9-M), Version 12.2(46)SE, RELEASE SOFTWARE (fc2). The "misbehaving" switches are running a newer IOS, so this suggests a regression introduced somewhere between 12.2(46)SE and 12.2(50)SE5. I was unable to find any evidence of this being intentional in the release notes for 12.2(50)SE. I would like to avoid this next time the passwords are changed :) What characters are illegal in Cisco IOS username secret passwords? Thank you for your help :)

    Read the article

  • Is there any way to shut up my ATI HD 5770?

    - by slpsys
    So to preface, I basically built Jeff's machine; I already had some of the components, including (scarily enough) the exact same case1. I've been buying bits and pieces over the past few months, which coincided perfectly with his recent post about three monitors, though not being a gamer outright, I opted for the second-from-the-bottom option. After finally plopping all the pieces lovingly into the case this evening, I turn it on...and it sounds like four professional grade hair-driers. Some quick regression analysis determined that with the video card out, the running machine sounded no louder than our house's vents. Basically, my last desktop build included a $45-at-the-time graphics card, and it's been Macbook Pros and workstations since then, so I have zero idea whether I'll just be able to tune the fan speed later on. Will I be able to get this thing to quiet down every time I'm not playing Modern Warfare 2 at maximum framerate, or should I just send this thing back now, and get the quietest card in my pricerange? 1 One thing of note is that I do not have noise-absorbing foam in the case, as is pictured in the article. I'm only mentioning that because I suspect it could drop the overall output a few decibels, but obviously not that many.

    Read the article

  • OBIEE 11.1.1 - Introduction to OBIEE 11g Full Sample App

    - by user809526
    Isn't it nice to discover OBIEE 11g around a nice "How To" catalog of features? to observe OBI and Essbase relationships at work? to discover TimesTen? The OBIEE 11g Full Sample App (FSA) is a comprehensive collection of examples designed to demonstrate the latest Oracle BIEE 11g capabilities and design best practices: Enhanced visualizations as Geo-spacial maps and interactive dashboards, Action Framework,  BI Publisher, Scorecard and Strategy Management, Mobile style sheets, Semantic layer modeling, Multi-source federation, Integration with products such as Essbase, Oracle OLAP, ODM, TimesTen, ODI and more The FSA is intended to be comprehensive, it is big (see CAVEAT below). The FSA is not an Oracle product, it is a good will free deployment of OBIEE/Essbase designed to exemplify OBIEE features, infrastructure and security around the Fusion Middleware components. Its contents and code are distributed free for demonstrative purposes only. It is neither maintained nor supported by Oracle as a licensed product. The OBIEE Full Sample App is independent of the default Sample App that comes with the OBIEE product. BENEFITS The FSA helps as a demonstrator of OBIEE 11g best practices, a tutorial, an environment "Test & Scrap", a SR bench (regression, conflicts), a tuning bench, a quick ready made POC seed for projects, a security options environment, ... The FSA - Is organized around a catalog of functional features - Has been deployed over 1000 times, it should be stable RELEASE The Full Sample App (V107) is bound to OBIEE 11.1.1.5 and Essbase 11.1.2.1 (November 2011). The FSA release dates are independent of the Product GA date (OBIEE). In early December 2011, a new functional Patch (V110) is released. It is easily applied (in less than 15 mins) on top of OBIEE SampleApp 11.1.1.5 (V107). The patch (V110) includes additional functional examples:        1. Web Catalog Statistics Application: Provides detailed insight into your web catalog content, dormant catalog objects, webcat impact analysis for metadata changes and more        2. Data inflation Scripts: A set of simple SQL procedures to quickly inflate SampleApp Fact and Dimension data to millions of records in a few minutes        3. Public Content Extensions Framework: A patching framework for public examples and contributions leveraging SampleApp        4. Additional report examples (including bridge report, external chart integrations) and bug fixes DISTRIBUTION as VBox image (November 2011) The ready made VBox image is designed to run on Virtual Box. It can be converted to VMware (see another BLOG). 1/ http://www.oracle.com/technetwork/middleware/bi-foundation/obiee-samples-167534.html VBox Image Deployment Guide Sampleapp_v107_GA.ovf - VBox image key file The above http URL provides the user:password for the ftp URLs below. 2/ ftp://user:[email protected]/static/SampleAppV107/ 12 "7-zip" files Sampleapp_v107_GA_7_20.7z.001 -> .012 We recommend 7-zip file manager for unzipping (http://www.7-zip.org/). Select Unzip here option, it will create the contents under a directory named "SampleApp_10722". On Windows, it is important to download and save zip file under the root directory (e.g. C:\ or D:\) because of possible long pathnames. 3/ ftp://user:[email protected]/static/SampleAppV107/Unzipped_Version/ 4 files Sampleapp_v107_GA-disk[1234].vmdk Important note: Check the provided checksums (md5sum). Please do it! DISTRIBUTION as Installation files for existing OBI 11.1.1.5 (November 2011) http://www.oracle.com/technetwork/middleware/bi-foundation/obiee-samples-167534.html Install files Deployment Guide SampleApp_10722_1.zip - 198 MB CAVEAT Many computers have RAM chips problems that keep often silent ... until you manipulate big files. It is strongly advised you run some memory check program eg MEMTEST in GRUB boot manager. Running md5sum repeatedly onto the very same big file must be consistent [same result], else a hardware memory problem is suspected. For Virtual Box, you should most likely enable VT-X (Vanderpool) hardware virtualization in BIOS. A free disk space of 80 GB is required to perform safely the VBox image installation. A Virtual Machine of minimum 6 to 7 GB memory fits the needs of combining OBIEE and Essbase execution.

    Read the article

  • Databases in Source Control

    - by Grant Fritchey
    I’ve been working as a database professional for quite a long time. But originally, I was a developer. And I loved being a developer. There was this constant feedback loop of a job well done, your code compiled and it ran. Every time this happened successfully, you’d check it into source control. These days you have to add another step; the code passed all the tests, unit, line, regression, qa, whatever, then into source control it goes. As a matter of fact, when I first made the jump from developer to DBA/database developer/database professional, source control was the one thing I couldn’t believe was missing from the DBA toolbox. Come to find out, source control was only the beginning of what was missing from your standard DBAs set of skills. Don’t get me wrong. I’m not disrespecting the DBA. They’re focused where they should be, on your production data. But there has to be a method for developing applications that include databases and the database side of that development and deployment process has long been lacking. This lack of development and deployment methodologies is a part of what has given rise to some of the wackier implementations of Object Relational Mapping tools, the NoSQL movement, and some of the other foul cursing that is directed towards databases, DBAs, and database development by application developers. Some of that is well earned. A lot isn’t. But it is a fact that database professionals, in general, do not have as sophisticated a model for managing development and deployment as application developers do. We could charge out and start trying to come up with our own standards and methods. I’m sure people have done exactly that. However, I’m lazy, and not terribly bright. Rather than try to invent a whole new process, I’m going to look to my developer roots and choose instead to emulate the developers. They’re sitting over there across the hall from me working with SCRUM/Agile/Waterfall/Object Driven/Feature Driven/Test Driven development processes that they’ve been polishing for years. What if I just started working on database development the same way they work on code development? Win! Ah, but now I have to have a mechanism for treating my database like application code. First, I need a method for getting it into source control. That’s where Red Gate’s SQL Source Control comes into the picture. SQL Source Control works within SQL Server Management Studio to connect your database objects up to the source control system of your choice. Right out of the box SQL Source Control can link to TFS, SVN or Vault. With a little work you can connect it to Git or just about any other source control system. With the ability to get my database into source control, a lot of possibilities for more direct integration with the application development teams open up.

    Read the article

  • To sample or not to sample...

    - by [email protected]
    Ideally, we would know the exact answer to every question. How many people support presidential candidate A vs. B? How many people suffer from H1N1 in a given state? Does this batch of manufactured widgets have any defective parts? Knowing exact answers is expensive in terms of time and money and, in most cases, is impractical if not impossible. Consider asking every person in a region for their candidate preference, testing every person with flu symptoms for H1N1 (assuming every person reported when they had flu symptoms), or destructively testing widgets to determine if they are "good" (leaving no product to sell). Knowing exact answers, fortunately, isn't necessary or even useful in many situations. Understanding the direction of a trend or statistically significant results may be sufficient to answer the underlying question: who is likely to win the election, have we likely reached a critical threshold for flu, or is this batch of widgets good enough to ship? Statistics help us to answer these questions with a certain degree of confidence. This focuses on how we collect data. In data mining, we focus on the use of data, that is data that has already been collected. In some cases, we may have all the data (all purchases made by all customers), in others the data may have been collected using sampling (voters, their demographics and candidate choice). Building data mining models on all of your data can be expensive in terms of time and hardware resources. Consider a company with 40 million customers. Do we need to mine all 40 million customers to get useful data mining models? The quality of models built on all data may be no better than models built on a relatively small sample. Determining how much is a reasonable amount of data involves experimentation. When starting the model building process on large datasets, it is often more efficient to begin with a small sample, perhaps 1000 - 10,000 cases (records) depending on the algorithm, source data, and hardware. This allows you to see quickly what issues might arise with choice of algorithm, algorithm settings, data quality, and need for further data preparation. Instead of waiting for a model on a large dataset to build only to find that the results don't meet expectations, once you are satisfied with the results on the initial sample, you can  take a larger sample to see if model quality improves, and to get a sense of how the algorithm scales to the particular dataset. If model accuracy or quality continues to improve, consider increasing the sample size. Sampling in data mining is also used to produce a held-aside or test dataset for assessing classification and regression model accuracy. Here, we reserve some of the build data (data that includes known target values) to be used for an honest estimate of model error using data the model has not seen before. This sampling transformation is often called a split because the build data is split into two randomly selected sets, often with 60% of the records being used for model building and 40% for testing. Sampling must be performed with care, as it can adversely affect model quality and usability. Even a truly random sample doesn't guarantee that all values are represented in a given attribute. This is particularly troublesome when the attribute with omitted values is the target. A predictive model that has not seen any examples for a particular target value can never predict that target value! For other attributes, values may consist of a single value (a constant attribute) or all unique values (an identifier attribute), each of which may be excluded during mining. Values from categorical predictor attributes that didn't appear in the training data are not used when testing or scoring datasets. In subsequent posts, we'll talk about three sampling techniques using Oracle Database: simple random sampling without replacement, stratified sampling, and simple random sampling with replacement.

    Read the article

  • How to convert from amateur web app developer to professional web apper?

    - by Nilesh
    This is more of a practical question on web app development and deployment process. Here is some background information. I use PHP for server side scripting, javascript for client side. I use Netbeans and notepad++. I user Firefox and firebug for debugging and testing. The process I use is very amateurish, I code something in netbeans, something in notepad++ and since there is nothing to compile, I just refresh the firefox browser and test it. This is convenient and faster compared to the Java development enviornment where you would have to atleast compile and deploy the jar files before you could run them. I have been thinking of putting a formal process in my development and find it hard putting it together. There are so many things to do before you can deploy your final web app. I keep hearing jslint, compression, unit testing (selenium), Ant, YUI compressor etc but I am now looking for some steps that I can take to make me more organized. For e.g I use netbeans but don't use any projects within it. I directly update the files. I don't use any source control but use my Iomega backup that saves each save into a different version and at the end of the day I backup the dev directory to my Amazon s3 account. For me development environment is just a DEV directory, TEST is my intermediate stage and PROD is the final directory that gets pushed out to the server. But all these directories are in the same apache home. I have few php scripts that just copies the needed files into the production directory. Thats about it for my development approach. I know I am missing the following - Regression testing (manual or automated ??) - automated testing (selenium ??) - automated deployment (ANT ??) - source control (svn ??) - quality control (jslint ??) Can someone explain what are the missing steps and how to go about filling those steps in order to have more professional approach. I am looking for tools with example tutorials in streamlining the whole development to deployment stage. For me just getting a hang of database, server side and client side development all in synchronization was itself a huge accomplishment. And now I feel there is lot missing before you can produce quality web application. For e.g I see lot of mention about using automated testing but how to put in use with respect to javascript and php. How to use ANT for the deployment etc. Is this all too much for a single or two person development team? Is there a way to automate all the above so that I just keep coding in netbeans and then run a batch file that is configured once and run it everytime to produce the code in the production directory? Lot of these information is scattered on the web and here, if someone can guide I would be happy to consolidate here. Thank you for your patience :)

    Read the article

  • Oracle's PeopleSoft Customer Advisory Boards Convene to Discuss Roadmap at Pleasanton Campus

    - by john.webb(at)oracle.com
    Last week we hosted all of the PeopleSoft CABs (Customer Advisory Boards) at our Pleasanton Development Center to review our detailed designs for future Feature Packs, PeopleSoft 9.2, and beyond. Over 150 customers from 79 companies attended representing a variety of industries, geographies, and company sizes. The PeopleSoft team relies heavily on this group to provide key input on our roadmap for applications as well as technology direction. A good product strategy is one part well thought out idea with many handfuls of customer validation, and very often our best ideas originate from these customer discussions. While the individual CABs have frequent interactions with our teams, it's always great to have all of them in one place and in person. Our attendance was up from last year which I attribute to two things: (1) More interest as a result of PeopleSoft 9.1 upgrade; (2) An improving economy allowing for more travel. Maybe we should index the second item meeting-to-meeting and use it as a market indicator - we'll see! We kicked off the day one session with an overview of the PeopleSoft Roadmap and I outlined our strategy around Feature Packs and PeopleSoft 9.2. Given the high adoption rate of PeopleSoft 9.1 (over 4x that of 9.0 given the same time lapse since the release date), there was a lot of interest around the 9.1 Feature Packs as a vehicle for continuous value. We provided examples of our 3 central design themes: Simplicity, Productivity, and lower TCO, including those already delivered via Feature Packs in 2010. A great example of this is the Company Directory feature in PeopleSoft HCM. The configuration capabilities and the new actionable links our CAB advised us on last Spring were made available to all customers late last year. We reviewed many more future Navigation changes that will fundamentally change the way users interact with PeopleSoft. Our old friend, the menu tree, is being relegated from center stage to a bit part, with new concepts like Activity Guides, Train Stops, Related Actions, Work Centers, Collaborative Workspaces, and Secure Enterprise Search bringing users what they need in a contextual, role based manner with fewer clicks. Paco Aubrejuan, our PeopleSoft GM, and Steve Miranda, the SVP for Fusion Applications, then discussed our plans around Oracle's Application Investment Strategy.  This included our continued investment in developing both PeopleSoft and Fusion as well as the co-existence strategy with new Fusion Apps integrating to PeopleSoft Apps. Should you want to view this presentation, a recording is available. Jeff Robbins, our lead PeopleTools Strategist, provided the roadmap for PeopleTools and discussed our continuing plan to deliver annual releases to further evolve the user experience. Numerous examples were highlighted with the Navigation techniques I mentioned previously. Jeff also provided a lot of food for thought around Lifecycle Management topics and how to remain current on releases with a  lower cost of ownership. Dennis Mesler, from Boise, was the guest speaker in this slot, who spoke about the new PeopleSoft Test Framework (PTF). Regression Testing is a key cost component when product updates are applied. This new tool (which is free to all PeopleSoft customers as part of PeopleTools 8.51) provides a meta data driven approach to recording and executing test scripts. Coupled with what our Usage Monitor enables, PTF provides our customers a powerful tool to lower costs and manage product updates more efficiently and at the time of their choosing. Beyond the general session, we broke out into the individual CABs: HCM, Financials, ESA/ALM, SRM, SCM, CRM, and PeopleTools/ Technology. A day and half of very engaging discussions around our plans took place for each product pillar. More about that to follow in future posts.      We capped the first day with a reception sponsored by our partners: InfoSys, SmartERP (represented by Doris Wong), and Grey Sparling  Solutions (represented by Chris Heller and Larry Grey). Great to see these old friends actively engaged in the very busy PeopleSoft ecosystem!   Jeff Robbins previews the roadmap for PeopleTools with the PeopleSoft CAB  

    Read the article

  • New R Interface to Oracle Data Mining Available for Download

    - by charlie.berger
      The R Interface to Oracle Data Mining ( R-ODM) allows R users to access the power of Oracle Data Mining's in-database functions using the familiar R syntax. R-ODM provides a powerful environment for prototyping data analysis and data mining methodologies. R-ODM is especially useful for: Quick prototyping of vertical or domain-based applications where the Oracle Database supports the application Scripting of "production" data mining methodologies Customizing graphics of ODM data mining results (examples: classification, regression, anomaly detection) The R-ODM interface allows R users to mine data using Oracle Data Mining from the R programming environment. It consists of a set of function wrappers written in source R language that pass data and parameters from the R environment to the Oracle RDBMS enterprise edition as standard user PL/SQL queries via an ODBC interface. The R-ODM interface code is a thin layer of logic and SQL that calls through an ODBC interface. R-ODM does not use or expose any Oracle product code as it is completely an external interface and not part of any Oracle product. R-ODM is similar to the example scripts (e.g., the PL/SQL demo code) that illustrates the use of Oracle Data Mining, for example, how to create Data Mining models, pass arguments, retrieve results etc. R-ODM is packaged as a standard R source package and is distributed freely as part of the R environment's Comprehensive R Archive Network (CRAN). For information about the R environment, R packages and CRAN, see www.r-project.org. R-ODM is particularly intended for data analysts and statisticians familiar with R but not necessarily familiar with the Oracle database environment or PL/SQL. It is a convenient environment to rapidly experiment and prototype Data Mining models and applications. Data Mining models prototyped in the R environment can easily be deployed in their final form in the database environment, just like any other standard Oracle Data Mining model. What is R? R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. The design of R has been heavily influenced by two existing languages: Becker, Chambers & Wilks' S and Sussman's Scheme. Whereas the resulting language is very similar in appearance to S, the underlying implementation and semantics are derived from Scheme. R was initially written by Ross Ihaka and Robert Gentleman at the Department of Statistics of the University of Auckland in Auckland, New Zealand. Since mid-1997 there has been a core group (the "R Core Team") who can modify the R source code archive. Besides this core group many R users have contributed application code as represented in the near 1,500 publicly-available packages in the CRAN archive (which has shown exponential growth since 2001; R News Volume 8/2, October 2008). Today the R community is a vibrant and growing group of dozens of thousands of users worldwide. It is free software distributed under a GNU-style copyleft, and an official part of the GNU project ("GNU S"). Resources: R website / CRAN R-ODM

    Read the article

  • Acceptance tests done first...how can this be accomplished?

    - by Crazy Eddie
    The basic gist of most Agile methods is that a feature is not "done" until it's been developed, tested, and in many cases released. This is supposed to happen in quick turnaround chunks of time such as "Sprints" in the Scrum process. A common part of Agile is also TDD, which states that tests are done first. My team works on a GUI program that does a lot of specific drawing and such. In order to provide tests, the testing team needs to be able to work with something that at least attempts to perform the things they are trying to test. We've found no way around this problem. I can very much see where they are coming from because if I was trying to write software that targeted some basically mysterious interface I'd have a very hard time. Although we have behavior fairly well specified, the exact process of interacting with various UI elements when it comes to automation seems to be too unique to a feature to allow testers to write automated scripts to drive something that does not exist. Even if we could, a lot of things end up turning up later as having been missing from the specification. One thing we considered doing was having the testers write test "scripts" that are more like a set of steps that must be performed, as described from a use-case perspective, so that they can be "automated" by a human being. This can then be performed by the developer(s) writing the feature and/or verified by someone else. When the testers later get an opportunity they automate the "script" for regression purposes mainly. This didn't end up catching on in the team though. The testing part of the team is actually falling behind us by quite a margin. This is one reason why the apparently extra time of developing a "script" for a human being to perform just did not happen....they're under a crunch to keep up with us developers. If we waited for them, we'd get nothing done. It's not their fault really, they're a bottle neck but they're doing what they should be and working as fast as possible. The process itself seems to be set up against them. Very often we end up having to go back a month or more in what we've done to fix bugs that the testers have finally gotten to checking. It's an ugly truth that I'd like to do something about. So what do other teams do to solve this fail cascade? How can we get testers ahead of us and how can we make it so that there's actually time for them to write tests for the features we do in a sprint without making us sit and twiddle our thumbs in the meantime? As it's currently going, in order to get a feature "done", using agile definitions, would be to have developers work for 1 week, then testers work the second week, and developers hopefully being able to fix all the bugs they come up with in the last couple days. That's just not going to happen, even if I agreed it was a reasonable solution. I need better ideas...

    Read the article

  • Does *every* project benefit from written specifications?

    - by nikie
    I know this is holy war territory, so please read the question to the end before answering. There are many cases where written specifications make a lot of sense. For example, if you're a contractor and you want to get paid, you need written specs. If you're working in a team with 20 persons, you need written specs. If you're writing a programming language compiler or interpreter (and it's not perl), you'll usually write a formal specification. I don't doubt that there are many more cases where written specifications are a really good idea. I just think that there are cases where there's so little benefit in written specs, that it doesn't outweigh the costs of writing and maintaining them. EDIT: The close votes say that "it is difficult to say what is asked here", so let me clarify: The usefulness of written, detailed specifications is often claimed like a dogma. (If you want examples, look at the comments.) But I don't see the use of them for the kind of development I'm doing. So what is asked here is: How would written specifications help me? Background information: I work for a small company that's developing vertical market software. If our product is easier to use and has better performance than the competition, it sells. If it's harder to use, even if it behaves 100% as the specification says, it doesn't sell. So there are no "external forces" for having written specs. The advantage would have to be somewhere in the development process. Now, I can see how frozen specifications would make a developer's life easier. But we'll never have frozen specs. If we see in the middle of development that feature X is not intuitive to use the way it's specified, then we can only choose between changing the specification or developing a product that won't sell. You'll probably ask by now: How do you know when you're done? Well, we're continually improving our product. The competition does the same. So (hopefully) we're never done. We keep improving the software, and when we reach a point when the benefits of the improvements we've added since the last release outweigh the costs of an update, we create a new release that is then tested, localized, documented and deployed. This also means that there's rarely any schedule pressure. Nobody has to do overtime to make a deadline. If the feature isn't done by the time we want to release the next version, it'll simply go into the next version. The next question might be: How do your developers know what they're supposed to implement? The answer is: They have a lot of domain knowledge. They know the customers business well enough, so a high-level description of the feature (or even just the problem that the customer needs solved) is enough to implement it. If it's not clear, the developer creates a few fake screens to get feedback from marketing/management or customers, but this is nowhere near the level of detail of actual specifications. This might be inefficient for larger teams, but for a small team with low turnover it works quite well. It has the additional benefit that the developer in question often comes up with a better solution than the person writing the specs might have. This question is already getting very long, but let me address one last point: Testing. Like I said in the beginning, if our software behaves 100% like the spec says, it still can be crap. In fact, if it's so unintuitive that you need a spec to know how to test it, it probably is crap. It makes sense to have fixed, written tests for some core functionality and for regression bugs, but again, this is nowhere near a full written spec of how the software should behave when. The main test is: hand the software to a user who doesn't know it yet and tell him to use the new feature X. If she can figure out how to use it and it works, it works.

    Read the article

  • SQL Injection Protection for dynamic queries

    - by jbugeja
    The typical controls against SQL injection flaws are to use bind variables (cfqueryparam tag), validation of string data and to turn to stored procedures for the actual SQL layer. This is all fine and I agree, however what if the site is a legacy one and it features a lot of dynamic queries. Then, rewriting all the queries is a herculean task and it requires an extensive period of regression and performance testing. I was thinking of using a dynamic SQL filter and calling it prior to calling cfquery for the actual execution. I found one filter in CFLib.org (http://www.cflib.org/udf/sqlSafe): <cfscript> /** * Cleans string of potential sql injection. * * @param string String to modify. (Required) * @return Returns a string. * @author Bryan Murphy ([email protected]) * @version 1, May 26, 2005 */ function metaguardSQLSafe(string) { var sqlList = "-- ,'"; var replacementList = "#chr(38)##chr(35)##chr(52)##chr(53)##chr(59)##chr(38)##chr(35)##chr(52)##chr(53)##chr(59)# , #chr(38)##chr(35)##chr(51)##chr(57)##chr(59)#"; return trim(replaceList( string , sqlList , replacementList )); } </cfscript> This seems to be quite a simple filter and I would like to know if there are ways to improve it or to come up with a better solution?

    Read the article

  • Optional route parameters in ASP.NET 4 RTM no longer work as before

    - by Simon_Weaver
    I upgraded my project to ASP.NET 4 RTM with ASP.NET MVC 2.0 RTM today. I was previously using ASP.NET 3.5 with ASP.NET MVC 2.0 RTM. Some of my routes don't work suddenly and I don't know why. I'm not sure if something changed between 3.5 and 4.0 - or if this was a regression type issue in the 4.0 RTM. (I never previously tested my app with 4.0). I like to use Url.RouteUrl("route-name", routeParams) to avoid ambiguity when generating URLs. Here's my route definition for a gallery page. I want imageID to be optional (you get a thumbnail page if you don't specify it). // gallery id routes.MapRoute( "gallery-route", "gallery/{galleryID}/{imageID}/{title}", new { controller = "Gallery", action = "Index", galleryID = (string) null, imageID = (string) null, title = (string) null} ); In .NET 3.5 / ASP.NET 2.0 RTM / IIS7 Url.RouteUrl("gallery-route", "cats") => /gallery/cats Url.RouteUrl("gallery-route", "cats", 4) => /gallery/cats/4 Url.RouteUrl("gallery-route", "cats", 4, "tiddles") => /gallery/cats/4/tiddles In .NET 4.0 RTM / ASP.NET 2.0 RTM / IIS7 Url.RouteUrl("gallery-route", "cats") => null Url.RouteUrl("gallery-route", "cats", 4) => /gallery/cats/4 Url.RouteUrl("gallery-route", "cats", 4, "tiddles") => /gallery/cats/4/tiddles Previously I could supply only the galleryID and everything else would be ignored in the generated URL. But now it's looking like I need to specify all the parameters up until title - or it gives up in determining the URL. Incoming URLs work fine for /gallery/cats and that is correctly mapped through this rule with imageID and title both being assigned null in my controller. I also tested the INCOMING routes with http://haacked.com/archive/2008/03/13/url-routing-debugger.aspx and they all work fine.

    Read the article

  • Sweave/R - Automatically generating an appendix that contains all the model summaries/plots/data pro

    - by John Horton
    I like the idea of making research available at multiple levels of detail i.e., abstract for the casually curious, full text for the more interested, and finally the data and code for those working in the same area/trying to reproduce your results. In between the actual text and the data/code level, I'd like to insert another layer. Namely, I'd like to create a kind of automatically generated appendix that contains the full regression output, diagnostic plots, exploratory graphs data profiles etc. from the analysis, regardless of whether those plots/regressions etc. made it into the final paper. One idea I had was to write a script that would examine the .Rnw file and automatically: Profile all data sets that are loaded (sort of like the Hmisc(?) package) Summarize all regressions - i.e., run summary(model) for all models Present all plots (regardless of whether they made it in the final version) The idea is to make this kind of a low-effort, push-button sort of thing as opposed to a formal appendix written like the rest of a paper. What I'm looking for is some ideas on how to do this in R in a relatively simple way. My hunch is that there is some way of going through the namespace, figuring out what something is and then dumping into a PDF. Thoughts? Does something like this already exist?

    Read the article

  • Merging datasets based on 2 variables in SAS.

    - by John
    Hye Guys, my question is the following, i'm working with different databases, all contain information about 1000+ companies, a company is defined by its ticker code (the short version of the name( Ford as F) usually seen on stock quotation boards). Aside from the ticker code to merge on I also have to merge on the time, I used month as a count variable throughout my time series. The final purpose is to have a regression in the kind of Y(jt) = c + X(jt) +X1(jt) etc with j = company (ticker) and t = time (month). So imagine I have 2 databases, one which is the base database with variables such as Tickers, months, beta's of a company (risk measure) etc and a second database which has an extra variable (let's say market capitalisation). What I want to do then is to merge these 2 databases based on the ticker and the month. Example: Base database: Ticker __ Month __ Betas AA __ 4 __ 1.2 BB __ 8 __ 1.18 Second database: Ticker __ Month __ MCAP AA __ 4 __ 8542 BB __ 6 __ 1245 Then after merge I would like to have something like this: Ticker __ Month _ Betas ___ MCAP AA __ 4 _ 1.2 ___ 8542 So all observations that do not match BOTH date and ticker have to be dropped, I'm sure this is possible, just can't find the right type of code. Thanks! PS: I'm guessing the underscars have something to do with font layout but both the bold as italic is supposed to be normal :)

    Read the article

< Previous Page | 4 5 6 7 8 9 10 11  | Next Page >