data recovery - Page 117

HDD dead forever???

- by Roberto

Yesterday I turned on my computer and it couldn't boot. I found out the hd (320GB SATA Seagate Momentus 7200.3 for notebook) was broken, it couldn't be recognized by the BIOS. I have another of the same hd, so I exchanged the boards. I found out that there is a problem on its board since my good hd didn't work. But the broken hd doesn't work with the good board as well: it can be recognized but when I insert a Windows Instalation DVD it says the hd is 0GB. I put it in a case and use it in another computer via USB, and but it doesn't show up in the "My Computer". I used a software to recover files called "GetDataBack for NTFS", it recognized the hd but with the wrong size (2TB). I try to make it read the hd but it get an I/O error reading sector. It tries to read, the hd spins... So, since I'm using a good board on it, the problem seems to be internal. Is there anything someone could do recover the files from it?

Read the article

Ubuntu loading stops ?

- by joxnas

I don't know why, or how... but my ubuntu's loading / booting stops right after the ubuntu logo appearing.. An underscore appears in the right top of the screen: _ then, it disappears leaving the whole screen black Version is 9.10 ,Kernel is .20 I have tryed recovering mode and selected recover damaged packages option, but it didn't do any good.. I have very important files in Ubuntu that I need to copy to a pen drive or something, today.. The terminal mode is working, so i think i can do this there.. My questions: How can I get my Ubuntu to load? Is it possible to copy the files i need via terminal mode? I dont know if with other of the previous kernels it would work... but I had configured my grub (inside Ubuntu's gui) to show only the last kernel... and now i can't select any other kernel then .20 because I don't know a way to configure Grub unless via Ubuntu's gui... My hardware: ATI Mobility Radeon 4650 HD P7450 2.13Ghz Core duo 4Gb DDR2

Read the article

Uptime concerns in case of AWS outage

- by Aditya Patawari

I am running an Elastic Load Balancer backup by 2 instances in different Availability Zones in US East. I am using Multi-AZ RDS as well. Ideally this should ensure that if one AZ goes down, it should not effect the app because everything is spread across multiple AZs. But the recent AWS outage took the app down for a long time. I am not sure how this can happen. It would be great if someone can point out what went wrong. Major question here I have is how can I avoid this in future? I can setup app servers across different regions or even providers and use DNS for load balancing but what do I do with MySQL? Read Replicas will introduce some lag which I would want to avoid.

Read the article

how to read/recover a udf partition

- by Toc

I need to recover the files inside a UDF partition on my vcr external disk. Which program can I use in (preferably) Windows or Linux os?

Read the article

How to efficiently restore Library folder partially deleted on OS X

- by flow

I am using OS X Lion, and trying to delete some files I did accidentally (from home directoy): rm -fr Library I realized about this some 15 seconds later and did killall rm Some folders have been deleted, of course, inside "Library". Now the system seems to be ok, but I fear what will happen in case of reboot. I have a Time Machine backup from 5 days ago. I wonder if it would be a good solution, just to copy whole "Library" folder from my home directory from backup and replace this one. Or, what do you think would be the best approach? PS: In order to restore just deleted directories inside "Library", in which order does "rm" start to delete directories, alphabetically?

Read the article

How do I recover JPEGs with no file size?

- by Jill

I downloaded some wedding photos into my external drive about a month ago. A total of 3 cards were downloaded into 3 different files. The first file lists all of the photos, about 600 images, but they have zero bytes. The other 2 files are fine. I can't recover the compact flash card because I have used it too many times since then. Is there any way to recover the images on my drive?

Read the article

Question about getting information off an old HD...

- by user37983

Ok so my old cpu was a sony vaio. its kinda old and had xp on it. My gf thrashed it - the cd drive went and she tried fixing it by messing with the configurations bios etc. the actual laptop is really fried the keyboard doesnt work, etc. However the harddrive is still intact. I tried putting the HD in my new cpu (toshiba runnig win7) and it looks like its gonna boot up it goes to the screen with the logo and the status bar starts to load. then it flashes a blue screen for a split second, and goes to the black screen where it says windows did not shut down properly and gives options to (start windows normally, safe mode, safe mode with networking, safe mode with prompts) ive tried every option but it always goes back to this screen. I need to get into the hd cuz i have very important files. is there anyway?

Read the article

Vostro 1520 crash during bios update

- by Deadmilkman

My Vostro 1520 (P8700 C2D) originally came with BIOS A02. This bios version have known problems with VT, so I had to upgrade to A05. During the upgrade, the notebook crashed. Needless to say, it won't boot anymore. The simple question is: is there a way to access the bios boot block to recover the bad flash with a crisis disk? I know that the MINI and the ALIENWARE models have this feature, but I can't seem to find anything about it in the Vostro machines. Any help will be much appreciated. PS: Dell's solution is to change the motherboard, but it will take several days, as the board isn't available in my country :( Regards, Deadmilkman

Read the article

Windows 7 just deleted 4 days of work

- by Mat

Hey! I'm just a bit about to freak out. I just finished a project and rebooted my computer. It didn't want to boot anymore so I had to use the Windows 7 system repair option. It ran for a minute and then booted up. Now most of my source code from the last 4 days of work is gone! Background: sometimes (most often after installing new software) my notebook won't boot up anymore. It will just show the little Windows 7 flag, but not read from the hard disk anymore. If I hard-abort and reboot then, it asks me whether to start Windows normally (which won't work) or to run "Windows startup repair". If I run it, it does some stuff for about two or three minutes and then I can boot Windows again. Usually after this, .exe files I added to the computer during previous days are gone - but other files so far were not touched. But now, after this happened, a whole bunch of ".as" (ActionScript source files) from my project are gone! Does anyone know where and whether there's a way to recover them?

Read the article

I lost my CSS Codes of my important Website, Why?

- by Hooshkar

Very weird, i had opened notepad ++ and working on my CSS codes for my website, suddenly my little niece unplugged the computer, when i re-start the computer and opened again the same CSS codes file in notepad ++, so all i am seeing is "NULL NULL NULL NULL NULL NULL NULL" there is no codes, all lost. I opened the same CSS codes file in other editors and its all empty no codes.. is there a way to fix it, because it was my hard work. and what can be the cause? Thank you.

Read the article

How do I force an exchange database to become "active"?

- by makerofthings7

We had a catastrophic failure where all that remains is a single edb file. No backups. No log files. The database that remains is on the "passive" copy. The "active" copy is missing, but the server is active. The Exchange console reports that the edb file needs to be reseeded, however there is no source to reseed from. How do I make the "invalid" database file (missing logs) valid? How do I make exchange recognize this as a valid database to use as a primary?

Read the article

Use `dd` linux program to save / recover a disk's MBR

- by Graduate

I have an Ubuntu OS installed on my laptop. I want to install Windows 7 as well to another disk partition (I will do it by recovering it from a special partition on my laptop). After installing Windows, I want to recover my hard drive MBR to be able to load Ubuntu. I have a plan to use linux dd program: 1) (Before installing, perform this command in Linux) dd if=/dev/sda of=/home/user/mbr_backup bs=512 count=1 2) (after installing, load Ubuntu Live CD and launch this) dd if=/home/user/mbr_backup of=/dev/sda bs=512 count=1 3) Load Ubuntu on PC and re-configure the GRUB2 to be able start Windows I need your advice, I want to be sure I won't damage the disk (it's partition table).

Read the article

How can I tell if my hard drive(s) have Battery Backed Write Cache?

- by Riedsio

How can I tell if my hard drives have a battery backed write cache (BBWC)? How can I tell if it is enabled and/or configured correctly? I don't have physical access to my server. It's a GNU/Linux box. I can provide supplemental incremental information/details as requested. My frame of reference is that of a DBA -- I have access and privileges, but (usually) only tread where I know am supposed to. :)

Read the article

How to save your Linux state (suspend to disk) periodically to recover from crashes?

- by WoLpH

One in a while my laptop crashes/dies because of a bad/empty battery, crappy wifi driver or whatever other reason. For a while I've wondered if it's possible to force Linux to periodically save the state (like vmware snapshots) to disk so you can restore from that with possibly slightly outdated work but at least with all of your apps open in the same state you've left them. I don't really see the point in having to boot everything from cratch constantly, although KDE saves your state on logout, that doesn't happen periodically (by default) either. It would make it much nicer to recover from your crashes if your ram was written to disk periodically. Anyone know if there's a system call to do this without also shutting down the machine? Even a manual button to save the entire state would be nice.

Read the article

Can I move files from a laptop hard drive with a corrupt sector to a USB hard drive?

- by Corey

I have a hard drive that is on its way out and won't boot to Windows 7. The Windows partition takes up the whole disk. I thought I would try to recover some recent files that hadn't been backed up. Assuming the files are recoverable, how can I explore the drive that has the corrupt sector and transfer files to a USB hard drive? If it helps, the laptop is able to see the USB drive when choosing a boot order. Some searching lead me to WinPE 3.0, part of the Windows Automated Install Kit. Is that a method?

Read the article

S#harp architecture mapping many to many and ado.net data services: A single resource was expected f

- by Leg10n

Hi, I'm developing an application that reads data from a SQL server database (migrated from a legacy DB) with nHibernate and s#arp architecture through ADO.NET Data services. I'm trying to map a many-to-many relationship. I have a Error class: public class Error { public virtual int ERROR_ID { get; set; } public virtual string ERROR_CODE { get; set; } public virtual string DESCRIPTION { get; set; } public virtual IList<ErrorGroup> GROUPS { get; protected set; } } And then I have the error group class: public class ErrorGroup { public virtual int ERROR_GROUP_ID {get; set;} public virtual string ERROR_GROUP_NAME { get; set; } public virtual string DESCRIPTION { get; set; } public virtual IList<Error> ERRORS { get; protected set; } } And the overrides: public class ErrorGroupOverride : IAutoMappingOverride<ErrorGroup> { public void Override(AutoMapping<ErrorGroup> mapping) { mapping.Table("ERROR_GROUP"); mapping.Id(x => x.ERROR_GROUP_ID, "ERROR_GROUP_ID"); mapping.IgnoreProperty(x => x.Id); mapping.HasManyToMany<Error>(x => x.Error) .Table("ERROR_GROUP_LINK") .ParentKeyColumn("ERROR_GROUP_ID") .ChildKeyColumn("ERROR_ID").Inverse().AsBag(); } } public class ErrorOverride : IAutoMappingOverride<Error> { public void Override(AutoMapping<Error> mapping) { mapping.Table("ERROR"); mapping.Id(x => x.ERROR_ID, "ERROR_ID"); mapping.IgnoreProperty(x => x.Id); mapping.HasManyToMany<ErrorGroup>(x => x.GROUPS) .Table("ERROR_GROUP_LINK") .ParentKeyColumn("ERROR_ID") .ChildKeyColumn("ERROR_GROUP_ID").AsBag(); } } When I view the Data service in the browser like: http://localhost:1905/DataService.svc/Errors it shows the list of errors with no problems, and using it like http://localhost:1905/DataService.svc/Errors(123) works too. The Problem When I want to see the Errors in a group or the groups form an error, like: "http://localhost:1905/DataService.svc/Errors(123)?$expand=GROUPS" I get the XML Document, but the browser says: The XML page cannot be displayed Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh button, or try again later. -------------------------------------------------------------------------------- Only one top level element is allowed in an XML document. Error processing resource 'http://localhost:1905/DataServic... <error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"> -^ I view the sourcecode, and I get the data. However it comes with an exception: <error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"> <code></code> <message xml:lang="en-US">An error occurred while processing this request.</message> <innererror xmlns="xmlns"> <message>A single resource was expected for the result, but multiple resources were found.</message> <type>System.InvalidOperationException</type> <stacktrace> at System.Data.Services.Serializers.Serializer.WriteRequest(IEnumerator queryResults, Boolean hasMoved) at System.Data.Services.ResponseBodyWriter.Write(Stream stream)</stacktrace> </innererror> </error> A I missing something??? Where does this error come from?

Read the article

Data Warehouse ETL slow - change primary key in dimension?

- by Jubbles

I have a working MySQL data warehouse that is organized as a star schema and I am using Talend Open Studio for Data Integration 5.1 to create the ETL process. I would like this process to run once per day. I have estimated that one of the dimension tables (dimUser) will have approximately 2 million records and 23 columns. I created a small test ETL process in Talend that worked, but given the amount of data that may need to be updated daily, the current performance will not cut it. It takes the ETL process four minutes to UPDATE or INSERT 100 records to dimUser. If I assumed a linear relationship between the count of records and the amount of time to UPDATE or INSERT, then there is no way the ETL can finish in 3-4 hours (my hope), let alone one day. Since I'm unfamiliar with Java, I wrote the ETL as a Python script and ran into the same problem. Although, I did discover that if I did only INSERT, the process went much faster. I am pretty sure that the bottleneck is caused by the UPDATE statements. The primary key in dimUser is an auto-increment integer. My friend suggested that I scrap this primary key and replace it with a multi-field primary key (in my case, 2-3 fields). Before I rip the test data out of my warehouse and change the schema, can anyone provide suggestions or guidelines related to the design of the data warehouse the ETL process how realistic it is to have an ETL process INSERT or UPDATE a few million records each day will my friend's suggestion significantly help If you need any further information, just let me know and I'll post it. UPDATE - additional information: mysql> describe dimUser; Field Type Null Key Default Extra user_key int(10) unsigned NO PRI NULL auto_increment id_A int(10) unsigned NO NULL id_B int(10) unsigned NO NULL field_4 tinyint(4) unsigned NO 0 field_5 varchar(50) YES NULL city varchar(50) YES NULL state varchar(2) YES NULL country varchar(50) YES NULL zip_code varchar(10) NO 99999 field_10 tinyint(1) NO 0 field_11 tinyint(1) NO 0 field_12 tinyint(1) NO 0 field_13 tinyint(1) NO 1 field_14 tinyint(1) NO 0 field_15 tinyint(1) NO 0 field_16 tinyint(1) NO 0 field_17 tinyint(1) NO 1 field_18 tinyint(1) NO 0 field_19 tinyint(1) NO 0 field_20 tinyint(1) NO 0 create_date datetime NO 2012-01-01 00:00:00 last_update datetime NO 2012-01-01 00:00:00 run_id int(10) unsigned NO 999 I used a surrogate key because I had read that it was good practice. Since, from a business perspective, I want to keep aware of potential fraudulent activity (say for 200 days a user is associated with state X and then the next day they are associated with state Y - they could have moved or their account could have been compromised), so that is why geographic data is kept. The field id_B may have a few distinct values of id_A associated with it, but I am interested in knowing distinct (id_A, id_B) tuples. In the context of this information, my friend suggested that something like (id_A, id_B, zip_code) be the primary key. For the large majority of daily ETL processes (80%), I only expect the following fields to be updated for existing records: field_10 - field_14, last_update, and run_id (this field is a foreign key to my etlLog table and is used for ETL auditing purposes).

Read the article

Change Data Capture Webinar

I am going to be doing a webinar with our friends at Attunity on Change Data Capture. Attunity have a good story around this technology and you can use it in your SSIS loads to great effect. Join Attunity and Konesans/SQLIS for a Webinar on 17 September Space is limited. Reserve your Webinar seat now at: https://www1.gotomeeting.com/register/693735512 Want increased efficiency and real-time speed when conducting ETL loads? Need lower implementation costs while minimizing system impact? Learn how change data capture (CDC) technologies can reduce ETL load times. Allan Mitchell, Principal Consultant at Konesans and SQLServer MVP specialising in ETL, will explain CDC concepts and benefits and how CDC can dramatically reduce ETL load times. Ian Archibald, Pre-Sales Director EMEA for Attunity, will present and demonstrate Attunity's award-winning Oracle-CDC for SSIS, a fully-integrated SSIS solution for designing, deploying and managing Oracle CDC processes. Title: Change Data Capture - Reducing ETL Load Times Date: Thursday, September 17, 2009 Time: 10:00 AM - 11:00 AM BST ABOUT THE SPEAKERS: Allan Mitchell is the joint owner of Konesans Ltd, a UK based consultancy specializing in SQL Server, and most importantly SQL Server Integration Services. Having been working with SQL Server from 6.5 onwards, he has extensive experience in many aspects of SQL Server, but now focuses on the BI suite of tools. He is a SQL Server MVP, a frequent poster on the MS SSIS/DTS newsgroups, and runs the sqldts.com and sqlis.com resource sites. Ian Archibald, Attunity Pre-Sales Director EMEA, has worked in Attunity’s UK Office for 17 years. An expert in Attunity solutions, Ian has extensive knowledge of Attunity’s products and data integration & CDC technologies. After registering you will receive a confirmation email containing information about joining the Webinar. System Requirements PC-based attendees Required: Windows® 2000, XP Home, XP Pro, 2003 Server, Vista Macintosh®-based attendees Required: Mac OS® X 10.4 (Tiger®) or newer

Read the article

Change Data Capture Webinar

I am going to be doing a webinar with our friends at Attunity on Change Data Capture. Attunity have a good story around this technology and you can use it in your SSIS loads to great effect. Join Attunity and Konesans/SQLIS for a Webinar on 17 September Space is limited. Reserve your Webinar seat now at: https://www1.gotomeeting.com/register/693735512 Want increased efficiency and real-time speed when conducting ETL loads? Need lower implementation costs while minimizing system impact? Learn how change data capture (CDC) technologies can reduce ETL load times. Allan Mitchell, Principal Consultant at Konesans and SQLServer MVP specialising in ETL, will explain CDC concepts and benefits and how CDC can dramatically reduce ETL load times. Ian Archibald, Pre-Sales Director EMEA for Attunity, will present and demonstrate Attunity's award-winning Oracle-CDC for SSIS, a fully-integrated SSIS solution for designing, deploying and managing Oracle CDC processes. Title: Change Data Capture - Reducing ETL Load Times Date: Thursday, September 17, 2009 Time: 10:00 AM - 11:00 AM BST ABOUT THE SPEAKERS: Allan Mitchell is the joint owner of Konesans Ltd, a UK based consultancy specializing in SQL Server, and most importantly SQL Server Integration Services. Having been working with SQL Server from 6.5 onwards, he has extensive experience in many aspects of SQL Server, but now focuses on the BI suite of tools. He is a SQL Server MVP, a frequent poster on the MS SSIS/DTS newsgroups, and runs the sqldts.com and sqlis.com resource sites. Ian Archibald, Attunity Pre-Sales Director EMEA, has worked in Attunity’s UK Office for 17 years. An expert in Attunity solutions, Ian has extensive knowledge of Attunity’s products and data integration & CDC technologies. After registering you will receive a confirmation email containing information about joining the Webinar. System Requirements PC-based attendees Required: Windows® 2000, XP Home, XP Pro, 2003 Server, Vista Macintosh®-based attendees Required: Mac OS® X 10.4 (Tiger®) or newer

Read the article

Do you need all that data?

- by BuckWoody

I read an amazing post over on ars technica (link: http://arstechnica.com/science/news/2010/03/the-software-brains-behind-the-particle-colliders.ars?utm_source=rss&utm_medium=rss&utm_campaign=rss) abvout the LHC, or as they are also known, the "particle colliders". Beyond just the pure scientific geek awesomeness, these instruments have the potential to collect more data than you can (or possibly should) store. Actually, this problem has a lot in common with a BI system. There's so much granular detail available in the source systems that a designer has to decide how, and how much, to roll up the data. Whenver you do that, you lose fidelity, but in many cases that's OK. Take, for example, your car's speedometer. You don't actually need to track each and every point of speed as it happens. You only need to know that you're hovering around the speed limit at a certain point in time. Since this is the way that humans percieve data, is there some lesson we should take in the design of data "flows" - and what implications does this have for new technologies like StreamInsight? Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article

Accessing Server-Side Data from Client Script: Accessing JSON Data From an ASP.NET Page Using jQuery

When building a web application, we must decide how and when the browser will communicate with the web server. The ASP.NET WebForms model greatly simplifies web development by providing a straightforward mechanism for exchanging data between the browser and the server. With WebForms, each ASP.NET page's rendered output includes a <form> element that performs a postback to the same page whenever a Button control within the form is clicked, or whenever the user modifies a control whose AutoPostBack property is set to True. On postback, the server sends the entire contents of the web page back to the browser, which then displays this new content. With WebForms we don't need to spend much time or effort thinking about how or when the browser will communicate with the server or how that returned information will be processed by the browser. It just works. While this approach certainly works and has its advantages, it's not without its drawbacks. The primary concern with postback forms is that they require a large amount of information to be exchanged between the browser and the server. Specifically, the browser sends back all of its form fields (including hidden ones, like view state, which may be quite large) and then the server sends back the entire contents of the web page. Granted, there are scenarios where this large quantity of data needs to be exchanged, but in many cases we can use techniques that exchange much less information. However, these techniques necessitate spending more time and effort thinking about how and when to have the browser communicate with the server and intelligently deciding on what information needs to be exchanged. This article, the first in a multi-part series, examines different techniques for accessing server-side data from a browser using client-side script. Throughout this series we will explore alternative ways to expose data on the server so that it can be accessed from the browser using script; we will also examine various tools for communicating with the server from JavaScript, including jQuery and the ASP.NET AJAX library. Read on to learn more! Read More >

Read the article

SQL SERVER – Standards Support, Protocol, Data Portability – 3 Important SQL Server Documentations for Downloads

- by pinaldave

I have been working with SQL Server for more than 8 years now continuously and I like to read a lot. Some time I read easy things and sometime I read stuff which are not so easy. Here are few recently released article which I referred and read. They are not easy read but indeed very important read if you are the one who like to read things which are more advanced. SQL Server Standards Support Documentation The SQL Server standards support documentation provides detailed support information for certain standards that are implemented in Microsoft SQL Server. Microsoft SQL Server Protocol Documentation The Microsoft SQL Server protocol documentation provides technical specifications for Microsoft proprietary protocols that are implemented and used in Microsoft SQL Server 2008. Microsoft SQL Server Data Portability Documentation The SQL Server data portability documentation explains various mechanisms by which user-created data in SQL Server can be extracted for use in other software products. These mechanisms include import/export functionality, documented APIs, industry standard formats, or documented data structures/file formats. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Documentation, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article

WebCenter .NET Accelerator - Microsoft SharePoint Data via WSRP

- by john.brunswick

Platforms in the enterprise will never be homogeneous. As much as any vendor would enjoy having their single development or application technology be exclusively adopted by customers, too much legacy, time, education, innovation and vertical business needs exist to make using a single platform practical. JAVA and .NET are the two industry application platform heavyweights and more often than not, business users are leveraging various systems in their day to day activities that incorporate applications developed on top of both platforms. BEA Systems acquired Plumtree Software to complete their "liquid" view of data, stressing that regardless of a particular source system heterogeneous data could interoperate at not only through layers that allowed for data aggregation, but also at the "glass" or UI layer. The technical components that allowed the integration at the glass thrive today at Oracle, helping WebCenter to provide a rich composite application framework. Oracle Ensemble and the Oracle .NET Application Accelerator allow WebCenter to consume and interact with the UI layers provided by .NET applications and a series of other technologies. The beauty of the .NET accelerator is that it can consume any .NET application and act as a Web Services for Remote Portlets (WSRP) producer. I recently had a chance to leverage the .NET accelerator to expose a ASP .NET 2.0 (C#) application in the WebCenter UI (pictured above) and wanted to share a few tips to help others get started with similar integrations. I was using two virtual machines for the exercise - one with Windows Server 2003, running SharePoint and the other running WebCenter Spaces 11g. For my sample application data I ended up using SharePoint 2007 lists and calendars (MOSS 2007) to supply results using a .NET API for SharePoint.

Read the article

Filtering a Grid of Data in ASP.NET MVC

This article is the fourth installment in an ongoing series on displaying a grid of data in an ASP.NET MVC application. The previous two articles in this series - Sorting a Grid of Data in ASP.NET MVC and Displaying a Paged Grid of Data in ASP.NET MVC - showed how to sort and page data in a grid. This article explores how to present a filtering interface to the user and then only show those records that conform to the filtering criteria. In particular, the demo we examine in this installment presents an interface with three filtering criteria: the category, minimum price, and whether to omit discontinued products. Using this interface the user can apply one or more of these criteria, allowing a variety of filtered displays. For example, the user could opt to view: all products in the Condiments category; those products in the Confections category that cost $50.00 or more; all products that cost $25.00 or more and are not discontinued; or any other such combination. Like with its predecessors, this article offers step-by-step instructions and includes a complete, working demo available for download at the end of the article. Read on to learn more! Read More >

Read the article

Convert ddply {plyr} to Oracle R Enterprise, or use with Embedded R Execution

- by Mark Hornick

The plyr package contains a set of tools for partitioning a problem into smaller sub-problems that can be more easily processed. One function within {plyr} is ddply, which allows you to specify subsets of a data.frame and then apply a function to each subset. The result is gathered into a single data.frame. Such a capability is very convenient. The function ddply also has a parallel option that if TRUE, will apply the function in parallel, using the backend provided by foreach. This type of functionality is available through Oracle R Enterprise using the ore.groupApply function. In this blog post, we show a few examples from Sean Anderson's "A quick introduction to plyr" to illustrate the correpsonding functionality using ore.groupApply. To get started, we'll create a demo data set and load the plyr package. set.seed(1) d <- data.frame(year = rep(2000:2014, each = 3), count = round(runif(45, 0, 20))) dim(d) library(plyr) This first example takes the data frame, partitions it by year, and calculates the coefficient of variation of the count, returning a data frame. # Example 1 res <- ddply(d, "year", function(x) { mean.count <- mean(x$count) sd.count <- sd(x$count) cv <- sd.count/mean.count data.frame(cv.count = cv) }) To illustrate the equivalent functionality in Oracle R Enterprise, using embedded R execution, we use the ore.groupApply function on the same data, but pushed to the database, creating an ore.frame. The function ore.push creates a temporary table in the database, returning a proxy object, the ore.frame. D <- ore.push(d) res <- ore.groupApply (D, D$year, function(x) { mean.count <- mean(x$count) sd.count <- sd(x$count) cv <- sd.count/mean.count data.frame(year=x$year[1], cv.count = cv) }, FUN.VALUE=data.frame(year=1, cv.count=1)) You'll notice the similarities in the first three arguments. With ore.groupApply, we augment the function to return the specific data.frame we want. We also specify the argument FUN.VALUE, which describes the resulting data.frame. From our previous blog posts, you may recall that by default, ore.groupApply returns an ore.list containing the results of each function invocation. To get a data.frame, we specify the structure of the result. The results in both cases are the same, however the ore.groupApply result is an ore.frame. In this case the data stays in the database until it's actually required. This can result in significant memory and time savings whe data is large. R> class(res) [1] "ore.frame" attr(,"package") [1] "OREbase" R> head(res) year cv.count 1 2000 0.3984848 2 2001 0.6062178 3 2002 0.2309401 4 2003 0.5773503 5 2004 0.3069680 6 2005 0.3431743 To make the ore.groupApply execute in parallel, you can specify the argument parallel with either TRUE, to use default database parallelism, or to a specific number, which serves as a hint to the database as to how many parallel R engines should be used. The next ddply example uses the summarise function, which creates a new data.frame. In ore.groupApply, the year column is passed in with the data. Since no automatic creation of columns takes place, we explicitly set the year column in the data.frame result to the value of the first row, since all rows received by the function have the same year. # Example 2 ddply(d, "year", summarise, mean.count = mean(count)) res <- ore.groupApply (D, D$year, function(x) { mean.count <- mean(x$count) data.frame(year=x$year[1], mean.count = mean.count) }, FUN.VALUE=data.frame(year=1, mean.count=1)) R> head(res) year mean.count 1 2000 7.666667 2 2001 13.333333 3 2002 15.000000 4 2003 3.000000 5 2004 12.333333 6 2005 14.666667 Example 3 uses the transform function with ddply, which modifies the existing data.frame. With ore.groupApply, we again construct the data.frame explicilty, which is returned as an ore.frame. # Example 3 ddply(d, "year", transform, total.count = sum(count)) res <- ore.groupApply (D, D$year, function(x) { total.count <- sum(x$count) data.frame(year=x$year[1], count=x$count, total.count = total.count) }, FUN.VALUE=data.frame(year=1, count=1, total.count=1)) > head(res) year count total.count 1 2000 5 23 2 2000 7 23 3 2000 11 23 4 2001 18 40 5 2001 4 40 6 2001 18 40 In Example 4, the mutate function with ddply enables you to define new columns that build on columns just defined. Since the construction of the data.frame using ore.groupApply is explicit, you always have complete control over when and how to use columns. # Example 4 ddply(d, "year", mutate, mu = mean(count), sigma = sd(count), cv = sigma/mu) res <- ore.groupApply (D, D$year, function(x) { mu <- mean(x$count) sigma <- sd(x$count) cv <- sigma/mu data.frame(year=x$year[1], count=x$count, mu=mu, sigma=sigma, cv=cv) }, FUN.VALUE=data.frame(year=1, count=1, mu=1,sigma=1,cv=1)) R> head(res) year count mu sigma cv 1 2000 5 7.666667 3.055050 0.3984848 2 2000 7 7.666667 3.055050 0.3984848 3 2000 11 7.666667 3.055050 0.3984848 4 2001 18 13.333333 8.082904 0.6062178 5 2001 4 13.333333 8.082904 0.6062178 6 2001 18 13.333333 8.082904 0.6062178 In Example 5, ddply is used to partition data on multiple columns before constructing the result. Realizing this with ore.groupApply involves creating an index column out of the concatenation of the columns used for partitioning. This example also allows us to illustrate using the ORE transparency layer to subset the data. # Example 5 baseball.dat <- subset(baseball, year > 2000) # data from the plyr package x <- ddply(baseball.dat, c("year", "team"), summarize, homeruns = sum(hr)) We first push the data set to the database to get an ore.frame. We then add the composite column and perform the subset, using the transparency layer. Since the results from database execution are unordered, we will explicitly sort these results and view the first 6 rows. BB.DAT <- ore.push(baseball) BB.DAT$index <- with(BB.DAT, paste(year, team, sep="+")) BB.DAT2 <- subset(BB.DAT, year > 2000) X <- ore.groupApply (BB.DAT2, BB.DAT2$index, function(x) { data.frame(year=x$year[1], team=x$team[1], homeruns=sum(x$hr)) }, FUN.VALUE=data.frame(year=1, team="A", homeruns=1), parallel=FALSE) res <- ore.sort(X, by=c("year","team")) R> head(res) year team homeruns 1 2001 ANA 4 2 2001 ARI 155 3 2001 ATL 63 4 2001 BAL 58 5 2001 BOS 77 6 2001 CHA 63 Our next example is derived from the ggplot function documentation. This illustrates the use of ddply within using the ggplot2 package. We first create a data.frame with demo data and use ddply to create some statistics for each group (gp). We then use ggplot to produce the graph. We can take this same code, push the data.frame df to the database and invoke this on the database server. The graph will be returned to the client window, as depicted below. # Example 6 with ggplot2 library(ggplot2) df <- data.frame(gp = factor(rep(letters[1:3], each = 10)), y = rnorm(30)) # Compute sample mean and standard deviation in each group library(plyr) ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y)) # Set up a skeleton ggplot object and add layers: ggplot() + geom_point(data = df, aes(x = gp, y = y)) + geom_point(data = ds, aes(x = gp, y = mean), colour = 'red', size = 3) + geom_errorbar(data = ds, aes(x = gp, y = mean, ymin = mean - sd, ymax = mean + sd), colour = 'red', width = 0.4) DF <- ore.push(df) ore.tableApply(DF, function(df) { library(ggplot2) library(plyr) ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y)) ggplot() + geom_point(data = df, aes(x = gp, y = y)) + geom_point(data = ds, aes(x = gp, y = mean), colour = 'red', size = 3) + geom_errorbar(data = ds, aes(x = gp, y = mean, ymin = mean - sd, ymax = mean + sd), colour = 'red', width = 0.4) }) But let's take this one step further. Suppose we wanted to produce multiple graphs, partitioned on some index column. We replicate the data three times and add some noise to the y values, just to make the graphs a little different. We also create an index column to form our three partitions. Note that we've also specified that this should be executed in parallel, allowing Oracle Database to control and manage the server-side R engines. The result of ore.groupApply is an ore.list that contains the three graphs. Each graph can be viewed by printing the list element. df2 <- rbind(df,df,df) df2$y <- df2$y + rnorm(nrow(df2)) df2$index <- c(rep(1,300), rep(2,300), rep(3,300)) DF2 <- ore.push(df2) res <- ore.groupApply(DF2, DF2$index, function(df) { df <- df[,1:2] library(ggplot2) library(plyr) ds <- ddply(df, .(gp), summarise, mean = mean(y), sd = sd(y)) ggplot() + geom_point(data = df, aes(x = gp, y = y)) + geom_point(data = ds, aes(x = gp, y = mean), colour = 'red', size = 3) + geom_errorbar(data = ds, aes(x = gp, y = mean, ymin = mean - sd, ymax = mean + sd), colour = 'red', width = 0.4) }, parallel=TRUE) res[[1]] res[[2]] res[[3]] To recap, we've illustrated how various uses of ddply from the plyr package can be realized in ore.groupApply, which affords the user explicit control over the contents of the data.frame result in a straightforward manner. We've also highlighted how ddply can be used within an ore.groupApply call.

Search Results

Search found 59880 results on 2396 pages for 'data recovery'.

Page 117/2396 | < Previous Page | 113 114 115 116 117 118 119 120 121 122 123 124 | Next Page >

- by Roberto

- by joxnas

- by Aditya Patawari

- by Toc

- by flow

- by Jill

- by user37983

- by Deadmilkman

- by Mat

- by Hooshkar

- by makerofthings7

- by Graduate

- by Riedsio

- by WoLpH

- by Corey

- by Leg10n

- by Jubbles

- by BuckWoody

- by pinaldave

- by john.brunswick

- by Mark Hornick

< Previous Page | 113 114 115 116 117 118 119 120 121 122 123 124 | Next Page >