Search Results

Search found 67262 results on 2691 pages for 'data driven testing'.

Page 92/2691 | < Previous Page | 88 89 90 91 92 93 94 95 96 97 98 99  | Next Page >

  • Gradual approaches to dependency injection

    - by JW01
    I'm working on making my classes unit-testable, using dependency injection. But some of these classes have a lot of clients, and I'm not ready to refactor all of them to start passing in the dependencies yet. So I'm trying to do it gradually; keeping the default dependencies for now, but allowing them to be overridden for testing. One approach I'm conisdering is just moving all the "new" calls into their own methods, e.g.: public MyObject createMyObject(args) { return new MyObject(args); } Then in my unit tests, I can just subclass this class, and override the create functions, so they create fake objects instead. Is this a good approach? Are there any disadvantages? More generally, is it okay to have hard-coded dependencies, as long as you can replace them for testing? I know the preferred approach is to explicitly require them in the constructor, and I'd like to get there eventually. But I'm wondering if this is a good first step.

    Read the article

  • My Take on Hadoop World 2011

    - by Jean-Pierre Dijcks
    I’m sure some of you have read pieces about Hadoop World and I did see some headlines which were somewhat, shall we say, interesting? I thought the keynote by Larry Feinsmith of JP Morgan Chase & Co was one of the highlights of the conference for me. The reason was very simple, he addressed some real use cases outside of internet and ad platforms. The following are my notes, since the keynote was recorded I presume you can go and look at Hadoopworld.com at some point… On the use cases that were mentioned: ETL – how can I do complex data transformation at scale Doing Basel III liquidity analysis Private banking – transaction filtering to feed [relational] data marts Common Data Platform – a place to keep data that is (or will be) valuable some day, to someone, somewhere 360 Degree view of customers – become pro-active and look at events across lines of business. For example make sure the mortgage folks know about direct deposits being stopped into an account and ensure the bank is pro-active to service the customer Treasury and Security – Global Payment Hub [I think this is really consolidation of data to cross reference activity across business and geographies] Data Mining Bypass data engineering [I interpret this as running a lot of a large data set rather than on samples] Fraud prevention – work on event triggers, say a number of failed log-ins to the website. When they occur grab web logs, firewall logs and rules and start to figure out who is trying to log in. Is this me, who forget his password, or is it someone in some other country trying to guess passwords Trade quality analysis – do a batch analysis or all trades done and run them through an analysis or comparison pipeline One of the key requests – if you can say it like that – was for vendors and entrepreneurs to make sure that new tools work with existing tools. JPMC has a large footprint of BI Tools and Big Data reporting and tools should work with those tools, rather than be separate. Security and Entitlement – how to protect data within a large cluster from unwanted snooping was another topic that came up. I thought his Elephant ears graph was interesting (couldn’t actually read the points on it, but the concept certainly made some sense) and it was interesting – when asked to show hands – how the audience did not (!) think that RDBMS and Hadoop technology would overlap completely within a few years. Another interesting session was the session from Disney discussing how Disney is building a DaaS (Data as a Service) platform and how Hadoop processing capabilities are mixed with Database technologies. I thought this one of the best sessions I have seen in a long time. It discussed real use case, where problems existed, how they were solved and how Disney planned some of it. The planning focused on three things/phases: Determine the Strategy – Design a platform and evangelize this within the organization Focus on the people – Hire key people, grow and train the staff (and do not overload what you have with new things on top of their day-to-day job), leverage a partner with experience Work on Execution of the strategy – Implement the platform Hadoop next to the other technologies and work toward the DaaS platform This kind of fitted with some of the Linked-In comments, best summarized in “Think Platform – Think Hadoop”. In other words [my interpretation], step back and engineer a platform (like DaaS in the Disney example), then layer the rest of the solutions on top of this platform. One general observation, I got the impression that we have knowledge gaps left and right. On the one hand are people looking for more information and details on the Hadoop tools and languages. On the other I got the impression that the capabilities of today’s relational databases are underestimated. Mostly in terms of data volumes and parallel processing capabilities or things like commodity hardware scale-out models. All in all I liked this conference, it was great to chat with a wide range of people on Oracle big data, on big data, on use cases and all sorts of other stuff. Just hope they get a set of bigger rooms next time… and yes, I hope I’m going to be back next year!

    Read the article

  • Should devs, testers and business users have one unified test script?

    - by Carlos Jaime C. De Leon
    In development, I would normally have my own test scripts that would document the data, scenarios and execution steps that I plan to test; this is my dev test plan. When the functionality has been deployed to Test, testers test it using their own test script that they wrote. In UAT, the business user then tests using their own test plan. In retrospect, it looks like this provides a better coverage, with dev tests having a mix of black and white box testing, while testers and business users focus on black box testing. But on the other hand, this brings up distinct test cases that only are executed per stage (ie. some cases which testers thought of are only executed on Test stage) and it would like the dev missed it, which makes it a finding/bug. Is it worth consolidating the test scripts from the start? Thus using one unified test script, or is it abit difficult to do this upfront?

    Read the article

  • TDD with limited resources

    - by bunglestink
    I work in a large company, but on a just two man team developing desktop LOB applications. I have been researching TDD for quite a while now, and although it is easy to realize its benefits for larger applications, I am having a hard time trying to justify the time to begin using TDD on the scale of our applications. I understand its advantages in automating testing, improving maintainability, etc., but on our scale, writing even basic unit tests for all of our components could easily double development time. Since we are already undermanned with extreme deadlines, I am not sure what direction to take. While other practices such as agile iterative development make perfect since, I am kind of torn over the productivity trade-offs of TDD on a small team. Are the advantages of TDD worth the extra development time on small teams with very tight schedules?

    Read the article

  • Questions to ask interviewer in an Interview

    - by chota
    Hello All, I have an SDET interview upcoming week. I have been preparing since long. It is a good company. I am working as SDET since two year. I wonder what questions should i ask to my interviewer regarding testing and other thing. I would appreciate your help if you give me some sample questions that i should ask to my interviewer during the interview. Some of them i thoughts are a) What type of testing methodologies do you use? Do you have triage meeting everyday? What percentage of code coverage is done by unit tests? I do not find these questions to be more effective, i would appreciate if somebody could help me out in coming out with better question?

    Read the article

  • Architecting multi-model multi-DB ASP.NET MVC solution

    - by A. Murray
    I have an ASP.NET MVC 4 solution that I'm putting together, leveraging IoC and the repository pattern using Entity Framework 5. I have a new requirement to be able to pull data from a second database (from another internal application) which I don't have control over. There is no API available unfortunately for the second application and the general pattern at my place of work is to go direct to the database. I want to maintain a consistent approach to modeling the domain and use entity framework to pull the data out, so thus far I have used Entity Framework's database first approach to generate a domain model and database context over the top of this. However, I've become a little stuck on how to include the second domain model in the application. I have a generic repository which I've now moved out to a common DataAccess project, but short of creating two distinct wrappers for the generic repository (so each can identify with a specific database context), I'm struggling to see how I can elegantly include multiple models?

    Read the article

  • Sort Data in Windows Phone using Collection View Source

    - by psheriff
    When you write a Windows Phone application you will most likely consume data from a web service somewhere. If that service returns data to you in a sort order that you do not want, you have an easy alternative to sort the data without writing any C# or VB code. You use the built-in CollectionViewSource object in XAML to perform the sorting for you. This assumes that you can get the data into a collection that implements the IEnumerable or IList interfaces.For this example, I will be using a simple Product class with two properties, and a list of Product objects using the Generic List class. Try this out by creating a Product class as shown in the following code:public class Product {  public Product(int id, string name)   {    ProductId = id;    ProductName = name;  }  public int ProductId { get; set; }  public string ProductName { get; set; }}Create a collection class that initializes a property called DataCollection with some sample data as shown in the code below:public class Products : List<Product>{  public Products()  {    InitCollection();  }  public List<Product> DataCollection { get; set; }  List<Product> InitCollection()  {    DataCollection = new List<Product>();    DataCollection.Add(new Product(3,        "PDSA .NET Productivity Framework"));    DataCollection.Add(new Product(1,        "Haystack Code Generator for .NET"));    DataCollection.Add(new Product(2,        "Fundamentals of .NET eBook"));    return DataCollection;  }}Notice that the data added to the collection is not in any particular order. Create a Windows Phone page and add two XML namespaces to the Page.xmlns:scm="clr-namespace:System.ComponentModel;assembly=System.Windows"xmlns:local="clr-namespace:WPSortData"The 'local' namespace is an alias to the name of the project that you created (in this case WPSortData). The 'scm' namespace references the System.Windows.dll and is needed for the SortDescription class that you will use for sorting the data. Create a phone:PhoneApplicationPage.Resources section in your Windows Phone page that looks like the following:<phone:PhoneApplicationPage.Resources>  <local:Products x:Key="products" />  <CollectionViewSource x:Key="prodCollection"      Source="{Binding Source={StaticResource products},                       Path=DataCollection}">    <CollectionViewSource.SortDescriptions>      <scm:SortDescription PropertyName="ProductName"                           Direction="Ascending" />    </CollectionViewSource.SortDescriptions>  </CollectionViewSource></phone:PhoneApplicationPage.Resources>The first line of code in the resources section creates an instance of your Products class. The constructor of the Products class calls the InitCollection method which creates three Product objects and adds them to the DataCollection property of the Products class. Once the Products object is instantiated you now add a CollectionViewSource object in XAML using the Products object as the source of the data to this collection. A CollectionViewSource has a SortDescriptions collection that allows you to specify a set of SortDescription objects. Each object can set a PropertyName and a Direction property. As you see in the above code you set the PropertyName equal to the ProductName property of the Product object and tell it to sort in an Ascending direction.All you have to do now is to create a ListBox control and set its ItemsSource property to the CollectionViewSource object. The ListBox displays the data in sorted order by ProductName and you did not have to write any LINQ queries or write other code to sort the data!<ListBox    ItemsSource="{Binding Source={StaticResource prodCollection}}"   DisplayMemberPath="ProductName" />SummaryIn this blog post you learned that you can sort any data without having to change the source code of where the data comes from. Simply feed the data into a CollectionViewSource in XAML and set some sort descriptions in XAML and the rest is done for you! This comes in very handy when you are consuming data from a source where the data is given to you and you do not have control over the sorting.NOTE: You can download this article and many samples like the one shown in this blog entry at my website. http://www.pdsa.com/downloads. Select “Tips and Tricks”, then “Sort Data in Windows Phone using Collection View Source” from the drop down list.Good Luck with your Coding,Paul Sheriff** SPECIAL OFFER FOR MY BLOG READERS **We frequently offer a FREE gift for readers of my blog. Visit http://www.pdsa.com/Event/Blog for your FREE gift!

    Read the article

  • Connect to QuickBooks from PowerBuilder using RSSBus ADO.NET Data Provider

    - by dataintegration
    The RSSBus ADO.NET providers are easy-to-use, standards based controls that can be used from any platform or development technology that supports Microsoft .NET, including Sybase PowerBuilder. In this article we show how to use the RSSBus ADO.NET Provider for QuickBooks in PowerBuilder. A similar approach can be used from PowerBuilder with other RSSBus ADO.NET Data Providers to access data from Salesforce, SharePoint, Dynamics CRM, Google, OData, etc. In this article we will show how to create a basic PowerBuilder application that performs CRUD operations using the RSSBus ADO.NET Provider for QuickBooks. Step 1: Open PowerBuilder and create a new WPF Window Application solution. Step 2: Add all the Visual Controls needed for the connection properties. Step 3: Add the DataGrid control from the .NET controls. Step 4:Configure the columns of the DataGrid control as shown below. The column bindings will depend on the table. <DataGrid AutoGenerateColumns="False" Margin="13,249,12,14" Name="datagrid1" TabIndex="70" ItemsSource="{Binding}"> <DataGrid.Columns> <DataGridTextColumn x:Name="idColumn" Binding="{Binding Path=ID}" Header="ID" Width="SizeToHeader" /> <DataGridTextColumn x:Name="nameColumn" Binding="{Binding Path=Name}" Header="Name" Width="SizeToHeader" /> ... </DataGrid.Columns> </DataGrid> Step 5:Add a reference to the RSSBus ADO.NET Provider for QuickBooks assembly. Step 6:Optional: Set the QBXML Version to 6. Some of the tables in QuickBooks require a later version of QuickBooks to support updates and deletes. Please check the help for details. Connect the DataGrid: Once the visual elements have been configured, developers can use standard ADO.NET objects like Connection, Command, and DataAdapter to populate a DataTable with the results of a SQL query: System.Data.RSSBus.QuickBooks.QuickBooksConnection conn conn = create System.Data.RSSBus.QuickBooks.QuickBooksConnection(connectionString) System.Data.RSSBus.QuickBooks.QuickBooksCommand comm comm = create System.Data.RSSBus.QuickBooks.QuickBooksCommand(command, conn) System.Data.DataTable table table = create System.Data.DataTable System.Data.RSSBus.QuickBooks.QuickBooksDataAdapter dataAdapter dataAdapter = create System.Data.RSSBus.QuickBooks.QuickBooksDataAdapter(comm) dataAdapter.Fill(table) datagrid1.ItemsSource=table.DefaultView The code above can be used to bind data from any query (set this in command), to the DataGrid. The DataGrid should have the same columns as those returned from the SELECT statement. PowerBuilder Sample Project The included sample project includes the steps outlined in this article. You will also need the QuickBooks ADO.NET Data Provider to make the connection. You can download a free trial here.

    Read the article

  • Installing, Configuring, and Testing WebLogic Server 12c Developer Zip Distribution in NetBeans

    - by JuergenKress
    This tutorial covers how to install Oracle WebLogic Server 12c (12.1.1) developer zip file distribution on Windows and configure it as a Java EE Application Server in NetBeans. Also covers how to test the WebLogic Server installation by deploying a Web application based on JSF and JPA entities. WebLogic Partner Community For regular information become a member in the WebLogic Partner Community please visit: http://www.oracle.com/partners/goto/wls-emea ( OPN account required). If you need support with your account please contact the Oracle Partner Business Center. BlogTwitterLinkedInMixForumWiki Technorati Tags: WebLogic installation,Netbeans,Developer zip,WebLogic Community,Oracle,OPN,Jürgen Kress

    Read the article

  • Swiss Re increases data warehouse performance and deploys in record time

    - by KLaker
    Great information on yet another data warehouse deployment on Exadata. A little background on Swiss Re: In 2002, Swiss Re established a data warehouse for its client markets and products to gather reinsurance information across all organizational units into an integrated structure. The data warehouse provided the basis for reporting at the group level with drill-down capability to individual contracts, while facilitating application integration and data exchange by using common data standards. Initially focusing on property and casualty reinsurance information only, it now includes life and health reinsurance, insurance, and nonlife insurance information. Key highlights of the benefits that Swiss Re achieved by using Exadata: Reduced the time to feed the data warehouse and generate data marts by 58% Reduced average runtime by 24% for standard reports comfortably loading two data warehouse refreshes per day with incremental feeds Freed up technical experts by significantly minimizing time spent on tuning activities Most importantly this was one of the fastest project deployments in Swiss Re's history. They went from installation to production in just four months! What is truly surprising is the that it only took two weeks between power-on to testing the machine with full data volumes! Business teams at Swiss Re are now able to fully exploit up-to-date analytics across property, casualty, life, health insurance, and reinsurance lines to identify successful products. These points are highlighted in the following quotes from Dr. Stephan Gutzwiller, Head of Data Warehouse Services at Swiss Re:  "We were operating a complete Oracle stack, including servers, storage area network, operating systems, and databases that was well optimized and delivered very good performance over an extended period of time. When a hardware replacement was scheduled for 2012, Oracle Exadata was a natural choice—and the performance increase was impressive. It enabled us to deliver analytics to our internal customers faster, without hiring more IT staff" “The high quality data that is readily available with Oracle Exadata gives us the insight and agility we need to cater to client needs. We also can continue re-engineering to keep up with the increasing demand without having to grow the organization. This combination creates excellent business value.” Our full press release is available here: http://www.oracle.com/us/corporate/customers/customersearch/swiss-re-1-exadata-ss-2050409.html. If you want more information about how Exadata can increase the performance of your data warehouse visit our home page: http://www.oracle.com/us/products/database/exadata-database-machine/overview/index.html

    Read the article

  • Should testers approve releases, or just report on tests?

    - by Ernest Friedman-Hill
    Does it make sense to give signoff authority to testers? Should a test team Just test features, issues, etc, and simply report on a pass/fail basis, leaving it up to others to act on those results, or Have authority to hold up releases themselves based on those results? In other words, should testers be required to actually sign off on releases? The testing team I'm working with feels that they do, and we're having an issue with this because of "testing scope creep" -- the refusal to approve releases is sometimes based on issues explicitly not addressed by the release in question.

    Read the article

  • How long for data highlighter mark up to appear in structured data tool?

    - by Max
    I used the data highlighter in webmaster tools over 3 weeks ago to mark up some local business data, but there is still no structured data being detected in webmaster tools. Does any body have any experience on approx how long it takes for Google Webmaster Tools to start reporting Structured Data that has been marked up with their data highlighter? I'm asking specifically about reporting on it in Web Master Tools Structured Data section, as opposed to actually appearing in the SERPs.

    Read the article

  • EBS Seed Data Comparison Reports Now Available

    - by Steven Chan (Oracle Development)
    Earlier this year we released a reporting tool that reports on the differences in E-Business Suite database objects between one release and another.  That's a very useful reference, but EBS defaults are delivered as seed data within the database objects themselves. What about the differences in this seed data between one release and another? I'm pleased to announce the availability of a new tool that provides comparison reports of E-Business Suite seed data between EBS 11.5.10.2, 12.0.4, 12.0.6, 12.1.1, and 12.1.3.  This new tool complements the information in the data model comparison tool.  You can download the new seed data comparison tool here: EBS ATG Seed Data Comparison Report (Note 1327399.1) The EBS ATG Seed Data Comparison Report provides report on the changes between different EBS releases based upon the seed data changes delivered by the product data loader files (.ldt extension) based on EBS ATG loader control (.lct extension) files.  You can use this new tool to report on the differences in the following types of seed data: Concurrent Program definitions Descriptive Flexfield entity definitions Application Object Library profile option definitions Application Object Library (AOL) key flexfield, function, lookups, value set definitions Application Object Library (AOL) menu and responsibility definitions Application Object Library messages Application Object Library request set definitions Application Object Library printer styles definitions Report Manager / WebADI component and integrator entity definitions Business Intelligence Publisher (BI Publisher) entity definitions BIS Request Set Generator entity definitions ... and more Your feedback is welcomeThis new tool was produced by our hard-working EBS Release Management team, and they're actively seeking your feedback.  Please feel free to share your experiences with it by posting a comment here.  You can also request enhancements to this tool via the distribution list address included in Note 1327399.1.Related Articles Oracle E-Business Suite Release 12.1.3 Now Available New Whitepaper: Upgrading EBS 11i Forms + OA Framework Personalizations to EBS 12 EBS 12.0 Minimum Requirements for Extended Support Finalized Five Key Resources for Upgrading to E-Business Suite Release 12 E-Business Suite Release 12.1.1 Consolidated Upgrade Patch 1 Now Available New Whitepaper: Planning Your E-Business Suite Upgrade from Release 11i to 12.1

    Read the article

  • Library to fake intermittent failures according to tester-defined policy?

    - by crosstalk
    I'm looking for a library that I can use to help mock a program component that works only intermittently - usually, it works fine, but sometimes it fails. For example, suppose I need to read data from a file, and my program has to avoid crashing or hanging when a read fails due to a disk head crash. I'd like to model that by having a mock data reader function that returns mock data 90% of the time, but hangs or returns garbage otherwise. Or, if I'm stress-testing my full program, I could turn on debugging code in my real data reader module to make it return real data 90% of the time and hang otherwise. Now, obviously, in this particular example I could just code up my mock manually to test against a random() routine. However, I was looking for a system that allows implementing any failure policy I want, including: Fail randomly 10% of the time Succeed 10 times, fail 4 times, repeat Fail semi-randomly, such that one failure tends to be followed by a burst of more failures Any policy the tester wants to define Furthermore, I'd like to be able to change the failure policy at runtime, using either code internal to the program under test, or external knobs or switches (though the latter can be implemented with the former). In pig-Java, I'd envision a FailureFaker interface like so: interface FailureFaker { /** Return true if and only if the mocked operation succeeded. Implementors should override this method with versions consistent with their failure policy. */ public boolean attempt(); } And each failure policy would be a class implementing FailureFaker; for example there would be a PatternFailureFaker that would succeed N times, then fail M times, then repeat, and a AlwaysFailFailureFaker that I'd use temporarily when I need to simulate, say, someone removing the external hard drive my data was on. The policy could then be used (and changed) in my mock object code like so: class MyMockComponent { FailureFaker faker; public void doSomething() { if (faker.attempt()) { // ... } else { throw new RuntimeException(); } } void setFailurePolicy (FailureFaker policy) { this.faker = policy; } } Now, this seems like something that would be part of a mocking library, so I wouldn't be surprised if it's been done before. (In fact, I got the idea from Steve Maguire's Writing Solid Code, where he discusses this exact idea on pages 228-231, saying that such facilities were common in Microsoft code of that early-90's era.) However, I'm only familiar with EasyMock and jMockit for Java, and neither AFAIK have this function, or something similar with different syntax. Hence, the question: Do such libraries as I've described above exist? If they do, where have you found them useful? If you haven't found them useful, why not?

    Read the article

  • MEF, IServiceProvider and Testing Visual Studio Extensions

    - by Daniel Cazzulino
    In the latest and greatest version of Visual Studio, MEF plays a critical role, one that makes extending VS much more fun than it ever was. So typically, you just [Export] something, and then someone [Import]s it and that's it. MEF in all its glory kicks in and gets all your dependencies satisfied. Cool, you say, so let's now import ITextTemplating and have some T4-based codegen going! Ah, if only it was that easy. Turns out by default, none of the VS built-in services are exposed to MEF, apparently because there wasn't enough time to analyze the lifetime, initialization, dependencies, etc. for each one before launch, which makes perfect sense. You don't want to blindly export everything now just in case. There's also the whole VS package initialization thing which in this version of VS is not so transparently integrated with the MEF publishing side (i.e. a MEF export from a package can get instantiated before its owning package, and in fact, the package can remain unloaded forever and the export will continue to be visible to anyone)....Read full article

    Read the article

  • Why IBM DB2 DBAs Love Load Testing

    A load test gives the database administrator quite a lot of valuable information and may make the difference between poor and acceptable application performance. Here are some proactive tips to make your IBM DB2 production implementation a success.

    Read the article

  • Returning JsonResult From ASP.NET MVC 2.0 Controller and Unit Testing

    This post will show how to return a simple Json result from an ASP.NET MVC 2.0 web project.  It will show how to test that result inside a unit test and essentially pick apart the Json, just like a JavaScript (or other client) would do.  It seems like it should be very simple (and indeed, [...]...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

    Read the article

  • Returning JsonResult From ASP.NET MVC 2.0 Controller and Unit Testing

    This post will show how to return a simple Json result from an ASP.NET MVC 2.0 web project.  It will show how to test that result inside a unit test and essentially pick apart the Json, just... This site is a resource for asp.net web programming. It has examples by Peter Kellner of techniques for high performance programming...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

    Read the article

  • MapRedux - PowerShell and Big Data

    - by Dittenhafer Solutions
    MapRedux – #PowerShell and #Big Data Have you been hearing about “big data”, “map reduce” and other large scale computing terms over the past couple of years and been curious to dig into more detail? Have you read some of the Apache Hadoop online documentation and unfortunately concluded that it wasn't feasible to setup a “test” hadoop environment on your machine? More recently, I have read about some of Microsoft’s work to enable Hadoop on the Azure cloud. Being a "Microsoft"-leaning technologist, I am more inclinded to be successful with experimentation when on the Windows platform. Of course, it is not that I am "religious" about one set of technologies other another, but rather more experienced. Anyway, within the past couple of weeks I have been thinking about PowerShell a bit more as the 2012 PowerShell Scripting Games approach and it occured to me that PowerShell's support for Windows Remote Management (WinRM), and some other inherent features of PowerShell might lend themselves particularly well to a simple implementation of the MapReduce framework. I fired up my PowerShell ISE and started writing just to see where it would take me. Quite simply, the ScriptBlock feature combined with the ability of Invoke-Command to create remote jobs on networked servers provides much of the plumbing of a distributed computing environment. There are some limiting factors of course. Microsoft provided some default settings which prevent PowerShell from taking over a network without administrative approval first. But even with just one adjustment, a given Windows-based machine can become a node in a MapReduce-style distributed computing environment. Ok, so enough introduction. Let's talk about the code. First, any machine that will participate as a remote "node" will need WinRM enabled for remote access, as shown below. This is not exactly practical for hundreds of intended nodes, but for one (or five) machines in a test environment it does just fine. C:> winrm quickconfig WinRM is not set up to receive requests on this machine. The following changes must be made: Set the WinRM service type to auto start. Start the WinRM service. Make these changes [y/n]? y Alternatively, you could take the approach described in the Remotely enable PSRemoting post from the TechNet forum and use PowerShell to create remote scheduled tasks that will call Enable-PSRemoting on each intended node. Invoke-MapRedux Moving on, now that you have one or more remote "nodes" enabled, you can consider the actual Map and Reduce algorithms. Consider the following snippet: $MyMrResults = Invoke-MapRedux -MapReduceItem $Mr -ComputerName $MyNodes -DataSet $dataset -Verbose Invoke-MapRedux takes an instance of a MapReduceItem which references the Map and Reduce scriptblocks, an array of computer names which are the remote nodes, and the initial data set to be processed. As simple as that, you can start working with concepts of big data and the MapReduce paradigm. Now, how did we get there? I have published the initial version of my PsMapRedux PowerShell Module on GitHub. The PsMapRedux module provides the Invoke-MapRedux function described above. Feel free to browse the underlying code and even contribute to the project! In a later post, I plan to show some of the inner workings of the module, but for now let's move on to how the Map and Reduce functions are defined. Map Both the Map and Reduce functions need to follow a prescribed prototype. The prototype for a Map function in the MapRedux module is as follows. A simple scriptblock that takes one PsObject parameter and returns a hashtable. It is important to note that the PsObject $dataset parameter is a MapRedux custom object that has a "Data" property which offers an array of data to be processed by the Map function. $aMap = { Param ( [PsObject] $dataset ) # Indicate the job is running on the remote node. Write-Host ($env:computername + "::Map"); # The hashtable to return $list = @{}; # ... Perform the mapping work and prepare the $list hashtable result with your custom PSObject... # ... The $dataset has a single 'Data' property which contains an array of data rows # which is a subset of the originally submitted data set. # Return the hashtable (Key, PSObject) Write-Output $list; } Reduce Likewise, with the Reduce function a simple prototype must be followed which takes a $key and a result $dataset from the MapRedux's partitioning function (which joins the Map results by key). Again, the $dataset is a MapRedux custom object that has a "Data" property as described in the Map section. $aReduce = { Param ( [object] $key, [PSObject] $dataset ) Write-Host ($env:computername + "::Reduce - Count: " + $dataset.Data.Count) # The hashtable to return $redux = @{}; # Return Write-Output $redux; } All Together Now When everything is put together in a short example script, you implement your Map and Reduce functions, query for some starting data, build the MapReduxItem via New-MapReduxItem and call Invoke-MapRedux to get the process started: # Import the MapRedux and SQL Server providers Import-Module "MapRedux" Import-Module “sqlps” -DisableNameChecking # Query the database for a dataset Set-Location SQLSERVER:\sql\dbserver1\default\databases\myDb $query = "SELECT MyKey, Date, Value1 FROM BigData ORDER BY MyKey"; Write-Host "Query: $query" $dataset = Invoke-SqlCmd -query $query # Build the Map function $MyMap = { Param ( [PsObject] $dataset ) Write-Host ($env:computername + "::Map"); $list = @{}; foreach($row in $dataset.Data) { # Write-Host ("Key: " + $row.MyKey.ToString()); if($list.ContainsKey($row.MyKey) -eq $true) { $s = $list.Item($row.MyKey); $s.Sum += $row.Value1; $s.Count++; } else { $s = New-Object PSObject; $s | Add-Member -Type NoteProperty -Name MyKey -Value $row.MyKey; $s | Add-Member -type NoteProperty -Name Sum -Value $row.Value1; $list.Add($row.MyKey, $s); } } Write-Output $list; } $MyReduce = { Param ( [object] $key, [PSObject] $dataset ) Write-Host ($env:computername + "::Reduce - Count: " + $dataset.Data.Count) $redux = @{}; $count = 0; foreach($s in $dataset.Data) { $sum += $s.Sum; $count += 1; } # Reduce $redux.Add($s.MyKey, $sum / $count); # Return Write-Output $redux; } # Create the item data $Mr = New-MapReduxItem "My Test MapReduce Job" $MyMap $MyReduce # Array of processing nodes... $MyNodes = ("node1", "node2", "node3", "node4", "localhost") # Run the Map Reduce routine... $MyMrResults = Invoke-MapRedux -MapReduceItem $Mr -ComputerName $MyNodes -DataSet $dataset -Verbose # Show the results Set-Location C:\ $MyMrResults | Out-GridView Conclusion I hope you have seen through this article that PowerShell has a significant infrastructure available for distributed computing. While it does take some code to expose a MapReduce-style framework, much of the work is already done and PowerShell could prove to be the the easiest platform to develop and run big data jobs in your corporate data center, potentially in the Azure cloud, or certainly as an academic excerise at home or school. Follow me on Twitter to stay up to date on the continuing progress of my Powershell MapRedux module, and thanks for reading! Daniel

    Read the article

< Previous Page | 88 89 90 91 92 93 94 95 96 97 98 99  | Next Page >