Search Results

Search found 79 results on 4 pages for 'dataflow'.

Page 3/4 | < Previous Page | 1 2 3 4  | Next Page >

  • SSISDB Analysis Script on Gist

    - by Davide Mauri
    I've created two simple, yet very useful, script to extract some useful data to quickly monitor SSIS packages execution in SQL Server 2012 and after.get-ssis-execution-status  get-ssis-data-pumped-rows  I've started to use gist since it comes very handy, for this "quick'n'dirty" scripts and snippets, and you can find the above scripts and others (hopefully the number will increase over time...I plan to use gist to store all the code snippet I used to store in a dedicated folder on my machine) there.Now, back to the aforementioned scripts. The first one ("get-ssis-execution-status") returns a list of all executed and executing packages along with latest successful and running executions (so that on can have an idea of the expected run time)error messageswarning messages related to duplicate rows found in lookupsthe second one ("get-ssis-data-pumped-rows") returns information on DataFlows status. Here there's something interesting, IMHO. Nothing exceptional, let it be clear, but nonetheless useful: the script extract information on destinations and row sent to destinations right from the messages produced by the DataFlow component. This helps to quickly understand how many rows as been sent and where...without having to increase the logging level.Enjoy! PSI haven't tested it with SQL Server 2014, but AFAIK they should work without problems. Of course any feedback on this is welcome. 

    Read the article

  • Modelling work-flow plus interaction with a database - quick and accessible options

    - by cjmUK
    I'm wanting to model a (proposed) manufacturing line, with specific emphasis on interaction with a traceability database. That is, various process engineers have already mapped the manufacturing process - I'm only interested in the various stations along the line that have to talk to the DB. The intended audience is a mixture of project managers, engineers and IT people - the purpose is to identify: points at which the line interacts with the DB (perhaps going so far as indicating the Store Procs called at each point, perhaps even which parameters are passed.) the communication source (PC/Handheld device/PLC) the communication medium (wireless/fibre/copper) control flow (if leak test fails, unit is diverted to repair station) Basically, the model will be used as a focus different groups on outstanding tasks; for example, I'm interested in the DB and any front-end app needed, process engineers need to be thinking about the workflow and liaising with the PLC suppliers, the other IT guys need to make sure we have the hardware and comms in place. Obviously I could just improvise in Visio, but I was wondering if there was a particular modelling technique that might particularly suit my needs or my audience. I'm thinking of a visual model with supporting documentation (as little as possible, as much as is necessary). Clearly, I don't want something that will take me ages to (effectively) learn, nor one that will alienate non-technical members of the project team. So far I've had brief looks at BPMN, EPC Diagrams, standard Flow Diagrams... and I've forgotten most of what I used to know about UML... And I'm not against picking and mixing... as long as it is quick, clear and effective. Conclusion: In the end, I opted for a quasi-workflow/dataflow diagram. I mapped out the parts of the manufacturing process that interact with the traceability DB, and indicated in a significantly-simplified form, the data flows and DB activity. Alongside which, I have a supporting document which outlines each process, the data being transacted for each process (a 'data dictionary' of sorts) and details of hardware and connectivity required. I can't decide whether is a product of genius or a crime against established software development practices, but I do think that is will hit the mark for this particular audience.

    Read the article

  • luntbuild + maven + findbugs = OutOfMemoryException

    - by Johannes
    Hi, I've been trying to get Luntbuild to generate and publish a project site for our project including a Findbugs report. All other reports (Cobertura, Surefire, JavaDoc, Dashboard) work fine, but Findbugs bails out with an OutOfMemoryException. Excluding findbugs from report generation fixes the build --- although obviously without a Findbugs report. The funny thing is that I first encountered this problem locally and solved it by setting MAVEN_OPTS=-Xmx512m. This does not seem to be enough in Luntbuild, however: setting that exact same option as an environment variable of my builder doesn't make a difference. I've found a couple of posts on the 'net stating you should also add -XX:MaxPermSize=512m to MAVEN_OPTS and/or pass -Dmaven.findbugs.jvmargs=-Xmx512m to mvn.bat. None of these (or their combination) seem to help though so any hints would be greatly appreciated! Cheers, Johannes Relevant information: Luntbuild is 1.5.6, Maven is 2.1.0, findbugs-maven-plugin is 2.0.1. This is the Findbugs section of the relevant pom.xml: <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>findbugs-maven-plugin</artifactId> <version>2.0.1</version> </plugin> This is the head of my build log: User "luntbuild" started the build Perform checkout operation for VCS setting: Vcs name: Subversion Repository url base: http://some.repository.com/repo/ Repository layout: multiple Directory for trunk: trunk Directory for branches: branches Directory for tags: tags Username: xxxx Password:xxxx Web interface: ViewVC URL to web interface: http://some.repository.com/repo/ Quiet period: modules: Source path: somepath, Branch: , Label: , Destination path: somewhere Source path: somepath, Branch: somewhere1.0.x, Label: , Destination path: somewhere-1.0.x Source path: somepath, Branch: somewhere1.1.x, Label: , Destination path: somewhere-1.1.x Update url: http://some.repository.com/repo//trunk Duration of the checkout operation: 0 minutes Perform build with builder setting: Builder name: default Builder type: Maven2 builder Command to run Maven2: "C:\maven\apache-maven-2.1.0\bin\mvn.bat" -e -f somewhere\pom.xml -P site -Dmaven.test.skip=false -DbuildDate="Tue Nov 24 11:13:24 CET 2009" -DbuildVersion="site-core138" -Dsvn.username=xxxx -Dsvn.password=xxxx -DstagingSiteURL=file:///C:/luntbuild/core-reports -Dmaven.findbugs.jvmargs=-Xmx512m Directory to run Maven2 in: Goals to build: site:stage site:stage-deploy Build properties: buildVersion="site-core138" artifactsDir="C:\\Program Files\\Luntbuild\\publish\\somewhere\\site-core\\site-core138\\artifacts" buildDate="Tue Nov 24 11:13:24 CET 2009" junitHtmlReportDir="" Environment variables: MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=512m" Build success condition: result==0 and builderLogContainsLine("INFO","BUILD SUCCESSFUL") Execute command: Executing 'C:\maven\apache-maven-2.1.0\bin\mvn.bat' with arguments: '-e' '-f' 'somewhere\pom.xml' '-P' 'site' '-Dmaven.test.skip=false' '-DbuildDate=Tue Nov 24 11:13:24 CET 2009' '-DbuildVersion=site-core138' '-Dsvn.username=xxxxxx' '-Dsvn.password=xxxxxx' '-DstagingSiteURL=file:///C:/luntbuild/reports' '-Dmaven.findbugs.jvmargs=-Xmx512m' '-DbuildVersion=site-core138' '-DartifactsDir=C:\\Program Files\\Luntbuild\\publish\\somewhere\\site-core\\site-core138\\artifacts' '-DbuildDate=Tue Nov 24 11:13:24 CET 2009' '-X' 'site:stage' 'site:stage-deploy' Tail of my build log: Analyzed: C:\luntbuild\somewhere-work\somewhere\...\SomeClass.class ... Analyzed: C:\luntbuild\somewhere-work\somewhere\...\target\classes Aux: C:\luntbuild\somewhere-work\somewhere\...\target\classes Aux: c:\maven\local-repo\...\somejar-1.1.1.1-SNAPSHOT.jar Aux: c:\maven\local-repo\commons-lang\commons-lang\2.3\commons-lang-2.3.jar .... Aux: c:\maven\local-repo\org\openoffice\ridl\3.1.0\ridl-3.1.0.jar Aux: c:\maven\local-repo\org\openoffice\unoil\3.1.0\unoil-3.1.0.jar [INFO] ------------------------------------------------------------------------ [ERROR] FATAL ERROR [INFO] ------------------------------------------------------------------------ [INFO] Java heap space [INFO] ------------------------------------------------------------------------ [DEBUG] Trace java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.(HashMap.java:209) at edu.umd.cs.findbugs.ba.type.TypeAnalysis$CachedExceptionSet.(TypeAnalysis.java:114) at edu.umd.cs.findbugs.ba.type.TypeAnalysis.getCachedExceptionSet(TypeAnalysis.java:688) at edu.umd.cs.findbugs.ba.type.TypeAnalysis.computeThrownExceptionTypes(TypeAnalysis.java:439) at edu.umd.cs.findbugs.ba.type.TypeAnalysis.transfer(TypeAnalysis.java:411) at edu.umd.cs.findbugs.ba.type.TypeAnalysis.transfer(TypeAnalysis.java:89) at edu.umd.cs.findbugs.ba.Dataflow.execute(Dataflow.java:356) at edu.umd.cs.findbugs.classfile.engine.bcel.TypeDataflowFactory.analyze(TypeDataflowFactory.java:82) at edu.umd.cs.findbugs.classfile.engine.bcel.TypeDataflowFactory.analyze(TypeDataflowFactory.java:44) at edu.umd.cs.findbugs.classfile.impl.AnalysisCache.analyzeMethod(AnalysisCache.java:331) at edu.umd.cs.findbugs.classfile.impl.AnalysisCache.getMethodAnalysis(AnalysisCache.java:281) at edu.umd.cs.findbugs.classfile.engine.bcel.CFGFactory.analyze(CFGFactory.java:173) at edu.umd.cs.findbugs.classfile.engine.bcel.CFGFactory.analyze(CFGFactory.java:64) at edu.umd.cs.findbugs.classfile.impl.AnalysisCache.analyzeMethod(AnalysisCache.java:331) at edu.umd.cs.findbugs.classfile.impl.AnalysisCache.getMethodAnalysis(AnalysisCache.java:281) at edu.umd.cs.findbugs.ba.ClassContext.getMethodAnalysis(ClassContext.java:937) at edu.umd.cs.findbugs.ba.ClassContext.getMethodAnalysisNoDataflowAnalysisException(ClassContext.java:921) at edu.umd.cs.findbugs.ba.ClassContext.getCFG(ClassContext.java:326) at edu.umd.cs.findbugs.detect.BuildUnconditionalParamDerefDatabase.analyzeMethod(BuildUnconditionalParamDerefDatabase.java:103) at edu.umd.cs.findbugs.detect.BuildUnconditionalParamDerefDatabase.considerMethod(BuildUnconditionalParamDerefDatabase.java:93) at edu.umd.cs.findbugs.detect.BuildUnconditionalParamDerefDatabase.visitClassContext(BuildUnconditionalParamDerefDatabase.java:79) at edu.umd.cs.findbugs.DetectorToDetector2Adapter.visitClass(DetectorToDetector2Adapter.java:68) at edu.umd.cs.findbugs.FindBugs2.analyzeApplication(FindBugs2.java:971) at edu.umd.cs.findbugs.FindBugs2.execute(FindBugs2.java:222) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:86) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:230) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:912) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:756) [INFO] ------------------------------------------------------------------------ [INFO] Total time: 17 minutes 16 seconds [INFO] Finished at: Tue Nov 24 11:31:23 CET 2009 [INFO] Final Memory: 70M/127M [INFO] ------------------------------------------------------------------------ Maven2 builder failed: build success condition not met! Note that apparently maven only uses 70MB... but that probably doesn't mean anything since the Findbugs plugin forks its own process.

    Read the article

  • What is Silverlight's relationship -- if any -- to WPF?

    - by xarzu
    I was working with a WPF application and I decided that the controls and graphics I wanted to display on the grid might look better if it was a silverlight component. I thought this way because of all the cool silverlight controls that look very flash-like. But now that I have gottem my Visual Studio 2010 set up with SIlverlight, it seems that every silverlight app I can make are ASP.NET in nature. It seems that instead of a cool GUI control to make, Silverlight is telling me that it is primarely a dataflow sort of application for the web. What is the relationship, if any, between WPF and Silverlight. Can I or can I not put a silverlight control into my existing WPF application?

    Read the article

  • Convert date from access to SQL Server with SSIS

    - by Arne
    Hi, I want to convert a database from access to SQL Server using SSIS. I cannot convert the date/time columns of the access db. SSIS says something like: conversion between DT_Date and DT_DBTIMESTAMP is not supported. (Its translated from my German version, might be different in English version). In Access I have Date/Time column, in SQL Server I have datetime. In the dataflow chart of the SSIS I have a OLE DB source for the access db, an sql server target and a data conversion. In the data conversion I convert the columns to date[DT_DATE]. They are connected like this: AccessDB -> conversion -> SQL DB What am I doing wrong? How can I convert the Access date columns to SQL Server date columns?

    Read the article

  • SSIS - Limiting Concurrent Connections

    - by Bigtoe
    Hi Folks, I am using SSIS to connect to a legecy mainframe database and this allows only 5 concurrent connections at a time. I have a dataflow task with many tables to transfer and it kicks outs because of this limitation. I have split up the Data Flow task into seperate data flows and this is working for the moment, but it is not optiomal as they need to be sequenced and 1 large transfer in a flow is holding up subsequent transfers. Anyone any idea of how to limit the number of connections in a single data flow, I had a look at using the Engine Threads but this did not make any difference. Any help much appericated.

    Read the article

  • What frameworks exist for data subscription and update?

    - by Timothy Pratley
    There is one server with multiple clients. The clients are viewing subsets of the servers entire data. If the data that a client is viewing changes, the client should be informed of the changes so that it displays the current data. Example: Two clients are viewing a list of users in an administration screen. One client adds a new user to the list and modifies the permissions of another user. The other client sees the changes propagated to their view. In the client side code I would like the users list to be updated by the framework itself, raising changed events such that it will be redrawn - similar to 'cells' or dataflow. I am looking specifically for a .NET or java implementation.

    Read the article

  • How to delete rows based on comparison from Data Flow Task in an SSIS?

    - by vikasde
    I have a DataFlow task with two OLE DB Source objects. This is the SQL I want to achieve using SSIS: Insert into server2.db.dbo.[table2] (...) Select col1, col2, col3 ... from Server1.db.dbo.[table1] where [table1.col1] not in (Select col5 from server2.db.dbo.[table2] Where ...) I am pretty new to SSIS and not sure how to achieve this. I thought I could do this using the Data Flow task and populating the first source with the data from server1.db.dbo.table1 and the second source with server2.db.dbo.[table2] and then do the conditional check before inserting it into server2.db.dbo.[table2]. I am not sure how to do the conditional check though. Any help is appreciated.

    Read the article

  • Performance problems loading XML with SSIS, an alternative way!

    - by AtulThakor
    I recently needed to load several thousand XML files into a SQL database, I created an SSIS package which was created as followed: Using a foreach container to loop through a directory and load each file path into a variable, the “Import XML” dataflow would then load each XML file into a SQL table.       Running this, it took approximately 1 second to load each file which seemed a massive amount of time to parse the XML and load the data, speaking to my colleague Martin Croft, he suggested the use of T-SQL Bulk Insert and OpenRowset, so we adjusted the package as followed:     The same foreach container was used but instead the following SQL command was executed (this is an expression):     "INSERT INTO MyTable(FileDate) SELECT   CAST(bulkcolumn AS XML)     FROM OPENROWSET(         BULK         '" + @[User::CurrentFile]  + "',         SINGLE_BLOB ) AS x"     Using this method we managed to load approximately 20 records per second, much faster…for data loading! For what we wanted to achieve this was perfect but I’ll leave you with the following points when making your own decision on which solution you decide to choose!      Openrowset Method Much faster to get the data into SQL You’ll need to parse or create a view over the XML data to allow the data to be more usable(another post on this!) Not able to apply validation/transformation against the data when loading it The SQL Server service account will need permission to the file No schema validation when loading files SSIS Slower (in our case) Schema validation Allows you to apply transformations/joins to the data Permissions should be less of a problem Data can be loaded into the final form through the package When using a schema validation errors can fail the package (I’ll do another post on this)

    Read the article

  • Good approach for hundreds of comsumers and big files

    - by ????? ???????
    I have several files (nearly 1GB each) with data. Data is a string line. I need to process each of these files with several hundreds of consumers. Each of these consumers does some processing that differs from others. Consumers do not write anywhere concurrently. They only need input string. After processing they update their local buffers. Consumers can easily be executed in parallel. Important: With one specific file each consumer has to process all lines (without skipping) in correct order (as they appear in file). The order of processing different files doesn't matter. Processing of a single line by one consumer is comparably fast. I expect less than 50 microseconds on Corei5. So now I'm looking for the good approach to this problem. This is going to be be a part of a .NET project, so please let's stick with .NET only (C# is preferable). I know about TPL and DataFlow. I guess that the most relevant would be BroadcastBlock. But i think that the problem here is that with each line I'll have to wait for all consumers to finish in order to post the new one. I guess that it would be not very efficient. I think that ideally situation would be something like this: One thread reads from file and writes to the buffer. Each consumer, when it is ready, reads the line from the buffer concurrently and processes it. The entry from the buffer shouldn't be deleted as one consumer reads it. It can be deleted only when all consumers have processed it. TPL schedules consumer threads itself. If one consumer outperforms the others, it shouldn't wait and can read more recent entries from the buffer. Am i right with this kind of approach? Whether yes or not, how can i implement the good solution? A bit was already discussed on StackOverflow: link

    Read the article

  • Workflow 4.5 is Awesome, cant wait for 5.0!

    - by JoshReuben
    About 2 years ago I wrote a blog post describing what I would like to see in Workflow vnext: http://geekswithblogs.net/JoshReuben/archive/2010/08/25/workflow-4.0---not-there-yet.aspx At the time WF 4.0 was a little rough around the edges – the State Machine was on codeplex and people were simulating state machines with Flowcharts. Last year I built a near- realtime machine management system using WF 4.0.1 – its managing the internal operations of this device: http://landanano.com/products/commercial   Well WF 4.5 has come a long way – many of my gripes have been addressed: C# expressions - no more VB 'AndAlso' clauses state machine awesomeness - can query current state many designer improvements - Document Outline is so much more succinct than Designer! Separate WCF Service Contract interfaces and ability to generate activities from contract operations ability to rehydrate to updated flow definitions via DynamicUpdateMap and WorkflowIdentity you can read about the new features here: http://msdn.microsoft.com/en-us/library/hh305677(VS.110).aspx   2013 could be the year of Workflow evangelism for .NET, as it comes together as the DSL language. Eg on Azure it could be used to graphically orchestrate between WebRoles, WorkerRoles and AppFabric Queues and the ServiceBus – that would be grand.   Here’s a list of things I’d like to see in Workflow 5.0: Stronger Parallelism support for true multithreaded workflows . A Workflow executes on a single thread – wouldn’t it be great if we had the ability to model TPL DataFlow? Parallel is not really parallel, just allows AsyncCodeActivity.     support for recursion an ExpressionTree activity with an editor design surface a math activity pack return of application level protocol (3.51 WF services) – automatically expose a state machine as a WCF service with bookmark Receive activities generated from OperationContract automatically placed in state transition triggers. A new HTML5 ActivityDesigner control – support with different CSS3  skinnable hooks,  remote connectivity (had to roll my own) A data flow view – crucial to understanding the big picture Ability to refactor a Sequence to custom activity in a separate .xaml file – like Expression Blend does for UserControl state machine global error handling - if all states goto an error state, you quickly get visual spagetti. Now you could nest a state machine, but what if you want an application level protocol whereby each state exposes certain WCF ops. DSL RAD editing - Make the Document Outline into a DSL editor for adding activities  – For WF to really succeed as a higher level of abstraction, It needs to be more productive than raw coding - drag & drop on the designer is currently too slow compared to just typing code. Extensible Wizard API - for pluggable WF editor experience other execution models beyond Sequence, Flowchart & StateMachine: SSIS, Behavior Trees,  Wolfram Model tool – surprise us! improvements to Designer debugging API - SourceLocation is tied to XAML file line number and char position, and ModelService access seems convoluted - why not leverage WPF LogicalTreeHelper / VisualTreeHelper ? Workflow Team , keep on rocking!

    Read the article

  • Analysing SQLBits Feedback

    - by jamiet
    Earlier this week I received all the feedback that people offered on my session at SQLBits 7 in York – “SSIS Dataflow Performance Tuning” (the video is available online if you wish to see it). As you may have gathered from previous posts on this blog and my less-SQLy-focused Wordpress blog I am a big fan of collecting and tracking both personal and public data and session feedback lends itself very well to tracking because it is quantitative rather than qualitative; by that I mean attendees are invited to provide marks out of ten rather than (or, in the case of SQLBits, as well as) written comments. The SQLBits feedback is also useful because they use a consistent format – the same questions are asked each time – this means it is particularly easy to to track whether the scores that people give are trending up or down. I suspect that somewhere the SQLBits organisers have a big Analysis Services cube (ok, perhaps its an Excel pivot table) that allows them to analyse these scores per conference, speaker, track etc.… and there’s no reason that we as session speakers cannot do the same thing. To that end I have started to store my feedback in an Excel spreadsheet of my own which in the interests of transparency is available for public viewing (only a web browser required) on SkyDrive at http://cid-550f681dad532637.office.live.com/view.aspx/Public/Misc/Personal%20SQLBits%20Session%20Feedback.xlsx. I have used a pivot table to aggregate all that feedback and here is a screenshot: I am hereby making a public plea to the SQLBits organisers (on the off-chance that they are reading) to please continue to keep the feedback format consistent in the future and I encourage them to publish all of the feedback in an anonymised form. I would also encourage anyone doing conference speaking to track their conference feedback in the same way that I am doing so that you get an insight into whether or not you are improving over time. It is not difficult to setup and maintaining it as you do more sessions takes very little effort. Storing feedback data like this leads me to wider thoughts about well-known conventions and data format standardisation. Let’s imagine a utopia where there were a standard set of questions for capturing session feedback that were leveraged at every conference regardless of subject matter, location or culture; that would give rise to immense cross-conference and cross-discipline analysis – the data analyst in me goes giddy at the thought of it. It is scenarios like this that drive my interest both in data formats such as iCalendar, microformats and RDF, and in emerging movements such as the semantic web and linked data, all things which I have written about in the past. I don’t know whether we will ever reach the stage where every piece of data has structured, descriptive metadata associated with it but I live in hope. @Jamiet

    Read the article

  • Server nearly unusable when doing disk writes

    - by Wikser
    My question closely relates to my last question here on serverfault. I was copying about 5GB from a 10 year old desktop computer to the server. The copy was done in Windows Explorer. In this situation I would assume the server to be bored by the dataflow. But as usual with this server, it really slowed down. At least I could work with the remote session, even there was some serious latency. The copy took its time (20min?). In this time I went to a colleague and he tried to log in in the same server via remote desktop (for some other reason). It took about a minute to get to the login screen, a minute to open the control panel, a minute to open the performance monitor, ... Icons were loading maybe one per second. We saw the following (from memory): CPU: 2% Avg. Queue Length: 50 Pages/sec: 115 (?) There was no other considerable activity on the server. The server seldom serves some ASP.NET pages, which became also very slow in this time. The relevant configuration is as follows: Windows 2003 SEAGATE ST3500631NS (7200 rpm, 500 GB) LSI MegaRAID based RAID 5 4 disks, 1 hot spare Write Through No read-ahead Direct Cache Mode Harddisk-Cache-Mode: off Is this normal behaviour for such a configuration? What measurements could give further clues? Is it reasonable to reduce the priority of such copy I/O and favour other processes like the remote desktop? How would you do that? Many thanks!

    Read the article

  • Network using only switches

    - by mschultz
    So I'm not a network guy - but here's what I want to do - I have an existing network using wifi, which I like, and which is used to connect several computers to the internet. It is headed up by a router, which is in another part of the building. Three of those computers are in my office. All three have gigabit wired ethernet. I have a gigabit switch. Here's what I want to do: Build a 2nd network, out of just that switch, which allows all 3 computers to connect to each other (just to each other is fine, for this purpose, they need no internet). I have a distributed computing task (rendering high-quality fractal artwork, as it were), that requires the best connection speed to all 3 computers. I want them to be able to "talk to each other" as quickly as possible, with the fewest dropped packets (the dataflow over this network will be quite high). So how do I do this. I'm not a networking guy at all - I tried connecting them all, and nobody got an IP address (which I assume is because nobody is running a DNS server?). What all do I need to do to make this work? PS - two are running windows, one is running ubuntu.

    Read the article

  • Big Data – Buzz Words: What is MapReduce – Day 7 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned what is Hadoop. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – MapReduce. What is MapReduce? MapReduce was designed by Google as a programming model for processing large data sets with a parallel, distributed algorithm on a cluster. Though, MapReduce was originally Google proprietary technology, it has been quite a generalized term in the recent time. MapReduce comprises a Map() and Reduce() procedures. Procedure Map() performance filtering and sorting operation on data where as procedure Reduce() performs a summary operation of the data. This model is based on modified concepts of the map and reduce functions commonly available in functional programing. The library where procedure Map() and Reduce() belongs is written in many different languages. The most popular free implementation of MapReduce is Apache Hadoop which we will explore tomorrow. Advantages of MapReduce Procedures The MapReduce Framework usually contains distributed servers and it runs various tasks in parallel to each other. There are various components which manages the communications between various nodes of the data and provides the high availability and fault tolerance. Programs written in MapReduce functional styles are automatically parallelized and executed on commodity machines. The MapReduce Framework takes care of the details of partitioning the data and executing the processes on distributed server on run time. During this process if there is any disaster the framework provides high availability and other available modes take care of the responsibility of the failed node. As you can clearly see more this entire MapReduce Frameworks provides much more than just Map() and Reduce() procedures; it provides scalability and fault tolerance as well. A typical implementation of the MapReduce Framework processes many petabytes of data and thousands of the processing machines. How do MapReduce Framework Works? A typical MapReduce Framework contains petabytes of the data and thousands of the nodes. Here is the basic explanation of the MapReduce Procedures which uses this massive commodity of the servers. Map() Procedure There is always a master node in this infrastructure which takes an input. Right after taking input master node divides it into smaller sub-inputs or sub-problems. These sub-problems are distributed to worker nodes. A worker node later processes them and does necessary analysis. Once the worker node completes the process with this sub-problem it returns it back to master node. Reduce() Procedure All the worker nodes return the answer to the sub-problem assigned to them to master node. The master node collects the answer and once again aggregate that in the form of the answer to the original big problem which was assigned master node. The MapReduce Framework does the above Map () and Reduce () procedure in the parallel and independent to each other. All the Map() procedures can run parallel to each other and once each worker node had completed their task they can send it back to master code to compile it with a single answer. This particular procedure can be very effective when it is implemented on a very large amount of data (Big Data). The MapReduce Framework has five different steps: Preparing Map() Input Executing User Provided Map() Code Shuffle Map Output to Reduce Processor Executing User Provided Reduce Code Producing the Final Output Here is the Dataflow of MapReduce Framework: Input Reader Map Function Partition Function Compare Function Reduce Function Output Writer In a future blog post of this 31 day series we will explore various components of MapReduce in Detail. MapReduce in a Single Statement MapReduce is equivalent to SELECT and GROUP BY of a relational database for a very large database. Tomorrow In tomorrow’s blog post we will discuss Buzz Word – HDFS. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SSIS Debugging Tip: Using Data Viewers

    - by Jim Giercyk
    When you have an SSIS package error, it is often very helpful to see the data records that are causing the problem.  After all, if your input has 50,000 records and 1 of them has corrupt data, it can be a chore.  Your execution results will tell you which column contains the bad data, but not which record…..enter the Data Viewer. In this scenario I have created a truncation error.  The input length of [lastname] is 50, but the output table has a length of 15.  When it runs, at least one of the records causes the package to fail.     Now what?  We can tell from our execution results that there is a problem with [lastname], but we have no idea WHICH record?     Let’s identify the row that is actually causing the problem.  First, we grab the oft’ forgotten Row Count shape from our toolbar and connect it to the error output from our input query.  Remember that in order to intercept errors with the error output, you must redirect them.     The Row Count shape requires 1 integer variable.  For our purposes, we will not reference the variable, but it is still required in order for the package to run.  Typically we would use the variable to hold the number of rows in the table and refer back to it later in our process.  We are simply using the Row Count as a “Dead End” for errors.  I called my variable RowCounter.  To create a variable, with no shapes selected, right-click on the background and choose Variable.     Once we have setup the Row Count shape, we can right-click on the red line (error output) from the query, and select Data Viewers.  In the popup, we click the add button and we will see this:     There are other fancier options we can play with, but for now we just want to view the output in a grid.  WE select Grid, then click OK on all of the popup windows to shut them down.  We should now see a grid with a pair of glasses on the error output line.     So, we are ready to catch the error output in a grid and see that is causing the problem!  This time when we run the package, it does not fail because we directed the error to the Row Count.  We also get a popup window showing the error record in a grid.  If there were multiple errors we would see them all.     Indeed, the [lastname] column is longer than 15 characters.  Notice the last column in the grid, [Error Code – Description].  We knew this was a truncation error before we added the grid, but if you have worked with SSIS for any length of time, you know that some errors are much more obscure.  The description column can be very useful under those circumstances! Data viewers can be used any time we want to see the data that is actually in the pipeline;  they stop the package temporarily until we shut them.  Also remember that the Row Count shape can be used as a “Dead End”.  It is useful during development when we want to see the output from a dataflow, but don’t want to update a table or file with the data.  Data viewers are an invaluable tool for both development and debugging.  Just remember to REMOVE THEM before putting your package into production

    Read the article

  • Asynchrony in C# 5 (Part I)

    - by javarg
    I’ve been playing around with the new Async CTP preview available for download from Microsoft. It’s amazing how language trends are influencing the evolution of Microsoft’s developing platform. Much effort is being done at language level today than previous versions of .NET. In these post series I’ll review some major features contained in this release: Asynchronous functions TPL Dataflow Task based asynchronous Pattern Part I: Asynchronous Functions This is a mean of expressing asynchronous operations. This kind of functions must return void or Task/Task<> (functions returning void let us implement Fire & Forget asynchronous operations). The two new keywords introduced are async and await. async: marks a function as asynchronous, indicating that some part of its execution may take place some time later (after the method call has returned). Thus, all async functions must include some kind of asynchronous operations. This keyword on its own does not make a function asynchronous thought, its nature depends on its implementation. await: allows us to define operations inside a function that will be awaited for continuation (more on this later). Async function sample: Async/Await Sample async void ShowDateTimeAsync() {     while (true)     {         var client = new ServiceReference1.Service1Client();         var dt = await client.GetDateTimeTaskAsync();         Console.WriteLine("Current DateTime is: {0}", dt);         await TaskEx.Delay(1000);     } } The previous sample is a typical usage scenario for these new features. Suppose we query some external Web Service to get data (in this case the current DateTime) and we do so at regular intervals in order to refresh user’s UI. Note the async and await functions working together. The ShowDateTimeAsync method indicate its asynchronous nature to the caller using the keyword async (that it may complete after returning control to its caller). The await keyword indicates the flow control of the method will continue executing asynchronously after client.GetDateTimeTaskAsync returns. The latter is the most important thing to understand about the behavior of this method and how this actually works. The flow control of the method will be reconstructed after any asynchronous operation completes (specified with the keyword await). This reconstruction of flow control is the real magic behind the scene and it is done by C#/VB compilers. Note how we didn’t use any of the regular existing async patterns and we’ve defined the method very much like a synchronous one. Now, compare the following code snippet  in contrast to the previuous async/await: Traditional UI Async void ComplicatedShowDateTime() {     var client = new ServiceReference1.Service1Client();     client.GetDateTimeCompleted += (s, e) =>     {         Console.WriteLine("Current DateTime is: {0}", e.Result);         client.GetDateTimeAsync();     };     client.GetDateTimeAsync(); } The previous implementation is somehow similar to the first shown, but more complicated. Note how the while loop is implemented as a chained callback to the same method (client.GetDateTimeAsync) inside the event handler (please, do not do this in your own application, this is just an example).  How it works? Using an state workflow (or jump table actually), the compiler expands our code and create the necessary steps to execute it, resuming pending operations after any asynchronous one. The intention of the new Async/Await pattern is to let us think and code as we normally do when designing and algorithm. It also allows us to preserve the logical flow control of the program (without using any tricky coding patterns to accomplish this). The compiler will then create the necessary workflow to execute operations as the happen in time.

    Read the article

  • Scaling Literate Programming?

    - by Tetha
    Greetings. I have been looking at Literate Programming a bit now, and I do like the idea behind it: you basically write a little paper about your code and write down as much of the design decisions, the code probably surrounding the module, the inner workins of the module, assumptions and conclusions resulting from the design decisions, potential extension, all this can be written down in a nice way using tex. Granted, the first point: it is documentation. It must be kept up-to-date, but that should not be that bad, because your change should have a justification and you can write that down. However, how does Literate Programming Scale to a larger degree? Overall, Literate Programming is still just text. Very human readable text, of course, but still text, and thus, it is hard to follow large systems. For example, I reworked large parts of my compiler to use and some magic to chain compile steps together, because some "x.register_follower(y); y.register_follower(z); y.register_follower(a);..." got really unwieldy, and changing that to x y z a made it a bit better, even though this is at its breaking point, too. So, how does Literate Programming scale to larger systems? Does anyone try to do that? My thought would be to use LP to specify components that communicate with each other using event streams and chain all of these together using a subset of graphviz. This would be a fairly natural extension to LP, as you can extract a documentation -- a dataflow diagram -- from the net and also generate code from it really well. What do you think of it? -- Tetha.

    Read the article

  • What are best practices for managing related Cabal packages?

    - by Norman Ramsey
    I'm working on a dataflow-based optimization library written in Haskell. It now seems likely that the library is going to have to be split into two pieces: A core piece with minimal build dependencies; call it hoopl-core. A full piece, call it hoopl, which may have extra dependencies on packages like a prettyprinter, QuickCheck, and so on. The idea is that the Glasgow Haskell Compiler will depend only on hoopl-core, so that it won't be too difficult to bootstrap the compiler. Other compilers will get the extra goodies in hoopl. Package hoopl will depend on hoopl-core. The Debian package tools can build multiple packages from a single source tree. Unfortunately Cabal has not yet reached that level of sophistication. But there must be other library or application designers out there who have similar issues (e.g., one package for a core library, another for a command-line interface, another for a GUI interface). What are current best practices for building and managing multiple related Haskell packages using Cabal?

    Read the article

  • How to cope with null results in SQL Tasks that return single rows in SSIS 2005?

    - by JSacksteder
    In a dataflow task, I can slip a rowcount into the processing flow and place the count into a variable. I can later use that variable to conditionally perform some other work if the rowcount was 0. This works well for me, but I have no corresponding strategy for sql tasks expected to return a single row. In that event, I'm returning those values into variables. If the lookup produces no rows, the sql task fails when assigning values into those variables. I can branch on that component failing, but there's a side effect of that - if I'm running the job as a SQL server agent job step, the step returns DTSER_FAILURE, causing the step to fail. I can tell the sql agent to disregard the step failure, but then I won't know if I have a legitimate error in that step. This seems harder than it should be. The only strategy I can think of is to run the same query with a count(*) aggregate and test if that returns a number 0 and if so running the query again without the count. That's ugly because I have the same query in two places that I need to keep in sync. Is there a better way?

    Read the article

  • Uncommitted reads in SSIS

    - by OldBoy
    I'm trying to debug some legacy Integration Services code, and really want some confirmation on what I think the problem is: We have a very large data task inside a control flow container. This control flow container is set up with TransactionOption = supported - i.e. it will 'inherit' transactions from parent containers, but none are set up here. Inside the data flow there is a call to a stored proc that writes to a table with pseudo code something like: "If a record doesn't exist that matches these parameters then write it" Now, the issue is that there are three records being passed into this proc all with the same parameters, so logically the first record doesn't find a match and a record is created. The second record (with the same parameters) also doesn't find a match and another record is created. My understanding is that the first 'record' passed to the proc in the dataflow is uncommitted and therefore can't be 'read' by the second call. The upshot being that all three records create a row, when logically only the first should. In this scenario am I right in thinking that it is the uncommitted transaction that stops the second call from seeing the first? Even setting the isolation level on the container doesn't help because it's not being wrapped in a transaction anyway.... Hope that makes sense, and any advice gratefully received. Work-arounds confer god-like status on you.

    Read the article

  • FairScheduling Conventions in Hadoop

    - by dan.mcclary
    While scheduling and resource allocation control has been present in Hadoop since 0.20, a lot of people haven't discovered or utilized it in their initial investigations of the Hadoop ecosystem. We could chalk this up to many things: Organizations are still determining what their dataflow and analysis workloads will comprise Small deployments under tests aren't likely to show the signs of strains that would send someone looking for resource allocation options The default scheduling options -- the FairScheduler and the CapacityScheduler -- are not placed in the most prominent position within the Hadoop documentation. However, for production deployments, it's wise to start with at least the foundations of scheduling in place so that you can tune the cluster as workloads emerge. To do that, we have to ask ourselves something about what the off-the-rack scheduling options are. We have some choices: The FairScheduler, which will work to ensure resource allocations are enforced on a per-job basis. The CapacityScheduler, which will ensure resource allocations are enforced on a per-queue basis. Writing your own implementation of the abstract class org.apache.hadoop.mapred.job.TaskScheduler is an option, but usually overkill. If you're going to have several concurrent users and leverage the more interactive aspects of the Hadoop environment (e.g. Pig and Hive scripting), the FairScheduler is definitely the way to go. In particular, we can do user-specific pools so that default users get their fair share, and specific users are given the resources their workloads require. To enable fair scheduling, we're going to need to do a couple of things. First, we need to tell the JobTracker that we want to use scheduling and where we're going to be defining our allocations. We do this by adding the following to the mapred-site.xml file in HADOOP_HOME/conf: <property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.FairScheduler</value> </property> <property> <name>mapred.fairscheduler.allocation.file</name> <value>/path/to/allocations.xml</value> </property> <property> <name>mapred.fairscheduler.poolnameproperty</name> <value>pool.name</value> </property> <property> <name>pool.name</name> <value>${user.name}</name> </property> What we've done here is simply tell the JobTracker that we'd like to task scheduling to use the FairScheduler class rather than a single FIFO queue. Moreover, we're going to be defining our resource pools and allocations in a file called allocations.xml For reference, the allocation file is read every 15s or so, which allows for tuning allocations without having to take down the JobTracker. Our allocation file is now going to look a little like this <?xml version="1.0"?> <allocations> <pool name="dan"> <minMaps>5</minMaps> <minReduces>5</minReduces> <maxMaps>25</maxMaps> <maxReduces>25</maxReduces> <minSharePreemptionTimeout>300</minSharePreemptionTimeout> </pool> <mapreduce.job.user.name="dan"> <maxRunningJobs>6</maxRunningJobs> </user> <userMaxJobsDefault>3</userMaxJobsDefault> <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout> </allocations> In this case, I've explicitly set my username to have upper and lower bounds on the maps and reduces, and allotted myself double the number of running jobs. Now, if I run hive or pig jobs from either the console or via the Hue web interface, I'll be treated "fairly" by the JobTracker. There's a lot more tweaking that can be done to the allocations file, so it's best to dig down into the description and start trying out allocations that might fit your workload.

    Read the article

  • How can we implement change notification propagation for WPF and SL in the MVVM pattern?

    - by Firoso
    Here's an example scenario targetting MVVM WPF/SL development: View data binds to view model Property "Target" "Target" exposes a field of an object called "data" that exists in the local application model, called "Original" when "Original" changes, it should raise notification to the view model and then propogate that change notification to the View. Here are the solutions I've come up with, but I don't like any of them all that much. I'm looking for other ideas, by the time we come up with something rock solid I'm certain Microsoft will have released .NET 5 with WPF/SL extensions for better tools for MVVM development. For now the question is, "What have you done to solve this problem and how has it worked out for you?" Option 1. Proposal: Attach a handler to data's PropertyChanged event that watches for string values of properties it cares about that might have changed, and raises the appropriate notification. Why I don't like it: Changes don't bubble naturally, objects must be explicitly watched, if data changes to a new source, events must be un-registered/registered. Why I kind of like it: I get explicit control over propogation of changes, and I don't have to use any types that belong at a higher level of the application such as dependancy properties. Option 2. Proposal: Attach a handler to data's PropertyChanged event that re-raises the event across all properties using the name property name. Why I don't like it: This is essentially the same as option 1, but less intelligent, and forces me to never change my property names, as they have to be the same as the property names on data Why I kind of like it: It's very easy to set up and I don't have to think about it... Then again if I try to think, and change names to things that make sense, I shoot myself in the foot, and then I have to think about it! Option 3. Proposal: Inherit my view model from dependancy object, and notify binding sources of changes directly. Why I don't like it: I'm not even 100% sure dependancy properties/objects can DO this, it was just a thought to look into. Also I don't personally feel that WPF/SL types like Dep Obj belong at the view model level. Why I kind of like it: IF it has the capability that I'm seeking then it's a good answer! minus that pesky layering issue. Option 4. Proposal: Use a consistant agent messaging system based off of Task Parallels DataFlow Library to propogate everything through linked pipelining. Why I don't like it: I've never tried this, and somehow I think it will be lacking, plus it requires me to think about my code completely differently all the way around. Why I kind of like it: It has the possiblity of allowing me to do some VERY fun manipulations with a push based data model and using ActionBlocks as validation AND setters to then privately change view model properties and explicitly control PropertyChanged notifications.

    Read the article

  • Advice for Architecture Design Logic for software application

    - by Prasad
    Hi, I have a framework of basic to complex set of objects/classes (C++) running into around 500. With some rules and regulations - all these objects can communicate with each other and hence can cover most of the common queries in the domain. My Dream: I want to provide these objects as icons/glyphs (as I learnt recently) on a workspace. All these objects can be dragged/dropped into the workspace. They have to communicate only through their methods(interface) and in addition to few iterative and conditional statements. All these objects are arranged finally to execute a protocol/workflow/dataflow/process. After drawing the flow, the user clicks the Execute/run button. All the user interaction should be multi-touch enabled. The best way to show my dream is : Jeff Han's Multitouch Video. consider Jeff is playing with my objects instead of the google maps. :-) it should be like playing a jigsaw puzzle. Objective: how can I achieve the following while working on this final product: a) the development should be flexible to enable provision for web services b) the development should enable easy web application development c) The development should enable client-server architecture - d) further it should also enable mouse based drag/drop desktop application like Adobe programs etc. I mean to say: I want to economize on investments. Now I list my efforts till now in design : a) Created an Editor (VB) where the user writes (manually) the object / class code b) On Run/Execute, the code is copied into a main() function and passed to interpreter. c) Catch the output and show it in the console. The interpreter can be separated to become a server and the Editor can become the client. This needs lot of standard client-server architecture work. But some how I am not comfortable in the tightness of this system. Without interpreter is there much faster and better embeddable solution to this? - other than writing a special compiler for these objects. Recently learned about AXIS-C++ can help me - looks like - a friend suggested. Is that the way to go ? Here are my questions: (pl. consider me a self taught programmer and NOT my domain) a) From the stage of C++ objects to multi-touch product, how can I make sure I will develop the parallel product/service models as well.? What should be architecture aspects I should consider ? b) What technologies are best suited for this? c) If I am thinking of moving to Cloud Computing, how difficult/ how redundant / how unnecessary my efforts will be ? d) How much time in months would it take to get the first beta ? I take the liberty to ask if any of the experts here are interested in this project, please email me: [email protected] Thank you for any help. Looking forward.

    Read the article

  • 10 tape technology features that make you go hmm.

    - by Karoly Vegh
    A week ago an Oracle/StorageTek Tape Specialist, Christian Vanden Balck, visited Vienna, and agreed to visit customers to do techtalks and update them about the technology boom going around tape. I had the privilege to attend some of his sessions and noted the information and features that took the customers by surprise and made them think. Allow me to share the top 10: I. StorageTek as a brand: StorageTek is one of he strongest names in the Tape field. The brand itself was valued so much by customers that even after Sun Microsystems acquiring StorageTek and the Oracle acquiring Sun the brand lives on with all the Oracle tapelibraries are officially branded StorageTek.See http://www.oracle.com/us/products/servers-storage/storage/tape-storage/overview/index.html II. Disk information density limitations: Disk technology struggles with information density. You haven't seen the disk sizes exploding lately, have you? That's partly because there are physical limits on a disk platter. The size is given, the number of platters is limited, they just can't grow, and are running out of physical area to write to. Now, in a T10000C tape cartridge we have over 1000m long tape. There you go, you have got your physical space and don't need to stuff all that data crammed together. You can write in a reliable pattern, and have space to grow too. III. Oracle has a market share of 62% worldwide in recording head manufacturing. That's right. If you are running LTO drives, with a good chance you rely on StorageTek production. That's two out of three LTO recording heads produced worldwide.  IV. You can store 1 Exabyte data in a single tape library. Yes, an Exabyte. That is 1000 Petabytes. Or, a million Terabytes. A thousand million GigaBytes. You can store that in a stacked StorageTek SL8500 tapelibrary. In one SL8500 you can put 10.000 T10000C cartridges, that store 10TB data (compressed). You can stack 10 of these SL8500s together. Boom. 1000.000 TB.(n.b.: stacking means interconnecting the libraries. Yes, cartridges are moved between the stacked libraries automatically.)  V. EMC: 'Tape doesn't suck after all. We moved on.': Do you remember the infamous 'Tape sucks, move on' Datadomain slogan? Of course they had to put it that way, having only had disk products. But here's a fun fact: on the EMCWorld 2012 there was a major presence of a Tape-tech company - EMC, in a sudden burst of sanity is embracing tape again. VI. The miraculous T10000C: Oracle StorageTek has developed an enterprise-grade tapedrive and cartridge, the T10000C. With awesome numbers: The Cartridge: Native 5TB capacity, 10TB with compression Over a kilometer long tape within the cartridge. And it's locked when unmounted, no rattling of your data.  Replaced the metalparticles datalayer with BaFe (bariumferrite) - metalparticles lose around 7% of magnetism within 30 days. BaFe does not. Yes we employ solid-state physicists doing R&D on demagnetisation in our labs. Can be partitioned, storage tiering within the cartridge!  The Drive: 2GB Cache Encryption implemented in HW - no performance hit 252 MB/s native sustained data rate, beats disk technology by far. Not to mention peak throughput.  Leading the tape while never touching the data side of it, protecting your data physically too Data integritiy checking (CRC recalculation) on tape within the drive without having to read it back to the server reordering data from tape-order, delivering it back in application-order  writing 32 tracks at once, reading them back for CRC check at once VII. You only use 20% of your data on a regular basis. The rest 80% is just lying around for years. On continuously spinning disks. Doubly consuming energy (power+cooling), blocking diskstorage capacity. There is a solution called SAM (Storage Archive Manager) that provides you a filesystem unifying disk and tape, moving data on-demand and for clients transparently between the different storage tiers. You can share these filesystems with NFS or CIFS for clients, and enjoy the low TCO of tape. Tapes don't spin. They sit quietly in their slots, storing 10TB data, using no energy, producing no heat, automounted when a client accesses their data.See: http://www.oracle.com/us/products/servers-storage/storage/storage-software/storage-archive-manager/overview/index.html VIII. HW supported for three decades: Did you know that the original PowderHorn library was released in '87 and has been only discontinued in 2010? That is over two decades of supported operation. Tape libraries are - just like the data carrying on tapecartridges - built for longevity. Oh, and the T10000C cartridge has 30-year archival life for long-term retention.  IX. Tape is easy to manage: Have you heard of Tape Storage Analytics? It is a central graphical tool to summarize, monitor, analyze dataflow, health and performance of drives and libraries, see: http://www.oracle.com/us/products/servers-storage/storage/tape-storage/tape-analytics/overview/index.html X. The next generation: The T10000B drives were able to reuse the T10000A cartridges and write on them even more data. On the same cartridges. We call this investment protection, and this is very important for Oracle for the future too. We usually support two generations of cartridges together. The current drive is a T10000C. (...I know I promised to enlist 10, but I got still two more I really want to mention. Allow me to work around the problem: ) X++. The TallBots, the robots moving around the cartridges in the StorageTek library from tapeslots to the drives are cableless. Cables, belts, chains running to moving parts in a library cause maintenance downtimes. So StorageTek eliminated them. The TallBots get power, commands, even firmwareupgrades through the rails they are running on. Also, the TallBots don't just hook'n'pull the tapes out of their slots, they actually grip'n'lift them out. No friction, no scratches, no zillion little plastic particles floating around in the library, in the drives, on your data. (X++)++: Tape beats SSDs and Disks. In terms of throughput (252 MB/s), in terms of TCO: disks cause around 290x more power and cooling, in terms of capacity: 10TB on a single media and soon more.  So... do you need to store large amounts of data? Are you legally bound to archive it for dozens of years? Would you benefit from automatic storage tiering? Have you got large mediachunks to be streamed at times? Have you got power and cooling issues in the growing datacenters? Do you find EMC's 180° turn of tape attitude interesting, but appreciate it at the same time? With all that, you aren't alone. The most data on this planet is stored on tape. Tape is coming. Big time.

    Read the article

< Previous Page | 1 2 3 4  | Next Page >