Search Results

Search found 61393 results on 2456 pages for 'data integration'.

Page 3/2456 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Integration Testing an Entire *Existing* Application (w/ automatic execution of test suite)

    - by Ev
    Hi there, I have just joined a team working on an existing Java web app. I have been tasked with creating an automated integration test suite that should run when developers commit to our continuous integration server (TeamCity), which automatically deploys to our staging server - so really the tests will be run against our staging web app server. I have read a lot of stuff about automated integration testing with frameworks like Watir, Selenium and RWebSpec. I have created tests in all of these and while I prefer Watir, I am open to anything. The thing that hasn't become clear to me is how to create an entire test suite for an application, and how to have that suite execute in it's entirety upon execution of some script. I can happily create individual tests of varying complexity, but there is a gap in my knowledge about how to tie everything together into something useful. Does anyone have any advice on how to create a full test suite and have it execute automatically? Thanks!

    Read the article

  • Why Oracle Data Integrator for Big Data?

    - by Mala Narasimharajan
    Big Data is everywhere these days - but what exactly is it? It’s data that comes from a multitude of sources – not only structured data, but unstructured data as well.  The sheer volume of data is mindboggling – here are a few examples of big data: climate information collected from sensors, social media information, digital pictures, log files, online video files, medical records or online transaction records.  These are just a few examples of what constitutes big data.   Embedded in big data is tremendous value and being able to manipulate, load, transform and analyze big data is key to enhancing productivity and competitiveness.  The value of big data lies in its propensity for greater in-depth analysis and data segmentation -- in turn giving companies detailed information on product performance, customer preferences and inventory.  Furthermore, by being able to store and create more data in digital form, “big data can unlock significant value by making information transparent and usable at much higher frequency." (McKinsey Global Institute, May 2011) Oracle's flagship product for bulk data movement and transformation, Oracle Data Integrator, is a critical component of Oracle’s Big Data strategy. ODI provides automation, bulk loading, and validation and transformation capabilities for Big Data while minimizing the complexities of using Hadoop.  Specifically, the advantages of ODI in a Big Data scenario are due to pre-built Knowledge Modules that drive processing in Hadoop. This leverages the graphical UI to load and unload data from Hadoop, perform data validations and create mapping expressions for transformations.  The Knowledge Modules provide a key jump-start and eliminate a significant amount of Hadoop development.  Using Oracle Data Integrator together with Oracle Big Data Connectors, you can simplify the complexities of mapping, accessing, and loading big data (via NoSQL or HDFS) but also correlating your enterprise data – this correlation may require integrating across heterogeneous and standards-based environments, connecting to Oracle Exadata, or sourcing via a big data platform such as Oracle Big Data Appliance. To learn more about Oracle Data Integration and Big Data, download our resource kit to see the latest in whitepapers, webinars, downloads, and more… or go to our website on www.oracle.com/bigdata

    Read the article

  • Oracle E-Business Suite (WebADI) integration with Oracle Open Office

    - by Harald Behnke
    Another highlight of the new Oracle Open Office Release 3.3 enterprise features is the Oracle E-Business Suite Release 12.1 (WebADI) integration. The WebADI integration in Oracle Open Office for Windows allows you to bring your Oracle E-Business Suite data into an Oracle Open Office Calc spreadsheet, where familiar data entry and modeling techniques can be used to complete your E-Business Suite tasks. You can create formatted spreadsheets on your desktop that allow you to download, view, edit, and create Oracle E-Business Suite data. Use data entry shortcuts (such as copying and pasting or dragging and dropping ranges of cells), or Calc's Open Document Format (ODF V1.2) compliant spreadsheet formulas, to calculate amounts to save time. You can combine speed and accuracy by invoking lists of values for fields within the spreadsheet. After editing the spreadsheet, you can use WebADI's validation functionality to validate the data before uploading it to the Oracle E-Business Suite. Validation messages are returned to the spreadsheet, allowing you to identify and correct invalid. This video shows a hands-on demonstration of the Oracle E-Business Suite integration: Read more about the Oracle Open Office enterprise features.

    Read the article

  • Unit and Integration testing: How can it become a reflex

    - by LordOfThePigs
    All the programmers in my team are familiar with unit testing and integration testing. We have all worked with it. We have all written tests with it. Some of us even have felt an improved sense of trust in his/her own code. However, for some reason, writing unit/integration tests has not become a reflex for any of the members of the team. None of us actually feel bad when not writing unit tests at the same time as the actual code. As a result, our codebase is mostly uncovered by unit tests, and projects enter production untested. The problem with that, of course is that once your projects are in production and are already working well, it is virtually impossible to obtain time and/or budget to add unit/integration testing. The members of my team and myself are already familiar with the value of unit testing (1, 2) but it doesn't seem to help bringing unit testing into our natural workflow. In my experience making unit tests and/or a target coverage mandatory just results in poor quality tests and slows down team members simply because there is no self-generated motivation to produce these tests. Also as soon as pressure eases, unit tests are not written any more. My question is the following: Is there any methods that you have experimented with that helps build a dynamic/momentum inside the team, leading to people naturally wanting to create and maintain those tests?

    Read the article

  • unit/integration testing web service proxy client

    - by cori
    I'm rewriting a PHP client/proxy library that provides an interface to a SOAP-based .Net webservice, and in the process I want to add some unit and integration tests so future modifications are less risky. The work the library I'm working on performs is to marshall the calls to the web service and do a little reorganizing of the responses to present a slightly more -object-oriented interface to the underlying service. Since this library is little else than a thin layer on top of web service calls, my basic assumption is that I'll really be writing integration tests more than unit tests - for example, I don't see any reason to mock away the web service - the work that's performed by the code I'm working on is very light; it's almost passing the response from the service right back to its consumer. Most of the calls are basic CRUD operations: CreateRole(), CreateUser(), DeleteUser(), FindUser(), &ct. I'll be starting from a known database state - the system I'm using for these tests is isolated for testing purposes, so the results will be more or less predictable. My question is this: is it natural to use web service calls to confirm the results of operations within the tests and to reset the state of the application within the scope of each test? Here's an example: One test might be createUserReturnsValidUserId() and might go like this: public function createUserReturnsValidUserId() { // we're assuming a global connection to the service $newUserId = $client->CreateUser("user1"); assertNotNull($newUserId); assertNotNull($client->FindUser($newUserId); $client->deleteUser($newUserId); } So I'm creating a user, making sure I get an ID back and that it represents a user in the system, and then cleaning up after myself (so that later tests don't rely on the success or failure of this test w/r/t the number of users in the system, for example). However this still seems pretty fragile - lots of dependencies and opportunities for tests to fail and effect the results of later tests, which I definitely want to avoid. Am I missing some options of ways to decouple these tests from the system under test, or is this really the best I can do? I think this is a fairly general unit/integration testing question, but if it matters I'm using PHPUnit for the testing framework.

    Read the article

  • General Policies and Procedures for Maintaining the Value of Data Assets

    Here is a general list for policies and procedures regarding maintaining the value of data assets. Data Backup Policies and Procedures Backups are very important when dealing with data because there is always the chance of losing data due to faulty hardware or a user activity. So the need for a strategic backup system should be mandatory for all companies. This being said, in the real world some companies that I have worked for do not really have a good data backup plan. Typically when companies tend to take this kind of approach in data backups usually the data is not really recoverable.  Unfortunately when companies do not regularly test their backup plans they get a false sense of security because they think that they are covered. However, I can tell you from personal and professional experience that a backup plan/system is never fully implemented until it is regularly tested prior to the time when it actually needs to be used. Disaster Recovery Plan Expanding on Backup Policies and Procedures, a company needs to also have a disaster recovery plan in order to protect its data in case of a catastrophic disaster.  Disaster recovery plans typically encompass how to restore all of a company’s data and infrastructure back to a restored operational status.  Most Disaster recovery plans also include time estimates on how long each step of the disaster recovery plan should take to be executed.  It is important to note that disaster recovery plans are never fully implemented until they have been tested just like backup plans. Disaster recovery plans should be tested regularly so that the business can be confident in not losing any or minimal data due to a catastrophic disaster. Firewall Policies and Content Filters One way companies can protect their data is by using a firewall to separate their internal network from the outside. Firewalls allow for enabling or disabling network access as data passes through it by applying various defined restrictions. Furthermore firewalls can also be used to prevent access from the internal network to the outside by these same factors. Common Firewall Restrictions Destination/Sender IP Address Destination/Sender Host Names Domain Names Network Ports Companies can also desire to restrict what their network user’s view on the internet through things like content filters. Content filters allow a company to track what webpages a person has accessed and can also restrict user’s access based on established rules set up in the content filter. This device and/or software can block access to domains or specific URLs based on a few factors. Common Content Filter Criteria Known malicious sites Specific Page Content Page Content Theme  Anti-Virus/Mal-ware Polices Fortunately, most companies utilize antivirus programs on all computers and servers for good reason, virus have been known to do the following: Corrupt/Invalidate Data, Destroy Data, and Steal Data. Anti-Virus applications are a great way to prevent any malicious application from being able to gain access to a company’s data.  However, anti-virus programs must be constantly updated because new viruses are always being created, and the anti-virus vendors need to distribute updates to their applications so that they can catch and remove them. Data Validation Policies and Procedures Data validation is very important to ensure that only accurate information is stored. The existence of invalid data can cause major problems when businesses attempt to use data for knowledge based decisions and for performance reporting. Data Scrubbing Policies and Procedures Data scrubbing is valuable to companies in one of two ways. The first can be used to clean data prior to being analyzed for report generation. The second is that it allows companies to remove things like personally Identifiable information from its data prior to transmit it between multiple environments or if the information is sent to an external location. An example of this can be seen with medical records in regards to HIPPA laws that prohibit the storage of specific personal and medical information. Additionally, I have professionally run in to a scenario where the Canadian government does not allow any Canadian’s personal information to be stored on a server not located in Canada. Encryption Practices The use of encryption is very valuable when a company needs to any personal information. This allows users with the appropriated access levels to view or confirm the existence or accuracy of data within a system by either decrypting the information or encrypting a piece of data and comparing it to the stored version.  Additionally, if for some unforeseen reason the data got in to the wrong hands then they would have to first decrypt the data before they could even be able to read it. Encryption just adds and additional layer of protection around data itself. Standard Normalization Practices The use of standard data normalization practices is very important when dealing with data because it can prevent allot of potential issues by eliminating the potential for unnecessary data duplication. Issues caused by data duplication include excess use of data storage, increased chance for invalidated data, and over use of data processing. Network and Database Security/Access Policies Every company has some form of network/data access policy even if they have none. These policies help secure data from being seen by inappropriate users along with preventing the data from being updated or deleted by users. In addition, without a good security policy there is a large potential for data to be corrupted by unassuming users or even stolen. Data Storage Policies Data storage polices are very important depending on how they are implemented especially when a company is trying to utilize them in conjunction with other policies like Data Backups. I have worked at companies where all network user folders are constantly backed up, and if a user wanted to ensure the existence of a piece of data in the form of a file then they had to store that file in their network folder. Conversely, I have also worked in places where when a user logs on or off of the network there entire user profile is backed up. Training Policies One of the biggest ways to prevent data loss and ensure that data will remain a company asset is through training. The practice of properly train employees on how to work with in systems that access data is crucial when trying to ensure a company’s data will remain an asset. Users need to be trained on how to manipulate a company’s data in order to perform their tasks to reduce the chances of invalidating data.

    Read the article

  • Why Oracle Delivers More Value than IBM in Data Integration Solutions

    - by irem.radzik(at)oracle.com
    For data integration projects, IT organization look for a robust but an easy-to-use solution, which simplifies enterprise data architecture while providing exceptional value-- not one that adds complexity and costs. This is a major challenge today for customers who are using IBM InfoSphere products like DataStage or Change Data Capture. Whereas, Oracle consistently delivers higher level value with its data integration products such as Oracle Data Integrator, Oracle GoldenGate. There are many differentiators for Oracle's Data Integration offering in comparison to IBM. Here are the top five: Lower cost of ownership Higher performance in both real-time and bulk data movement Ease of use and flexibility Reliability Complete, Open, and Integrated Middleware Offering Architectural differences between products contribute a great deal to these differences. First of all, Oracle's ETL architecture does not require a middle-tier transformation server, something IBM does require. Not only it costs more to manage an additional transformation server including energy costs, but it adds a performance bottleneck as well. In addition, IBM's data integration products are complex and often require lengthy professional services engagements to integrate. This translates to higher costs and delayed time to market. Then there's the reliability factor. Our customers choose Oracle GoldenGate over IBM's InfoSphere Change Data Capture product because Oracle GoldenGate is designed for mission-critical systems that require guaranteed data delivery and automatic recovery in case of process interruptions. On Thursday we will discuss these key differentiators in detail and provide customer examples that chose Oracle over IBM in data integration projects. Join us on Thursday Feb 10th at 11am PT to learn how Oracle delivers more value than IBM in data integration solutions.

    Read the article

  • Importing Multiple Schemas to a Model in Oracle SQL Developer Data Modeler

    - by thatjeffsmith
    Your physical data model might stretch across multiple Oracle schemas. Or maybe you just want a single diagram containing tables, views, etc. spanning more than a single user in the database. The process for importing a data dictionary is the same, regardless if you want to suck in objects from one schema, or many schemas. Let’s take a quick look at how to get started with a data dictionary import. I’m using Oracle SQL Developer in this example. The process is nearly identical in Oracle SQL Developer Data Modeler – the only difference being you’ll use the ‘File’ menu to get started versus the ‘File – Data Modeler’ menu in SQL Developer. Remember, the functionality is exactly the same whether you use SQL Developer or SQL Developer Data Modeler when it comes to the data modeling features – you’ll just have a cleaner user interface in SQL Developer Data Modeler. Importing a Data Dictionary to a Model You’ll want to open or create your model first. You can import objects to an existing or new model. The easiest way to get started is to simply open the ‘Browser’ under the View menu. The Browser allows you to navigate your open designs/models You’ll see an ‘Untitled_1′ model by default. I’ve renamed mine to ‘hr_sh_scott_demo.’ Now go back to the File menu, and expand the ‘Data Modeler’ section, and select ‘Import – Data Dictionary.’ This is a fancy way of saying, ‘suck objects out of the database into my model’ Connect! If you haven’t already defined a connection to the database you want to reverse engineer, you’ll need to do that now. I’m going to assume you already have that connection – so select it, and hit the ‘Next’ button. Select the Schema(s) to be imported Select one or more schemas you want to import The schemas selected on this page of the wizard will dictate the lists of tables, views, synonyms, and everything else you can choose from in the next wizard step to import. For brevity, I have selected ALL tables, views, and synonyms from 3 different schemas: HR SCOTT SH Once I hit the ‘Finish’ button in the wizard, SQL Developer will interrogate the database and add the objects to our model. The Big Model and the 3 Little Models I can now see ALL of the objects I just imported in the ‘hr_sh_scott_demo’ relational model in my design tree, and in my relational diagram. Quick Tip: Oracle SQL Developer calls what most folks think of as a ‘Physical Model’ the ‘Relational Model.’ Same difference, mostly. In SQL Developer, a Physical model allows you to define partitioning schemes, advanced storage parameters, and add your PL/SQL code. You can have multiple physical models per relational models. For example I might have a 4 Node RAC in Production that uses partitioning, but in test/dev, only have a single instance with no partitioning. I can have models for both of those physical implementations. The list of tables in my relational model Wouldn’t it be nice if I could segregate the objects based on their schema? Good news, you can! And it’s done by default Several of you might already know where I’m going with this – SUBVIEWS. You can easily create a ‘SubView’ by selecting one or more objects in your model or diagram and add them to a new SubView. SubViews are just mini-models. They contain a subset of objects from the main model. This is very handy when you want to break your model into smaller, more digestible parts. The model information is identical across the model and subviews, so you don’t have to worry about making a change in one place and not having it propagate across your design. SubViews can be used as filters when you create reports and exports as well. So instead of generating a PDF for everything, just show me what’s in my ‘ABC’ subview. But, I don’t want to do any work! Remember, I’m really lazy. More good news – it’s already done by default! The schemas are automatically used to create default SubViews Auto-Navigate to the Object in the Diagram In the subview tree node, right-click on the object you want to navigate to. You can ask to be taken to the main model view or to the SubView location. If you haven’t already opened the SubView in the diagram, it will be automatically opened for you. The SubView diagram only contains the objects from that SubView Your SubView might still be pretty big, many dozens of objects, so don’t forget about the ‘Navigator‘ either! In summary, use the ‘Import’ feature to add existing database objects to your model. If you import from multiple schemas, take advantage of the default schema based SubViews to help you manage your models! Sometimes less is more!

    Read the article

  • How should I select continuous integration tool?

    - by DeveloperDon
    I found this cool comparison table for integration servers on Wikipedia, but I am a little uncertain how to rank the tools vs. my needs and interests. The chart itself seems to have a lot of boxes marked unknown, so if you are comfortable updating it on Wikipedia, that could be great too. Are there a few top performing products so I can quickly narrow down to four or five options? Which products seems to have the largest user communities and most ongoing enhancements and integration with new tools? Are the open source offerings best, or are there high quality tools that can be a great deal for a single user at home? Will use of multiple systems (primary desktop, local only home network server, personal and work notebooks, multiple virtual machines spread across all) create problems and how can they be managed?

    Read the article

  • Deploying Data Mining Models using Model Export and Import, Part 2

    - by [email protected]
    In my last post, Deploying Data Mining Models using Model Export and Import, we explored using DBMS_DATA_MINING.EXPORT_MODEL and DBMS_DATA_MINING.IMPORT_MODEL to enable moving a model from one system to another. In this post, we'll look at two distributed scenarios that make use of this capability and a tip for easily moving models from one machine to another using only Oracle Database, not an external file transport mechanism, such as FTP. The first scenario, consider a company with geographically distributed business units, each collecting and managing their data locally for the products they sell. Each business unit has in-house data analysts that build models to predict which products to recommend to customers in their space. A central telemarketing business unit also uses these models to score new customers locally using data collected over the phone. Since the models recommend different products, each customer is scored using each model. This is depicted in Figure 1.Figure 1: Target instance importing multiple remote models for local scoring In the second scenario, consider multiple hospitals that collect data on patients with certain types of cancer. The data collection is standardized, so each hospital collects the same patient demographic and other health / tumor data, along with the clinical diagnosis. Instead of each hospital building it's own models, the data is pooled at a central data analysis lab where a predictive model is built. Once completed, the model is distributed to hospitals, clinics, and doctor offices who can score patient data locally.Figure 2: Multiple target instances importing the same model from a source instance for local scoring Since this blog focuses on model export and import, we'll only discuss what is necessary to move a model from one database to another. Here, we use the package DBMS_FILE_TRANSFER, which can move files between Oracle databases. The script is fairly straightforward, but requires setting up a database link and directory objects. We saw how to create directory objects in the previous post. To create a database link to the source database from the target, we can use, for example: create database link SOURCE1_LINK connect to <schema> identified by <password> using 'SOURCE1'; Note that 'SOURCE1' refers to the service name of the remote database entry in your tnsnames.ora file. From SQL*Plus, first connect to the remote database and export the model. Note that the model_file_name does not include the .dmp extension. This is because export_model appends "01" to this name.  Next, connect to the local database and invoke DBMS_FILE_TRANSFER.GET_FILE and import the model. Note that "01" is eliminated in the target system file name.  connect <source_schema>/<password>@SOURCE1_LINK; BEGIN  DBMS_DATA_MINING.EXPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp',                                 'MY_SOURCE_DIR_OBJECT',                                 'name =''MY_MINING_MODEL'''); END; connect <target_schema>/<password>; BEGIN  DBMS_FILE_TRANSFER.GET_FILE ('MY_SOURCE_DIR_OBJECT',                               'EXPORT_FILE_NAME' || '01.dmp',                               'SOURCE1_LINK',                               'MY_TARGET_DIR_OBJECT',                               'EXPORT_FILE_NAME' || '.dmp' );  DBMS_DATA_MINING.IMPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp',                                 'MY_TARGET_DIR_OBJECT'); END; To clean up afterward, you may want to drop the exported .dmp file at the source and the transferred file at the target. For example, utl_file.fremove('&directory_name', '&model_file_name' || '.dmp');

    Read the article

  • Abstract Data Type and Data Structure

    - by mark075
    It's quite difficult for me to understand these terms. I searched on google and read a little on Wikipedia but I'm still not sure. I've determined so far that: Abstract Data Type is a definition of new type, describes its properties and operations. Data Structure is an implementation of ADT. Many ADT can be implemented as the same Data Structure. If I think right, array as ADT means a collection of elements and as Data Structure, how it's stored in a memory. Stack is ADT with push, pop operations, but can we say about stack data structure if I mean I used stack implemented as an array in my algorithm? And why heap isn't ADT? It can be implemented as tree or an array.

    Read the article

  • Separate Database for Integration Testing

    - by john doe
    I am performance integration testing where I fire up the ASPX pages using WatiN and fill the fields and insert into the database. There are couple of problems that I am facing. 1) Should I use a completely separate database for integration testing? I already gave db_test and db_dev. db_test is for unit testing and is cleared after each test. db_dev is for developers. 2) When I run WatiN test which are contained in a separate assembly (not separate from unit test assembly which should be better since WatiN test take so much time to run). So WatiN test fire up the WebApps project and uses their web.config which is pointing to the dev database. Is there anyway I can tell WatiN to use a separate web.config which contains a different database name?

    Read the article

  • Integration error in high velocity

    - by Elektito
    I've implemented a simple simulation of two planets (simple 2D disks really) in which the only force is gravity and there is also collision detection/response (collisions are completely elastic). I can launch one planet into orbit of the other just fine. The collision detection code though does not work so well. I noticed that when one planet hits the other in a free fall it speeds backward and goes much higher than its original position. Some poking around convinced me that the simplistic Euler integration is causing the error. Consider this case. One object has a mass of 1kg and the other has a mass equal to earth. Say the object is 10 meters above ground. Assume that our dt (delta t) is 1 second. The object goes to the height of 9 meters at the end of the first iteration, 7 at the end of the second, 4 at the end of the third and 0 at the end of the fourth iteration. At this points it hits the ground and bounces back with the speed of 10 meters per second. The problem is with dt=1, on the first iteration it bounces back to a height of 10. It takes several more steps to make the object change its course. So my question is, what integration method can I use which fixes this problem. Should I split dt to smaller pieces when velocity is high? Or should I use another method altogether? What method do you suggest? EDIT: You can see the source code here at github:https://github.com/elektito/diskworld/

    Read the article

  • Big Data – Various Learning Resources – How to Start with Big Data? – Day 20 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned how to become a Data Scientist for Big Data. In this article we will go over various learning resources related to Big Data. In this series we have covered many of the most essential details about Big Data. At the beginning of this series, I have encouraged readers to send me questions. One of the most popular questions is - “I want to learn more about Big Data. Where can I learn it?” This is indeed a great question as there are plenty of resources out to learn about Big Data and it is indeed difficult to select on one resource to learn Big Data. Hence I decided to write here a few of the very important resources which are related to Big Data. Learn from Pluralsight Pluralsight is a global leader in high-quality online training for hardcore developers.  It has fantastic Big Data Courses and I started to learn about Big Data with the help of Pluralsight. Here are few of the courses which are directly related to Big Data. Big Data: The Big Picture Big Data Analytics with Tableau NoSQL: The Big Picture Understanding NoSQL Data Analysis Fundamentals with Tableau I encourage all of you start with this video course as they are fantastic fundamentals to learn Big Data. Learn from Apache Resources at Apache are single point the most authentic learning resources. If you want to learn fundamentals and go deep about every aspect of the Big Data, I believe you must understand various concepts in Apache’s library. I am pretty impressed with the documentation and I am personally referencing it every single day when I work with Big Data. I strongly encourage all of you to bookmark following all the links for authentic big data learning. Haddop - The Apache Hadoop® project develops open-source software for reliable, scalable, distributed computing. Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which include support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heat maps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper: A high-performance coordination service for distributed applications. Learn from Vendors One of the biggest issues with about learning Big Data is setting up the environment. Every Big Data vendor has different environment request and there are lots of things require to set up Big Data framework. Many of the users do not start with Big Data as they are afraid about the resources required to set up framework as well as a time commitment. Here Hortonworks have created fantastic learning environment. They have created Sandbox with everything one person needs to learn Big Data and also have provided excellent tutoring along with it. Sandbox comes with a dozen hands-on tutorial that will guide you through the basics of Hadoop as well it contains the Hortonworks Data Platform. I think Hortonworks did a fantastic job building this Sandbox and Tutorial. Though there are plenty of different Big Data Vendors I have decided to list only Hortonworks due to their unique setup. Please leave a comment if there are any other such platform to learn Big Data. I will include them over here as well. Learn from Books There are indeed few good books out there which one can refer to learn Big Data. Here are few good books which I have read. I will update the list as I will learn more. Ethics of Big Data Balancing Risk and Innovation Big Data for Dummies Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions If you search on Amazon there are millions of the books but I think above three books are a great set of books and it will give you great ideas about Big Data. Once you go through above books, you will have a clear idea about what is the next step you should follow in this series. You will be capable enough to make the right decision for yourself. Tomorrow In tomorrow’s blog post we will wrap up this series of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Data Storage Options

    - by Kenneth
    When I was working as a website designer/engineer I primarily used databases for storage of much of my dynamic data. It was very easy and convenient to use this method and seemed like a standard practice from my research on the matter. I'm now working on shifting away from websites and into desktop applications. What are the best practices for data storage for desktop applications? I ask because I have noticed that most programs I use on a personal level don't appear to use a database for data storage unless its embedded in the program. (I'm not thinking of an application like a word processor where it makes sense to have data stored in individual files as defined by the user. Rather I'm thinking of something more along the lines of a calendar application which would need to store dates and event info and such where accessing that information would be much easier if stored in a database... at least as far as my experience would indicate.) Thanks for the input!

    Read the article

  • What is a Data Warehouse?

    Typically Data Warehouses are considered to be non-volatile in comparison to traditional databasesdue to the fact that data within the warehouse does not change that often.  In addition, Data Warehouses typically represent data through the use of Multidimensional Conceptual Views that allow data to be extracted based on the view and the current position within the view. Common Data Warehouse Traits Relatively Non-volatile Data Supports Data Extraction and Analysis Optimized for Data Retrieval and Analysis Multidimensional Views of Data Flexible Reporting Multi User Support Generic Dimensionality Transparent Accessible Unlimited Dimensions of Data Unlimited Aggregation levels of Data Normally, Data Warehouses are much larger then there traditional database counterparts due to the fact that they store the basis data along with derived data via Multidimensional Conceptual Views. As companies store larger and larger amounts of data, they will need a way to effectively and accurately extract analysis information that can be used to aide in formulating current and future business decisions. This process can be done currently through data mining within a Data Warehouse. Data Warehouses provide access to data derived through complex analysis, knowledge discovery and decision making. Secondly, they support the demands for high performance in regards to analyzing an organization’s existing and current data. Data Warehouses provide support for an organization’s data and acquired business knowledge.  Within a Data Warehouse multiple types of operations/sub systems are supported. Common Data Warehouse Sub Systems Online Analytical Processing (OLAP) Decision –Support Systems (DSS) Online Transaction Processing (OLTP)

    Read the article

  • Oracle Coherence & Oracle Service Bus: REST API Integration

    - by Nino Guarnacci
    This post aims to highlight one of the features found in Oracle Coherence which allows it to be easily added and integrated inside a wider variety of projects.  The features in question are the REST API exposed by the Coherence nodes, with which you can interact in the wider mode in memory data grid.Oracle Coherence and Oracle Service Bus are natively integrated through a feature found in the Oracle Service Bus, which allows you to use the coherence grid cache during the configuration phase of a business service. This feature allows you to use an intermediate layer of cache to retrieve the answers from previous invocations of the same service, without necessarily having to invoke the real business service again. Directly from the web console of Oracle Service Bus, you can decide the policies of eviction of the objects / answers and define the discriminating parameters that identify their uniqueness.The coherence REST APIs, however, allow you to integrate both products for other necessities enabling realization of new architectures design.  Consider coherence’s node as a simple service which interoperates through the stardard services and in particular REST (with JSON and XML). Thinking of coherence as a company’s shared service, able to have an implementation of a centralized “map and reduce” which you can access  by a huge variety of protocols (transport and envelopes).An amazing step forward for those who still imagine connectors and code. This type of integration does not require writing custom code or complex implementation to be self-supported. The added value is made unique by the incredible value of both products independently, and still more out of their simple and robust integration.As already mentioned this scenario discovers a hidden new door behind the columns of these two products. The door leads to new ideas and perspectives for enterprise architectures that increasingly wink to next-generation applications: simple and dynamic, perhaps towards the mobile and web 2.0.Below, a small and simple demo useful to demonstrate how easily is to integrate these two products using the Coherence REST API. This demo is also intended to imagine new enterprise architectures using this approach.The idea is to create a centralized system of alerting, fed easily from any company’s application, regardless of the technology with which they were built . Then use a representation standard protocol: RSS, using a service exposed by the service bus; So you can browse and search only the alerts that you are interested on, by category, author, title, date, etc etc.. The steps needed to implement this system are very simple and very few. Here they are listed below and described to be easily replicated within your environment. I would remind you that the demo is only meant to demonstrate how easily is to integrate Oracle Coherence and the Oracle Service Bus, and stimulate your imagination to new technological approaches.1) Install the two products: In this demo used (if necessary, consult the installation guides of 2 products)  - Oracle Service Bus ver. 11.1.1.5.0 http://www.oracle.com/technetwork/middleware/service-bus/downloads/index.html - Oracle Coherence ver. 3.7.1 http://www.oracle.com/technetwork/middleware/coherence/downloads/index.html 2) Because you choose to create a centralized alerting system, we need to define a structure type containing some alerting attributes useful to preserve and organize the information of the various alerts sent by the different applications. Here, then it was built a java class named Alert containing the canonical properties of an alarm information:- Title- Description- System- Time- Severity 3) Therefore, we need to create two configuration files for the coherence node, in order to save the Alert objects within the grid, through the rest/http protocol (more than the native API for Java, C + +, C,. Net). Here are the two minimal configuration files for Coherence:coherence-rest-config.xml resty-server-config.xml This minimum configuration allows me to use a distributed cache named "alerts" that can  also be accessed via http - rest on the host "localhost" over port "8080", objects are of type “oracle.cohsb.Alert”. 4) Below  a simple Java class that represents the type of alert messages: 5) At this point we just need to startup our coherence node, able to listen on http protocol to manage the “alerts” cache, which will receive incoming XML or JSON objects of type Alert. Remember to include in the classpath of the coherence node, the Alert java class and the following coherence libraries and configuration files:  At this point, just run the coherence class node “com.tangosol.net.DefaultCacheServer”advising you to set the following parameters:-Dtangosol.coherence.log.level=9 -Dtangosol.coherence.log=stdout -Dtangosol.coherence.cacheconfig=[PATH_TO_THE_FILE]\resty-server-config.xml 6) Let's create a procedure to test our configuration of Coherence and in order to insert some custom alerts in our cache. The technology with which you want to achieve this functionality is fully not considerable: Javascript, Python, Ruby, Scala, C + +, Java.... Because the protocol to communicate with Coherence is simply HTTP / JSON or XML. For this little demo i choose Java: A method to send/put the alert to the cache: A method to query and view the content of the cache: Finally the main method that execute our methods:  No special library added in the classpath for our class (json struct static defined), when it will be executed, it asks some information such as title, description,... in order to compose and send an alert to the cache and then it will perform an inquiry, to the same cache. At this point, a good exercise at this point, may be to create the same procedure using other technologies, such as a simple html page containing some JavaScript code, and then using Python, Ruby, and so on.7) Now we are ready to start configuring the Oracle Service Bus in order to integrate the two products. First integrate the internal alerting system of Oracle Service Bus with our centralized alerting system based on coherence node. This ensures that by monitoring, or directly from within our Proxy Message Flow, we can throw alerts and save them directly into the Coherence node. To do this I choose to use the jms technology, natively present inside the Oracle Weblogic / Service Bus. Access to the Oracle WebLogic Administration console and create and configure a new JMS connection factory and a new jms destination (queue). Now we should create a new resource of type “alert destination” within our Oracle Service Bus project. The new “alert destination” resource should be configured using the newly created connection factory jms and jms destination. Finally, in order to withdraw the message alert enqueued in our JMS destination and send it to our coherence node, we just need to create a new business service and proxy service within our Oracle Service Bus project.Our business service is responsible for sending a message to our REST service Coherence using as a method action: PUT Finally our proxy service have to collect all messages enqueued on the destination, execute an xquery transformation on those messages  in order to translate them into valid XML / alert objects useful to be sent to our coherence service, through the newly created business service. The message flow pipeline containing the xquery transformation: Incredibly,  we just did a basic first integration between the native alerting system of Oracle Service Bus and our centralized alerting system by simply configuring our coherence node without developing anything.It's time to test it out. To do this I create a proxy service able to generate an alert using our "alert destination", whenever the proxy is invoked. After some invocation to our proxy that generates fake alerts, we could open an Internet browser and type the URL  http://localhost: 8080/alerts/  so we could see what has been inserted within the coherence node. 8) We are ready for the final step.  We would create a new message flow, that can be used to search and display the results in standard mode. To do this I choosen the standard representation of RSS, to display a formatted result on a huge variety of devices such as readers for the iPhone and Android. The inquiry may be defined already at the time of the request able to return only feed / items related to our needs. To do this we need to create a new business service, a new proxy service, and finally a new XQuery Transformation to take care of translating the collection of alerts that will be return from our coherence node in a nicely formatted RSS standard document.So we start right from this resource (xquery), which has the task of transforming a collection of alerts / xml returned from the node coherence in a type well-formatted feed RSS 2.0 our new business service that will search the alerts on our coherence node using the Rest API. And finally, our last resource, the proxy service that will be exposed as an RSS / feeds to various mobile devices and traditional web readers, in which we will intercept any search query, and transform the result returned by the business service in an RSS feed 2.0. The message flow with the transformation phase (Alert TO Feed Items): Finally some little tricks to follow during the routing to the business service, - check for any queries present in the url to require a subset of alerts  - the http header "Accept" to help get an answer XML instead of JSON: In our little demo we also static added some coherence parameters to the request:sort=time:desc;start=0;count=100I would like to get from Coherence that the results will be sorted by date, and starting from 1 up to a maximum of 100.Done!!Just incredible, our centralized alerting system is ready. Inheriting all the qualities and capabilities of the two products involved Oracle Coherence & Oracle Service Bus: - RASP (Reliability, Availability, Scalability, Performance)Now try to use your mobile device, or a normal Internet browser by accessing the RSS just published: Some urls you may test: Search for the last 100 alerts : http://localhost:7001/alarmsSearch for alerts that do not have time set to null (time is not null):http://localhost:7001/alarms?q=time+is+not+nullSearch for alerts that the system property is “Web Browser” (system = ‘Web Browser’):http://localhost:7001/alarms?q=system+%3D+%27Web+Browser%27Search for alerts that the system property is “Web Browser” and the severity property is “Fatal” and the title property contain the word “Javascript”  (system = ‘Web Broser’ and severity = ‘Fatal’ and title like ‘%Javascript%’)http://localhost:8080/alerts?q=system+%3D+%27Web+Browser%27+AND+severity+%3D+%27Fatal%27+AND+title+LIKE+%27%25Javascript%25%27 To compose more complex queries about your need I would suggest you to read the chapter in the coherence documentation inherent the Cohl language (Coherence Query Language) http://download.oracle.com/docs/cd/E24290_01/coh.371/e22837/api_cq.htm . Some useful links: - Oracle Coherence REST API Documentation http://download.oracle.com/docs/cd/E24290_01/coh.371/e22839/rest_intro.htm - Oracle Service Bus Documentation http://download.oracle.com/docs/cd/E21764_01/soa.htm#osb - REST explanation from Wikipedia http://en.wikipedia.org/wiki/Representational_state_transfer At this URL could be downloaded the whole materials of this demo http://blogs.oracle.com/slc/resource/cosb/coh-sb-demo.zip Author: Nino Guarnacci.

    Read the article

  • How Do You Actually Model Data?

    Since the 1970’s Developers, Analysts and DBAs have been able to represent concepts and relations in the form of data through the use of generic symbols.  But what is data modeling?  The first time I actually heard this term I could not understand why anyone would want to display a computer on a fashion show runway. Hey, what do you expect? At that time I was a freshman in community college, and obviously this was a long time ago.  I have since had the chance to learn what data modeling truly is through using it. Data modeling is a process of breaking down information and/or requirements in to common categories called objects. Once objects start being defined then relationships start to form based on dependencies found amongst other existing objects.  Currently, there are several tools on the market that help data designer actually map out objects and their relationships through the use of symbols and lines.  These diagrams allow for designs to be review from several perspectives so that designers can ensure that they have the optimal data design for their project and that the design is flexible enough to allow for potential changes and/or extension in the future. Additionally these basic models can always be further refined to show different levels of details depending on the target audience through the use of three different types of models. Conceptual Data Model(CDM)Conceptual Data Models include all key entities and relationships giving a viewer a high level understanding of attributes. Conceptual data model are created by gathering and analyzing information from various sources pertaining to a project during the typical planning phase of a project. Logical Data Model (LDM)Logical Data Models are conceptual data models that have been expanded to include implementation details pertaining to the data that it will store. Additionally, this model typically represents an origination’s business requirements and business rules by defining various attribute data types and relationships regarding each entity. This additional information can be directly translated to the Physical Data Model which reduces the actual time need to implement it. Physical Data Model(PDMs)Physical Data Model are transformed Logical Data Models that include the necessary tables, columns, relationships, database properties for the creation of a database. This model also allows for considerations regarding performance, indexing and denormalization that are applied through database rules, data integrity. Further expanding on why we actually use models in modern application/database development can be seen in the benefits that data modeling provides for data modelers and projects themselves, Benefits of Data Modeling according to Applied Information Science Abstraction that allows data designers remove concepts and ideas form hard facts in the form of data. This gives the data designers the ability to express general concepts and/or ideas in a generic form through the use of symbols to represent data items and the relationships between the items. Transparency through the use of data models allows complex ideas to be translated in to simple symbols so that the concept can be understood by all viewpoints and limits the amount of confusion and misunderstanding. Effectiveness in regards to tuning a model for acceptable performance while maintaining affordable operational costs. In addition it allows systems to be built on a solid foundation in terms of data. I shudder at the thought of a world without data modeling, think about it? Data is everywhere in our lives. Data modeling allows for optimizing a design for performance and the reduction of duplication. If one was to design a database without data modeling then I would think that the first things to get impacted would be database performance due to poorly designed database and there would be greater chances of unnecessary data duplication that would also play in to the excessive query times because unneeded records would need to be processed. You could say that a data designer designing a database is like a box of chocolates. You will never know what kind of database you will get until after it is built.

    Read the article

  • maintaining a growing, diverse codebase with continuous integration

    - by Nate
    I am in need of some help with philosophy and design of a continuous integration setup. Our current CI setup uses buildbot. When I started out designing it, I inherited (well, not strictly, as I was involved in its design a year earlier) a bespoke CI builder that was tailored to run the entire build at once, overnight. After a while, we decided that this was insufficient, and started exploring different CI frameworks, eventually choosing buildbot. One of my goals in transitioning to buildbot (besides getting to enjoy all the whiz-bang extras) was to overcome some of the inadequacies of our bespoke nightly builder. Humor me for a moment, and let me explain what I have inherited. The codebase for my company is almost 150 unique c++ Windows applications, each of which has dependencies on one or more of a dozen internal libraries (and many on 3rd party libraries as well). Some of these libraries are interdependent, and have depending applications that (while they have nothing to do with each other) have to be built with the same build of that library. Half of these applications and libraries are considered "legacy" and unportable, and must be built with several distinct configurations of the IBM compiler (for which I have written unique subclasses of Compile), and the other half are built with visual studio. The code for each compiler is stored in two separate Visual SourceSafe repositories (which I am simply handling using a bunch of ShellCommands, as there is no support for VSS). Our original nightly builder simply took down the source for everything, and built stuff in a certain order. There was no way to build only a single application, or pick a revision, or to group things. It would launched virtual machines to build a number of the applications. It wasn't very robust, it wasn't distributable. It wasn't terribly extensible. I wanted to be able to overcame all of these limitations in buildbot. The way I did this originally was to create entries for each of the applications we wanted to build (all 150ish of them), then create triggered schedulers that could build various applications as groups, and then subsume those groups under an overall nightly build scheduler. These could run on dedicated slaves (no more virtual machine chicanery), and if I wanted I could simply add new slaves. Now, if we want to do a full build out of schedule, it's one click, but we can also build just one application should we so desire. There are four weaknesses of this approach, however. One is our source tree's complex web of dependencies. In order to simplify config maintenace, all builders are generated from a large dictionary. The dependencies are retrieved and built in a not-terribly robust fashion (namely, keying off of certain things in my build-target dictionary). The second is that each build has between 15 and 21 build steps, which is hard to browse and look at in the web interface, and since there are around 150 columns, takes forever to load (think from 30 seconds to multiple minutes). Thirdly, we no longer have autodiscovery of build targets (although, as much as one of my coworkers harps on me about this, I don't see what it got us in the first place). Finally, aformentioned coworker likes to constantly bring up the fact that we can no longer perform a full build on our local machine (though I never saw what that got us, either, considering that it took three times as long as the distributed build; I think he is just paranoically phobic of ever breaking the build). Now, moving to new development, we are starting to use g++ and subversion (not porting the old repository, mind you - just for the new stuff). Also, we are starting to do more unit testing ("more" might give the wrong picture... it's more like any), and integration testing (using python). I'm having a hard time figuring out how to fit these into my existing configuration. So, where have I gone wrong philosophically here? How can I best proceed forward (with buildbot - it's the only piece of the puzzle I have license to work on) so that my configuration is actually maintainable? How do I address some of my design's weaknesses? What really works in terms of CI strategies for large, (possibly over-)complex codebases?

    Read the article

  • Data Structures usage and motivational aspects

    - by Aubergine
    For long student life I was always wondering why there are so many of them yet there seems to be lack of usage at all in many of them. The opinion didn't really change when I got a job. We have brilliant books on what they are and their complexities, but I never encounter resources which would actually give a good hint of practical usage. I perfectly understand that I have to look at problem , analyse required operations, look for data structure that does them efficiently. However in practice I never do that, not because of human laziness syndrome, but because when it comes to work I acknowledge time priority over self-development. Over time I thought that when I would be better developer I will automatically use more of them - that didn't happen at all or maybe I just didn't. Then I found that the colleagues usually in the same plate as me - knowing more or less some three of data structures and being totally happy about it and refusing to discuss this matter further with me, coming back to conversations about 'cool new languages' 'libraries that do jobs for you' and the joy to work under scrumban etc. I am stuck with ArrayLists, Arrays and SortedMap , which no matter what I do always suffice or either I tweak them to be capable of fulfilling my task. Yes, it might be inefficient but do we really have to care if Intel increases performance over years no matter if we improve our skills? Does new Xeon or IBM machines really care what we use? What if I like build things, but I am not particularly excited whether it is n log(n) or just n? Over twenty years the processing power increased enormously, which gives us freedom of not being critical about which one to use? On top of that new more optimized languages appear which support multiple cores more efficiently. To be more specific: I would like to find motivational material on complex real areas/cases of possible effective usages of data structures. I would be really grateful if you would provide relevant resources. There is similar question ,but in the end the links again mostly describe or do dumb example(vehicles, students or holy grail quest - yes, very relevant) them and people keep referring to the "scenario decides the data structure to use". I want to know these complex scenarios to be able to identify similarities to my scenario and then use them. The complex scenarios where it really matters and not necessarily of quantitive nature. It seems that data structures only concern is efficiency and nothing else? There seems to be no particular convenience for developer in use one over another. (only when I found scientific resources on why exactly simple carbohydrates are evil I stopped eating sugar and candies completely replacing it with less harmful fruits - I hope you can see the analogy)

    Read the article

  • Unleash AutoVue on Your Unmanaged Data

    - by [email protected]
    Over the years, I've spoken to hundreds of customers who use AutoVue to collaborate on their "managed" data stored in content management systems, product lifecycle management systems, etc. via our many integrations. Through these conversations I've also learned a harsh reality - we will never fully move away from unmanaged data (desktops, file servers, emails, etc). If you use AutoVue today you already know that even if your primary use is viewing content stored in a content management system, you can still open files stored locally on your computer. But did you know that AutoVue actually has - built-in - a great solution for viewing, printing and redlining your data stored on file servers? Using the 'Server protocol' you can point AutoVue directly to a top-level location on any networked file server and provide your users with a link or shortcut to access an interface similar to the sample page shown below. Many customers link to pages just like this one from their internal company intranets. Through this webpage, users can easily search and browse through file server data with a 'click-and-view' interface to find the specific image, document, drawing or model they're looking for. Any markups created on a document will be accessible to everyone else viewing that document and of course real-time collaboration is supported as well. Customers on maintenance can consult the AutoVue Admin guide or My Oracle Support Doc ID 753018.1 for an introduction to the server protocol. Contact your local AutoVue Solutions Consultant for help setting up the sample shown above.

    Read the article

  • How To Run integrational Tests

    - by Vladimir
    In our project we have a plenty of Unit Tests. They help to keep project rather well-tested. Besides them we have a set of tests which are unit tests, but depends on some kind of external resource. We call them external tests. They can access web-service sometimes or similar. While unit tests is easy to run the integrational tests couldn't pass sometimes - for example due to timeout error. Also these tests can take too much time to run. Currently we keep integration/external unit tests just to run them when developing corresponding functionality. For plain unit tests we use TeamCIty for continuous integration. How do you run the integration unit tests and when do you run them?

    Read the article

  • SQL SERVER – Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 – Introduction to DQS

    - by pinaldave
    Data Quality Services is a very important concept of SQL Server. I have recently started to explore the same and I am really learning some good concepts. Here are two very important blog posts which one should go over before continuing this blog post. Installing Data Quality Services (DQS) on SQL Server 2012 Connecting Error to Data Quality Services (DQS) on SQL Server 2012 This article is introduction to Data Quality Services for beginners. We will be using an Excel file Click on the image to enlarge the it. In the first article we learned to install DQS. In this article we will see how we can learn about building Knowledge Base and using it to help us identify the quality of the data as well help correct the bad quality of the data. Here are the two very important steps we will be learning in this tutorial. Building a New Knowledge Base  Creating a New Data Quality Project Let us start the building the Knowledge Base. Click on New Knowledge Base. In our project we will be using the Excel as a knowledge base. Here is the Excel which we will be using. There are two columns. One is Colors and another is Shade. They are independent columns and not related to each other. The point which I am trying to show is that in Column A there are unique data and in Column B there are duplicate records. Clicking on New Knowledge Base will bring up the following screen. Enter the name of the new knowledge base. Clicking NEXT will bring up following screen where it will allow to select the EXCE file and it will also let users select the source column. I have selected Colors and Shade both as a source column. Creating a domain is very important. Here you can create a unique domain or domain which is compositely build from Colors and Shade. As this is the first example, I will create unique domain – for Colors I will create domain Colors and for Shade I will create domain Shade. Here is the screen which will demonstrate how the screen will look after creating domains. Clicking NEXT it will bring you to following screen where you can do the data discovery. Clicking on the START will start the processing of the source data provided. Pre-processed data will show various information related to the source data. In our case it shows that Colors column have unique data whereas Shade have non-unique data and unique data rows are only two. In the next screen you can actually add more rows as well see the frequency of the data as the values are listed unique. Clicking next will publish the knowledge base which is just created. Now the knowledge base is created. We will try to take any random data and attempt to do DQS implementation over it. I am using another excel sheet here for simplicity purpose. In reality you can easily use SQL Server table for the same. Click on New Data Quality Project to see start DQS Project. In the next screen it will ask which knowledge base to use. We will be using our Colors knowledge base which we have recently created. In the Colors knowledge base we had two columns – 1) Colors and 2) Shade. In our case we will be using both of the mappings here. User can select one or multiple column mapping over here. Now the most important phase of the complete project. Click on Start and it will make the cleaning process and shows various results. In our case there were two columns to be processed and it completed the task with necessary information. It demonstrated that in Colors columns it has not corrected any value by itself but in Shade value there is a suggestion it has. We can train the DQS to correct values but let us keep that subject for future blog posts. Now click next and keep the domain Colors selected left side. It will demonstrate that there are two incorrect columns which it needs to be corrected. Here is the place where once corrected value will be auto-corrected in future. I manually corrected the value here and clicked on Approve radio buttons. As soon as I click on Approve buttons the rows will be disappeared from this tab and will move to Corrected Tab. If I had rejected tab it would have moved the rows to Invalid tab as well. In this screen you can see how the corrected 2 rows are demonstrated. You can click on Correct tab and see previously validated 6 rows which passed the DQS process. Now let us click on the Shade domain on the left side of the screen. This domain shows very interesting details as there DQS system guessed the correct answer as Dark with the confidence level of 77%. It is quite a high confidence level and manual observation also demonstrate that Dark is the correct answer. I clicked on Approve and the row moved to corrected tab. On the next screen DQS shows the summary of all the activities. It also demonstrates how the correction of the quality of the data was performed. The user can explore their data to a SQL Server Table, CSV file or Excel. The user also has an option to either explore data and all the associated cleansing info or data only. I will select Data only for demonstration purpose. Clicking explore will generate the files. Let us open the generated file. It will look as following and it looks pretty complete and corrected. Well, we have successfully completed DQS Process. The process is indeed very easy. I suggest you try this out yourself and you will find it very easy to learn. In future we will go over advanced concepts. Are you using this feature on your production server? If yes, would you please leave a comment with your environment and business need. It will be indeed interesting to see where it is implemented. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Business Intelligence, Data Warehousing, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Data Quality Services, DQS

    Read the article

  • Consolidate Data in Private Clouds, But Consider Security and Regulatory Issues

    - by Troy Kitch
    The January 13 webcast Security and Compliance for Private Cloud Consolidation will provide attendees with an overview of private cloud computing based on Oracle's Maximum Availability Architecture and how security and regulatory compliance affects implementations. Many organizations are taking advantage of Oracle's Maximum Availability Architecture to drive down the cost of IT by deploying private cloud computing environments that can support downtime and utilization spikes without idle redundancy. With two-thirds of sensitive and regulated data in organizations' databases private cloud database consolidation means organizations must be more concerned than ever about protecting their information and addressing new regulatory challenges. Join us for this webcast to learn about greater risks and increased threats to private cloud data and how Oracle Database Security Solutions can assist in securely consolidating data and meet compliance requirements. Register Now.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >