Search Results

Search found 59554 results on 2383 pages for 'distributed data mining'.

Page 4/2383 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Distributed transactions

- by javi

Hello! I've a question regarding distributed transactions. Let's assume I have 3 transaction programs: Transaction A begin a=read(A) b=read(B) c=a+b write(C,c) commit Transaction B begin a=read(A) a=a+1 write(A,a) commit Transaction C begin c=read(C) c=c*2 write(A,c) commit So there are 5 pairs of critical operations: C2-A5, A2-B4, B4-C4, B2-C4, A2-C4. I should ensure integrity and confidentiality, do you have any idea of how to achieve it? Thank you in advance!

Read the article
How to use data mining principles in this project?

- by Simon

I'm getting a Data Mining class this semester and we are free for the final project. For a few months I'm working on procedural planets rendering (something like this http://www.youtube.com/watch?v=rL8zDgTlXso). Do you have any idea of which data mining principles I could use to keep working this project ? Maybe I could try to generate interesting terrains from a set of real maps ? Any publications on that subject ? Any other ideas ?

Read the article
Abstract Data Type and Data Structure

- by mark075

It's quite difficult for me to understand these terms. I searched on google and read a little on Wikipedia but I'm still not sure. I've determined so far that: Abstract Data Type is a definition of new type, describes its properties and operations. Data Structure is an implementation of ADT. Many ADT can be implemented as the same Data Structure. If I think right, array as ADT means a collection of elements and as Data Structure, how it's stored in a memory. Stack is ADT with push, pop operations, but can we say about stack data structure if I mean I used stack implemented as an array in my algorithm? And why heap isn't ADT? It can be implemented as tree or an array.

Read the article
Advanced Analytics Oracle Data Mining - NEW 2-Day Training Course

- by Mike.Hallett(at)Oracle-BI&EPM

A NEW 2-Day Oracle University (OU) Instructor Led Course on Oracle Data Mining has been developed for partners and customers to learn more about data mining, predictive analytics and knowledge discovery inside the Oracle Database. Oracle Data Mining, provides data mining algorithms that run native for high performance in-database model building and model deployment. This OU course is a great way to learn the advantages and benefits of "big data analytics"; mining data, building and deploying "predictive analytics" all inside the Oracle Database and to work with OBI. To register for a class, click here, then click on View Schedule to see the latest scheduled classes and/or submit your information expressing interest in attending a class.

Read the article
First Look - Oracle Data Mining

- by kimberly.billings

In his blog, JT on EDM, James Taylor shares his analysis of Oracle Data Mining, including its new GUI and Exadata integration. While Oracle Data Mining has been available for a while, it is now easier to access and try via the Amazon Cloud. Using the Oracle 11gR2 Data Mining Amazon Machine Image (AMI), you can launch an Oracle Data Mining-enabled instance directly through Amazon Web Services (AWS) and connect to it using the Oracle Data Miner graphical user interface. The new Oracle Data Mining GUI, which will be available to beta customers soon, provides more graphics, the ability to define, save and share analytical "work flows" to solve business problems, and provides more automation and simplicity. Taylor comments that, "the UI looks to have a nice look and feel including graphical model development flows, easy access to the data, nice little micro graphs when browsing data records and more." On using Oracle Data Mining with Exadata, Taylor writes, "Oracle says that the use of the ODM routines in the Exadata kernel is faster than running a native ODM model in the database by a factor of 2 and that this increases as more joins are used. This could mean that ODM outperforms even third party in-database analytics." Taylor concludes his blog with a positive overall review, stating that "ODM is a nice product for Oracle database customers and well worth looking into. The new UI will only make it more so." Read the blog. var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); try { var pageTracker = _gat._getTracker("UA-13185312-1"); pageTracker._trackPageview(); } catch(err) {}

Read the article
Big Data – Various Learning Resources – How to Start with Big Data? – Day 20 of 21

- by Pinal Dave

In yesterday’s blog post we learned how to become a Data Scientist for Big Data. In this article we will go over various learning resources related to Big Data. In this series we have covered many of the most essential details about Big Data. At the beginning of this series, I have encouraged readers to send me questions. One of the most popular questions is - “I want to learn more about Big Data. Where can I learn it?” This is indeed a great question as there are plenty of resources out to learn about Big Data and it is indeed difficult to select on one resource to learn Big Data. Hence I decided to write here a few of the very important resources which are related to Big Data. Learn from Pluralsight Pluralsight is a global leader in high-quality online training for hardcore developers. It has fantastic Big Data Courses and I started to learn about Big Data with the help of Pluralsight. Here are few of the courses which are directly related to Big Data. Big Data: The Big Picture Big Data Analytics with Tableau NoSQL: The Big Picture Understanding NoSQL Data Analysis Fundamentals with Tableau I encourage all of you start with this video course as they are fantastic fundamentals to learn Big Data. Learn from Apache Resources at Apache are single point the most authentic learning resources. If you want to learn fundamentals and go deep about every aspect of the Big Data, I believe you must understand various concepts in Apache’s library. I am pretty impressed with the documentation and I am personally referencing it every single day when I work with Big Data. I strongly encourage all of you to bookmark following all the links for authentic big data learning. Haddop - The Apache Hadoop® project develops open-source software for reliable, scalable, distributed computing. Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which include support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heat maps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper: A high-performance coordination service for distributed applications. Learn from Vendors One of the biggest issues with about learning Big Data is setting up the environment. Every Big Data vendor has different environment request and there are lots of things require to set up Big Data framework. Many of the users do not start with Big Data as they are afraid about the resources required to set up framework as well as a time commitment. Here Hortonworks have created fantastic learning environment. They have created Sandbox with everything one person needs to learn Big Data and also have provided excellent tutoring along with it. Sandbox comes with a dozen hands-on tutorial that will guide you through the basics of Hadoop as well it contains the Hortonworks Data Platform. I think Hortonworks did a fantastic job building this Sandbox and Tutorial. Though there are plenty of different Big Data Vendors I have decided to list only Hortonworks due to their unique setup. Please leave a comment if there are any other such platform to learn Big Data. I will include them over here as well. Learn from Books There are indeed few good books out there which one can refer to learn Big Data. Here are few good books which I have read. I will update the list as I will learn more. Ethics of Big Data Balancing Risk and Innovation Big Data for Dummies Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions If you search on Amazon there are millions of the books but I think above three books are a great set of books and it will give you great ideas about Big Data. Once you go through above books, you will have a clear idea about what is the next step you should follow in this series. You will be capable enough to make the right decision for yourself. Tomorrow In tomorrow’s blog post we will wrap up this series of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Data Storage Options

- by Kenneth

When I was working as a website designer/engineer I primarily used databases for storage of much of my dynamic data. It was very easy and convenient to use this method and seemed like a standard practice from my research on the matter. I'm now working on shifting away from websites and into desktop applications. What are the best practices for data storage for desktop applications? I ask because I have noticed that most programs I use on a personal level don't appear to use a database for data storage unless its embedded in the program. (I'm not thinking of an application like a word processor where it makes sense to have data stored in individual files as defined by the user. Rather I'm thinking of something more along the lines of a calendar application which would need to store dates and event info and such where accessing that information would be much easier if stored in a database... at least as far as my experience would indicate.) Thanks for the input!

Read the article
What is a Data Warehouse?

Typically Data Warehouses are considered to be non-volatile in comparison to traditional databasesdue to the fact that data within the warehouse does not change that often. In addition, Data Warehouses typically represent data through the use of Multidimensional Conceptual Views that allow data to be extracted based on the view and the current position within the view. Common Data Warehouse Traits Relatively Non-volatile Data Supports Data Extraction and Analysis Optimized for Data Retrieval and Analysis Multidimensional Views of Data Flexible Reporting Multi User Support Generic Dimensionality Transparent Accessible Unlimited Dimensions of Data Unlimited Aggregation levels of Data Normally, Data Warehouses are much larger then there traditional database counterparts due to the fact that they store the basis data along with derived data via Multidimensional Conceptual Views. As companies store larger and larger amounts of data, they will need a way to effectively and accurately extract analysis information that can be used to aide in formulating current and future business decisions. This process can be done currently through data mining within a Data Warehouse. Data Warehouses provide access to data derived through complex analysis, knowledge discovery and decision making. Secondly, they support the demands for high performance in regards to analyzing an organization’s existing and current data. Data Warehouses provide support for an organization’s data and acquired business knowledge. Within a Data Warehouse multiple types of operations/sub systems are supported. Common Data Warehouse Sub Systems Online Analytical Processing (OLAP) Decision –Support Systems (DSS) Online Transaction Processing (OLTP)

Read the article
Interconnect nodes in a Java distributed infrastructure for tweet processing

- by David Moreno García

I'm working in a new version of an old project that I used to download and process user statuses from Twitter. The main problem of that project was its infrastructure. I used multiple instances of a java application (trackers) to download from Twitter given an specific task (basically terms to search for), connected with a central node (a web application) that had to process all tweets once per day and generate a new task for each trackers once each 15 minutes. The central node also had to monitor all trackers and enable/disable them under user petition. This, as I said, was too slow because I had multiple bottlenecks, so in this new version I want to improve the infrastructure and isolate all functionalities in specific nodes. I also need a good notification system to receive notifications for any node. So, in the next diagram I show the components that I'll need in this new version: As you can see, there are more nodes. Here are some notes about them: Dashboard: Controls trackers statuses and send a single task to each of them (under user request). The trackers will use this task until replaced with a new one (if done, not each 15 minutes like before). Search engine: I need to store all the tweets. They are firstly stored in a local database for each tracker but after that I'm thinking on using something like Elasticsearch to be able to do fast searches. Tweet processor: Just and isolated component with its own database (maybe something like the search engine to have fast access to info generated by the module). In the future more could be added. Application UI: A web application with a shared database with the Dashboard (mainly to store users information and preferences). Indeed, both could be merged into a single web. The main difference with the previous version of the project is that now they will be isolated and they will only show information and send requests. I will not do any heavy task in them (like process tweets as I did before). So, having this components, my main headache is how to structure all to not have to rewrite a lot of code every time I need to access any new data. Another headache is how can I interconnect nodes. I could use sockets but that is a pain in the ass. Maybe a REST layer? And finally, if all the nodes are isolated, how could I generate notifications for each user which info is only in the database used by the Application UI? I'm programming this using Java and Spring (at least I used them in the last version) but I have no problems with changing the language if I can take advantage of a tool/library/engine to make my life easier and have a better platform. Any comment will be appreciated.

Read the article
Does data mining qualify as an abuse?

- by Hybryd

Hi all, today I had a strange experience with my ISP. They disabled my password for internet connection, and when I called them, they enabled it again, but they didn't say why it happened. In the last couple of days I was running a data mining that I made for one forum to get some useful info about business that I'm in. So I thought, maybe my ISP figured that 10,000 page requests in couple of hours to the same site may be some kind of attack. What do you think, does it qualify as an attack? Is it even ok to data mine in that way?

Read the article
How Do You Actually Model Data?

Since the 1970’s Developers, Analysts and DBAs have been able to represent concepts and relations in the form of data through the use of generic symbols. But what is data modeling? The first time I actually heard this term I could not understand why anyone would want to display a computer on a fashion show runway. Hey, what do you expect? At that time I was a freshman in community college, and obviously this was a long time ago. I have since had the chance to learn what data modeling truly is through using it. Data modeling is a process of breaking down information and/or requirements in to common categories called objects. Once objects start being defined then relationships start to form based on dependencies found amongst other existing objects. Currently, there are several tools on the market that help data designer actually map out objects and their relationships through the use of symbols and lines. These diagrams allow for designs to be review from several perspectives so that designers can ensure that they have the optimal data design for their project and that the design is flexible enough to allow for potential changes and/or extension in the future. Additionally these basic models can always be further refined to show different levels of details depending on the target audience through the use of three different types of models. Conceptual Data Model(CDM)Conceptual Data Models include all key entities and relationships giving a viewer a high level understanding of attributes. Conceptual data model are created by gathering and analyzing information from various sources pertaining to a project during the typical planning phase of a project. Logical Data Model (LDM)Logical Data Models are conceptual data models that have been expanded to include implementation details pertaining to the data that it will store. Additionally, this model typically represents an origination’s business requirements and business rules by defining various attribute data types and relationships regarding each entity. This additional information can be directly translated to the Physical Data Model which reduces the actual time need to implement it. Physical Data Model(PDMs)Physical Data Model are transformed Logical Data Models that include the necessary tables, columns, relationships, database properties for the creation of a database. This model also allows for considerations regarding performance, indexing and denormalization that are applied through database rules, data integrity. Further expanding on why we actually use models in modern application/database development can be seen in the benefits that data modeling provides for data modelers and projects themselves, Benefits of Data Modeling according to Applied Information Science Abstraction that allows data designers remove concepts and ideas form hard facts in the form of data. This gives the data designers the ability to express general concepts and/or ideas in a generic form through the use of symbols to represent data items and the relationships between the items. Transparency through the use of data models allows complex ideas to be translated in to simple symbols so that the concept can be understood by all viewpoints and limits the amount of confusion and misunderstanding. Effectiveness in regards to tuning a model for acceptable performance while maintaining affordable operational costs. In addition it allows systems to be built on a solid foundation in terms of data. I shudder at the thought of a world without data modeling, think about it? Data is everywhere in our lives. Data modeling allows for optimizing a design for performance and the reduction of duplication. If one was to design a database without data modeling then I would think that the first things to get impacted would be database performance due to poorly designed database and there would be greater chances of unnecessary data duplication that would also play in to the excessive query times because unneeded records would need to be processed. You could say that a data designer designing a database is like a box of chocolates. You will never know what kind of database you will get until after it is built.

Read the article
Optimistic work sharing on sparsely distributed systems

- by Asti

What would a system like BOINC look like if it were written today? At the time BOINC was written, databases were the primary choice for maintaining a shared state and concurrency among nodes. Since then, many approaches have been developed for tasking with optimistic concurrency (OT, partial synchronization primitives, shared iterators etc.) Is there an optimal paradigm for optimistically distributing units of work on sparsely distributing systems which communicate through message passing? Sorry if this is a bit vague. P.S. The concept of Tuple-spaces is great, but locking is inherent to its definition. Edit: I already have a federation system which works very well. I have a reactive OT system is implemented on top of it. I'm looking to extend it to get clients to do units of work.

Read the article
Data Structures usage and motivational aspects

- by Aubergine

For long student life I was always wondering why there are so many of them yet there seems to be lack of usage at all in many of them. The opinion didn't really change when I got a job. We have brilliant books on what they are and their complexities, but I never encounter resources which would actually give a good hint of practical usage. I perfectly understand that I have to look at problem , analyse required operations, look for data structure that does them efficiently. However in practice I never do that, not because of human laziness syndrome, but because when it comes to work I acknowledge time priority over self-development. Over time I thought that when I would be better developer I will automatically use more of them - that didn't happen at all or maybe I just didn't. Then I found that the colleagues usually in the same plate as me - knowing more or less some three of data structures and being totally happy about it and refusing to discuss this matter further with me, coming back to conversations about 'cool new languages' 'libraries that do jobs for you' and the joy to work under scrumban etc. I am stuck with ArrayLists, Arrays and SortedMap , which no matter what I do always suffice or either I tweak them to be capable of fulfilling my task. Yes, it might be inefficient but do we really have to care if Intel increases performance over years no matter if we improve our skills? Does new Xeon or IBM machines really care what we use? What if I like build things, but I am not particularly excited whether it is n log(n) or just n? Over twenty years the processing power increased enormously, which gives us freedom of not being critical about which one to use? On top of that new more optimized languages appear which support multiple cores more efficiently. To be more specific: I would like to find motivational material on complex real areas/cases of possible effective usages of data structures. I would be really grateful if you would provide relevant resources. There is similar question ,but in the end the links again mostly describe or do dumb example(vehicles, students or holy grail quest - yes, very relevant) them and people keep referring to the "scenario decides the data structure to use". I want to know these complex scenarios to be able to identify similarities to my scenario and then use them. The complex scenarios where it really matters and not necessarily of quantitive nature. It seems that data structures only concern is efficiency and nothing else? There seems to be no particular convenience for developer in use one over another. (only when I found scientific resources on why exactly simple carbohydrates are evil I stopped eating sugar and candies completely replacing it with less harmful fruits - I hope you can see the analogy)

Read the article
I've totally missed the point of distributed vcs [closed]

- by NimChimpsky

I thought the major benefit of it was that each developers code gets stored within each others repository. My impression was that each developer has their working directory, their own repository, and then a copy of the other developers repository. Removing the need for central server, as you have as many backups as you have developers/repositories Turns out this is nto the case, and your code is only backed up (somewhere other than locally) when you push, the same as a commit in subversions. I am bit disappointed ... hopefully I will be pleasantly surprised when it handles merges better and there are less conflicts ?

Read the article
Unleash AutoVue on Your Unmanaged Data

- by [email protected]

Over the years, I've spoken to hundreds of customers who use AutoVue to collaborate on their "managed" data stored in content management systems, product lifecycle management systems, etc. via our many integrations. Through these conversations I've also learned a harsh reality - we will never fully move away from unmanaged data (desktops, file servers, emails, etc). If you use AutoVue today you already know that even if your primary use is viewing content stored in a content management system, you can still open files stored locally on your computer. But did you know that AutoVue actually has - built-in - a great solution for viewing, printing and redlining your data stored on file servers? Using the 'Server protocol' you can point AutoVue directly to a top-level location on any networked file server and provide your users with a link or shortcut to access an interface similar to the sample page shown below. Many customers link to pages just like this one from their internal company intranets. Through this webpage, users can easily search and browse through file server data with a 'click-and-view' interface to find the specific image, document, drawing or model they're looking for. Any markups created on a document will be accessible to everyone else viewing that document and of course real-time collaboration is supported as well. Customers on maintenance can consult the AutoVue Admin guide or My Oracle Support Doc ID 753018.1 for an introduction to the server protocol. Contact your local AutoVue Solutions Consultant for help setting up the sample shown above.

Read the article
SQL SERVER – Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 – Introduction to DQS

- by pinaldave

Data Quality Services is a very important concept of SQL Server. I have recently started to explore the same and I am really learning some good concepts. Here are two very important blog posts which one should go over before continuing this blog post. Installing Data Quality Services (DQS) on SQL Server 2012 Connecting Error to Data Quality Services (DQS) on SQL Server 2012 This article is introduction to Data Quality Services for beginners. We will be using an Excel file Click on the image to enlarge the it. In the first article we learned to install DQS. In this article we will see how we can learn about building Knowledge Base and using it to help us identify the quality of the data as well help correct the bad quality of the data. Here are the two very important steps we will be learning in this tutorial. Building a New Knowledge Base Creating a New Data Quality Project Let us start the building the Knowledge Base. Click on New Knowledge Base. In our project we will be using the Excel as a knowledge base. Here is the Excel which we will be using. There are two columns. One is Colors and another is Shade. They are independent columns and not related to each other. The point which I am trying to show is that in Column A there are unique data and in Column B there are duplicate records. Clicking on New Knowledge Base will bring up the following screen. Enter the name of the new knowledge base. Clicking NEXT will bring up following screen where it will allow to select the EXCE file and it will also let users select the source column. I have selected Colors and Shade both as a source column. Creating a domain is very important. Here you can create a unique domain or domain which is compositely build from Colors and Shade. As this is the first example, I will create unique domain – for Colors I will create domain Colors and for Shade I will create domain Shade. Here is the screen which will demonstrate how the screen will look after creating domains. Clicking NEXT it will bring you to following screen where you can do the data discovery. Clicking on the START will start the processing of the source data provided. Pre-processed data will show various information related to the source data. In our case it shows that Colors column have unique data whereas Shade have non-unique data and unique data rows are only two. In the next screen you can actually add more rows as well see the frequency of the data as the values are listed unique. Clicking next will publish the knowledge base which is just created. Now the knowledge base is created. We will try to take any random data and attempt to do DQS implementation over it. I am using another excel sheet here for simplicity purpose. In reality you can easily use SQL Server table for the same. Click on New Data Quality Project to see start DQS Project. In the next screen it will ask which knowledge base to use. We will be using our Colors knowledge base which we have recently created. In the Colors knowledge base we had two columns – 1) Colors and 2) Shade. In our case we will be using both of the mappings here. User can select one or multiple column mapping over here. Now the most important phase of the complete project. Click on Start and it will make the cleaning process and shows various results. In our case there were two columns to be processed and it completed the task with necessary information. It demonstrated that in Colors columns it has not corrected any value by itself but in Shade value there is a suggestion it has. We can train the DQS to correct values but let us keep that subject for future blog posts. Now click next and keep the domain Colors selected left side. It will demonstrate that there are two incorrect columns which it needs to be corrected. Here is the place where once corrected value will be auto-corrected in future. I manually corrected the value here and clicked on Approve radio buttons. As soon as I click on Approve buttons the rows will be disappeared from this tab and will move to Corrected Tab. If I had rejected tab it would have moved the rows to Invalid tab as well. In this screen you can see how the corrected 2 rows are demonstrated. You can click on Correct tab and see previously validated 6 rows which passed the DQS process. Now let us click on the Shade domain on the left side of the screen. This domain shows very interesting details as there DQS system guessed the correct answer as Dark with the confidence level of 77%. It is quite a high confidence level and manual observation also demonstrate that Dark is the correct answer. I clicked on Approve and the row moved to corrected tab. On the next screen DQS shows the summary of all the activities. It also demonstrates how the correction of the quality of the data was performed. The user can explore their data to a SQL Server Table, CSV file or Excel. The user also has an option to either explore data and all the associated cleansing info or data only. I will select Data only for demonstration purpose. Clicking explore will generate the files. Let us open the generated file. It will look as following and it looks pretty complete and corrected. Well, we have successfully completed DQS Process. The process is indeed very easy. I suggest you try this out yourself and you will find it very easy to learn. In future we will go over advanced concepts. Are you using this feature on your production server? If yes, would you please leave a comment with your environment and business need. It will be indeed interesting to see where it is implemented. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Business Intelligence, Data Warehousing, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Data Quality Services, DQS

Read the article
Consolidate Data in Private Clouds, But Consider Security and Regulatory Issues

- by Troy Kitch

The January 13 webcast Security and Compliance for Private Cloud Consolidation will provide attendees with an overview of private cloud computing based on Oracle's Maximum Availability Architecture and how security and regulatory compliance affects implementations. Many organizations are taking advantage of Oracle's Maximum Availability Architecture to drive down the cost of IT by deploying private cloud computing environments that can support downtime and utilization spikes without idle redundancy. With two-thirds of sensitive and regulated data in organizations' databases private cloud database consolidation means organizations must be more concerned than ever about protecting their information and addressing new regulatory challenges. Join us for this webcast to learn about greater risks and increased threats to private cloud data and how Oracle Database Security Solutions can assist in securely consolidating data and meet compliance requirements. Register Now.

Read the article
Validating Data Using Data Annotation Attributes in ASP.NET MVC

- by bipinjoshi

The data entered by the end user in various form fields must be validated before it is saved in the database. Developers often use validation HTML helpers provided by ASP.NET MVC to perform the input validations. Additionally, you can also use data annotation attributes from the System.ComponentModel.DataAnnotations namespace to perform validations at the model level. Data annotation attributes are attached to the properties of the model class and enforce some validation criteria. They are capable of performing validation on the server side as well as on the client side. This article discusses the basics of using these attributes in an ASP.NET MVC application.http://www.bipinjoshi.net/articles/0a53f05f-b58c-47b1-a544-f032f5cfca58.aspx

Read the article
Cleaning a dataset of song data - what sort of problem is this?

- by Rob Lourens

I have a set of data about songs. Each entry is a line of text which includes the artist name, song title, and some extra text. Some entries are only "extra text". My goal is to resolve as many of these as possible to songs on Spotify using their web API. My strategy so far has been to search for the entry via the API - if there are no results, apply a transformation such as "remove all text between ( )" and search again. I have a list of heuristics and I've had reasonable success with this but as the code gets more and more convoluted I keep thinking there must be a more generic and consistent way. I don't know where to look - any suggestions for what to try, topics to study, buzzwords to google?

Read the article
Test Data in a Distributed System

- by Davin Tryon

A question that has been vexing me lately has been about how to effectively test (end-to-end) features in a distributed system. Particuarly, how to effectively manage (through time) test data for feature testing. The system in question is a typical SOA setup. The composition is done in JavaScript when call to several REST APIs. Each service is built as an independent block. Each service has some kind of persistent storage (SQL Server in most cases). The main issue at the moment is how to approach test data when testing end-to-end features. Functional end-to-end testing occurs through the UI, and it is therefore necessary for test data to be set up before the test run (this could be manual or automated testing). As is typical in a distributed system, identifiers from one service are used as a link in another service. So, some level of synchronization needs to be present in the data to effectively test. What is the best way to manage and set up this data after a successful deployment to a test environment? For example, is it better to manage this test data inside each service? Or package it together with the testing suite? Does that testing suite exist as a separate project? I'm interested in design guidance about how to store and manage this test data as the application features evolve.

Read the article
Web services, J2EE, Spring, DB integration project ideas - maybe data mining related?

- by saral jain

I am a graduate Computer Science student (Data Mining and Machine Learning) and have good exposure to core Java (3 years). I have read up on a bunch of stuff on the following topics: Design patterns, J2EE Web services (SOAP and REST), Spring, and Hibernate Java Concurrency - advanced features like Task and Executors. I would now like to do a project combining this stuff -- over my free time of course -- to get a better understanding of these things and to kind of make an end to end software (to learn the best design principles etc + SVN, maven). Any good project ideas would be really appreciated. I just want to build this stuff to learn, so I don't really mind re-inventing the wheel. Also, anything related to data mining would be an added bonus as it fits with my research but is absolutely not necessary since this project is more to learn to do large scale software development.

Read the article
Move data from others user accounts in my user account

- by user118136

I had problems with compiz setting and I make multiple accounts, now I want to transfer my information from all deleted users in my current account, some data I can not copy because I am not right to read, I type in terminal "sudo nautilus" and I get the permission for read, but the copied data is available only for superusers and I must charge the permissions for each file and each folder. How I can copy the information with out the superuser rights OR how I can charge the permissions for selected folder and all files and folders included in it?

Read the article
Web services, Java EE, Spring, DB integration project ideas - maybe data mining related?

- by saral jain

I am a graduate Computer Science student (Data Mining and Machine Learning) and have good exposure to core Java (3 years). I have read up on a bunch of stuff on the following topics: Design patterns, Java EE Web services (SOAP and REST), Spring, and Hibernate Java Concurrency - advanced features like Task and Executors. I would now like to do a project combining this stuff -- over my free time of course -- to get a better understanding of these things and to kind of make an end to end software (to learn the best design principles etc + SVN, maven). Any good project ideas would be really appreciated. I just want to build this stuff to learn, so I don't really mind re-inventing the wheel. Also, anything related to data mining would be an added bonus as it fits with my research but is absolutely not necessary since this project is more to learn to do large scale software development.

Read the article
web services, J2EE, spring, DB integration project ideas- maybe data mining related?

- by sj88

Hey guys, I am a graduate CS student (Data mining and machine learning) and have a good exposure to core JAVA (3 years). I have read up a bunch of stuff on Design patterns J2EE Web services( soap and rest) spring and hibernate Java Concurrency - advanced features like Task and Executors. I would now like to do a project combining this stuff (over my free time of corse) to get a better understanding of these things and to kind of make an end to end software (to learn the best design principles etc + svn, maven). Any good project ideas would be really appreciated. I just wanna build this stuff to learn so I dont really mind re-inventing the wheel. Also, anything related to data mining would be an added bonus (fits with my research) but absolutly not necesary (since this project is more to learn to do large scale software developement)

Read the article
Data Mining Software

- by Mark

I want to harvest some data like this http://www.newcardealers.ca/en/Dealers/List-A.aspx And insert the name, address, phone number, email, etc. into a database. Is there some software I can use that will take a webpage, let me specify some regexes or something, and then spit out all the matched data in a CSV or some format easily insertable into a DB?

Read the article

Search Results

Search found 59554 results on 2383 pages for 'distributed data mining'.

Page 4/2383 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by javi

- by Simon

- by mark075

- by Mike.Hallett(at)Oracle-BI&EPM

- by kimberly.billings

- by Pinal Dave

- by Kenneth

- by David Moreno García

- by Hybryd

- by Asti

- by Aubergine

- by NimChimpsky

- by [email protected]

- by pinaldave

- by Troy Kitch

- by bipinjoshi

- by Rob Lourens

- by Davin Tryon

- by saral jain

- by user118136

- by saral jain

- by sj88

- by Mark

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >