Search Results

Search found 324 results on 13 pages for 'mining'.

Page 3/13 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Use Access 2007 to Get Started in Data Mining

Learn how you can use Microsoft Access 2007 as a basic data mining tool for exploring your valuable data. This article illustrates how data filters, pivot graphs, queries in graphs and filters in reports can help this cause.

Read the article
Data Mining Introduction

Many people that work for years with SQL Server never use the Data Mining. This article has the objective to introduce them to this magic and exciting new world. 24% of devs don’t use database source control – make sure you aren’t one of themVersion control is standard for application code, but databases haven’t caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out…

Read the article
Introduction to the SQL Server Analysis Services Neural Network Data Mining Algorithm

In data mining and machine learning circles, the neural network is one of the most difficult algorithms to explain. Fortunately, SQL Server Analysis Services allows for a simple implementation of the algorithm for data analytics. Dallas Snider explains 24% of devs don’t use database source control – make sure you aren’t one of themVersion control is standard for application code, but databases haven’t caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out…

Read the article
Information Extraction Toolkits

- by MathGladiator

I'm looking for information extraction libraries where I can have semi structured information that may have either hidden or incomplete data. I want to train some classifiers to pull out content based on the structure. I'm working on building a tool where I can select text in the browser, and it will generate (via some web service call) a classifier that can be used on other documents to pull out text. I'm primarily looking at how the structure of the document can be used to indicate what the content is.

Read the article
Indexing and Searching Over Word Level Annotation Layers in Lucene

- by dmcer

I have a data set with multiple layers of annotation over the underlying text, such as part-of-tags, chunks from a shallow parser, name entities, and others from various natural language processing (NLP) tools. For a sentence like The man went to the store, the annotations might look like: Word POS Chunk NER ==== === ===== ======== The DT NP Person man NN NP Person went VBD VP - to TO PP - the DT NP Location store NN NP Location I'd like to index a bunch of documents with annotations like these using Lucene and then perform searches across the different layers. An example of a simple query would be to retrieve all documents where Washington is tagged as a person. While I'm not absolutely committed to the notation, syntactically end-users might enter the query as follows: Query: Word=Washington,NER=Person I'd also like to do more complex queries involving the sequential order of annotations across different layers, e.g. find all the documents where there's a word tagged person followed by the words arrived at followed by a word tagged location. Such a query might look like: Query: "NER=Person Word=arrived Word=at NER=Location" What's a good way to go about approaching this with Lucene? Is there anyway to index and search over document fields that contain structured tokens?

Read the article
choose the best class if 2 class have same P (c|d), naive bayes

- by ryandi

Hello I have some question about naive bayes classifier . In my project I have to classify a text into a class from 4 available class. In naive bayes we have formula like cmap=argmax.P(d|c).P(c) I have standarize the amount of training document of each class, so I got a same P(c) value for each class (0.25). Here's my question: What if a testing document token doesn't have any token which belong to any of those 4 class(in document training)? Resulted to all of the class have same value of P(d|c).P(c). Which class should i pick? What if the token exist, and 2 class or more have same value of P(d|c).P(c) what should I do? Thank you..

Read the article
Using The Data Mining Query Task in SSIS

SQL Server Integration Services (SSIS) is a Business Intelligence tool which can be used by database developers or administrators to perform Extract, Transform & Load (ETL) operations. In my previous article Using Analysis Services Processing Task & Analysis Services ... [Read Full Article]

Read the article
Data Mining Email with Thunderbird

- by user554629

Oracle has many formal, searchable locations: Service Requests, BugIDs, Technical Documents. These contain the results of an investigation for a customer crash situation; they're created after the intense work of resolution is over, and typically contain the "root cause" of the failure ... but not the methods for identifying that cause. Email is still the standby for interacting with quickly formed groups of specialists, focusing on a particular incident.Customer BI, Network and System specialists; Oracle Tech Support, Development, Consultants; OEM Database, OS technical support. It is a chaotic, time-oriented set of configuration, call stacks, changes, techniques to discover and repair the failure. I needed to organize that information into something cohesive to prepare the blog entry on Teradata. My corporate email client of choice is Thunderbird. My original (flawed) search technique: R-Click on Inbox in Thunderbird left pane, and choose Search Messages Subject: [ teradata ] Results: A new window titled "Search Messages"Single pane of selected messagesColumn headings: Subject From Date LocationNo preview window for messages There are 673 email entries in the result ( too many ) R-click icon just above the vertical scroll bar on the rightCheck [x] Tags Click on the Tags header to sort by "Important" View contents of message by double-clickingOpens in the Thunderbird Main Window in a new Tab Not what I was looking for, close the tab and try again. There has to be a better way. ( and there is ) I need to be more productive, eliminating duplicate-chained messages, for example. Even the Tag "Important" that was added during the investigation phase, is "not so much" for my current task. In the "Search Messages" window, click [ Save as Search Folder ] [ teradata ] Appears as a new folder in my Inbox. Focus on that folder and the results appear with a list of messages like every other folder in the Inbox.Only the results of the search are shown A preview window is now available for each message Sort, Select message, Cursor Down ... navigates quickly through the messages. But wait, there's more ... Click Find ( Ctrl-F) Enter a search term for the message body, like.[ LIBPATH ] The search is "sticky" ... each message you cycle through wil focus ( and highlight) the LIBPATH search term. And still more .... Reset the Tag"Important" message. Press "1" and the tag is removed Press "4" and a new Tag "ToDo" is applied After applying all of the tags, sort by Tag for a new message order Adjust the search criteria ... R-click on the [ teradata ] search folder, and choose Properties Add additional criteria to narrow the search Some of the information I'm looking for did not contain "teradata" in the subject line. + Body [ contains ] [ Best Practices ] That's it. Much more efficient search. Thank you Thunderbird.

Read the article
Fraud Detection with the SQL Server Suite Part 2

- by Dejan Sarka

This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

Read the article
New Communications Industry Data Model with "Factory Installed" Predictive Analytics using Oracle Da

- by charlie.berger

Oracle Introduces Oracle Communications Data Model to Provide Actionable Insight for Communications Service Providers We've integrated pre-installed analytical methodologies with the new Oracle Communications Data Model to deliver automated, simple, yet powerful predictive analytics solutions for customers. Churn, sentiment analysis, identifying customer segments - all things that can be anticipated and hence, preconcieved and implemented inside an applications. Read on for more information! TM Forum Management World, Nice, France - 18 May 2010 News Facts To help communications service providers (CSPs) manage and analyze rapidly growing data volumes cost effectively, Oracle today introduced the Oracle Communications Data Model. With the Oracle Communications Data Model, CSPs can achieve rapid time to value by quickly implementing a standards-based enterprise data warehouse that features communications industry-specific reporting, analytics and data mining. The combination of the Oracle Communications Data Model, Oracle Exadata and the Oracle Business Intelligence (BI) Foundation represents the most comprehensive data warehouse and BI solution for the communications industry. Also announced today, Hong Kong Broadband Network enhanced their data warehouse system, going live on Oracle Communications Data Model in three months. The leading provider increased its subscriber base by 37 percent in six months and reduced customer churn to less than one percent. Product Details Oracle Communications Data Model provides industry-specific schema and embedded analytics that address key areas such as customer management, marketing segmentation, product development and network health. CSPs can efficiently capture and monitor critical data and transform it into actionable information to support development and delivery of next-generation services using: More than 1,300 industry-specific measurements and key performance indicators (KPIs) such as network reliability statistics, provisioning metrics and customer churn propensity. Embedded OLAP cubes for extremely fast dimensional analysis of business information. Embedded data mining models for sophisticated trending and predictive analysis. Support for multiple lines of business, such as cable, mobile, wireline and Internet, which can be easily extended to support future requirements. With Oracle Communications Data Model, CSPs can jump start the implementation of a communications data warehouse in line with communications-industry standards including the TM Forum Information Framework (SID), formerly known as the Shared Information Model. Oracle Communications Data Model is optimized for any Oracle Database 11g platform, including Oracle Exadata, which can improve call data record query performance by 10x or more. Supporting Quotes "Oracle Communications Data Model covers a wide range of business areas that are relevant to modern communications service providers and is a comprehensive solution - with its data model and pre-packaged templates including BI dashboards, KPIs, OLAP cubes and mining models. It helps us save a great deal of time in building and implementing a customized data warehouse and enables us to leverage the advanced analytics quickly and more effectively," said Yasuki Hayashi, executive manager, NTT Comware Corporation. "Data volumes will only continue to grow as communications service providers expand next-generation networks, deploy new services and adopt new business models. They will increasingly need efficient, reliable data warehouses to capture key insights on data such as customer value, network value and churn probability. With the Oracle Communications Data Model, Oracle has demonstrated its commitment to meeting these needs by delivering data warehouse tools designed to fill communications industry-specific needs," said Elisabeth Rainge, program director, Network Software, IDC. "The TM Forum Conformance Mark provides reassurance to customers seeking standards-based, and therefore, cost-effective and flexible solutions. TM Forum is extremely pleased to work with Oracle to certify its Oracle Communications Data Model solution. Upon successful completion, this certification will represent the broadest and most complete implementation of the TM Forum Information Framework to date, with more than 130 aggregate business entities," said Keith Willetts, chairman and chief executive officer, TM Forum. Supporting Resources Oracle Communications Oracle Communications Data Model Data Sheet Oracle Communications Data Model Podcast Oracle Data Warehousing Oracle Communications on YouTube Oracle Communications on Delicious Oracle Communications on Facebook Oracle Communications on Twitter Oracle Communications on LinkedIn Oracle Database on Twitter The Data Warehouse Insider Blog

Read the article
Data Mining open source tools

- by Andriyev

Hi I'm due to take up a project which is into data mining. Before I jump in I wanted to probe around for different data mining tools (preferably open source) which allows web based reporting. In my scenario the all the data would be provided to me, so I'm not supposed to crawl for it. In n nutshell, am looking for a tool which does - Data Analysis, Web based Reporting, provides some kind of a dashboard and mining features. I have worked on the Microsoft Analysis Services and BOXI and off late I have been looking at Pentaho, which seems to be a good option. Please share your experiences on any such tool which you know of. cheers

Read the article
Social Analytics in your current data

- by Dan McGrath

By now everyone is aware of the massive boom in social-networking (Twitter, Facebook, LinkedIn) and obviously a big part of its business model revolves around being able to mine this data to create information that can be used to make money for someone. Gartner has identified 'Social Analytics' as one of the top 10 strategic technologies for 2011. Has anyone looked at their existing data structures to determine if they could extract a social graph and then perform further data mining against this? How does it fit in with your other strategic development strategies? What information are you trying to extract from the data? Take for example, a bank. They could conceivably determine a social graph through account relationships and transactions. Obviously there would be open edges on the graph where funds enter/leave the institute, but that shouldn't detract from the usefulness of the data. I'm looking for actual examples with the answers, as well as why/how they did it. References to other sites will be greatly appreciated. Note: I'm not at all referring to mining data out of actual social networks.

Read the article
CG miner "configure: error: No mining configured in "

- by Jorma

Nvidia Gt 630 cuda 5.5 running CGminer not. Cuda examples fine. Should CGminer work or is there limitations to it? sudo ./autogen.sh --disable-cpumining --enable-opencl && make Configuration Options Summary: libcurl(GBT+getwork).: Enabled: -lcurl curses.TUI...........: FOUND: -lncurses Avalon.ASICs.........: Disabled BlackArrow.ASICs.....: Disabled BFL.ASICs............: Disabled BitForce.FPGAs.......: Disabled BitFury.ASICs........: Disabled Hashfast.ASICs.......: Disabled Icarus.ASICs/FPGAs...: Disabled Klondike.ASICs.......: Disabled KnC.ASICs............: Disabled ModMiner.FPGAs.......: Disabled configure: error: No mining configured in

Read the article
Why is GPU used for mining bitcoins?

- by starcorn

Something that I have not really grasped is the idea of bitcoins. Especially since everybody can mine for it using a powerful GPU. I wonder why is GPU used for this purpose? Is the work done by GPU used by some huge organization or is it just wasted resource that goes into simulated mining? I mean for example SETI uses your GPU for the purpose of finding aliens, but what I can see of bitmining it seems for no actual purpose than wasted resource.

Read the article
'Similarity' in Data Mining

- by Shailesh Tainwala

In the field of Data Mining, is there a specific sub-discipline called 'Similarity'? If yes, what does it deal with. Any examples, links, references will be helpful. Also, being new to the field, I would like the community opinion on how closely related Data Mining and Artificial Intelligence are. Are they synonyms, is one the subset of the other? Thanks in advance for sharing your knowledge.

Read the article
Going For Gold: AngloGold Ashanti and Oracle Spatial 11g

- by stephen.garth

Save The Date: May 6 at 11:00am Pacific time Attend this free Directions Media live webinar to find out how AngloGold Ashanti is using Oracle Database 11g with a unique geospatial infrastructure based on Oracle Spatial 11g to support worldwide gold exploration and mining operations. Terence Harbort, Exploration Systems Architect at AngloGold Ashanti, will discuss how the company is addressing challenges including management of large volumes of highly varied mapping and image data, 3D visualization, and geospatial analysis. Viewers can paricipate in a live Q&A session at the end of the webinar. Date: May 6, 2010 Time: 11:00am PDT Register here

Read the article
Oracle Database Third Party Model Import and Scoring - Poll

- by [email protected]

We in the Oracle Data Mining Technologies group are interested in your needs for third party model import. Are you interested in importing SAS, SPSS, R, etc. analytic models into Oracle for in-database scoring? Thank you for taking a minute to participate in this quick poll.Click here to take the poll.

Read the article
Oracle Technológia Fórum rendezvény, 2010. május 5. szerda

- by Fekete Zoltán

Jövo hét szerdán Oracle Technology Fórum napot tartunk, ahol az adatbázis-kezelési és a fejlesztoi szekciókban hallgathatók meg eloadások illetve kaphatók válaszok a kérdésekre. Jelentkezés a rendezvényre. Az adatbázis szekcióban fogok beszélni a Sun Oracle Database Machine / Exadata megoldások technikai gyöngyszemeirol mind a tranzakciós (OLTP) mind az adattárházas (DW) és adatbázis konszolidáció oldaláról. Emellett kiemelem majd az Oracle Data Mining (adatbányászat) és OLAP újdonságait, érdekességeit. Megemlítem majd az Oracle's Data Warehouse Reference Architecture alkalmazási lehetoségeit is.

Read the article
July SQL Server UG Event in Manchester

I will be speaking at the SQL Server UK User Group event in Manchester on 16.07.2009. I am going to be talking about data mining again and how it isn’t all statistics and people with PhDs from Oxford. Come join me and the excellent Chris Testa-O’Neill. More details and registration can be found here

Read the article
July SQL Server UG Event in Manchester

I will be speaking at the SQL Server UK User Group event in Manchester on 16.07.2009. I am going to be talking about data mining again and how it isn’t all statistics and people with PhDs from Oxford. Come join me and the excellent Chris Testa-O’Neill. More details and registration can be found here

Read the article
Explaining the State Transitions Viewer in Sequence Clustering

Her is an article I wrote for MSDN that helps explains the excellent viewer we get for Sequence Clustering models in SQL Server Data Mining. I show you how the numbers you see are derived and also give an explanation as to what the icons you see in the viewer mean. Link to the article

Read the article
MSDN Article for the GetClusterCharacteristics Stored Procedure

Here is an article I wrote for MSDN that introduces us to the GetClusterCharacteristics stored procedure in SQL Server Data Mining. It gives us an insight into how the sequences within clusters are derived when using the Sequence Clustering algorithm. Link to article

Read the article
are there any useful datasets available on the web for data mining?

- by niko

Hi, Does anyone know any good resource where example (real) data can be downloaded for experimenting statistics and machine learning techniques such as decision trees etc? Currently I am studying machine learning techniques and it would be very helpful to have real data for evaluating the accuracy of various tools. If anyone knows any good resource (perhaps csv, xls files or any other format) I would be very thankful for a suggestion.

Read the article
Manchester UG Presentation Video

In July I was invited to speak at the UK SQL Server UG event in Manchester. I spoke about Excel being a good data mining client. I was a little rushed at the end as Chris Testa-ONeill told me I had only 5 minutes to go when I had only been talking for 10 minutes. Apparently I have a reputation for running over my time allocation. At the event we also had a product demo from SQL Sentry around their BI monitoring dashboard solution. This includes SSIS but the main thrust was SSAS Then came Chris with a look at Analysis Services. If you have never heard Chris talk then take the opportunity now, he is a top class presenter and I am often found sat at the back of his classes. Here is the video link

Read the article
Kicking yourself because you missed the Oracle OpenWorld and Oracle Develop Call for Papers?

- by charlie.berger

Here's a great opportunity!If you missed the Oracle OpenWorld and Oracle Develop Call for Papers, here is another opportunity to submit a paper to present. Submit a paper and ask your colleagues, Oracle Mix community, friends and anyone else you know to vote for your session. As applications of data mining and predictive analytics are always interesting, your chances of getting accepted by votes is higher. Note, only Oracle Mix members are allowed to vote. Voting is open from the end of May through June 20. For the most part, the top voted sessions will be selected for the program (although we may choose sessions in order to balance the content across the program). Please note that Oracle reserves the right to decline sessions that are not appropriate for the conference, such as subjects that are competitive in nature or sessions that cover outdated versions of products. Oracle OpenWorld and Oracle DevelopSuggest-a-Sessionhttps://mix.oracle.com/oow10/proposals FAQhttps://mix.oracle.com/oow10/faq

Read the article

Search Results

Search found 324 results on 13 pages for 'mining'.

Page 3/13 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by MathGladiator

- by dmcer

- by ryandi

- by user554629

- by Dejan Sarka

- by charlie.berger

- by Andriyev

- by Dan McGrath

- by Jorma

- by starcorn

- by Shailesh Tainwala

- by stephen.garth

- by [email protected]

- by Fekete Zoltán

- by niko

- by charlie.berger

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >