classification - Page 3

Visualize Classifier Error Weka

- by user1780592

Hye there i have a have datasets where this data i have test it on weka with J48 classifier It give me an output = 87.2611% Total of instances = 157 Correctly Instances = 137 Incorrectly instance = 20 Then i have do a visualize classifier error on my data. However my result have been decrease to: New result = 85.4015% Correctly Instances = 117 Incorrectly instances = 20 Total of instances = 137 Is there any reason for that? Should my result become much better after i do the visualize classifier error?

Read the article

strange chi-square result using scikit_learn with feature matrix

- by user963386

I am using scikit learn to calculate the basic chi-square statistics(sklearn.feature_selection.chi2(X, y)): def chi_square(feat,target): """ """ from sklearn.feature_selection import chi2 ch,pval = chi2(feat,target) return ch,pval chisq,p = chi_square(feat_mat,target_sc) print(chisq) print("**********************") print(p) I have 1500 samples,45 features,4 classes. The input is a feature matrix with 1500x45 and a target array with 1500 components. The feature matrix is not sparse. When I run the program and I print the arrray "chisq" with 45 components, I can see that the component 13 has a negative value and p = 1. How is it possible? Or what does it mean or what is the big mistake that I am doing? I am attaching the printouts of chisq and p: [ 9.17099260e-01 3.77439701e+00 5.35004211e+01 2.17843312e+03 4.27047184e+04 2.23204883e+01 6.49985540e-01 2.02132664e-01 1.57324454e-03 2.16322638e-01 1.85592258e+00 5.70455805e+00 1.34911126e-02 -1.71834753e+01 1.05112366e+00 3.07383691e-01 5.55694752e-02 7.52801686e-01 9.74807972e-01 9.30619466e-02 4.52669897e-02 1.08348058e-01 9.88146259e-03 2.26292358e-01 5.08579194e-02 4.46232554e-02 1.22740419e-02 6.84545170e-02 6.71339545e-03 1.33252061e-02 1.69296016e-02 3.81318236e-02 4.74945604e-02 1.59313146e-01 9.73037448e-03 9.95771327e-03 6.93777954e-02 3.87738690e-02 1.53693158e-01 9.24603716e-04 1.22473138e-01 2.73347277e-01 1.69060817e-02 1.10868365e-02 8.62029628e+00] ********************** [ 8.21299526e-01 2.86878266e-01 1.43400668e-11 0.00000000e+00 0.00000000e+00 5.59436980e-05 8.84899894e-01 9.77244281e-01 9.99983411e-01 9.74912223e-01 6.02841813e-01 1.26903019e-01 9.99584918e-01 1.00000000e+00 7.88884155e-01 9.58633878e-01 9.96573548e-01 8.60719653e-01 8.07347364e-01 9.92656816e-01 9.97473024e-01 9.90817144e-01 9.99739526e-01 9.73237195e-01 9.96995722e-01 9.97526259e-01 9.99639669e-01 9.95333185e-01 9.99853998e-01 9.99592531e-01 9.99417113e-01 9.98042114e-01 9.97286030e-01 9.83873717e-01 9.99745466e-01 9.99736512e-01 9.95239765e-01 9.97992843e-01 9.84693908e-01 9.99992525e-01 9.89010468e-01 9.64960636e-01 9.99418323e-01 9.99690553e-01 3.47893682e-02]

Read the article

How to engineer features for machine learning

- by Ivo Danihelka

Do you have some advices or reading how to engineer features for a machine learning task? Good input features are important even for a neural network. The chosen features will affect the needed number of hidden neurons and the needed number of training examples. The following is an example problem, but I'm interested in feature engineering in general. A motivation example: What would be a good input when looking at a puzzle (e.g., 15-puzzle or Sokoban)? Would it be possible to recognize which of two states is closer to the goal?

Read the article

Ideas for designing an automated content tagging system needed

- by Benjamin Smith

I am currently designing a website that amongst other is required to display and organise small amounts of text content (mainly quotes, article stubs, etc.). I currently have a database with 250,000+ items and need to come up with a method of tagging each item with relevant tags which will eventually allow for easy searching/browsing of the content for users. A very simplistic idea I have (and one that I believe is employed by some sites that I have been looking to for inspiration (http://www.brainyquote.com/quotes/topics.html)), is to simply search the database for certain words or phrases and use these words as tags for the content. This can easily be extended so that if for example a user wanted to show all items with a theme of love then I would just return a list of items with words and phrases relating to this theme. This would not be hard to implement but does not provide very good results. For example if I were to search for the month 'May' in the database with the aim of then classifying the items returned as realting to the topic of Spring then I would get back all occurrences of the word May, regardless of the semantic meaning. Another shortcoming of this method is that I believe it would be quite hard to automate the process to any large scale. What I really require is a library that can take an item, break it down and analyse the semantic meaning and also return a list of tags that would correctly classify the item. I know this is a lot to ask and I have a feeling I will end up reverting to the aforementioned method but I just thought I should ask if anyone knew of any pre-existing solution. I think that as the items in the database are short then it is probably quite a hard task to analyse any meaning from them however I may be mistaken. Another path to possibly go down would be to use something like amazon turk to outsource the task which may produce good results but would be expensive. Eventually I would like users to be able to (and want to!) tag content and to vote for the most relevant tags, possibly using a gameification mechanic as motivation however this is some way down the line. A temporary fix may be the best thing if this were the route I decided to go down as I could use the rough results I got as the starting point for a more in depth solution. If you've read this far, thanks for sticking with me, I know I'm spitballing but any input would be really helpful. Thanks.

Read the article

Detecting an online poker cheat

- by Tom Gullen

It recently emerged on a large poker site that some players were possibly able to see all opponents cards as they played through exploiting a security vulnerability that was discovered. A naïve cheater would win at an incredibly fast rate, and these cheats are caught very quickly usually, and if not caught quickly they are easy to detect through a quick scan through their hand histories. The more difficult problem occurs when the cheater exhibits intelligence, bluffing in spots they are bound to be called in, calling river bets with the worst hands, the basic premise is that they lose pots on purpose to disguise their ability to see other players cards, and they win at a reasonably realistic rate. Given: A data set of millions of verified and complete information hand histories Theoretical unlimited computer power Assume the game No Limit Hold'em, although suggestions on Omaha or limit poker may be beneficial How could we reasonably accurately classify these cheaters? The original 2+2 thread appeals for ideas, and I thought that the SO community might have some useful suggestions. It's an interesting problem also because it is current, and has real application in bettering the world if someone finds a creative solution, as there is a good chance genuine players will have funds refunded to them when identified cheaters are discovered.

Read the article

choose the best class if 2 class have same P (c|d), naive bayes

- by ryandi

Hello I have some question about naive bayes classifier . In my project I have to classify a text into a class from 4 available class. In naive bayes we have formula like cmap=argmax.P(d|c).P(c) I have standarize the amount of training document of each class, so I got a same P(c) value for each class (0.25). Here's my question: What if a testing document token doesn't have any token which belong to any of those 4 class(in document training)? Resulted to all of the class have same value of P(d|c).P(c). Which class should i pick? What if the token exist, and 2 class or more have same value of P(d|c).P(c) what should I do? Thank you..

Read the article

How does Stack Overflow's subjective question detector work? [closed]

- by csl

When writing a question in Stack Overflow, it will give you a red bar with a warning if the question looks subjective. Subjective or non-programming questions (e.g., What is your faviourite programming language?) seem to be easily recognized. How, exactly, isthis done? Heuristics, algorithms, machine learning? I'm amazed!

Read the article

Where would you start if you were trying to solve this PDF classification problem?

- by burtonic

We are crawling and downloading lots of companies' PDFs and trying to pick out the ones that are Annual Reports. Such reports can be downloaded from most companies' investor-relations pages. The PDFs are scanned and the database is populated with, among other things, the: Title Contents (full text) Page count Word count Orientation First line Using this data we are checking for the obvious phrases such as: Annual report Financial statement Quarterly report Interim report Then recording the frequency of these phrases and others. So far we have around 350,000 PDFs to scan and a training set of 4,000 documents that have been manually classified as either a report or not. We are experimenting with a number of different approaches including Bayesian classifiers and weighting the different factors available. We are building the classifier in Ruby. My question is: if you were thinking about this problem, where would you start?

Read the article

Reference Data Management and Master Data: Are Relation ?

- by Mala Narasimharajan

Submitted By: Rahul Kamath Oracle Data Relationship Management (DRM) has always been extremely powerful as an Enterprise Master Data Management (MDM) solution that can help manage changes to master data in a way that influences enterprise structure, whether it be mastering chart of accounts to enable financial transformation, or revamping organization structures to drive business transformation and operational efficiencies, or restructuring sales territories to enable equitable distribution of leads to sales teams following the acquisition of new products, or adding additional cost centers to enable fine grain control over expenses. Increasingly, DRM is also being utilized by Oracle customers for reference data management, an emerging solution space that deserves some explanation. What is reference data? How does it relate to Master Data? Reference data is a close cousin of master data. While master data is challenged with problems of unique identification, may be more rapidly changing, requires consensus building across stakeholders and lends structure to business transactions, reference data is simpler, more slowly changing, but has semantic content that is used to categorize or group other information assets – including master data – and gives them contextual value. In fact, the creation of a new master data element may require new reference data to be created. For example, when a European company acquires a US business, chances are that they will now need to adapt their product line taxonomy to include a new category to describe the newly acquired US product line. Further, the cross-border transaction will also result in a revised geo hierarchy. The addition of new products represents changes to master data while changes to product categories and geo hierarchy are examples of reference data changes.1 The following table contains an illustrative list of examples of reference data by type. Reference data types may include types and codes, business taxonomies, complex relationships & cross-domain mappings or standards. Types & Codes Taxonomies Relationships / Mappings Standards Transaction Codes Industry Classification Categories and Codes, e.g., North America Industry Classification System (NAICS) Product / Segment; Product / Geo Calendars (e.g., Gregorian, Fiscal, Manufacturing, Retail, ISO8601) Lookup Tables (e.g., Gender, Marital Status, etc.) Product Categories City à State à Postal Codes Currency Codes (e.g., ISO) Status Codes Sales Territories (e.g., Geo, Industry Verticals, Named Accounts, Federal/State/Local/Defense) Customer / Market Segment; Business Unit / Channel Country Codes (e.g., ISO 3166, UN) Role Codes Market Segments Country Codes / Currency Codes / Financial Accounts Date/Time, Time Zones (e.g., ISO 8601) Domain Values Universal Standard Products and Services Classification (UNSPSC), eCl@ss International Classification of Diseases (ICD) e.g., ICD9 à IC10 mappings Tax Rates Why manage reference data? Reference data carries contextual value and meaning and therefore its use can drive business logic that helps execute a business process, create a desired application behavior or provide meaningful segmentation to analyze transaction data. Further, mapping reference data often requires human judgment. Sample Use Cases of Reference Data Management Healthcare: Diagnostic Codes The reference data challenges in the healthcare industry offer a case in point. Part of being HIPAA compliant requires medical practitioners to transition diagnosis codes from ICD-9 to ICD-10, a medical coding scheme used to classify diseases, signs and symptoms, causes, etc. The transition to ICD-10 has a significant impact on business processes, procedures, contracts, and IT systems. Since both code sets ICD-9 and ICD-10 offer diagnosis codes of very different levels of granularity, human judgment is required to map ICD-9 codes to ICD-10. The process requires collaboration and consensus building among stakeholders much in the same way as does master data management. Moreover, to build reports to understand utilization, frequency and quality of diagnoses, medical practitioners may need to “cross-walk” mappings -- either forward to ICD-10 or backwards to ICD-9 depending upon the reporting time horizon. Spend Management: Product, Service & Supplier Codes Similarly, as an enterprise looks to rationalize suppliers and leverage their spend, conforming supplier codes, as well as product and service codes requires supporting multiple classification schemes that may include industry standards (e.g., UNSPSC, eCl@ss) or enterprise taxonomies. Aberdeen Group estimates that 90% of companies rely on spreadsheets and manual reviews to aggregate, classify and analyze spend data, and that data management activities account for 12-15% of the sourcing cycle and consume 30-50% of a commodity manager’s time. Creating a common map across the extended enterprise to rationalize codes across procurement, accounts payable, general ledger, credit card, procurement card (P-card) as well as ACH and bank systems can cut sourcing costs, improve compliance, lower inventory stock, and free up talent to focus on value added tasks. Change Management: Point of Sales Transaction Codes and Product Codes In the specialty finance industry, enterprises are confronted with usury laws – governed at the state and local level – that regulate financial product innovation as it relates to consumer loans, check cashing and pawn lending. To comply, it is important to demonstrate that transactions booked at the point of sale are posted against valid product codes that were on offer at the time of booking the sale. Since new products are being released at a steady stream, it is important to ensure timely and accurate mapping of point-of-sale transaction codes with the appropriate product and GL codes to comply with the changing regulations. Multi-National Companies: Industry Classification Schemes As companies grow and expand across geographies, a typical challenge they encounter with reference data represents reconciling various versions of industry classification schemes in use across nations. While the United States, Mexico and Canada conform to the North American Industry Classification System (NAICS) standard, European Union countries choose different variants of the NACE industry classification scheme. Multi-national companies must manage the individual national NACE schemes and reconcile the differences across countries. Enterprises must invest in a reference data change management application to address the challenge of distributing reference data changes to downstream applications and assess which applications were impacted by a given change. References 1 Master Data versus Reference Data, Malcolm Chisholm, April 1, 2006.

Read the article

Modeling many-to-one with constraints?

- by Greg Beech

I'm attempting to create a database model for movie classifications, where each movie could have a single classification from each of one of multiple rating systems (e.g. BBFC, MPAA). This is the current design, with all implied PKs and FKs: TABLE Movie ( MovieId INT ) TABLE ClassificationSystem ( ClassificationSystemId TINYINT ) TABLE Classification ( ClassificationId INT, ClassificationSystemId TINYINT ) TABLE MovieClassification ( MovieId INT, ClassificationId INT, Advice NVARCHAR(250) -- description of why the classification was given ) The problem is with the MovieClassification table whose constraints would allow multiple classifications from the same system, whereas it should ideally only permit either zero or one classifications from a given system. Is there any reasonable way to restructure this so that a movie having exactly zero or one classifications from any given system is enforced by database constraints, given the following requirements? Do not duplicate information that could be looked up (i.e. duplicating ClassificationSystemId in the MovieClassification table is not a good solution because this could get out of sync with the value in the Classification table) Remain extensible to multiple classification systems (i.e. a new classification system does not require any changes to the table structure)? Note also the Advice column - each mapping of a movie to a classification needs to have a textual description of why that classification was given to that movie. Any design would need to support this.

Read the article

Granular Clipboard Control in Oracle IRM

- by martin.abrahams

One of the main leak prevention controls that customers are looking for is clipboard control. After all, there is little point in controlling access to a document if authorised users can simply make unprotected copies by use of the cut and paste mechanism. Oddly, for such a fundamental requirement, many solutions only offer very simplistic clipboard control - and require the customer to make an awkward choice between usability and security. In many cases, clipboard control is simply an ON-OFF option. By turning the clipboard OFF, you disable one of the most valuable edit functions known to man. Try working for any length of time without copying and pasting, and you'll soon appreciate how valuable that function is. Worse, some solutions disable the clipboard completely - not just for the protected document but for all of the various applications you have open at the time. Normal service is only resumed when you close the protected document. In this way, policy enforcement bleeds out of the particular assets you need to protect and interferes with the entire user experience. On the other hand, turning the clipboard ON satisfies a fundamental usability requirement - but also makes it really easy for users to create unprotected copies of sensitive information, maliciously or otherwise. All they need to do is paste into another document. If creating unprotected copies is this simple, you have to question how much you are really gaining by applying protection at all. You may not be allowed to edit, forward, or print the protected asset, but all you need to do is create a copy and work with that instead. And that activity would not be tracked in any way. So, a simple ON-OFF control creates a real tension between usability and security. If you are only using IRM on a small scale, perhaps security can outweigh usability - the business can put up with the restriction if it only applies to a handful of important documents. But try extending protection to large numbers of documents and large user communities, and the restriction rapidly becomes really unwelcome. I am aware of one solution that takes a different tack. Rather than disable the clipboard, pasting is always permitted, but protection is automatically applied to any document that you paste into. At first glance, this sounds great - protection travels with the content. However, at any scale this model may not be so appealing once you've had to deal with support calls from users who have accidentally applied protection to documents that really don't need it - which would be all too easily done. This may help control leakage, but it also pollutes the system with documents that have policies applied with no obvious rhyme or reason, and it can seriously inconvenience the business by making non-sensitive documents difficult to access. And what policy applies if you paste some protected content into an already protected document? Which policy applies? There are no prizes for guessing that Oracle IRM takes a rather different approach. The Oracle IRM Approach Oracle IRM offers a spectrum of clipboard controls between the extremes of ON and OFF, and it leverages the classification-based rights model to give granular control that satisfies both security and usability needs. Firstly, we take it for granted that if you have EDIT rights, of course you can use the clipboard within a given document. Why would we force you to retype a piece of content that you want to move from HERE... to HERE...? If the pasted content remains in the same document, it is equally well protected whether it be at the beginning, middle, or end - or all three. So, the first point is that Oracle IRM always enables the clipboard if you have the right to edit the file. Secondly, whether we enable or disable the clipboard, we only affect the protected document. That is, you can continue to use the clipboard in the usual way for unprotected documents and applications regardless of whether the clipboard is enabled or disabled for the protected document(s). And if you have multiple protected documents open, each may have the clipboard enabled or disabled independently, according to whether you have Edit rights for each. So, even for the simplest cases - the ON-OFF cases - Oracle IRM adds value by containing the effect to the protected documents rather than to the whole desktop environment. Now to the granular options between ON and OFF. Thanks to our classification model, we can define rights that enable pasting between documents in the same classification - ie. between documents that are protected by the same policy. So, if you are working on this month's financial report and you want to pull some data from last month's report, you can simply cut and paste between the two documents. The two documents are classified the same way, subject to the same policy, so the content is equally safe in both documents. However, if you try to paste the same data into an unprotected document or a document in a different classification, you can be prevented. Thus, the control balances legitimate user requirements to allow pasting with legitimate information security concerns to keep data protected. We can take this further. You may have the right to paste between related classifications of document. So, the CFO might want to copy some financial data into a board document, where the two documents are sealed to different classifications. The CFO's rights may well allow this, as it is a reasonable thing for a CFO to want to do. But policy might prevent the CFO from copying the same data into a classification that is accessible to external parties. The above option, to copy between classifications, may be for specific classifications or open-ended. That is, your rights might enable you to go from A to B but not to C, or you might be allowed to paste to any classification subject to your EDIT rights. As for so many features of Oracle IRM, our classification-based rights model makes this type of granular control really easy to manage - you simply define that pasting is permitted between classifications A and B, but omit C. Or you might define that pasting is permitted between all classifications, but not to unprotected locations. The classification model enables millions of documents to be controlled by a few such rules. Finally, you MIGHT have the option to paste anywhere - such that unprotected copies may be created. This is rare, but a legitimate configuration for some users, some use cases, and some classifications - but not something that you have to permit simply because the alternative is too restrictive. As always, these rights are defined in user roles - so different users are subject to different clipboard controls as required in different classifications. So, where most solutions offer just two clipboard options - ON-OFF or ON-but-encrypt-everything-you-touch - Oracle IRM offers real granularity that leverages our classification model. Indeed, I believe it is the lack of a classification model that makes such granularity impractical for other IRM solutions, because the matrix of rules for controlling pasting would be impossible to manage - there are so many documents to consider, and more are being created all the time.

Read the article

Build a decision tree for classification of large amount data,using python?

- by kaushik

Hi,i am working for speech synthesis.In this i have a large number of pronunciation for each phone i.e alphabet and need to classify them according to few feature such as segment size(int) and alphabet itself(string) into a smaller set suitable for that particular context. For this purpose,i have decided to use decision tree for classification.the data to be parsed is in the S expression format.eg:((question)(LEFTNODE)(RIGHTNODE)). i hav idea for building decision tree for normal buit in type such as list..looking for suggestion for implementation for S expression.. kindly help.. Thanks in advance.. Note:this question may look similar to my prev post,srry if cant giv multiple post.already edited it many times so though of wirting new question instead of editing again

Read the article

What is the type/classification/term for an PC headset that does not go over the head or back of the

- by FMFF

How do I search in the leading auction site for the PC headset(earpiece + microphone) that does not have a band to go around or over your head - instead it is entirely like a cable, with the microphone attached in one of the wires leading from the ear piece? How is it called? I'm not talking about Bluetooth wireless headsets. I have one from Plantronics I got years ago, which they stopped making and I want another one, but don't know how to search. Simply searching for HeadSet or similar terms, brings mostly the newer ones which I don't want. Thank you.

Read the article

Protecting offline IRM rights and the error "Unable to Connect to Offline database"

- by Simon Thorpe

One of the most common problems I get asked about Oracle IRM is in relation to the error message "Unable to Connect to Offline database". This error message is a result of how Oracle IRM is protecting the cached rights on the local machine and if that cache has become invalid in anyway, this error is thrown. Offline rights and security First we need to understand how Oracle IRM handles offline use. The way it is implemented is one of the main reasons why Oracle IRM is the leading document security solution and demonstrates our methodology to ensure that solutions address both security and usability and puts the balance of these two in your control. Each classification has a set of predefined roles that the manager of the classification can assign to users. Each role has an offline period which determines the amount of time a user can access content without having to communicate with the IRM server. By default for the context model, which is the classification system that ships out of the box with Oracle IRM, the offline period for each role is 3 days. This is easily changed however and can be as low as under an hour to as long as years. It is also possible to switch off the ability to access content offline which can be useful when content is very sensitive and requires a tight leash. So when a user is online, transparently in the background, the Oracle IRM Desktop communicates with the server and updates the users rights and offline periods. This transparent synchronization period is determined by the server and communicated to all IRM Desktops and allows for users rights to be kept up to date without their intervention. This allows us to support some very important scenarios which are key to a successful IRM solution. A user doesn't have to make any decision when going offline, they simply unplug their laptop and they already have their offline periods synchronized to the maximum values. Any solution that requires a user to make a decision at the point of going offline isn't going to work because people forget to do this and will therefore be unable to legitimately access their content offline. If your rights change to REMOVE your access to content, this also happens in the background. This is very useful when someone has an offline duration of a week and they happen to make a connection to the internet 3 days into that offline period, the Oracle IRM Desktop detects this online state and automatically updates all rights for the user. This means the business risk is reduced when setting long offline periods, because of the daily transparent sync, you can reflect changes as soon as the user is online. Of course, if they choose not to come online at all during that week offline period, you cannot effect change, but you take that risk in giving the 7 day offline period in the first place. If you are added to a NEW classification during the day, this will automatically be synchronized without the user even having to open a piece of content secured against that classification. This is very important, consider the scenario where a senior executive downloads all their email but doesn't open any of it. Disconnects the laptop and then gets on a plane. During the flight they attempt to open a document attached to a downloaded email which has been secured against an IRM classification the user was not even aware they had access to. Because their new role in this classification was automatically synchronized their experience is a good one and the document opens. More information on how the Oracle IRM classification model works can be found in this article by Martin Abrahams. So what about problems accessing the offline rights database? So onto the core issue... when these rights are cached to your machine they are stored in an encrypted database. The encryption of this offline database is keyed to the instance of the installation of the IRM Desktop and the Windows user account. Why? Well what you do not want to happen is for someone to get their rights for content and then copy these files across hundreds of other machines, therefore getting access to sensitive content across many environments. The IRM server has a setting which controls how many times you can cache these rights on unique machines. This is because people typically access IRM content on more than one computer. Their work desktop, a laptop and often a home computer. So Oracle IRM allows for the usability of caching rights on more than one computer whilst retaining strong security over this cache. So what happens if these files are corrupted in someway? That's when you will see the error, Unable to Connect to Offline database. The most common instance of seeing this is when you are using virtual machines and copy them from one computer to the next. The virtual machine software, VMWare Workstation for example, makes changes to the unique information of that virtual machine and as such invalidates the offline database. How do you solve the problem? Resolution is however simple. You just delete all of the offline database files on the machine and they will be recreated with working encryption when the Oracle IRM Desktop next starts. However this does mean that the IRM server will think you have your rights cached to more than one computer and you will need to rerequest your rights, even though you are only going to be accessing them on one. Because it still thinks the old cache is valid. So be aware, it is good practice to increase the server limit from the default of 1 to say 3 or 4. This is done using the Enterprise Manager instance of IRM. So to delete these offline files I have a simple .bat file you can use; Download DeleteOfflineDBs.bat Note that this uses pskillto stop the irmBackground.exe from running. This is part of the IRM Desktop and holds open a lock to the offline database. Either kill this from task manager or use pskillas part of the script.

Read the article

Oracle IRM video demonstration of seperating duties of document security

- by Simon Thorpe

One thing an Information Rights Management technology should do well is separate out three main areas of responsibility.The business process of defining and controlling the classifications to which content is secured and the definition of the roles employees, customers, partners and contractors have when accessing secured content. Allow IT to manage the server and perform the role of authorizing the creation of new classifications to meet business needs but yet once the classification has been created and handed off to the business, IT no longer plays a role on the ongoing management. Empower the business to take ownership of classifications to which their own content is secured. For example an employee who is leading an acquisition project should be responsible for defining who has access to confidential project documents. This person should be able to manage the rights users have in the classification and also be the point of contact for those wishing to gain rights. Oracle IRM has since it's creation in the late 1990's had this core model at the heart of its design. Due in part to the important seperation of rights from the documents themselves, Oracle IRM places the right functionality within the right parts of the business. For example some IRM technologies allow the end user to make decisions about what users can print, edit or save a secured document. This in practice results in a wide variety of content secured with a plethora of options that don't conform to any policy. With Oracle IRM users choose from a list of classifications to which they have been given the ability to secure information against. Their role in the classification was given to them by the business owner of the classification, yet the definition of the role resides within the realm of corporate security who own the overall business classification policies. It is this type of design and philosophy in Oracle IRM that makes it an enterprise solution that works beyond a few users and a few secured documents to hundreds of thousands of users and millions of documents. This following video shows how Oracle IRM 11g, the market leading document security solution, lets the security organization manage and create classifications whilst the business owns and manages them. If you want to experience using Oracle IRM secured content and the effects of different roles users have, why not sign up for our free demonstration.

Read the article

Reference Data Management

- by rahulkamath

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} table.MsoTableColorfulListAccent2 {mso-style-name:"Colorful List - Accent 2"; mso-tstyle-rowband-size:1; mso-tstyle-colband-size:1; mso-style-priority:72; mso-style-unhide:no; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-tstyle-shading:#F8EDED; mso-tstyle-shading-themecolor:accent2; mso-tstyle-shading-themetint:25; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; color:black; mso-themecolor:text1;} table.MsoTableColorfulListAccent2FirstRow {mso-style-name:"Colorful List - Accent 2"; mso-table-condition:first-row; mso-style-priority:72; mso-style-unhide:no; mso-tstyle-shading:#9E3A38; mso-tstyle-shading-themecolor:accent2; mso-tstyle-shading-themeshade:204; mso-tstyle-border-bottom:1.5pt solid white; mso-tstyle-border-bottom-themecolor:background1; color:white; mso-themecolor:background1; mso-ansi-font-weight:bold; mso-bidi-font-weight:bold;} table.MsoTableColorfulListAccent2LastRow {mso-style-name:"Colorful List - Accent 2"; mso-table-condition:last-row; mso-style-priority:72; mso-style-unhide:no; mso-tstyle-shading:white; mso-tstyle-shading-themecolor:background1; mso-tstyle-border-top:1.5pt solid black; mso-tstyle-border-top-themecolor:text1; color:#9E3A38; mso-themecolor:accent2; mso-themeshade:204; mso-ansi-font-weight:bold; mso-bidi-font-weight:bold;} table.MsoTableColorfulListAccent2FirstCol {mso-style-name:"Colorful List - Accent 2"; mso-table-condition:first-column; mso-style-priority:72; mso-style-unhide:no; mso-ansi-font-weight:bold; mso-bidi-font-weight:bold;} table.MsoTableColorfulListAccent2LastCol {mso-style-name:"Colorful List - Accent 2"; mso-table-condition:last-column; mso-style-priority:72; mso-style-unhide:no; mso-ansi-font-weight:bold; mso-bidi-font-weight:bold;} table.MsoTableColorfulListAccent2OddColumn {mso-style-name:"Colorful List - Accent 2"; mso-table-condition:odd-column; mso-style-priority:72; mso-style-unhide:no; mso-tstyle-shading:#EFD3D2; mso-tstyle-shading-themecolor:accent2; mso-tstyle-shading-themetint:63; mso-tstyle-border-top:cell-none; mso-tstyle-border-left:cell-none; mso-tstyle-border-bottom:cell-none; mso-tstyle-border-right:cell-none; mso-tstyle-border-insideh:cell-none; mso-tstyle-border-insidev:cell-none;} table.MsoTableColorfulListAccent2OddRow {mso-style-name:"Colorful List - Accent 2"; mso-table-condition:odd-row; mso-style-priority:72; mso-style-unhide:no; mso-tstyle-shading:#F2DBDB; mso-tstyle-shading-themecolor:accent2; mso-tstyle-shading-themetint:51;} Reference Data Management Oracle Data Relationship Management (DRM) has always been extremely powerful as an Enterprise MDM solution that can help manage changes to master data in a way that influences enterprise structure, whether it be mastering chart of accounts to enable financial transformation, or revamping organization structures to drive business transformation and operational efficiencies, or mastering sales territories in light of rapid fire acquisitions that require frequent sales territory refinement, equitable distribution of leads and accounts to salespersons, and alignment of budget/forecast with results to optimize sales coverage. Increasingly, DRM is also being utilized by Oracle customers for reference data management, an emerging solution space that deserves some explanation. What is reference data? Reference data is a close cousin of master data. While master data may be more rapidly changing, requires consensus building across stakeholders and lends structure to business transactions, reference data is simpler, more slowly changing, but has semantic content that is used to categorize or group other information assets – including master data – and give them contextual value. The following table contains an illustrative list of examples of reference data by type. Reference data types may include types and codes, business taxonomies, complex relationships & cross-domain mappings or standards. Types & Codes Taxonomies Relationships / Mappings Standards Transaction Codes Industry Classification Categories and Codes, e.g., North America Industry Classification System (NAICS) Product / Segment; Product / Geo Calendars (e.g., Gregorian, Fiscal, Manufacturing, Retail, ISO8601) Lookup Tables (e.g., Gender, Marital Status, etc.) Product Categories City à State à Postal Codes Currency Codes (e.g., ISO) Status Codes Sales Territories (e.g., Geo, Industry Verticals, Named Accounts, Federal/State/Local/Defense) Customer / Market Segment; Business Unit / Channel Country Codes (e.g., ISO 3166, UN) Role Codes Market Segments Country Codes / Currency Codes / Financial Accounts Date/Time, Time Zones (e.g., ISO 8601) Domain Values Universal Standard Products and Services Classification (UNSPSC), eCl@ss International Classification of Diseases (ICD) e.g., ICD9 à IC10 mappings Tax Rates Why manage reference data? Reference data carries contextual value and meaning and therefore its use can drive business logic that helps execute a business process, create a desired application behavior or provide meaningful segmentation to analyze transaction data. Further, mapping reference data often requires human judgment. Sample Use Cases of Reference Data Management Healthcare: Diagnostic Codes The reference data challenges in the healthcare industry offer a case in point. Part of being HIPAA compliant requires medical practitioners to transition diagnosis codes from ICD-9 to ICD-10, a medical coding scheme used to classify diseases, signs and symptoms, causes, etc. The transition to ICD-10 has a significant impact on business processes, procedures, contracts, and IT systems. Since both code sets ICD-9 and ICD-10 offer diagnosis codes of very different levels of granularity, human judgment is required to map ICD-9 codes to ICD-10. The process requires collaboration and consensus building among stakeholders much in the same way as does master data management. Moreover, to build reports to understand utilization, frequency and quality of diagnoses, medical practitioners may need to “cross-walk” mappings -- either forward to ICD-10 or backwards to ICD-9 depending upon the reporting time horizon. Spend Management: Product, Service & Supplier Codes Similarly, as an enterprise looks to rationalize suppliers and leverage their spend, conforming supplier codes, as well as product and service codes requires supporting multiple classification schemes that may include industry standards (e.g., UNSPSC, eCl@ss) or enterprise taxonomies. Aberdeen Group estimates that 90% of companies rely on spreadsheets and manual reviews to aggregate, classify and analyze spend data, and that data management activities account for 12-15% of the sourcing cycle and consume 30-50% of a commodity manager’s time. Creating a common map across the extended enterprise to rationalize codes across procurement, accounts payable, general ledger, credit card, procurement card (P-card) as well as ACH and bank systems can cut sourcing costs, improve compliance, lower inventory stock, and free up talent to focus on value added tasks. Specialty Finance: Point of Sales Transaction Codes and Product Codes In the specialty finance industry, enterprises are confronted with usury laws – governed at the state and local level – that regulate financial product innovation as it relates to consumer loans, check cashing and pawn lending. To comply, it is important to demonstrate that transactions booked at the point of sale are posted against valid product codes that were on offer at the time of booking the sale. Since new products are being released at a steady stream, it is important to ensure timely and accurate mapping of point-of-sale transaction codes with the appropriate product and GL codes to comply with the changing regulations. Multi-National Companies: Industry Classification Schemes As companies grow and expand across geographies, a typical challenge they encounter with reference data represents reconciling various versions of industry classification schemes in use across nations. While the United States, Mexico and Canada conform to the North American Industry Classification System (NAICS) standard, European Union countries choose different variants of the NACE industry classification scheme. Multi-national companies must manage the individual national NACE schemes and reconcile the differences across countries. Enterprises must invest in a reference data change management application to address the challenge of distributing reference data changes to downstream applications and assess which applications were impacted by a given change.

Read the article

Naive Bayesian classification (spam filtering) - Doubt in one calculation? Which one is right? Plz c

- by Microkernel

Hi guys, I am implementing Naive Bayesian classifier for spam filtering. I have doubt on some calculation. Please clarify me what to do. Here is my question. In this method, you have to calculate P(S|W) - Probability that Message is spam given word W occurs in it. P(W|S) - Probability that word W occurs in a spam message. P(W|H) - Probability that word W occurs in a Ham message. So to calculate P(W|S), should I do (1) (Number of times W occuring in spam)/(total number of times W occurs in all the messages) OR (2) (Number of times word W occurs in Spam)/(Total number of words in the spam message) So, to calculate P(W|S), should I do (1) or (2)? (I thought it to be (2), but I am not sure, so plz clarify me) I am refering http://en.wikipedia.org/wiki/Bayesian_spam_filtering for the info by the way. I got to complete the implementation by this weekend :( Thanks and regards, MicroKernel :) @sth: Hmm... Shouldn't repeated occurrence of word 'W' increase a message's spam score? In the your approach it wouldn't, right?. Lets take a scenario and discuss... Lets say, we have 100 training messages, out of which 50 are spam and 50 are Ham. and say word_count of each message = 100. And lets say, in spam messages word W occurs 5 times in each message and word W occurs 1 time in Ham message. So total number of times W occuring in all the spam message = 5*50 = 250 times. And total number of times W occuring in all Ham messages = 1*50 = 50 times. Total occurance of W in all of the training messages = (250+50) = 300 times. So, in this scenario, how do u calculate P(W|S) and P(W|H) ? Naturally we should expect, P(W|S) P(W|H)??? right. Please share your thought...

Read the article

DB Design Pattern - Many to many classification / categorised tagging.

- by Robin Day

I have an existing database design that stores Job Vacancies. The "Vacancy" table has a number of fixed fields across all clients, such as "Title", "Description", "Salary range". There is an EAV design for "Custom" fields that the Clients can setup themselves, such as, "Manager Name", "Working Hours". The field names are stored in a "ClientText" table and the data stored in a "VacancyClientText" table with VacancyId, ClientTextId and Value. Lastly there is a many to many EAV design for custom tagging / categorising the vacancies with things such as Locations/Offices the vacancy is in, a list of skills required. This is stored as a "ClientCategory" table listing the types of tag, "Locations, Skills", a "ClientCategoryItem" table listing the valid values for each Category, e.g., "London,Paris,New York,Rome", "C#,VB,PHP,Python". Finally there is a "VacancyClientCategoryItem" table with VacancyId and ClientCategoryItemId for each of the selected items for the vacancy. There are no limits to the number of custom fields or custom categories that the client can add. I am now designing a new system that is very similar to the existing system, however, I have the ability to restrict the number of custom fields a Client can have and it's being built from scratch so I have no legacy issues to deal with. For the Custom Fields my solution is simple, I have 5 additional columns on the Vacancy Table called CustomField1-5. This removes one of the EAV designs. It is with the tagging / categorising design that I am struggling. If I limit a client to having 5 categories / types of tag. Should I create 5 tables listing the possible values "CustomCategoryItems1-5" and then an additional 5 many to many tables "VacancyCustomCategoryItem1-5" This would result in 10 tables performing the same storage as the three tables in the existing system. Also, should (heaven forbid) the requirements change in that I need 6 custom categories rather than 5 then this will result in a lot of code change. Therefore, can anyone suggest any DB Design Patterns that would be more suitable to storing such data. I'm happy to stick with the EAV approach, however, the existing system has come across all the usual performance issues and complex queries associated with such a design. Any advice / suggestions are much appreciated. The DBMS system used is SQL Server 2005, however, 2008 is an option if required for any particular pattern.

Read the article

how do I interpret rpart splits on factor variables when building classification trees in R?

- by artessa

ie, if the factor variable is Climate, with 4 possible values: Tropical, Arid, Temperate, Snow, and a node in my rpart tree is labeled as "Climate:ab", what is the split?

Read the article

Thematic map contd.

- by jsharma

The previous post (creating a thematic map) described the use of an advanced style (color ranged-bucket style). The bucket style definition object has an attribute ('classification') which specifies the data classification scheme to use. It's values can be one of {'equal', 'quantile', 'logarithmic', 'custom'}. We use logarithmic in the previous example. Here we'll describe how to use a custom algorithm for classification. Specifically the Jenks Natural Breaks algorithm. We'll use the Javascript implementation in geostats.js The sample code above needs a few changes which are listed below. Include the geostats.js file after or before including oraclemapsv2.js <script src="geostats.js"></script> Modify the bucket style definition to use custom classification Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} bucketStyleDef = { numClasses : colorSeries[colorName].classes, classification: 'custom', //'logarithmic', // use a logarithmic scale algorithm: jenksFromGeostats, styles: theStyles, gradient: useGradient? 'linear' : 'off' }; The function, which implements the custom classification scheme, is specified as the algorithm attribute value. It must accept two input parameters, an array of OM.feature and the name of the feature attribute (e.g. TOTPOP) to use in the classification, and must return an array of buckets (i.e. an array of or OM.style.Bucket or OM.style.RangedBucket in this case). However the algorithm also needs to know the number of classes (i.e. the number of buckets to create). So we use a global to pass that info in. (Note: This bug/oversight will be fixed and the custom algorithm will be passed 3 parameters: the features array, attribute name, and number of classes). So createBucketColorStyle() has the following changes Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} var numClasses ; function createBucketColorStyle( colorName, colorSeries, rangeName, useGradient) { var theBucketStyle; var bucketStyleDef; var theStyles = []; //var numClasses ; numClasses = colorSeries[colorName].classes; ... and the function jenksFromGeostats is defined as Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} function jenksFromGeostats(featureArray, columnName) { var items = [] ; // array of attribute values to be classified $.each(featureArray, function(i, feature) { items.push(parseFloat(feature.getAttributeValue(columnName))); }); // create the geostats object var theSeries = new geostats(items); // call getJenks which returns an array of bounds var theClasses = theSeries.getJenks(numClasses); if(theClasses) { theClasses[theClasses.length-1]=parseFloat(theClasses[theClasses.length-1])+1; } else { alert(' empty result from getJenks'); } var theBuckets = [], aBucket=null ; for(var k=0; k<numClasses; k++) { aBucket = new OM.style.RangedBucket( {low:parseFloat(theClasses[k]), high:parseFloat(theClasses[k+1]) }); theBuckets.push(aBucket); } return theBuckets; } A screenshot of the resulting map with 5 classes is shown below. It is also possible to simply create the buckets and supply them when defining the Bucket style instead of specifying the function (algorithm). In that case the bucket style definition object would be bucketStyleDef = { numClasses : colorSeries[colorName].classes, classification: 'custom', buckets: theBuckets, //since we are supplying all the buckets styles: theStyles, gradient: useGradient? 'linear' : 'off' };

Read the article

Database warehouse design: fact tables and dimension tables

- by morpheous

I am building a poor man's data warehouse using a RDBMS. I have identified the key 'attributes' to be recorded as: sex (true/false) demographic classification (A, B, C etc) place of birth date of birth weight (recorded daily): The fact that is being recorded My requirements are to be able to run 'OLAP' queries that allow me to: 'slice and dice' 'drill up/down' the data and generally, be able to view the data from different perspectives After reading up on this topic area, the general consensus seems to be that this is best implemented using dimension tables rather than normalized tables. Assuming that this assertion is true (i.e. the solution is best implemented using fact and dimension tables), I would like to seek some help in the design of these tables. 'Natural' (or obvious) dimensions are: Date dimension Geographical location Which have hierarchical attributes. However, I am struggling with how to model the following fields: sex (true/false) demographic classification (A, B, C etc) The reason I am struggling with these fields is that: They have no obvious hierarchical attributes which will aid aggregation (AFAIA) - which suggest they should be in a fact table They are mostly static or very rarely change - which suggests they should be in a dimension table. Maybe the heuristic I am using above is too crude? I will give some examples on the type of analysis I would like to carryout on the data warehouse - hopefully that will clarify things further. I would like to aggregate and analyze the data by sex and demographic classification - e.g. answer questions like: How does male and female weights compare across different demographic classifications? Which demographic classification (male AND female), show the most increase in weight this quarter. etc. Can anyone clarify whether sex and demographic classification are part of the fact table, or whether they are (as I suspect) dimension tables.? Also assuming they are dimension tables, could someone elaborate on the table structures (i.e. the fields)? The 'obvious' schema: CREATE TABLE sex_type (is_male int); CREATE TABLE demographic_category (id int, name varchar(4)); may not be the correct one.

Read the article

Database warehoue design: fact tables and dimension tables

- by morpheous

I am building a poor man's data warehouse using a RDBMS. I have identified the key 'attributes' to be recorded as: sex (true/false) demographic classification (A, B, C etc) place of birth date of birth weight (recorded daily): The fact that is being recorded My requirements are to be able to run 'OLAP' queries that allow me to: 'slice and dice' 'drill up/down' the data and generally, be able to view the data from different perspectives After reading up on this topic area, the general consensus seems to be that this is best implemented using dimension tables rather than normalized tables. Assuming that this assertion is true (i.e. the solution is best implemented using fact and dimension tables), I would like to see some help in the design of these tables. 'Natural' (or obvious) dimensions are: Date dimension Geographical location Which have hierarchical attributes. However, I am struggling with how to model the following fields: sex (true/false) demographic classification (A, B, C etc) The reason I am struggling with these fields is that: They have no obvious hierarchical attributes which will aid aggregation (AFAIA) - which suggest they should be in a fact table They are mostly static or very rarely change - which suggests they should be in a dimension table. Maybe the heuristic I am using above is too crude? I will give some examples on the type of analysis I would like to carryout on the data warehouse - hopefully that will clarify things further. I would like to aggregate and analyze the data by sex and demographic classification - e.g. answer questions like: How does male and female weights compare across different demographic classifications? Which demographic classification (male AND female), show the most increase in weight this quarter. etc. Can anyone clarify whether sex and demographic classification are part of the fact table, or whether they are (as I suspect) dimension tables.? Also assuming they are dimension tables, could someone elaborate on the table structures (i.e. the fields)? The 'obvious' schema: CREATE TABLE sex_type (is_male int); CREATE TABLE demographic_category (id int, name varchar(4)); may not be the correct one.

Read the article

Machine Learning Algorithm for Peer-to-Peer Nodes

- by FreshCode

I want to apply machine learning to a classification problem in a parallel environment. Several independent nodes, each with multiple on/off sensors, can communicate their sensor data with the goal of classifying an event as defined by a heuristic, training data or both. Each peer will be measuring the same data from their unique perspective and will attempt to classify the result while taking into account that any neighbouring node (or its sensors or just the connection to the node) could be faulty. Nodes should function as equal peers and determine the most likely classification by communicating their results. Ultimately each node should make a decision based on their own sensor data and their peers' data. If it matters, false positives are OK for certain classifications (albeit undesirable) but false negatives would be totally unacceptable. Given that each final classification will receive good or bad feedback, what would be an appropriate machine learning algorithm to approach this problem with if the nodes could communicate with each other to determine the most likely classification?

Read the article

Machine Learning Algorithm for Parallel Nodes

- by FreshCode

I want to apply machine learning to a classification problem in a parallel environment. Several independent nodes, each with multiple on/off sensors, can communicate their sensor data with the goal of classifying an event defined by a heuristic, training data or both. Each peer will be measuring the same data from their unique perspective and will attempt to classify the result while taking into account that any neighbouring node (or its sensors or just the connection to the node) could be faulty. Nodes should function as equal peers and determine the most likely classification by communicating their results. Ultimately each node should make a decision based on their own sensor data and their peers' data. If it matters, false positives are OK (albeit undesirable) but false negatives are totally unacceptable. Given that each final classification will receive good or bad feedback, what would be an appropriate machine learning algorithm to approach this problem with if the nodes could communicate with each other to determine the most likely classification?

Read the article

SQL Server 2008 - Keyword search using table Join

- by Aaron Wagner

Ok, I created a Stored Procedure that, among other things, is searching 5 columns for a particular keyword. To accomplish this, I have the keywords parameter being split out by a function and returned as a table. Then I do a Left Join on that table, using a LIKE constraint. So, I had this working beautifully, and then all of the sudden it stops working. Now it is returning every row, instead of just the rows it needs. The other caveat, is that if the keyword parameter is empty, it should ignore it. Given what's below, is there A) a glaring mistake, or B) a more efficient way to approach this? Here is what I have currently: ALTER PROCEDURE [dbo].[usp_getOppsPaged] @startRowIndex int, @maximumRows int, @city varchar(100) = NULL, @state char(2) = NULL, @zip varchar(10) = NULL, @classification varchar(15) = NULL, @startDateMin date = NULL, @startDateMax date = NULL, @endDateMin date = NULL, @endDateMax date = NULL, @keywords varchar(400) = NULL AS BEGIN SET NOCOUNT ON; ;WITH Results_CTE AS ( SELECT opportunities.*, organizations.*, departments.dept_name, departments.dept_address, departments.dept_building_name, departments.dept_suite_num, departments.dept_city, departments.dept_state, departments.dept_zip, departments.dept_international_address, departments.dept_phone, departments.dept_website, departments.dept_gen_list, ROW_NUMBER() OVER (ORDER BY opp_id) AS RowNum FROM opportunities JOIN departments ON opportunities.dept_id = departments.dept_id JOIN organizations ON departments.org_id=organizations.org_id LEFT JOIN Split(',',@keywords) AS kw ON (title LIKE '%'+kw.s+'%' OR [description] LIKE '%'+kw.s+'%' OR tasks LIKE '%'+kw.s+'%' OR requirements LIKE '%'+kw.s+'%' OR comments LIKE '%'+kw.s+'%') WHERE ( (@city IS NOT NULL AND (city LIKE '%'+@city+'%' OR dept_city LIKE '%'+@city+'%' OR org_city LIKE '%'+@city+'%')) OR (@state IS NOT NULL AND ([state] = @state OR dept_state = @state OR org_state = @state)) OR (@zip IS NOT NULL AND (zip = @zip OR dept_zip = @zip OR org_zip = @zip)) OR (@classification IS NOT NULL AND (classification LIKE '%'+@classification+'%')) OR ((@startDateMin IS NOT NULL AND @startDateMax IS NOT NULL) AND ([start_date] BETWEEN @startDateMin AND @startDateMax)) OR ((@endDateMin IS NOT NULL AND @endDateMax IS NOT NULL) AND ([end_date] BETWEEN @endDateMin AND @endDateMax)) OR ( (@city IS NULL AND @state IS NULL AND @zip IS NULL AND @classification IS NULL AND @startDateMin IS NULL AND @startDateMax IS NULL AND @endDateMin IS NULL AND @endDateMin IS NULL) ) ) ) SELECT * FROM Results_CTE WHERE RowNum >= @startRowIndex AND RowNum < @startRowIndex + @maximumRows; END

Search Results

Search found 227 results on 10 pages for 'classification'.

Page 3/10 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page >

- by user1780592

- by user963386

- by Ivo Danihelka

- by Benjamin Smith

- by Tom Gullen

- by ryandi

- by csl

- by burtonic

- by Mala Narasimharajan

- by Greg Beech

- by martin.abrahams

- by kaushik

- by FMFF

- by Simon Thorpe

- by Simon Thorpe

- by rahulkamath

- by Microkernel

- by Robin Day

- by artessa

- by jsharma

- by morpheous

- by morpheous

- by FreshCode

- by FreshCode

- by Aaron Wagner

< Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page >