Search Results

Search found 36 results on 2 pages for 'datamining'.

Page 2/2 | < Previous Page | 1 2 

  • Free NOSQL database for use with C# client [closed]

    - by Mitten
    I've never used NOSQL databases before, but so far it seems like the best data storage solution for my project. I am going to implement a datamining application. The data I would like to mine is thousands of documents which cannot be imported into datamining applications. To make to import easier and faster (than importing thousands of documents) I am planning to import these documents into a NOSQL database first and when import NOSQL database into datamining software. At the very least once I have all the data in NOSQL database I should be able to code simplest datamining logic myself. Am I correct that NOSQL databases allow to creates records of data, but they don't mandate all the records to adhere to the same data schema (same column names/types in a classic table oriended SQL databases)? I think for each document I would create a row/entry/object (not sure what is the correct term is in use in NOSQL world) which would be a string id, few (columns) with unstructured text data, and a dozens of columns mostly of datetime and integer types. From its name NOSQL does not support SQL query syntax, but it support locating the object(row/entry?) by its unique id. Does NOSQL support qyuering objects using property=value syntax? Unfortunately most of free NOSQL db only support Java/C++ clients, which free NOSQL db would you recommend for a C# programmer?

    Read the article

  • Extending Oracle CEP with Predictive Analytics

    - by vikram.shukla(at)oracle.com
    Introduction: OCEP is often used as a business rules engine to execute a set of business logic rules via CQL statements, and take decisions based on the outcome of those rules. There are times where configuring rules manually is sufficient because an application needs to deal with only a small and well-defined set of static rules. However, in many situations customers don't want to pre-define such rules for two reasons. First, they are dealing with events with lots of columns and manually crafting such rules for each column or a set of columns and combinations thereof is almost impossible. Second, they are content with probabilistic outcomes and do not care about 100% precision. The former is the case when a user is dealing with data with high dimensionality, the latter when an application can live with "false" positives as they can be discarded after further inspection, say by a Human Task component in a Business Process Management software. The primary goal of this blog post is to show how this can be achieved by combining OCEP with Oracle Data Mining® and leveraging the latter's rich set of algorithms and functionality to do predictive analytics in real time on streaming events. The secondary goal of this post is also to show how OCEP can be extended to invoke any arbitrary external computation in an RDBMS from within CEP. The extensible facility is known as the JDBC cartridge. The rest of the post describes the steps required to achieve this: We use the dataset available at http://blogs.oracle.com/datamining/2010/01/fraud_and_anomaly_detection_made_simple.html to showcase the capabilities. We use it to show how transaction anomalies or fraud can be detected. Building the model: Follow the self-explanatory steps described at the above URL to build the model.  It is very simple - it uses built-in Oracle Data Mining PL/SQL packages to cleanse, normalize and build the model out of the dataset.  You can also use graphical Oracle Data Miner®  to build the models. To summarize, it involves: Specifying which algorithms to use. In this case we use Support Vector Machines as we're trying to find anomalies in highly dimensional dataset.Build model on the data in the table for the algorithms specified. For this example, the table was populated in the scott/tiger schema with appropriate privileges. Configuring the Data Source: This is the first step in building CEP application using such an integration.  Our datasource looks as follows in the server config file.  It is advisable that you use the Visualizer to add it to the running server dynamically, rather than manually edit the file.    <data-source>         <name>DataMining</name>         <data-source-params>             <jndi-names>                 <element>DataMining</element>             </jndi-names>             <global-transactions-protocol>OnePhaseCommit</global-transactions-protocol>         </data-source-params>         <connection-pool-params>             <credential-mapping-enabled></credential-mapping-enabled>             <test-table-name>SQL SELECT 1 from DUAL</test-table-name>             <initial-capacity>1</initial-capacity>             <max-capacity>15</max-capacity>             <capacity-increment>1</capacity-increment>         </connection-pool-params>         <driver-params>             <use-xa-data-source-interface>true</use-xa-data-source-interface>             <driver-name>oracle.jdbc.OracleDriver</driver-name>             <url>jdbc:oracle:thin:@localhost:1522:orcl</url>             <properties>                 <element>                     <value>scott</value>                     <name>user</name>                 </element>                 <element>                     <value>{Salted-3DES}AzFE5dDbO2g=</value>                     <name>password</name>                 </element>                                 <element>                     <name>com.bea.core.datasource.serviceName</name>                     <value>oracle11.2g</value>                 </element>                 <element>                     <name>com.bea.core.datasource.serviceVersion</name>                     <value>11.2.0</value>                 </element>                 <element>                     <name>com.bea.core.datasource.serviceObjectClass</name>                     <value>java.sql.Driver</value>                 </element>             </properties>         </driver-params>     </data-source>   Designing the EPN: The EPN is very simple in this example. We briefly describe each of the components. The adapter ("DataMiningAdapter") reads data from a .csv file and sends it to the CQL processor downstream. The event payload here is same as that of the table in the database (refer to the attached project or do a "desc table-name" from a SQL*PLUS prompt). While this is for convenience in this example, it need not be the case. One can still omit fields in the streaming events, and need not match all columns in the table on which the model was built. Better yet, it does not even need to have the same name as columns in the table, as long as you alias them in the USING clause of the mining function. (Caveat: they still need to draw values from a similar universe or domain, otherwise it constitutes incorrect usage of the model). There are two things in the CQL processor ("DataMiningProc") that make scoring possible on streaming events. 1.      User defined cartridge function Please refer to the OCEP CQL reference manual to find more details about how to define such functions. We include the function below in its entirety for illustration. <?xml version="1.0" encoding="UTF-8"?> <jdbcctxconfig:config     xmlns:jdbcctxconfig="http://www.bea.com/ns/wlevs/config/application"     xmlns:jc="http://www.oracle.com/ns/ocep/config/jdbc">        <jc:jdbc-ctx>         <name>Oracle11gR2</name>         <data-source>DataMining</data-source>               <function name="prediction2">                                 <param name="CQLMONTH" type="char"/>                      <param name="WEEKOFMONTH" type="int"/>                      <param name="DAYOFWEEK" type="char" />                      <param name="MAKE" type="char" />                      <param name="ACCIDENTAREA"   type="char" />                      <param name="DAYOFWEEKCLAIMED"  type="char" />                      <param name="MONTHCLAIMED" type="char" />                      <param name="WEEKOFMONTHCLAIMED" type="int" />                      <param name="SEX" type="char" />                      <param name="MARITALSTATUS"   type="char" />                      <param name="AGE" type="int" />                      <param name="FAULT" type="char" />                      <param name="POLICYTYPE"   type="char" />                      <param name="VEHICLECATEGORY"  type="char" />                      <param name="VEHICLEPRICE" type="char" />                      <param name="FRAUDFOUND" type="int" />                      <param name="POLICYNUMBER" type="int" />                      <param name="REPNUMBER" type="int" />                      <param name="DEDUCTIBLE"   type="int" />                      <param name="DRIVERRATING"  type="int" />                      <param name="DAYSPOLICYACCIDENT"   type="char" />                      <param name="DAYSPOLICYCLAIM" type="char" />                      <param name="PASTNUMOFCLAIMS" type="char" />                      <param name="AGEOFVEHICLES" type="char" />                      <param name="AGEOFPOLICYHOLDER" type="char" />                      <param name="POLICEREPORTFILED" type="char" />                      <param name="WITNESSPRESNT" type="char" />                      <param name="AGENTTYPE" type="char" />                      <param name="NUMOFSUPP" type="char" />                      <param name="ADDRCHGCLAIM"   type="char" />                      <param name="NUMOFCARS" type="char" />                      <param name="CQLYEAR" type="int" />                      <param name="BASEPOLICY" type="char" />                                     <return-component-type>char</return-component-type>                                                      <sql><![CDATA[             SELECT to_char(PREDICTION_PROBABILITY(CLAIMSMODEL, '0' USING *))               AS probability             FROM (SELECT  :CQLMONTH AS MONTH,                                            :WEEKOFMONTH AS WEEKOFMONTH,                          :DAYOFWEEK AS DAYOFWEEK,                           :MAKE AS MAKE,                           :ACCIDENTAREA AS ACCIDENTAREA,                           :DAYOFWEEKCLAIMED AS DAYOFWEEKCLAIMED,                           :MONTHCLAIMED AS MONTHCLAIMED,                           :WEEKOFMONTHCLAIMED,                             :SEX AS SEX,                           :MARITALSTATUS AS MARITALSTATUS,                            :AGE AS AGE,                           :FAULT AS FAULT,                           :POLICYTYPE AS POLICYTYPE,                            :VEHICLECATEGORY AS VEHICLECATEGORY,                           :VEHICLEPRICE AS VEHICLEPRICE,                           :FRAUDFOUND AS FRAUDFOUND,                           :POLICYNUMBER AS POLICYNUMBER,                           :REPNUMBER AS REPNUMBER,                           :DEDUCTIBLE AS DEDUCTIBLE,                            :DRIVERRATING AS DRIVERRATING,                           :DAYSPOLICYACCIDENT AS DAYSPOLICYACCIDENT,                            :DAYSPOLICYCLAIM AS DAYSPOLICYCLAIM,                           :PASTNUMOFCLAIMS AS PASTNUMOFCLAIMS,                           :AGEOFVEHICLES AS AGEOFVEHICLES,                           :AGEOFPOLICYHOLDER AS AGEOFPOLICYHOLDER,                           :POLICEREPORTFILED AS POLICEREPORTFILED,                           :WITNESSPRESNT AS WITNESSPRESENT,                           :AGENTTYPE AS AGENTTYPE,                           :NUMOFSUPP AS NUMOFSUPP,                           :ADDRCHGCLAIM AS ADDRCHGCLAIM,                            :NUMOFCARS AS NUMOFCARS,                           :CQLYEAR AS YEAR,                           :BASEPOLICY AS BASEPOLICY                 FROM dual)                 ]]>         </sql>        </function>     </jc:jdbc-ctx> </jdbcctxconfig:config> 2.      Invoking the function for each event. Once this function is defined, you can invoke it from CQL as follows: <?xml version="1.0" encoding="UTF-8"?> <wlevs:config xmlns:wlevs="http://www.bea.com/ns/wlevs/config/application">   <processor>     <name>DataMiningProc</name>     <rules>        <query id="q1"><![CDATA[                     ISTREAM(SELECT S.CQLMONTH,                                   S.WEEKOFMONTH,                                   S.DAYOFWEEK, S.MAKE,                                   :                                         S.BASEPOLICY,                                    C.F AS probability                                                 FROM                                 StreamDataChannel [NOW] AS S,                                 TABLE(prediction2@Oracle11gR2(S.CQLMONTH,                                      S.WEEKOFMONTH,                                      S.DAYOFWEEK,                                       S.MAKE, ...,                                      S.BASEPOLICY) AS F of char) AS C)                       ]]></query>                 </rules>               </processor>           </wlevs:config>   Finally, the last stage in the EPN prints out the probability of the event being an anomaly. One can also define a threshold in CQL to filter out events that are normal, i.e., below a certain mark as defined by the analyst or designer. Sample Runs: Now let's see how this behaves when events are streamed through CEP. We use only two events for brevity, one normal and other one not. This is one of the "normal" looking events and the probability of it being anomalous is less than 60%. Event is: eventType=DataMiningOutEvent object=q1  time=2904821976256 S.CQLMONTH=Dec, S.WEEKOFMONTH=5, S.DAYOFWEEK=Wednesday, S.MAKE=Honda, S.ACCIDENTAREA=Urban, S.DAYOFWEEKCLAIMED=Tuesday, S.MONTHCLAIMED=Jan, S.WEEKOFMONTHCLAIMED=1, S.SEX=Female, S.MARITALSTATUS=Single, S.AGE=21, S.FAULT=Policy Holder, S.POLICYTYPE=Sport - Liability, S.VEHICLECATEGORY=Sport, S.VEHICLEPRICE=more than 69000, S.FRAUDFOUND=0, S.POLICYNUMBER=1, S.REPNUMBER=12, S.DEDUCTIBLE=300, S.DRIVERRATING=1, S.DAYSPOLICYACCIDENT=more than 30, S.DAYSPOLICYCLAIM=more than 30, S.PASTNUMOFCLAIMS=none, S.AGEOFVEHICLES=3 years, S.AGEOFPOLICYHOLDER=26 to 30, S.POLICEREPORTFILED=No, S.WITNESSPRESENT=No, S.AGENTTYPE=External, S.NUMOFSUPP=none, S.ADDRCHGCLAIM=1 year, S.NUMOFCARS=3 to 4, S.CQLYEAR=1994, S.BASEPOLICY=Liability, probability=.58931702982118561 isTotalOrderGuarantee=true\nAnamoly probability: .58931702982118561 However, the following event is scored as an anomaly with a very high probability of  89%. So there is likely to be something wrong with it. A close look reveals that the value of "deductible" field (10000) is not "normal". What exactly constitutes normal here?. If you run the query on the database to find ALL distinct values for the "deductible" field, it returns the following set: {300, 400, 500, 700} Event is: eventType=DataMiningOutEvent object=q1  time=2598483773496 S.CQLMONTH=Dec, S.WEEKOFMONTH=5, S.DAYOFWEEK=Wednesday, S.MAKE=Honda, S.ACCIDENTAREA=Urban, S.DAYOFWEEKCLAIMED=Tuesday, S.MONTHCLAIMED=Jan, S.WEEKOFMONTHCLAIMED=1, S.SEX=Female, S.MARITALSTATUS=Single, S.AGE=21, S.FAULT=Policy Holder, S.POLICYTYPE=Sport - Liability, S.VEHICLECATEGORY=Sport, S.VEHICLEPRICE=more than 69000, S.FRAUDFOUND=0, S.POLICYNUMBER=1, S.REPNUMBER=12, S.DEDUCTIBLE=10000, S.DRIVERRATING=1, S.DAYSPOLICYACCIDENT=more than 30, S.DAYSPOLICYCLAIM=more than 30, S.PASTNUMOFCLAIMS=none, S.AGEOFVEHICLES=3 years, S.AGEOFPOLICYHOLDER=26 to 30, S.POLICEREPORTFILED=No, S.WITNESSPRESENT=No, S.AGENTTYPE=External, S.NUMOFSUPP=none, S.ADDRCHGCLAIM=1 year, S.NUMOFCARS=3 to 4, S.CQLYEAR=1994, S.BASEPOLICY=Liability, probability=.89171554529576691 isTotalOrderGuarantee=true\nAnamoly probability: .89171554529576691 Conclusion: By way of this example, we show: real-time scoring of events as they flow through CEP leveraging Oracle Data Mining.how CEP applications can invoke complex arbitrary external computations (function shipping) in an RDBMS.

    Read the article

  • What framework & technology would you use to make a site like fancy.com [on hold]

    - by adriancdperu
    Im about to start a 3 month process to build something similar to fancy.com. What are your opinions about: best language + framework? (server - side) = I was thinking about LAMP and CakePHP important technological issues to consider when developing? MAU assumption = 1 million. Main features: Registration with Facebook Normal registration Search articles Like articles Post articles Comment on articles Suggest articles to friends via mail, Facebook, Twitter and about 3 or 4 more apis Ranking of articles as a cron job done every minute, many criterias, many rankings Follow users Datamining users to mail everyday with articles they have high probability of liking Operation tools for admins to add articles and close user accounts Mobile focus

    Read the article

  • Azure Futures - Distributed Computing and Number Crunching

    - by JoshReuben
    "the biggest Azure customers today are the ones using HPC on-premises at the current time" - http://www.zdnet.com/blog/microsoft/windows-azure-futures-turning-the-cloud-into-a-supercomputer/8592?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+zdnet%2Fmicrosoft+%28ZDNet+All+About+Microsoft%29&utm_content=Google+Reader   Orleans Framework for cloud computing - http://research.microsoft.com/en-us/projects/orleans     HPC on Azure - http://www.zdnet.com/blog/microsoft/microsoft-finalizes-its-latest-supercomputing-operating-system-release/7414   Dryad is Microsoft’s competitor to Google MapReduce and Apache Hadoop  - http://www.zdnet.com/blog/microsoft/microsoft-takes-a-step-toward-commercializing-its-dryad-distributed-computing-technologies/8255?tag=mantle_skin;content   SQL Server Analysis Services DataMining in the cloud - http://www.sqlmag.com/article/reporting2/azure-data-mining-in-the-cloud.aspx

    Read the article

  • SQLAuthority News – Feature Pack for Microsoft SQL Server 2005 SP4

    - by pinaldave
    If you are still using SQL Server 2005 – I suggest that you consider migrating to later version of the SQL Server 2008/2008 R2. Due to any reason, you wanted to continue using SQL Server 2005, I suggest that you take a look at the Feature Pack for Microsoft SQL Server 2005 SP4. There are many different tools and features available in pack, which can be very handy and can solve issues. Microsoft ADOMD.NET Microsoft Core XML Services (MSXML) 6.0 Microsoft OLEDB Provider for DB2 Microsoft SQL Server Management Pack for MOM 2005 Microsoft SQL Server 2000 PivotTable Services Microsoft SQL Server 2000 DTS Designer Components Microsoft SQL Server Native Client Microsoft SQL Server 2005 Analysis Services 9.0 OLE DB Provider Microsoft SQL Server 2005 Backward Compatibility Components Microsoft SQL Server 2005 Command Line Query Utility Microsoft SQL Server 2005 Datamining Viewer Controls Microsoft SQL Server 2005 JDBC Driver Microsoft SQL Server 2005 Management Objects Collection Microsoft SQL Server 2005 Compact Edition Microsoft SQL Server 2005 Notification Services Client Components Microsoft SQL Server 2005 Upgrade Advisor Microsoft .NET Data Provider for mySAP Business Suite, Preview Version Reporting Add-In for Microsoft Visual Web Developer 2005 Express Microsoft Exception Message Box Data Mining Managed Plug-in Algorithm API for SQL Server 2005 Microsoft SQL Server 2005 Reporting Services Add-in for Microsoft SharePoint Technologies Microsoft SQL Server 2005 Data Mining Add-ins for Microsoft Office 2007 SQL Server 2005 Performance Dashboard Reports SQL Server 2005 Best Practices Analyzer Download Feature Pack for Microsoft SQL Server 2005 SP4 Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: SQL, SQL Authority, SQL Documentation, SQL Download, SQL Query, SQL Server, SQL Service Pack, SQL Tips and Tricks, SQLAuthority News, T SQL, Technology

    Read the article

  • links for 2010-04-29

    - by Bob Rhubart
    AS11 Oracle B2B Sync Support - Series 1 (Oracle Fusion Middleware - B2B Team Blog) Sinkarbabu Kirubanithi with part 1 of a planned 3-part series on synchronous message support in Oracle B2B 11g. (tags: oracle otn fusionmiddleware b2b) Java 2 Go!: How to write a simple yet “bullet-proof” object cache "So, while we were thinking hard to come up with the most efficient, generic and elegant way of finally implementing our weak and soft caches, Mr. Eric Chan, who is one of the main architects in Oracle Beehive team, had a very interesting breakthrough. In short terms, he thought of a very nice way of combining both WeakReference and SoftReference in our weak and soft caches so that they would provide exactly the same functionality without having to deal with those reference queues at all. Basically, instead of using a plain HashMap as our backing storage, we used a java.util.WeakHashMap in both our cache implementations. The hat trick was what and how to store things in it." - Eduardo Rodrigues (tags: oracle java sun) @jamet123: First Look – Oracle Data Mining "[Oracle Data Mining] is a nice product for Oracle database customers and well worth looking into. The new UI will only make it more so." James Taylor (tags: oracle otn datamining database) Live Webcast: Social BPM: Integrating Enterprise 2.0 with Business Applications #oracle Peggy Chen and Dan Tortorici show you how to take your business to the next level with a unified solution that fosters process-based collaboration between employees, partners, and customers. Wednesday, May 12, 2010 at 11:00am PT / 2:00pm ET (tags: oracle otn enterprise2.0 webcast)

    Read the article

  • F# books question

    - by Michiel Borkent
    I am now reading Foundations of F# by Robert Pickering and parallelly the book in progress 'Real World Functional Programming' by Tomas Petricek. My question is, what is the added value I would get from buying and reading the following books: 1) Expert F# by Don Syme and others 2) F# for Scientists by John Harrop Are those books still up to date with the current CTP version. What are things to keep notice of with respect to the recent changes in the language? Will there be reprinted updated versions? Also I want to learn more about datamining techniques with F# as a tool for this. What are good books to read next on this topic?

    Read the article

  • New Perl user: using a hash of arrays

    - by Zach H
    I'm doing a little datamining project where a perl script grabs info from a SQL database and parses it. The data consists of several timestamps. I want to find how many of a particular type of timestamp exist on any particular day. Unfortunately, this is my first perl script, and the nature of perl when it comes to hashes and arrays is confusing me quite a bit. Code segment: my %values=();#A hash of the total values of each type of data of each day. #The key is the day, and each key stores an array of each of the values I need. my @proposal; #[drafted timestamp(0), submitted timestamp(1), attny approved timestamp(2),Organiziation approved timestamp(3), Other approval timestamp(4), Approved Timestamp(5)] while(@proposal=$sqlresults->fetchrow_array()){ #TODO: check to make sure proposal is valid #Increment the number of timestamps of each type on each particular date my $i; for($i=0;$i<=5;$i++) $values{$proposal[$i]}[$i]++; #Update rolling average of daily #TODO: To check total load, increment total load on all dates between attourney approve date and accepted date for($i=$proposal[1];$i<=$proposal[2];$i++) $values{$i}[6]++; } I keep getting syntax errors inside the for loops incrementing values. Also, considering that I'm using strict and warnings, will Perl auto-create arrays of the right values when I'm accessing them inside the hash, or will I get out-of bounds errors everywhere? Thanks for any help, Zach

    Read the article

  • Real time location tracking - windows program or browser based?

    - by mawg
    I want to track a few hundred, maybe a few thousand people in real time. Let's say that the hardware aspects are sorted out and I can get the data into a database. Now, I want to get it out and show it, in real-time. Weeeell ... "real-enough" time. Let's say that I want to draw a floorplan of a building and plot everyone every 1 to 5 seconds. (I might want to show only certain "kinds" of people at the click of a button; I will need datamining, etc, but let's stick with the worse case scenario). I am comfortable enough with PHP, though not this sort of thing. I personally would be happier with a windows app coded in Delphi, but the trend seems to be to make everything browser based. So, the question, I guess is whether a browser can handle this and whether there are compelling arguments for a windows-based or browser-based solution. If browser-based can handle this (displaying a few thousand data-points a second), and there are no overwhelming arguments for windows then I guess I will go for browser-based and learn a few new tricks. The obvious advantage being that I could also re-use a large part of my code for (vehicle) tracking on Google maps.

    Read the article

  • Using a "white list" for extracting terms for Text Mining

    - by [email protected]
    In Part 1 of my post on "Generating cluster names from a document clustering model" (part 1, part 2, part 3), I showed how to build a clustering model from text documents using Oracle Data Miner, which automates preparing data for text mining. In this process we specified a custom stoplist and lexer and relied on Oracle Text to identify important terms.  However, there is an alternative approach, the white list, which uses a thesaurus object with the Oracle Text CTXRULE index to allow you to specify the important terms. INTRODUCTIONA stoplist is used to exclude, i.e., black list, specific words in your documents from being indexed. For example, words like a, if, and, or, and but normally add no value when text mining. Other words can also be excluded if they do not help to differentiate documents, e.g., the word Oracle is ubiquitous in the Oracle product literature. One problem with stoplists is determining which words to specify. This usually requires inspecting the terms that are extracted, manually identifying which ones you don't want, and then re-indexing the documents to determine if you missed any. Since a corpus of documents could contain thousands of words, this could be a tedious exercise. Moreover, since every word is considered as an individual token, a term excluded in one context may be needed to help identify a term in another context. For example, in our Oracle product literature example, the words "Oracle Data Mining" taken individually are not particular helpful. The term "Oracle" may be found in nearly all documents, as with the term "Data." The term "Mining" is more unique, but could also refer to the Mining industry. If we exclude "Oracle" and "Data" by specifying them in the stoplist, we lose valuable information. But it we include them, they may introduce too much noise. Still, when you have a broad vocabulary or don't have a list of specific terms of interest, you rely on the text engine to identify important terms, often by computing the term frequency - inverse document frequency metric. (This is effectively a weight associated with each term indicating its relative importance in a document within a collection of documents. We'll revisit this later.) The results using this technique is often quite valuable. As noted above, an alternative to the subtractive nature of the stoplist is to specify a white list, or a list of terms--perhaps multi-word--that we want to extract and use for data mining. The obvious downside to this approach is the need to specify the set of terms of interest. However, this may not be as daunting a task as it seems. For example, in a given domain (Oracle product literature), there is often a recognized glossary, or a list of keywords and phrases (Oracle product names, industry names, product categories, etc.). Being able to identify multi-word terms, e.g., "Oracle Data Mining" or "Customer Relationship Management" as a single token can greatly increase the quality of the data mining results. The remainder of this post and subsequent posts will focus on how to produce a dataset that contains white list terms, suitable for mining. CREATING A WHITE LIST We'll leverage the thesaurus capability of Oracle Text. Using a thesaurus, we create a set of rules that are in effect our mapping from single and multi-word terms to the tokens used to represent those terms. For example, "Oracle Data Mining" becomes "ORACLEDATAMINING." First, we'll create and populate a mapping table called my_term_token_map. All text has been converted to upper case and values in the TERM column are intended to be mapped to the token in the TOKEN column. TERM                                TOKEN DATA MINING                         DATAMINING ORACLE DATA MINING                  ORACLEDATAMINING 11G                                 ORACLE11G JAVA                                JAVA CRM                                 CRM CUSTOMER RELATIONSHIP MANAGEMENT    CRM ... Next, we'll create a thesaurus object my_thesaurus and a rules table my_thesaurus_rules: CTX_THES.CREATE_THESAURUS('my_thesaurus', FALSE); CREATE TABLE my_thesaurus_rules (main_term     VARCHAR2(100),                                  query_string  VARCHAR2(400)); We next populate the thesaurus object and rules table using the term token map. A cursor is defined over my_term_token_map. As we iterate over  the rows, we insert a synonym relationship 'SYN' into the thesaurus. We also insert into the table my_thesaurus_rules the main term, and the corresponding query string, which specifies synonyms for the token in the thesaurus. DECLARE   cursor c2 is     select token, term     from my_term_token_map; BEGIN   for r_c2 in c2 loop     CTX_THES.CREATE_RELATION('my_thesaurus',r_c2.token,'SYN',r_c2.term);     EXECUTE IMMEDIATE 'insert into my_thesaurus_rules values                        (:1,''SYN(' || r_c2.token || ', my_thesaurus)'')'     using r_c2.token;   end loop; END; We are effectively inserting the token to return and the corresponding query that will look up synonyms in our thesaurus into the my_thesaurus_rules table, for example:     'ORACLEDATAMINING'        SYN ('ORACLEDATAMINING', my_thesaurus)At this point, we create a CTXRULE index on the my_thesaurus_rules table: create index my_thesaurus_rules_idx on        my_thesaurus_rules(query_string)        indextype is ctxsys.ctxrule; In my next post, this index will be used to extract the tokens that match each of the rules specified. We'll then compute the tf-idf weights for each of the terms and create a nested table suitable for mining.

    Read the article

  • CodePlex Daily Summary for Sunday, June 23, 2013

    CodePlex Daily Summary for Sunday, June 23, 2013Popular ReleasesDalmatian Build Script: Dalmatian Build 0.1.0.0: This is the initial release of Dalmatian BuildFluent Ribbon Control Suite: Fluent Ribbon Control Suite 2.1.0 - Prerelease d: Fluent Ribbon Control Suite 2.1.0 - Prerelease d(supports .NET 3.5, 4.0 and 4.5) Includes: Fluent.dll (with .pdb and .xml) Showcase Application Samples (not for .NET 3.5) Foundation (Tabs, Groups, Contextual Tabs, Quick Access Toolbar, Backstage) Resizing (ribbon reducing & enlarging principles) Galleries (Gallery in ContextMenu, InRibbonGallery) MVVM (shows how to use this library with Model-View-ViewModel pattern) KeyTips ScreenTips Toolbars ColorGallery *Walkthrough (do...Three-Dimensional Maneuver Gear for Minecraft: TDMG 1.1.0.0 for 1.5.2: CodePlex???(????????) ?????????(???1/4) ??????????? ?????????? ???????????(??????????) ??????????????????????? ↑????、?????????????????????(???????) ???、??????????、?????????????????????、????????1.5?????????? Shift+W(????)??????????????????10°、?10°(?????????)???Hyper-V Management Pack Extensions 2012: HyperVMPE2012 (v1.0.1.126): Hyper-V Management Pack Extensions 2012 Beta ReleaseOutlook 2013 Add-In: Email appointments: This new version includes the following changes: - Ability to drag emails to the calendar to create appointments. Will gather all the recipients from all the emails and create an appointment on the day you drop the emails, with the text and subject of the last selected email (if more than one selected). - Increased maximum of numbers to display appointments to 30. You will have to uninstall the previous version (add/remove programs) if you had installed it before. Before unzipping the file...Caliburn Micro: WPF, Silverlight, WP7 and WinRT/Metro made easy.: Caliburn.Micro v1.5.2: v1.5.2 - This is a service release. We've fixed a number of issues with Tasks and IoC. We've made some consistency improvements across platforms and fixed a number of minor bugs. See changes.txt for details. Packages Available on Nuget Caliburn.Micro – The full framework compiled into an assembly. Caliburn.Micro.Start - Includes Caliburn.Micro plus a starting bootstrapper, view model and view. Caliburn.Micro.Container – The Caliburn.Micro inversion of control container (IoC); source code...SQL Compact Query Analyzer: 1.0.1.1511: Beta build of SQL Compact Query Analyzer Bug fixes: - Resolved issue where the application crashes when loading a database that contains tables without a primary key Features: - Displays database information (database version, filename, size, creation date) - Displays schema summary (number of tables, columns, primary keys, identity fields, nullable fields) - Displays the information schema views - Displays column information (database type, clr type, max length, allows null, etc) - Support...CODE Framework: 4.0.30618.0: See change notes in the documentation section for details on what's new. Note: If you download the class reference help file with, you have to right-click the file, pick "Properties", and then unblock the file, as many browsers flag the file as blocked during download (for security reasons) and thus hides all content.Toolbox for Dynamics CRM 2011: XrmToolBox (v1.2013.6.18): XrmToolbox improvement Use new connection controls (use of Microsoft.Xrm.Client.dll) New display capabilities for tools (size, image and colors) Added prerequisites check Added Most Used Tools feature Tools improvementNew toolSolution Transfer Tool (v1.0.0.0) developed by DamSim Updated toolView Layout Replicator (v1.2013.6.17) Double click on source view to display its layoutXml All tools list Access Checker (v1.2013.6.17) Attribute Bulk Updater (v1.2013.6.18) FetchXml Tester (v1.2013.6.1...Media Companion: Media Companion MC3.570b: New* Movie - using XBMC TMDB - now renames movies if option selected. * Movie - using Xbmc Tmdb - Actor images saved from TMDb if option selected. Fixed* Movie - Checks for poster.jpg against missing poster filter * Movie - Fixed continual scraping of vob movie file (not DVD structure) * Both - Correctly display audio channels * Both - Correctly populate audio info in nfo's if multiple audio tracks. * Both - added icons and checked for DTS ES and Dolby TrueHD audio tracks. * Both - Stream d...LINQ Extensions Library: 1.0.4.2: New to release 1.0.4.2 Custom sorting extensions that perform up to 50% better than LINQ OrderBy, ThenBy extensions... Extensions allow for fine tuning of the sort by controlling the algorithm each sort uses.Document.Editor: 2013.24: What's new for Document.Editor 2013.24: Improved Video Editing support Improved Link Editing support Minor Bug Fix's, improvements and speed upsExtJS based ASP.NET Controls: FineUI v3.3.0: ??FineUI ?? ExtJS ??? ASP.NET ???。 FineUI??? ?? No JavaScript,No CSS,No UpdatePanel,No ViewState,No WebServices ???????。 ?????? IE 7.0、Firefox 3.6、Chrome 3.0、Opera 10.5、Safari 3.0+ ???? Apache License v2.0 ?:ExtJS ?? GPL v3 ?????(http://www.sencha.com/license)。 ???? ??:http://fineui.com/bbs/ ??:http://fineui.com/demo/ ??:http://fineui.com/doc/ ??:http://fineui.codeplex.com/ FineUI???? ExtJS ?????????,???? ExtJS ?。 ????? FineUI ? ExtJS ?:http://fineui.com/bbs/forum.php?mod=viewthrea...BarbaTunnel: BarbaTunnel 8.0: Check Version History for more information about this release.ExpressProfiler: ExpressProfiler v1.5: [+] added Start time, End time event columns [+] added SP:StmtStarting, SP:StmtCompleted events [*] fixed bug with Audit:Logout eventpatterns & practices: Data Access Guidance: Data Access Guidance Drop4 2013.06.17: Drop 4Microsoft Ajax Minifier: Microsoft Ajax Minifier 4.94: add dstLine and dstCol attributes to the -Analyze output in XML mode. un-combine leftover comma-separates expression statements after optimizations are complete so downstream tools don't stack-overflow on really deep comma trees. add support for using a single source map generator instance with multiple runs of MinifyJavaScript, assuming that the results are concatenated to the same output file.Kooboo CMS: Kooboo CMS 4.1.1: The stable release of Kooboo CMS 4.1.0 with fixed the following issues: https://github.com/Kooboo/CMS/issues/1 https://github.com/Kooboo/CMS/issues/11 https://github.com/Kooboo/CMS/issues/13 https://github.com/Kooboo/CMS/issues/15 https://github.com/Kooboo/CMS/issues/19 https://github.com/Kooboo/CMS/issues/20 https://github.com/Kooboo/CMS/issues/24 https://github.com/Kooboo/CMS/issues/43 https://github.com/Kooboo/CMS/issues/45 https://github.com/Kooboo/CMS/issues/46 https://github....VidCoder: 1.5.0 Beta: The betas have started up again! If you were previously on the beta track you will need to install this to get back on it. That's because you can now run both the Beta and Stable version of VidCoder side-by-side! Note that the OpenCL and Intel QuickSync changes being tested by HandBrake are not in the betas yet. They will appear when HandBrake integrates them into the main branch. Updated HandBrake core to SVN 5590. This adds a new FDK AAC encoder. The FAAC encoder has been removed and now...Wsus Package Publisher: Release v1.2.1306.16: Date/Time are displayed as Local Time. (Last Contact, Last Report and DeadLine) Wpp now remember the last used path for update publishing. (See 'Settings' Form for options) Add an option to allow users to publish an update even if the Framework has judged the certificate as invalid. (Attention : Using this option will NOT allow you to publish or revise an update if your certificate is really invalid). When publishing a new update, filter update files to ensure that there is not files wi...New ProjectsBrowserQuest for Windows Phone: Source code for the Windows Phone version of BrowserQuest.Content Factory: The Content Factory is an auxiliary MonoGame game development tool. It provides contents management functions.Csharp: Ask a question, and I'll see if I have the source code in my library, and if I do, I'll post it.CY_MVC: ??webform?????MVC??! MVC framework based on WebForm!EasyLogging++: Single header based, extremely light-weight high performance logging library for C++ applications Git repository: https://github.com/mkhan3189/EasyLoggingPPhxwebapp: I want to build up a framwork, so that when I have new project, I just do the things: design the database, then configurate the view models. that's all...ll-furniture: This project is for testMeraProject: Ask anythingPath Editor: Edit PATH environment on Windows convenientlyProtobuf GUI: GUI for Google ProtobufQios Devsuite: Qios DevSuite is an advanced .NET control library, that is fully integrated with Visual Studio.NET and can be used with all .NET languagesQuesTime: questimeRaspberryPi .NET Microframework (NETMF): This project intend to port the main .NET Microframework (NETMF) classes to the RaspberryPi using Mono and have access to the Microsoft.SPOT.Hardware classes.SharePoint 2013 My Sites Navigation Workaround: This project is a workaround for links not always appearing in the left navigation of a My Sites portal in SharePoint 2013.SkyNet Utility: A Skype Utility I'm making to use. It will be a general, all-around tool.SurveyManifestacoes: Surveytan's address book: tan's address bookTisonet.DataMining: Implementation of data mining algorithmsTscMon: Simple application for showing the number of running TypeScript compilers in the notification area.VB6: Helping Businesses Users and Other Developers with VB6 Ask a question, and I'll see if I have the source code in my library, and if I do, I'll post it.VeraCrypt: An open source disk encryption tool with strong security for the ParanoidWinform2GTA4: this tool convert winform built with visual studio designer to a netscripthook GTA Form. Converted forms are wrote in c# language.

    Read the article

< Previous Page | 1 2