Search Results

Search found 35102 results on 1405 pages for 'text mining'.

Page 1/1405 | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Using a "white list" for extracting terms for Text Mining

    - by [email protected]
    In Part 1 of my post on "Generating cluster names from a document clustering model" (part 1, part 2, part 3), I showed how to build a clustering model from text documents using Oracle Data Miner, which automates preparing data for text mining. In this process we specified a custom stoplist and lexer and relied on Oracle Text to identify important terms.  However, there is an alternative approach, the white list, which uses a thesaurus object with the Oracle Text CTXRULE index to allow you to specify the important terms. INTRODUCTIONA stoplist is used to exclude, i.e., black list, specific words in your documents from being indexed. For example, words like a, if, and, or, and but normally add no value when text mining. Other words can also be excluded if they do not help to differentiate documents, e.g., the word Oracle is ubiquitous in the Oracle product literature. One problem with stoplists is determining which words to specify. This usually requires inspecting the terms that are extracted, manually identifying which ones you don't want, and then re-indexing the documents to determine if you missed any. Since a corpus of documents could contain thousands of words, this could be a tedious exercise. Moreover, since every word is considered as an individual token, a term excluded in one context may be needed to help identify a term in another context. For example, in our Oracle product literature example, the words "Oracle Data Mining" taken individually are not particular helpful. The term "Oracle" may be found in nearly all documents, as with the term "Data." The term "Mining" is more unique, but could also refer to the Mining industry. If we exclude "Oracle" and "Data" by specifying them in the stoplist, we lose valuable information. But it we include them, they may introduce too much noise. Still, when you have a broad vocabulary or don't have a list of specific terms of interest, you rely on the text engine to identify important terms, often by computing the term frequency - inverse document frequency metric. (This is effectively a weight associated with each term indicating its relative importance in a document within a collection of documents. We'll revisit this later.) The results using this technique is often quite valuable. As noted above, an alternative to the subtractive nature of the stoplist is to specify a white list, or a list of terms--perhaps multi-word--that we want to extract and use for data mining. The obvious downside to this approach is the need to specify the set of terms of interest. However, this may not be as daunting a task as it seems. For example, in a given domain (Oracle product literature), there is often a recognized glossary, or a list of keywords and phrases (Oracle product names, industry names, product categories, etc.). Being able to identify multi-word terms, e.g., "Oracle Data Mining" or "Customer Relationship Management" as a single token can greatly increase the quality of the data mining results. The remainder of this post and subsequent posts will focus on how to produce a dataset that contains white list terms, suitable for mining. CREATING A WHITE LIST We'll leverage the thesaurus capability of Oracle Text. Using a thesaurus, we create a set of rules that are in effect our mapping from single and multi-word terms to the tokens used to represent those terms. For example, "Oracle Data Mining" becomes "ORACLEDATAMINING." First, we'll create and populate a mapping table called my_term_token_map. All text has been converted to upper case and values in the TERM column are intended to be mapped to the token in the TOKEN column. TERM                                TOKEN DATA MINING                         DATAMINING ORACLE DATA MINING                  ORACLEDATAMINING 11G                                 ORACLE11G JAVA                                JAVA CRM                                 CRM CUSTOMER RELATIONSHIP MANAGEMENT    CRM ... Next, we'll create a thesaurus object my_thesaurus and a rules table my_thesaurus_rules: CTX_THES.CREATE_THESAURUS('my_thesaurus', FALSE); CREATE TABLE my_thesaurus_rules (main_term     VARCHAR2(100),                                  query_string  VARCHAR2(400)); We next populate the thesaurus object and rules table using the term token map. A cursor is defined over my_term_token_map. As we iterate over  the rows, we insert a synonym relationship 'SYN' into the thesaurus. We also insert into the table my_thesaurus_rules the main term, and the corresponding query string, which specifies synonyms for the token in the thesaurus. DECLARE   cursor c2 is     select token, term     from my_term_token_map; BEGIN   for r_c2 in c2 loop     CTX_THES.CREATE_RELATION('my_thesaurus',r_c2.token,'SYN',r_c2.term);     EXECUTE IMMEDIATE 'insert into my_thesaurus_rules values                        (:1,''SYN(' || r_c2.token || ', my_thesaurus)'')'     using r_c2.token;   end loop; END; We are effectively inserting the token to return and the corresponding query that will look up synonyms in our thesaurus into the my_thesaurus_rules table, for example:     'ORACLEDATAMINING'        SYN ('ORACLEDATAMINING', my_thesaurus)At this point, we create a CTXRULE index on the my_thesaurus_rules table: create index my_thesaurus_rules_idx on        my_thesaurus_rules(query_string)        indextype is ctxsys.ctxrule; In my next post, this index will be used to extract the tokens that match each of the rules specified. We'll then compute the tf-idf weights for each of the terms and create a nested table suitable for mining.

    Read the article

  • Using a "white list" for extracting terms for Text Mining, Part 2

    - by [email protected]
    In my last post, we set the groundwork for extracting specific tokens from a white list using a CTXRULE index. In this post, we will populate a table with the extracted tokens and produce a case table suitable for clustering with Oracle Data Mining. Our corpus of documents will be stored in a database table that is defined as create table documents(id NUMBER, text VARCHAR2(4000)); However, any suitable Oracle Text-accepted data type can be used for the text. We then create a table to contain the extracted tokens. The id column contains the unique identifier (or case id) of the document. The token column contains the extracted token. Note that a given document many have many tokens, so there will be one row per token for a given document. create table extracted_tokens (id NUMBER, token VARCHAR2(4000)); The next step is to iterate over the documents and extract the matching tokens using the index and insert them into our token table. We use the MATCHES function for matching the query_string from my_thesaurus_rules with the text. DECLARE     cursor c2 is       select id, text       from documents; BEGIN     for r_c2 in c2 loop        insert into extracted_tokens          select r_c2.id id, main_term token          from my_thesaurus_rules          where matches(query_string,                        r_c2.text)>0;     end loop; END; Now that we have the tokens, we can compute the term frequency - inverse document frequency (TF-IDF) for each token of each document. create table extracted_tokens_tfidf as   with num_docs as (select count(distinct id) doc_cnt                     from extracted_tokens),        tf       as (select a.id, a.token,                            a.token_cnt/b.num_tokens token_freq                     from                        (select id, token, count(*) token_cnt                        from extracted_tokens                        group by id, token) a,                       (select id, count(*) num_tokens                        from extracted_tokens                        group by id) b                     where a.id=b.id),        doc_freq as (select token, count(*) overall_token_cnt                     from extracted_tokens                     group by token)   select tf.id, tf.token,          token_freq * ln(doc_cnt/df.overall_token_cnt) tf_idf   from num_docs,        tf,        doc_freq df   where df.token=tf.token; From the WITH clause, the num_docs query simply counts the number of documents in the corpus. The tf query computes the term (token) frequency by computing the number of times each token appears in a document and divides that by the number of tokens found in the document. The doc_req query counts the number of times the token appears overall in the corpus. In the SELECT clause, we compute the tf_idf. Next, we create the nested table required to produce one record per case, where a case corresponds to an individual document. Here, we COLLECT all the tokens for a given document into the nested column extracted_tokens_tfidf_1. CREATE TABLE extracted_tokens_tfidf_nt              NESTED TABLE extracted_tokens_tfidf_1                  STORE AS extracted_tokens_tfidf_tab AS              select id,                     cast(collect(DM_NESTED_NUMERICAL(token,tf_idf)) as DM_NESTED_NUMERICALS) extracted_tokens_tfidf_1              from extracted_tokens_tfidf              group by id;   To build the clustering model, we create a settings table and then insert the various settings. Most notable are the number of clusters (20), using cosine distance which is better for text, turning off auto data preparation since the values are ready for mining, the number of iterations (20) to get a better model, and the split criterion of size for clusters that are roughly balanced in number of cases assigned. CREATE TABLE km_settings (setting_name  VARCHAR2(30), setting_value VARCHAR2(30)); BEGIN  INSERT INTO km_settings (setting_name, setting_value) VALUES     VALUES (dbms_data_mining.clus_num_clusters, 20);  INSERT INTO km_settings (setting_name, setting_value)     VALUES (dbms_data_mining.kmns_distance, dbms_data_mining.kmns_cosine);   INSERT INTO km_settings (setting_name, setting_value) VALUES     VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_off);   INSERT INTO km_settings (setting_name, setting_value) VALUES     VALUES (dbms_data_mining.kmns_iterations,20);   INSERT INTO km_settings (setting_name, setting_value) VALUES     VALUES (dbms_data_mining.kmns_split_criterion,dbms_data_mining.kmns_size);   COMMIT; END; With this in place, we can now build the clustering model. BEGIN     DBMS_DATA_MINING.CREATE_MODEL(     model_name          => 'TEXT_CLUSTERING_MODEL',     mining_function     => dbms_data_mining.clustering,     data_table_name     => 'extracted_tokens_tfidf_nt',     case_id_column_name => 'id',     settings_table_name => 'km_settings'); END;To generate cluster names from this model, check out my earlier post on that topic.

    Read the article

  • Data Mining Resources

    - by Dejan Sarka
    There are many different types of analyses, each one with its own pros and cons. Relational reports have a predefined structure, and end users cannot change it. They are simple to use for end users. Reports can use real-time data and snapshots of data to show the state of a report at specific points in time. One of the drawbacks is that report authoring is limited to IT pros and advanced users. Any kind of dynamic restructuring is very limited. If real-time data is used for a report, the report has a negative impact on the performance of the source system. Processing of the reports might be slow because the data comes from relational database management systems, which are not optimized for reporting only. If you create a semantic model of your data, your end users can create ad-hoc report structures. However, the development is more complex because a developer is needed to create these semantic models. For OLAP, you typically use specialized database management systems. You get lightning speed of analyses. End users can use rich and thin clients to interactively change the structure of the report. Typically, they do it graphically. However, the development of an OLAP system is many times quite complex. It involves the preparation and maintenance of an enterprise data warehouse and OLAP cubes. In order to exploit the possibility of real-time restructuring of reports, the users must be both active and educated. The data is usually stale, as it is loaded into data warehouses and OLAP cubes with a scheduled process. With data mining, a structure is not selected in advance; it searches for the structure. As a result, data mining can give you the most valuable results because you can discover patterns you did not expect. A data mining model structure is limited only by the attributes that you use to train the model. One of the drawbacks is that a lot of knowledge is needed for a successful data mining project. End users have to understand the results. Subject matter experts and IT professionals need to understand business problem thoroughly. The development might be sometimes even more complex than the development of OLAP cubes. Each type of analysis has its own place in an enterprise system. SQL Server has tools for all kinds of analyses. However, data mining is the most advanced way of analyzing the data; this is the “I” in BI. In order to get the most out of it, you need to learn quite a lot. In this blog post, I am gathering together resources for learning, including forthcoming events. Books Multiple authors: SQL Server MVP Deep Dives – I wrote an introductory data mining chapter there. Erik Veerman, Teo Lachev and Dejan Sarka: MCTS Self-Paced Training Kit (Exam 70-448): Microsoft SQL Server 2008 - Business Intelligence Development and Maintenance – you can find a good overview of a complete BI solution, including data mining, in this book. Jamie MacLennan, ZhaoHui Tang, and Bogdan Crivat: Data Mining with Microsoft SQL Server 2008 – can’t miss this book if you want to mine your data with SQL Server tools. Michael Berry, Gordon Linoff: Mastering Data Mining: The Art and Science of Customer Relationship Management – data mining from both, business and technical perspective. Dorian Pyle: Data Preparation for Data Mining – an in-depth book about data preparation. Thomas and Ronald Wonnacott: Introductory Statistics – if you thought that you could get away without statistics, then you are not serious about data mining. Jiawei Han and Micheline Kamber: Data Mining Concepts and Techniques – in-depth explanation of the most popular data mining algorithms. Michael Berry and Gordon Linoff: Data Mining Techniques – another book that explains data mining algorithms, more fro a business perspective. Paolo Guidici: Applied Data Mining – very mathematical book, only if you enjoy statistics and mathematics in general. Forthcoming presentations I am presenting two data mining related sessions during the PASS Summit in Charlotte, NC: Wednesday, October 16th, 2013 - Fraud Detection: Notes from the Field – I am showing how to use data mining for a specific business problem. The presentation is based on real-life projects. Friday, October 18th: Excel 2013 Advanced Analytics – I am focusing on Excel Data Mining Add-ins, and how to use them together with Power Pivot and other add-ins. This is the most you can get out of Excel. Sinergija 2013, Belgrade, Serbia Tuesday, October 22nd: Excel 2013 Analytics to the Max – another presentation focusing on the most advanced analytics you can get in Excel. SQL Rally Amsterdam, Netherlands Thursday, November 7th: Advanced Analytics in Excel 2013 – and again I am presenting about data mining in Excel. Why three different titles for the same presentation? I don’t know, I guess I forgot the name I proposed every time right after I sent the proposal. Courses Data Mining with SQL Server 2012 – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. OK, now you know: no more excuses, start learning data mining, get the most out of your data

    Read the article

  • Deploying Data Mining Models using Model Export and Import

    - by [email protected]
    In this post, we'll take a look at how Oracle Data Mining facilitates model deployment. After building and testing models, a next step is often putting your data mining model into a production system -- referred to as model deployment. The ability to move data mining model(s) easily into a production system can greatly speed model deployment, and reduce the overall cost. Since Oracle Data Mining provides models as first class database objects, models can be manipulated using familiar database techniques and technology. For example, one or more models can be exported to a flat file, similar to a database table dump file (.dmp). This file can be moved to a different instance of Oracle Database EE, and then imported. All methods for exporting and importing models are based on Oracle Data Pump technology and found in the DBMS_DATA_MINING package. Before performing the actual export or import, a directory object must be created. A directory object is a logical name in the database for a physical directory on the host computer. Read/write access to a directory object is necessary to access the host computer file system from within Oracle Database. For our example, we'll work in the DMUSER schema. First, DMUSER requires the privilege to create any directory. This is often granted through the sysdba account. grant create any directory to dmuser; Now, DMUSER can create the directory object specifying the path where the exported model file (.dmp) should be placed. In this case, on a linux machine, we have the directory /scratch/oracle. CREATE OR REPLACE DIRECTORY dmdir AS '/scratch/oracle'; If you aren't sure of the exact name of the model or models to export, you can find the list of models using the following query: select model_name from user_mining_models; There are several options when exporting models. We can export a single model, multiple models, or all models in a schema using the following procedure calls: BEGIN   DBMS_DATA_MINING.EXPORT_MODEL ('MY_MODEL.dmp','dmdir','name =''MY_DT_MODEL'''); END; BEGIN   DBMS_DATA_MINING.EXPORT_MODEL ('MY_MODELS.dmp','dmdir',              'name IN (''MY_DT_MODEL'',''MY_KM_MODEL'')'); END; BEGIN   DBMS_DATA_MINING.EXPORT_MODEL ('ALL_DMUSER_MODELS.dmp','dmdir'); END; A .dmp file can be imported into another schema or database using the following procedure call, for example: BEGIN   DBMS_DATA_MINING.IMPORT_MODEL('MY_MODELS.dmp', 'dmdir'); END; As with models from any data mining tool, when moving a model from one environment to another, care needs to be taken to ensure the transformations that prepare the data for model building are matched (with appropriate parameters and statistics) in the system where the model is deployed. Oracle Data Mining provides automatic data preparation (ADP) and embedded data preparation (EDP) to reduce, or possibly eliminate, the need to explicitly transport transformations with the model. In the case of ADP, ODM automatically prepares the data and includes the necessary transformations in the model itself. In the case of EDP, users can associate their own transformations with attributes of a model. These transformations are automatically applied when applying the model to data, i.e., scoring. Exporting and importing a model with ADP or EDP results in these transformations being immediately available with the model in the production system.

    Read the article

  • Text mining on large database (data mining)

    - by yox
    Hello, I have a large database of resumes (CV), and a certain table skills grouping all users skills. inside that table there's a field skill_text that describes the skill in full text. I'm looking for an algorithm/software/method to extract significant terms/phrases from that table in order to build a new table with standarized skills.. Here are some examples skills extracted from the DB : Sectoral and competitive analysis Business Development (incl. in international settings) Specific structure and road design software - Microstation, Macao, AutoCAD (basic knowledge) Creative work (Photoshop, In-Design, Illustrator) checking and reporting back on campaign progress organising and attending events and exhibitions Development : Aptana Studio, PHP, HTML, CSS, JavaScript, SQL, AJAX Discipline: One to one marketing, E-marketing (SEO & SEA, display, emailing, affiliate program) Mix marketing, Viral Marketing, Social network marketing. The output shoud be something like : Sectoral and competitive analysis Business Development Specific structure and road design software - Macao AutoCAD Photoshop In-Design Illustrator organising events Development Aptana Studio PHP HTML CSS JavaScript SQL AJAX Mix marketing Viral Marketing Social network marketing emailing SEO One to one marketing As you see only skills remains no other representation text. I know this is possible using text mining technics but how to do it ? the database is realy large.. it's a good thing because we can calculate text frequency and decide if it's a real skill or just meaningless text... The big problem is .. how to determin that "blablabla" is a skill ? thanks

    Read the article

  • Deploying Data Mining Models using Model Export and Import, Part 2

    - by [email protected]
    In my last post, Deploying Data Mining Models using Model Export and Import, we explored using DBMS_DATA_MINING.EXPORT_MODEL and DBMS_DATA_MINING.IMPORT_MODEL to enable moving a model from one system to another. In this post, we'll look at two distributed scenarios that make use of this capability and a tip for easily moving models from one machine to another using only Oracle Database, not an external file transport mechanism, such as FTP. The first scenario, consider a company with geographically distributed business units, each collecting and managing their data locally for the products they sell. Each business unit has in-house data analysts that build models to predict which products to recommend to customers in their space. A central telemarketing business unit also uses these models to score new customers locally using data collected over the phone. Since the models recommend different products, each customer is scored using each model. This is depicted in Figure 1.Figure 1: Target instance importing multiple remote models for local scoring In the second scenario, consider multiple hospitals that collect data on patients with certain types of cancer. The data collection is standardized, so each hospital collects the same patient demographic and other health / tumor data, along with the clinical diagnosis. Instead of each hospital building it's own models, the data is pooled at a central data analysis lab where a predictive model is built. Once completed, the model is distributed to hospitals, clinics, and doctor offices who can score patient data locally.Figure 2: Multiple target instances importing the same model from a source instance for local scoring Since this blog focuses on model export and import, we'll only discuss what is necessary to move a model from one database to another. Here, we use the package DBMS_FILE_TRANSFER, which can move files between Oracle databases. The script is fairly straightforward, but requires setting up a database link and directory objects. We saw how to create directory objects in the previous post. To create a database link to the source database from the target, we can use, for example: create database link SOURCE1_LINK connect to <schema> identified by <password> using 'SOURCE1'; Note that 'SOURCE1' refers to the service name of the remote database entry in your tnsnames.ora file. From SQL*Plus, first connect to the remote database and export the model. Note that the model_file_name does not include the .dmp extension. This is because export_model appends "01" to this name.  Next, connect to the local database and invoke DBMS_FILE_TRANSFER.GET_FILE and import the model. Note that "01" is eliminated in the target system file name.  connect <source_schema>/<password>@SOURCE1_LINK; BEGIN  DBMS_DATA_MINING.EXPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp',                                 'MY_SOURCE_DIR_OBJECT',                                 'name =''MY_MINING_MODEL'''); END; connect <target_schema>/<password>; BEGIN  DBMS_FILE_TRANSFER.GET_FILE ('MY_SOURCE_DIR_OBJECT',                               'EXPORT_FILE_NAME' || '01.dmp',                               'SOURCE1_LINK',                               'MY_TARGET_DIR_OBJECT',                               'EXPORT_FILE_NAME' || '.dmp' );  DBMS_DATA_MINING.IMPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp',                                 'MY_TARGET_DIR_OBJECT'); END; To clean up afterward, you may want to drop the exported .dmp file at the source and the transferred file at the target. For example, utl_file.fremove('&directory_name', '&model_file_name' || '.dmp');

    Read the article

  • Error in running script [closed]

    - by SWEngineer
    I'm trying to run heathusf_v1.1.0.tar.gz found here I installed tcsh to make build_heathusf work. But, when I run ./build_heathusf, I get the following (I'm running that on a Fedora Linux system from Terminal): $ ./build_heathusf Compiling programs to build a library of image processing functions. convexpolyscan.c: In function ‘cdelete’: convexpolyscan.c:346:5: warning: incompatible implicit declaration of built-in function ‘bcopy’ [enabled by default] myalloc.c: In function ‘mycalloc’: myalloc.c:68:16: error: invalid storage class for function ‘store_link’ myalloc.c: In function ‘mymalloc’: myalloc.c:101:16: error: invalid storage class for function ‘store_link’ myalloc.c: In function ‘myfree’: myalloc.c:129:27: error: invalid storage class for function ‘find_link’ myalloc.c:131:12: warning: assignment makes pointer from integer without a cast [enabled by default] myalloc.c: At top level: myalloc.c:150:13: warning: conflicting types for ‘store_link’ [enabled by default] myalloc.c:150:13: error: static declaration of ‘store_link’ follows non-static declaration myalloc.c:91:4: note: previous implicit declaration of ‘store_link’ was here myalloc.c:164:24: error: conflicting types for ‘find_link’ myalloc.c:131:14: note: previous implicit declaration of ‘find_link’ was here Building the mammogram resizing program. gcc -O2 -I. -I../common mkimage.o -o mkimage -L../common -lmammo -lm ../common/libmammo.a(aggregate.o): In function `aggregate': aggregate.c:(.text+0x7fa): undefined reference to `mycalloc' aggregate.c:(.text+0x81c): undefined reference to `mycalloc' aggregate.c:(.text+0x868): undefined reference to `mycalloc' ../common/libmammo.a(aggregate.o): In function `aggregate_median': aggregate.c:(.text+0xbc5): undefined reference to `mymalloc' aggregate.c:(.text+0xbfb): undefined reference to `mycalloc' aggregate.c:(.text+0xc3c): undefined reference to `mycalloc' ../common/libmammo.a(aggregate.o): In function `aggregate': aggregate.c:(.text+0x9b5): undefined reference to `myfree' ../common/libmammo.a(aggregate.o): In function `aggregate_median': aggregate.c:(.text+0xd85): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `linear_optical_density': optical_density.c:(.text+0x29e): undefined reference to `mymalloc' optical_density.c:(.text+0x342): undefined reference to `mycalloc' optical_density.c:(.text+0x383): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `log10_optical_density': optical_density.c:(.text+0x693): undefined reference to `mymalloc' optical_density.c:(.text+0x74f): undefined reference to `mycalloc' optical_density.c:(.text+0x790): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `map_with_ushort_lut': optical_density.c:(.text+0xb2e): undefined reference to `mymalloc' optical_density.c:(.text+0xb87): undefined reference to `mycalloc' optical_density.c:(.text+0xbc6): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `linear_optical_density': optical_density.c:(.text+0x4d9): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `log10_optical_density': optical_density.c:(.text+0x8f1): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `map_with_ushort_lut': optical_density.c:(.text+0xd0d): undefined reference to `myfree' ../common/libmammo.a(virtual_image.o): In function `deallocate_cached_image': virtual_image.c:(.text+0x3dc6): undefined reference to `myfree' virtual_image.c:(.text+0x3dd7): undefined reference to `myfree' ../common/libmammo.a(virtual_image.o):virtual_image.c:(.text+0x3de5): more undefined references to `myfree' follow ../common/libmammo.a(virtual_image.o): In function `allocate_cached_image': virtual_image.c:(.text+0x4233): undefined reference to `mycalloc' virtual_image.c:(.text+0x4253): undefined reference to `mymalloc' virtual_image.c:(.text+0x4275): undefined reference to `mycalloc' virtual_image.c:(.text+0x42e7): undefined reference to `mycalloc' virtual_image.c:(.text+0x44f9): undefined reference to `mycalloc' virtual_image.c:(.text+0x47a9): undefined reference to `mycalloc' virtual_image.c:(.text+0x4a45): undefined reference to `mycalloc' virtual_image.c:(.text+0x4af4): undefined reference to `myfree' collect2: error: ld returned 1 exit status make: *** [mkimage] Error 1 Building the breast segmentation program. gcc -O2 -I. -I../common breastsegment.o segment.o -o breastsegment -L../common -lmammo -lm breastsegment.o: In function `render_segmentation_sketch': breastsegment.c:(.text+0x43): undefined reference to `mycalloc' breastsegment.c:(.text+0x58): undefined reference to `mycalloc' breastsegment.c:(.text+0x12f): undefined reference to `mycalloc' breastsegment.c:(.text+0x1b9): undefined reference to `myfree' breastsegment.c:(.text+0x1c6): undefined reference to `myfree' breastsegment.c:(.text+0x1e1): undefined reference to `myfree' segment.o: In function `find_center': segment.c:(.text+0x53): undefined reference to `mycalloc' segment.c:(.text+0x71): undefined reference to `mycalloc' segment.c:(.text+0x387): undefined reference to `myfree' segment.o: In function `bordercode': segment.c:(.text+0x4ac): undefined reference to `mycalloc' segment.c:(.text+0x546): undefined reference to `mycalloc' segment.c:(.text+0x651): undefined reference to `mycalloc' segment.c:(.text+0x691): undefined reference to `myfree' segment.o: In function `estimate_tissue_image': segment.c:(.text+0x10d4): undefined reference to `mycalloc' segment.c:(.text+0x14da): undefined reference to `mycalloc' segment.c:(.text+0x1698): undefined reference to `mycalloc' segment.c:(.text+0x1834): undefined reference to `mycalloc' segment.c:(.text+0x1850): undefined reference to `mycalloc' segment.o:segment.c:(.text+0x186a): more undefined references to `mycalloc' follow segment.o: In function `estimate_tissue_image': segment.c:(.text+0x1bbc): undefined reference to `myfree' segment.c:(.text+0x1c4a): undefined reference to `mycalloc' segment.c:(.text+0x1c7c): undefined reference to `mycalloc' segment.c:(.text+0x1d8e): undefined reference to `myfree' segment.c:(.text+0x1d9b): undefined reference to `myfree' segment.c:(.text+0x1da8): undefined reference to `myfree' segment.c:(.text+0x1dba): undefined reference to `myfree' segment.c:(.text+0x1dc9): undefined reference to `myfree' segment.o:segment.c:(.text+0x1dd8): more undefined references to `myfree' follow segment.o: In function `estimate_tissue_image': segment.c:(.text+0x20bf): undefined reference to `mycalloc' segment.o: In function `segment_breast': segment.c:(.text+0x24cd): undefined reference to `mycalloc' segment.o: In function `find_center': segment.c:(.text+0x3a4): undefined reference to `myfree' segment.o: In function `bordercode': segment.c:(.text+0x6ac): undefined reference to `myfree' ../common/libmammo.a(aggregate.o): In function `aggregate': aggregate.c:(.text+0x7fa): undefined reference to `mycalloc' aggregate.c:(.text+0x81c): undefined reference to `mycalloc' aggregate.c:(.text+0x868): undefined reference to `mycalloc' ../common/libmammo.a(aggregate.o): In function `aggregate_median': aggregate.c:(.text+0xbc5): undefined reference to `mymalloc' aggregate.c:(.text+0xbfb): undefined reference to `mycalloc' aggregate.c:(.text+0xc3c): undefined reference to `mycalloc' ../common/libmammo.a(aggregate.o): In function `aggregate': aggregate.c:(.text+0x9b5): undefined reference to `myfree' ../common/libmammo.a(aggregate.o): In function `aggregate_median': aggregate.c:(.text+0xd85): undefined reference to `myfree' ../common/libmammo.a(cc_label.o): In function `cc_label': cc_label.c:(.text+0x20c): undefined reference to `mycalloc' cc_label.c:(.text+0x6c2): undefined reference to `mycalloc' cc_label.c:(.text+0xbaa): undefined reference to `myfree' ../common/libmammo.a(cc_label.o): In function `cc_label_0bkgd': cc_label.c:(.text+0xe17): undefined reference to `mycalloc' cc_label.c:(.text+0x12d7): undefined reference to `mycalloc' cc_label.c:(.text+0x17e7): undefined reference to `myfree' ../common/libmammo.a(cc_label.o): In function `cc_relabel_by_intensity': cc_label.c:(.text+0x18c5): undefined reference to `mycalloc' ../common/libmammo.a(cc_label.o): In function `cc_label_4connect': cc_label.c:(.text+0x1cf0): undefined reference to `mycalloc' cc_label.c:(.text+0x2195): undefined reference to `mycalloc' cc_label.c:(.text+0x26a4): undefined reference to `myfree' ../common/libmammo.a(cc_label.o): In function `cc_relabel_by_intensity': cc_label.c:(.text+0x1b06): undefined reference to `myfree' ../common/libmammo.a(convexpolyscan.o): In function `polyscan_coords': convexpolyscan.c:(.text+0x6f0): undefined reference to `mycalloc' convexpolyscan.c:(.text+0x75f): undefined reference to `mycalloc' convexpolyscan.c:(.text+0x7ab): undefined reference to `myfree' convexpolyscan.c:(.text+0x7b8): undefined reference to `myfree' ../common/libmammo.a(convexpolyscan.o): In function `polyscan_poly_cacheim': convexpolyscan.c:(.text+0x805): undefined reference to `mycalloc' convexpolyscan.c:(.text+0x894): undefined reference to `myfree' ../common/libmammo.a(mikesfileio.o): In function `read_segmentation_file': mikesfileio.c:(.text+0x1e9): undefined reference to `mycalloc' mikesfileio.c:(.text+0x205): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `linear_optical_density': optical_density.c:(.text+0x29e): undefined reference to `mymalloc' optical_density.c:(.text+0x342): undefined reference to `mycalloc' optical_density.c:(.text+0x383): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `log10_optical_density': optical_density.c:(.text+0x693): undefined reference to `mymalloc' optical_density.c:(.text+0x74f): undefined reference to `mycalloc' optical_density.c:(.text+0x790): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `map_with_ushort_lut': optical_density.c:(.text+0xb2e): undefined reference to `mymalloc' optical_density.c:(.text+0xb87): undefined reference to `mycalloc' optical_density.c:(.text+0xbc6): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `linear_optical_density': optical_density.c:(.text+0x4d9): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `log10_optical_density': optical_density.c:(.text+0x8f1): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `map_with_ushort_lut': optical_density.c:(.text+0xd0d): undefined reference to `myfree' ../common/libmammo.a(virtual_image.o): In function `deallocate_cached_image': virtual_image.c:(.text+0x3dc6): undefined reference to `myfree' virtual_image.c:(.text+0x3dd7): undefined reference to `myfree' ../common/libmammo.a(virtual_image.o):virtual_image.c:(.text+0x3de5): more undefined references to `myfree' follow ../common/libmammo.a(virtual_image.o): In function `allocate_cached_image': virtual_image.c:(.text+0x4233): undefined reference to `mycalloc' virtual_image.c:(.text+0x4253): undefined reference to `mymalloc' virtual_image.c:(.text+0x4275): undefined reference to `mycalloc' virtual_image.c:(.text+0x42e7): undefined reference to `mycalloc' virtual_image.c:(.text+0x44f9): undefined reference to `mycalloc' virtual_image.c:(.text+0x47a9): undefined reference to `mycalloc' virtual_image.c:(.text+0x4a45): undefined reference to `mycalloc' virtual_image.c:(.text+0x4af4): undefined reference to `myfree' collect2: error: ld returned 1 exit status make: *** [breastsegment] Error 1 Building the mass feature generation program. gcc -O2 -I. -I../common afumfeature.o -o afumfeature -L../common -lmammo -lm afumfeature.o: In function `afum_process': afumfeature.c:(.text+0xd80): undefined reference to `mycalloc' afumfeature.c:(.text+0xd9c): undefined reference to `mycalloc' afumfeature.c:(.text+0xe80): undefined reference to `mycalloc' afumfeature.c:(.text+0x11f8): undefined reference to `myfree' afumfeature.c:(.text+0x1207): undefined reference to `myfree' afumfeature.c:(.text+0x1214): undefined reference to `myfree' ../common/libmammo.a(aggregate.o): In function `aggregate': aggregate.c:(.text+0x7fa): undefined reference to `mycalloc' aggregate.c:(.text+0x81c): undefined reference to `mycalloc' aggregate.c:(.text+0x868): undefined reference to `mycalloc' ../common/libmammo.a(aggregate.o): In function `aggregate_median': aggregate.c:(.text+0xbc5): undefined reference to `mymalloc' aggregate.c:(.text+0xbfb): undefined reference to `mycalloc' aggregate.c:(.text+0xc3c): undefined reference to `mycalloc' ../common/libmammo.a(aggregate.o): In function `aggregate': aggregate.c:(.text+0x9b5): undefined reference to `myfree' ../common/libmammo.a(aggregate.o): In function `aggregate_median': aggregate.c:(.text+0xd85): undefined reference to `myfree' ../common/libmammo.a(convexpolyscan.o): In function `polyscan_coords': convexpolyscan.c:(.text+0x6f0): undefined reference to `mycalloc' convexpolyscan.c:(.text+0x75f): undefined reference to `mycalloc' convexpolyscan.c:(.text+0x7ab): undefined reference to `myfree' convexpolyscan.c:(.text+0x7b8): undefined reference to `myfree' ../common/libmammo.a(convexpolyscan.o): In function `polyscan_poly_cacheim': convexpolyscan.c:(.text+0x805): undefined reference to `mycalloc' convexpolyscan.c:(.text+0x894): undefined reference to `myfree' ../common/libmammo.a(mikesfileio.o): In function `read_segmentation_file': mikesfileio.c:(.text+0x1e9): undefined reference to `mycalloc' mikesfileio.c:(.text+0x205): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `linear_optical_density': optical_density.c:(.text+0x29e): undefined reference to `mymalloc' optical_density.c:(.text+0x342): undefined reference to `mycalloc' optical_density.c:(.text+0x383): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `log10_optical_density': optical_density.c:(.text+0x693): undefined reference to `mymalloc' optical_density.c:(.text+0x74f): undefined reference to `mycalloc' optical_density.c:(.text+0x790): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `map_with_ushort_lut': optical_density.c:(.text+0xb2e): undefined reference to `mymalloc' optical_density.c:(.text+0xb87): undefined reference to `mycalloc' optical_density.c:(.text+0xbc6): undefined reference to `mycalloc' ../common/libmammo.a(optical_density.o): In function `linear_optical_density': optical_density.c:(.text+0x4d9): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `log10_optical_density': optical_density.c:(.text+0x8f1): undefined reference to `myfree' ../common/libmammo.a(optical_density.o): In function `map_with_ushort_lut': optical_density.c:(.text+0xd0d): undefined reference to `myfree' ../common/libmammo.a(virtual_image.o): In function `deallocate_cached_image': virtual_image.c:(.text+0x3dc6): undefined reference to `myfree' virtual_image.c:(.text+0x3dd7): undefined reference to `myfree' ../common/libmammo.a(virtual_image.o):virtual_image.c:(.text+0x3de5): more undefined references to `myfree' follow ../common/libmammo.a(virtual_image.o): In function `allocate_cached_image': virtual_image.c:(.text+0x4233): undefined reference to `mycalloc' virtual_image.c:(.text+0x4253): undefined reference to `mymalloc' virtual_image.c:(.text+0x4275): undefined reference to `mycalloc' virtual_image.c:(.text+0x42e7): undefined reference to `mycalloc' virtual_image.c:(.text+0x44f9): undefined reference to `mycalloc' virtual_image.c:(.text+0x47a9): undefined reference to `mycalloc' virtual_image.c:(.text+0x4a45): undefined reference to `mycalloc' virtual_image.c:(.text+0x4af4): undefined reference to `myfree' collect2: error: ld returned 1 exit status make: *** [afumfeature] Error 1 Building the mass detection program. make: Nothing to be done for `all'. Building the performance evaluation program. gcc -O2 -I. -I../common DDSMeval.o polyscan.o -o DDSMeval -L../common -lmammo -lm ../common/libmammo.a(mikesfileio.o): In function `read_segmentation_file': mikesfileio.c:(.text+0x1e9): undefined reference to `mycalloc' mikesfileio.c:(.text+0x205): undefined reference to `mycalloc' collect2: error: ld returned 1 exit status make: *** [DDSMeval] Error 1 Building the template creation program. gcc -O2 -I. -I../common mktemplate.o polyscan.o -o mktemplate -L../common -lmammo -lm Building the drawimage program. gcc -O2 -I. -I../common drawimage.o -o drawimage -L../common -lmammo -lm ../common/libmammo.a(mikesfileio.o): In function `read_segmentation_file': mikesfileio.c:(.text+0x1e9): undefined reference to `mycalloc' mikesfileio.c:(.text+0x205): undefined reference to `mycalloc' collect2: error: ld returned 1 exit status make: *** [drawimage] Error 1 Building the compression/decompression program jpeg. gcc -O2 -DSYSV -DNOTRUNCATE -c lexer.c lexer.c:41:1: error: initializer element is not constant lexer.c:41:1: error: (near initialization for ‘yyin’) lexer.c:41:1: error: initializer element is not constant lexer.c:41:1: error: (near initialization for ‘yyout’) lexer.c: In function ‘initparser’: lexer.c:387:21: warning: incompatible implicit declaration of built-in function ‘strlen’ [enabled by default] lexer.c: In function ‘MakeLink’: lexer.c:443:16: warning: incompatible implicit declaration of built-in function ‘malloc’ [enabled by default] lexer.c:447:7: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:452:7: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:455:34: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:458:7: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:460:3: warning: incompatible implicit declaration of built-in function ‘strcpy’ [enabled by default] lexer.c: In function ‘getstr’: lexer.c:548:26: warning: incompatible implicit declaration of built-in function ‘malloc’ [enabled by default] lexer.c:552:4: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:557:21: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:557:28: warning: incompatible implicit declaration of built-in function ‘strlen’ [enabled by default] lexer.c:561:7: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c: In function ‘parser’: lexer.c:794:21: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:798:8: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:1074:21: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:1078:8: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:1116:21: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:1120:8: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:1154:25: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:1158:5: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:1190:5: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:1247:25: warning: incompatible implicit declaration of built-in function ‘calloc’ [enabled by default] lexer.c:1251:5: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c:1283:5: warning: incompatible implicit declaration of built-in function ‘exit’ [enabled by default] lexer.c: In function ‘yylook’: lexer.c:1867:9: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] lexer.c:1867:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] lexer.c:1877:12: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] lexer.c:1877:23: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] make: *** [lexer.o] Error 1

    Read the article

  • Oracle Data Mining a Star Schema: Telco Churn Case Study

    - by charlie.berger
    There is a complete and detailed Telco Churn case study "How to" Blog Series just posted by Ari Mozes, ODM Dev. Manager.  In it, Ari provides detailed guidance in how to leverage various strengths of Oracle Data Mining including the ability to: mine Star Schemas and join tables and views together to obtain a complete 360 degree view of a customer combine transactional data e.g. call record detail (CDR) data, etc. define complex data transformation, model build and model deploy analytical methodologies inside the Database  His blog is posted in a multi-part series.  Below are some opening excerpts for the first 3 blog entries.  This is an excellent resource for any novice to skilled data miner who wants to gain competitive advantage by mining their data inside the Oracle Database.  Many thanks Ari! Mining a Star Schema: Telco Churn Case Study (1 of 3) One of the strengths of Oracle Data Mining is the ability to mine star schemas with minimal effort.  Star schemas are commonly used in relational databases, and they often contain rich data with interesting patterns.  While dimension tables may contain interesting demographics, fact tables will often contain user behavior, such as phone usage or purchase patterns.  Both of these aspects - demographics and usage patterns - can provide insight into behavior.Churn is a critical problem in the telecommunications industry, and companies go to great lengths to reduce the churn of their customer base.  One case study1 describes a telecommunications scenario involving understanding, and identification of, churn, where the underlying data is present in a star schema.  That case study is a good example for demonstrating just how natural it is for Oracle Data Mining to analyze a star schema, so it will be used as the basis for this series of posts...... Mining a Star Schema: Telco Churn Case Study (2 of 3) This post will follow the transformation steps as described in the case study, but will use Oracle SQL as the means for preparing data.  Please see the previous post for background material, including links to the case study and to scripts that can be used to replicate the stages in these posts.1) Handling missing values for call data recordsThe CDR_T table records the number of phone minutes used by a customer per month and per call type (tariff).  For example, the table may contain one record corresponding to the number of peak (call type) minutes in January for a specific customer, and another record associated with international calls in March for the same customer.  This table is likely to be fairly dense (most type-month combinations for a given customer will be present) due to the coarse level of aggregation, but there may be some missing values.  Missing entries may occur for a number of reasons: the customer made no calls of a particular type in a particular month, the customer switched providers during the timeframe, or perhaps there is a data entry problem.  In the first situation, the correct interpretation of a missing entry would be to assume that the number of minutes for the type-month combination is zero.  In the other situations, it is not appropriate to assume zero, but rather derive some representative value to replace the missing entries.  The referenced case study takes the latter approach.  The data is segmented by customer and call type, and within a given customer-call type combination, an average number of minutes is computed and used as a replacement value.In SQL, we need to generate additional rows for the missing entries and populate those rows with appropriate values.  To generate the missing rows, Oracle's partition outer join feature is a perfect fit.  select cust_id, cdre.tariff, cdre.month, minsfrom cdr_t cdr partition by (cust_id) right outer join     (select distinct tariff, month from cdr_t) cdre     on (cdr.month = cdre.month and cdr.tariff = cdre.tariff);   ....... Mining a Star Schema: Telco Churn Case Study (3 of 3) Now that the "difficult" work is complete - preparing the data - we can move to building a predictive model to help identify and understand churn.The case study suggests that separate models be built for different customer segments (high, medium, low, and very low value customer groups).  To reduce the data to a single segment, a filter can be applied: create or replace view churn_data_high asselect * from churn_prep where value_band = 'HIGH'; It is simple to take a quick look at the predictive aspects of the data on a univariate basis.  While this does not capture the more complex multi-variate effects as would occur with the full-blown data mining algorithms, it can give a quick feel as to the predictive aspects of the data as well as validate the data preparation steps.  Oracle Data Mining includes a predictive analytics package which enables quick analysis. begin  dbms_predictive_analytics.explain(   'churn_data_high','churn_m6','expl_churn_tab'); end; /select * from expl_churn_tab where rank <= 5 order by rank; ATTRIBUTE_NAME       ATTRIBUTE_SUBNAME EXPLANATORY_VALUE RANK-------------------- ----------------- ----------------- ----------LOS_BAND                                      .069167052          1MINS_PER_TARIFF_MON  PEAK-5                   .034881648          2REV_PER_MON          REV-5                    .034527798          3DROPPED_CALLS                                 .028110322          4MINS_PER_TARIFF_MON  PEAK-4                   .024698149          5From the above results, it is clear that some predictors do contain information to help identify churn (explanatory value > 0).  The strongest uni-variate predictor of churn appears to be the customer's (binned) length of service.  The second strongest churn indicator appears to be the number of peak minutes used in the most recent month.  The subname column contains the interior piece of the DM_NESTED_NUMERICALS column described in the previous post.  By using the object relational approach, many related predictors are included within a single top-level column. .....   NOTE:  These are just EXCERPTS.  Click here to start reading the Oracle Data Mining a Star Schema: Telco Churn Case Study from the beginning.    

    Read the article

  • How to split a text file into multiple text files

    - by Andrew
    I have a text file called entry.txt that contains the following: [ entry1 ] 1239 1240 1242 1391 1392 1394 1486 1487 1489 1600 1601 1603 1657 1658 1660 2075 2076 2078 2322 2323 2325 2740 2741 2743 3082 3083 3085 3291 3292 3294 3481 3482 3484 3633 3634 3636 3690 3691 3693 3766 3767 3769 4526 4527 4529 4583 4584 4586 4773 4774 4776 5153 5154 5156 5628 5629 5631 [ entry2 ] 1239 1240 1242 1391 1392 1394 1486 1487 1489 1600 1601 1603 1657 1658 1660 2075 2076 2078 2322 2323 2325 2740 2741 2743 3082 3083 3085 3291 3292 3294 3481 3482 3484 3690 3691 3693 3766 3767 3769 4526 4527 4529 4583 4584 4586 4773 4774 4776 5153 5154 5156 5628 5629 5631 [ entry3 ] 1239 1240 1242 1391 1392 1394 1486 1487 1489 1600 1601 1603 1657 1658 1660 2075 2076 2078 2322 2323 2325 2740 2741 2743 3082 3083 3085 3291 3292 3294 3481 3482 3484 3690 3691 3693 3766 3767 3769 4241 4242 4244 4526 4527 4529 4583 4584 4586 4773 4774 4776 5153 5154 5156 5495 5496 5498 5628 5629 5631 I would like to split it into three text files: entry1.txt, entry2.txt, entry3.txt. Their contents are as follows. entry1.txt: [ entry1 ] 1239 1240 1242 1391 1392 1394 1486 1487 1489 1600 1601 1603 1657 1658 1660 2075 2076 2078 2322 2323 2325 2740 2741 2743 3082 3083 3085 3291 3292 3294 3481 3482 3484 3633 3634 3636 3690 3691 3693 3766 3767 3769 4526 4527 4529 4583 4584 4586 4773 4774 4776 5153 5154 5156 5628 5629 5631 entry2.txt: [ entry2 ] 1239 1240 1242 1391 1392 1394 1486 1487 1489 1600 1601 1603 1657 1658 1660 2075 2076 2078 2322 2323 2325 2740 2741 2743 3082 3083 3085 3291 3292 3294 3481 3482 3484 3690 3691 3693 3766 3767 3769 4526 4527 4529 4583 4584 4586 4773 4774 4776 5153 5154 5156 5628 5629 5631 entry3.txt: [ entry3 ] 1239 1240 1242 1391 1392 1394 1486 1487 1489 1600 1601 1603 1657 1658 1660 2075 2076 2078 2322 2323 2325 2740 2741 2743 3082 3083 3085 3291 3292 3294 3481 3482 3484 3690 3691 3693 3766 3767 3769 4241 4242 4244 4526 4527 4529 4583 4584 4586 4773 4774 4776 5153 5154 5156 5495 5496 5498 5628 5629 5631 In other words, the [ character indicates a new file should begin. Is there any way I can accomplish automatic text file splitting? My eventual, actual input entry.txt actually contains 200,001 entries. Doing the text split in either Windows or Linux would be great. I do not have access to a Mac machine. Thanks!

    Read the article

  • Integrating Data Mining into your BI Solution (Presentation)

    I recently gave a live meeting presentation to the UK User Group on Integrating Data Mining into your BI Solution.  In it I talk about and demo ways of using your data mining models inside Integration Services, Analysis Services and Reporting Services.  This is the first in a series of presentations I will be doing for the UG as I try to get the word out that Data Mining can be for the masses. You can download my deck and my line meeting recording from here.

    Read the article

  • Nagy dobás készül az Oracle adatányászati felületen, Oracle Data Mining

    - by Fekete Zoltán
    Ahogyan már a tavaly oszi Oracle OpenWorld hírekben és eloadásokban is láthattuk a beharangozót, az Oracle nagy dobásra készül az adatbányászati fronton (Oracle Data Mining), mégpedig a remekül használható adatbányászati motor grafikus felületének a kiterjesztésével. Ha jól megfigyeljük ezt az utóbbi linket, az eddigi grafikus felület már Oracle Data Miner Classic néven fut. Hogyan is lehet használni az Oracle Data Mining-ot? - Oracle Data Miner (ingyenesen letöltheto GUI az OTN-rol) - Java-ból és PL/SQL-bol, Oracle Data Mining JDeveloper and SQL Developer Extensions - Excel felületrol, Oracle Spreadsheet Add-In for Predictive Analytics - ODM Connector for mySAP BW Oracle Data Mining technikai információ.

    Read the article

  • Integrating Data Mining into your BI Solution (Presentation)

    I recently gave a live meeting presentation to the UK User Group on Integrating Data Mining into your BI Solution.  In it I talk about and demo ways of using your data mining models inside Integration Services, Analysis Services and Reporting Services.  This is the first in a series of presentations I will be doing for the UG as I try to get the word out that Data Mining can be for the masses. You can download my deck and my line meeting recording from here.

    Read the article

  • Text inside <p> shrinks on mobile devices while div does not [migrated]

    - by guisasso
    I asked this question on stack overflow, but didn't get any answers, so I'm trying here. Does anybody know whats happening here? I tested on opera, dolphin and the factory android browser. (although it seems now to be working on opera) The div doesn't change size, but the text somehow is shrunk to fit on part of a div. Anyway to prevent this? Just to be clear, I'm trying to achieve on the mobile browser the same look as the pc version. As the problem seems to be with the browsers, how can I force the text to take the full width of the div? I tried setting the p tag to 100% with no success. The div has to have that width and be aligned to the left of the page. On a Pc, as it should be: I shrunk the code as much as I could: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en-us"> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type" /> <meta content="" name="keywords" /> <meta content="" name="description" /> <title></title> </head> <body> <div style="width:1000px; margin-left:auto; margin-right:auto;" > <div style="float:left; width:758px; background-color:aqua;"> <p> Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text .<br /> <br /> Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text Random text .<br /> <br /> Random text Random text Random text Random text <a href="http://www.a.com/a.html"> Random text </a> Random text Random text . </p> </div> </div> </body> </html> Thanks.

    Read the article

  • More Value From Data Using Data Mining Presentation

    Here is a presentation I gave at the SQLBits conference in September which was recorded by Microsoft.  Usually I speak about SSIS but on this particular event I thought people would like to hear something different from me. Microsoft are making a big play for making Data Mining more accessible to everyone and not just boffins.  In this presentation I give an overview of data mining and then do some demonstrations using the excellent Excel Add-Ins available from Microsoft SQL Server 2008 SQL Server 2005 I hope you enjoy this presentation http://go.microsoft.com/?linkid=9633764

    Read the article

  • Sublime Text 2 'text bubbling'?

    - by Alex Mcp
    In vim and Notepad++ I have an awesome feature either mapped or built in that I've seen called text bubbling. I know about the Sublime documentation for mapping my own, but wanted to make sure I wasn't duplicating functionality: Basically when I have either block of text selected, or just a cursor on a line, I push (ctrl + up/down) or some other mapping, and the text is moved up or down, in a block, and the rest of the text 'flows' around it. Is this a native feature in Sublime Text or should I script it in?

    Read the article

  • Ma este Oracle Data Mining újdonságok webcast!

    - by Fekete Zoltán
    2010. május 12-én szerdán 18 órakor a böngészonkkel kapcsolódva a következo roppant érdekes eloadást hallgathatjuk meg az Oracle BIWA keretében: BIWA SIG TechCast Series - May 12 - Data Mining Made Easy, az eloadó Charlie Berger, az Oracle adatbányászati vezetoje. Könnyen elvégezheto adatbányászat! Az Oracle Data Miner 11g Release 2 új "Work flow" grafikus felületének bevezetése. Csatlakozni az Oracle BIWA-hoz a ezen a linken ingyenesen lehet. Itt találhatjuk meg, hogyan lehet meghallgatni ezt a konferenciát: www.oraclebiwa.org

    Read the article

  • To sample or not to sample...

    - by [email protected]
    Ideally, we would know the exact answer to every question. How many people support presidential candidate A vs. B? How many people suffer from H1N1 in a given state? Does this batch of manufactured widgets have any defective parts? Knowing exact answers is expensive in terms of time and money and, in most cases, is impractical if not impossible. Consider asking every person in a region for their candidate preference, testing every person with flu symptoms for H1N1 (assuming every person reported when they had flu symptoms), or destructively testing widgets to determine if they are "good" (leaving no product to sell). Knowing exact answers, fortunately, isn't necessary or even useful in many situations. Understanding the direction of a trend or statistically significant results may be sufficient to answer the underlying question: who is likely to win the election, have we likely reached a critical threshold for flu, or is this batch of widgets good enough to ship? Statistics help us to answer these questions with a certain degree of confidence. This focuses on how we collect data. In data mining, we focus on the use of data, that is data that has already been collected. In some cases, we may have all the data (all purchases made by all customers), in others the data may have been collected using sampling (voters, their demographics and candidate choice). Building data mining models on all of your data can be expensive in terms of time and hardware resources. Consider a company with 40 million customers. Do we need to mine all 40 million customers to get useful data mining models? The quality of models built on all data may be no better than models built on a relatively small sample. Determining how much is a reasonable amount of data involves experimentation. When starting the model building process on large datasets, it is often more efficient to begin with a small sample, perhaps 1000 - 10,000 cases (records) depending on the algorithm, source data, and hardware. This allows you to see quickly what issues might arise with choice of algorithm, algorithm settings, data quality, and need for further data preparation. Instead of waiting for a model on a large dataset to build only to find that the results don't meet expectations, once you are satisfied with the results on the initial sample, you can  take a larger sample to see if model quality improves, and to get a sense of how the algorithm scales to the particular dataset. If model accuracy or quality continues to improve, consider increasing the sample size. Sampling in data mining is also used to produce a held-aside or test dataset for assessing classification and regression model accuracy. Here, we reserve some of the build data (data that includes known target values) to be used for an honest estimate of model error using data the model has not seen before. This sampling transformation is often called a split because the build data is split into two randomly selected sets, often with 60% of the records being used for model building and 40% for testing. Sampling must be performed with care, as it can adversely affect model quality and usability. Even a truly random sample doesn't guarantee that all values are represented in a given attribute. This is particularly troublesome when the attribute with omitted values is the target. A predictive model that has not seen any examples for a particular target value can never predict that target value! For other attributes, values may consist of a single value (a constant attribute) or all unique values (an identifier attribute), each of which may be excluded during mining. Values from categorical predictor attributes that didn't appear in the training data are not used when testing or scoring datasets. In subsequent posts, we'll talk about three sampling techniques using Oracle Database: simple random sampling without replacement, stratified sampling, and simple random sampling with replacement.

    Read the article

  • EXCEL VBA STUDENTS DATABASE [on hold]

    - by BENTET
    I AM DEVELOPING AN EXCEL DATABASE TO RECORD STUDENTS DETAILS. THE HEADINGS OF THE TABLE ARE DATE,YEAR, PAYMENT SLIP NO.,STUDENT NUMBER,NAME,FEES,AMOUNT PAID, BALANCE AND PREVIOUS BALANCE. I HAVE BEEN ABLE TO PUT UP SOME CODE WHICH IS WORKING, BUT THERE ARE SOME SETBACKS THAT I WANT TO BE ADDRESSED.I ACTUALLY DEVELOPED A USERFORM FOR EACH PROGRAMME OF THE INSTITUTION AND ASSIGNED EACH TO A SPECIFIC SHEET BUT WHENEVER I ADD A RECORD, IT DOES NOT GO TO THE ASSIGNED SHEET BUT GOES TO THE ACTIVE SHEET.ALSO I WANT TO HIDE ALL SHEETS AND BE WORKING ONLY ON THE USERFORMS WHEN THE WORKBOOK IS OPENED.ONE PROBLEM AM ALSO FACING IS THE UPDATE CODE.WHENEVER I UPDATE A RECORD ON A SPECIFIC ROW, IT RATHER EDIT THE RECORD ON THE FIRST ROW NOT THE RECORD EDITED.THIS IS THE CODE I HAVE BUILT SO FAR.I AM VIRTUALLY A NOVICE IN PROGRAMMING. Private Sub cmdAdd_Click() Dim lastrow As Long lastrow = Sheets("Sheet4").Range("A" & Rows.Count).End(xlUp).Row Cells(lastrow + 1, "A").Value = txtDate.Text Cells(lastrow + 1, "B").Value = ComBox1.Text Cells(lastrow + 1, "C").Value = txtSlipNo.Text Cells(lastrow + 1, "D").Value = txtStudentNum.Text Cells(lastrow + 1, "E").Value = txtName.Text Cells(lastrow + 1, "F").Value = txtFees.Text Cells(lastrow + 1, "G").Value = txtAmountPaid.Text txtDate.Text = "" ComBox1.Text = "" txtSlipNo.Text = "" txtStudentNum.Text = "" txtName.Text = "" txtFees.Text = "" txtAmountPaid.Text = "" End Sub Private Sub cmdClear_Click() txtDate.Text = "" ComBox1.Text = "" txtSlipNo.Text = "" txtStudentNum.Text = "" txtName.Text = "" txtFees.Text = "" txtAmountPaid.Text = "" txtBalance.Text = "" End Sub Private Sub cmdClearD_Click() txtDate.Text = "" ComBox1.Text = "" txtSlipNo.Text = "" txtStudentNum.Text = "" txtName.Text = "" txtFees.Text = "" txtAmountPaid.Text = "" txtBalance.Text = "" End Sub Private Sub cmdClose_Click() Unload Me End Sub Private Sub cmdDelete_Click() 'declare the variables Dim findvalue As Range Dim cDelete As VbMsgBoxResult 'check for values If txtStudentNum.Value = "" Or txtName.Value = "" Or txtDate.Text = "" Or ComBox1.Text = "" Or txtSlipNo.Text = "" Or txtFees.Text = "" Or txtAmountPaid.Text = "" Or txtBalance.Text = "" Then MsgBox "There is not data to delete" Exit Sub End If 'give the user a chance to change their mind cDelete = MsgBox("Are you sure that you want to delete this student", vbYesNo + vbDefaultButton2, "Are you sure????") If cDelete = vbYes Then 'delete the row Set findvalue = Sheet4.Range("D:D").Find(What:=txtStudentNum, LookIn:=xlValues) findvalue.EntireRow.Delete End If 'clear the controls txtDate.Text = "" ComBox1.Text = "" txtSlipNo.Text = "" txtStudentNum.Text = "" txtName.Text = "" 'txtFees.Text = "" txtAmountPaid.Text = "" txtBalance.Text = "" End Sub Private Sub cmdSearch_Click() Dim lastrow As Long Dim currentrow As Long Dim studentnum As String lastrow = Sheets("Sheet4").Range("A" & Rows.Count).End(xlUp).Row studentnum = txtStudentNum.Text For currentrow = 2 To lastrow If Cells(currentrow, 4).Text = studentnum Then txtDate.Text = Cells(currentrow, 1) ComBox1.Text = Cells(currentrow, 2) txtSlipNo.Text = Cells(currentrow, 3) txtStudentNum.Text = Cells(currentrow, 4).Text txtName.Text = Cells(currentrow, 5) txtFees.Text = Cells(currentrow, 6) txtAmountPaid.Text = Cells(currentrow, 7) txtBalance.Text = Cells(currentrow, 8) End If Next currentrow txtStudentNum.SetFocus End Sub Private Sub cmdSearchName_Click() Dim lastrow As Long Dim currentrow As Long Dim studentname As String lastrow = Sheets("Sheet4").Range("A" & Rows.Count).End(xlUp).Row studentname = txtName.Text For currentrow = 2 To lastrow If Cells(currentrow, 5).Text = studentname Then txtDate.Text = Cells(currentrow, 1) ComBox1.Text = Cells(currentrow, 2) txtSlipNo.Text = Cells(currentrow, 3) txtStudentNum.Text = Cells(currentrow, 4) txtName.Text = Cells(currentrow, 5).Text txtFees.Text = Cells(currentrow, 6) txtAmountPaid.Text = Cells(currentrow, 7) txtBalance.Text = Cells(currentrow, 8) End If Next currentrow txtName.SetFocus End Sub Private Sub cmdUpdate_Click() Dim tdate As String Dim tlevel As String Dim tslipno As String Dim tstudentnum As String Dim tname As String Dim tfees As String Dim tamountpaid As String Dim currentrow As Long Dim lastrow As Long 'If Cells(currentrow, 5).Text = studentname Then 'txtDate.Text = Cells(currentrow, 1) lastrow = Sheets("Sheet4").Range("A" & Columns.Count).End(xlUp).Offset(0, 1).Column For currentrow = 2 To lastrow tdate = txtDate.Text Cells(currentrow, 1).Value = tdate txtDate.Text = Cells(currentrow, 1) tlevel = ComBox1.Text Cells(currentrow, 2).Value = tlevel ComBox1.Text = Cells(currentrow, 2) tslipno = txtSlipNo.Text Cells(currentrow, 3).Value = tslipno txtSlipNo = Cells(currentrow, 3) tstudentnum = txtStudentNum.Text Cells(currentrow, 4).Value = tstudentnum txtStudentNum.Text = Cells(currentrow, 4) tname = txtName.Text Cells(currentrow, 5).Value = tname txtName.Text = Cells(currentrow, 5) tfees = txtFees.Text Cells(currentrow, 6).Value = tfees txtFees.Text = Cells(currentrow, 6) tamountpaid = txtAmountPaid.Text Cells(currentrow, 7).Value = tamountpaid txtAmountPaid.Text = Cells(currentrow, 7) Next currentrow txtDate.SetFocus ComBox1.SetFocus txtSlipNo.SetFocus txtStudentNum.SetFocus txtName.SetFocus txtFees.SetFocus txtAmountPaid.SetFocus txtBalance.SetFocus End Sub PLEASE I WAS THINKING IF I CAN DEVELOP SOMETHING THAT WILL USE ONLY ONE USERFORM TO SEND DATA TO DIFFERENT SHEETS IN THE WORKBOOK.

    Read the article

  • How can i move towards the Business intelliegnce/ data mining fields from software developer

    - by user1758043
    I am working as python developer and i work with djnago. I also do some web scrapping and building spiders and bots. Now from there i want to make my move to Business intelligence. I just want to know how can i move into that field. because as companies are not going to hire me in that field directly , i just want to know how can i make transistions. I was thinking of first work as Database developer in sql and then i can see futher. But i want from you guys so that i can start learning that stuff so that i can chnage jobs keeping that in mind. here in my area there are plent of jobs in all area but i need to know hoe to transitio and what thing i should learn before making that transition. Here JObs are plenty so if i know my stuff , getting job is piece of cake becaus ethey don't ahve any persons. same jobs keep getting advertised for months and months

    Read the article

  • How can I move towards the Business Intelligence/ data mining fields from software developer [closed]

    - by user1758043
    I am working as a Python developer and I work with django. I also do some web scraping and building spiders and bots. Now from there I want to make my move to Business Intelligence. I just want to know how I can move into that field. Because as companies are not going to hire me in that field directly, I just want to know how can I make the transistion. I was thinking of first working as Database developer in SQL and then I can see further. But I want advice from you guys so that I can start learning that stuff so that I can change jobs keeping that in mind. Here in my area there are plenty of jobs in all areas but I need to know how to transition and what things I should learn before making that transition. Here jobs are plenty so if I know my stuff, getting a job is a piece of cake because they don't have any people. Same jobs keep getting advertised for months and months.

    Read the article

  • Video Presentation and Demo of Oracle Advanced Analytics & Data Mining

    - by Mike.Hallett(at)Oracle-BI&EPM
    For a video presentation and demonstration of Oracle Advanced Analytics & Data Mining  click here. (This plays a large MP4 file in a browser: access is from Google.docs, and this works best with Google CHROME). This one hour session focuses primarily on the Oracle Data Mining component of the Oracle Advanced Analytics Option along with Oracle R Enterprise and is tied to the Oracle SQL Developer Days virtual and onsite events and is presented by Oracle’s Director for Advanced Analytics, Charlie Berger, covering: Big Data + Big Data Analytics Competing on analytics & value proposition What is data mining? Typical use cases Oracle Data Mining high performance in-database SQL based data mining functions Exadata "smart scan" scoring Oracle Data Miner GUI (an Extension that ships with SQL Developer) Oracle Business Intelligence EE + Oracle Data Mining results/predictions in dashboards Applications "powered by Oracle Data Mining" for factory installed predictive analytics methodologies Oracle R Enterprise Please contact [email protected] should you have any questions. 

    Read the article

  • Looking for speech-to-text tool (convert .wav to text)

    - by David
    I have the ability to get .wav files of voice mails emailed to me, but sometimes I'll be sitting in a meeting and I need to know the content of a message without playing it out loud. Are there any good (and, preferably, free) tools for converting .wav files to text? I know Google Voice has this capability, but I can't determine if it'll work on a file-by-file basis. I realize that this is a difficult research problem, but even an 80% solution might be workable.

    Read the article

  • Looking for speech-to-text tool (convert .wav to text)

    - by David
    I have the ability to get .wav files of voice mails emailed to me, but sometimes I'll be sitting in a meeting and I need to know the content of a message without playing it out loud. Are there any good (and, preferably, free) tools for converting .wav files to text? I know Google Voice has this capability, but I can't determine if it'll work on a file-by-file basis. I realize that this is a difficult research problem, but even an 80% solution might be workable.

    Read the article

1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >