Daily Archives

Articles indexed Tuesday March 23 2010

Page 115/130 | < Previous Page | 111 112 113 114 115 116 117 118 119 120 121 122  | Next Page >

  • Using a "white list" for extracting terms for Text Mining

    - by [email protected]
    In Part 1 of my post on "Generating cluster names from a document clustering model" (part 1, part 2, part 3), I showed how to build a clustering model from text documents using Oracle Data Miner, which automates preparing data for text mining. In this process we specified a custom stoplist and lexer and relied on Oracle Text to identify important terms.  However, there is an alternative approach, the white list, which uses a thesaurus object with the Oracle Text CTXRULE index to allow you to specify the important terms. INTRODUCTIONA stoplist is used to exclude, i.e., black list, specific words in your documents from being indexed. For example, words like a, if, and, or, and but normally add no value when text mining. Other words can also be excluded if they do not help to differentiate documents, e.g., the word Oracle is ubiquitous in the Oracle product literature. One problem with stoplists is determining which words to specify. This usually requires inspecting the terms that are extracted, manually identifying which ones you don't want, and then re-indexing the documents to determine if you missed any. Since a corpus of documents could contain thousands of words, this could be a tedious exercise. Moreover, since every word is considered as an individual token, a term excluded in one context may be needed to help identify a term in another context. For example, in our Oracle product literature example, the words "Oracle Data Mining" taken individually are not particular helpful. The term "Oracle" may be found in nearly all documents, as with the term "Data." The term "Mining" is more unique, but could also refer to the Mining industry. If we exclude "Oracle" and "Data" by specifying them in the stoplist, we lose valuable information. But it we include them, they may introduce too much noise. Still, when you have a broad vocabulary or don't have a list of specific terms of interest, you rely on the text engine to identify important terms, often by computing the term frequency - inverse document frequency metric. (This is effectively a weight associated with each term indicating its relative importance in a document within a collection of documents. We'll revisit this later.) The results using this technique is often quite valuable. As noted above, an alternative to the subtractive nature of the stoplist is to specify a white list, or a list of terms--perhaps multi-word--that we want to extract and use for data mining. The obvious downside to this approach is the need to specify the set of terms of interest. However, this may not be as daunting a task as it seems. For example, in a given domain (Oracle product literature), there is often a recognized glossary, or a list of keywords and phrases (Oracle product names, industry names, product categories, etc.). Being able to identify multi-word terms, e.g., "Oracle Data Mining" or "Customer Relationship Management" as a single token can greatly increase the quality of the data mining results. The remainder of this post and subsequent posts will focus on how to produce a dataset that contains white list terms, suitable for mining. CREATING A WHITE LIST We'll leverage the thesaurus capability of Oracle Text. Using a thesaurus, we create a set of rules that are in effect our mapping from single and multi-word terms to the tokens used to represent those terms. For example, "Oracle Data Mining" becomes "ORACLEDATAMINING." First, we'll create and populate a mapping table called my_term_token_map. All text has been converted to upper case and values in the TERM column are intended to be mapped to the token in the TOKEN column. TERM                                TOKEN DATA MINING                         DATAMINING ORACLE DATA MINING                  ORACLEDATAMINING 11G                                 ORACLE11G JAVA                                JAVA CRM                                 CRM CUSTOMER RELATIONSHIP MANAGEMENT    CRM ... Next, we'll create a thesaurus object my_thesaurus and a rules table my_thesaurus_rules: CTX_THES.CREATE_THESAURUS('my_thesaurus', FALSE); CREATE TABLE my_thesaurus_rules (main_term     VARCHAR2(100),                                  query_string  VARCHAR2(400)); We next populate the thesaurus object and rules table using the term token map. A cursor is defined over my_term_token_map. As we iterate over  the rows, we insert a synonym relationship 'SYN' into the thesaurus. We also insert into the table my_thesaurus_rules the main term, and the corresponding query string, which specifies synonyms for the token in the thesaurus. DECLARE   cursor c2 is     select token, term     from my_term_token_map; BEGIN   for r_c2 in c2 loop     CTX_THES.CREATE_RELATION('my_thesaurus',r_c2.token,'SYN',r_c2.term);     EXECUTE IMMEDIATE 'insert into my_thesaurus_rules values                        (:1,''SYN(' || r_c2.token || ', my_thesaurus)'')'     using r_c2.token;   end loop; END; We are effectively inserting the token to return and the corresponding query that will look up synonyms in our thesaurus into the my_thesaurus_rules table, for example:     'ORACLEDATAMINING'        SYN ('ORACLEDATAMINING', my_thesaurus)At this point, we create a CTXRULE index on the my_thesaurus_rules table: create index my_thesaurus_rules_idx on        my_thesaurus_rules(query_string)        indextype is ctxsys.ctxrule; In my next post, this index will be used to extract the tokens that match each of the rules specified. We'll then compute the tf-idf weights for each of the terms and create a nested table suitable for mining.

    Read the article

  • How do determine what is *really* causing your compiler error

    - by ML
    Hi All, I am porting a very large code base and I am having more difficulty with old code. For example, this causes a compiler error: inline CP_M_ReferenceCounted * FrAssignRef(CP_M_ReferenceCounted * & to, CP_M_ReferenceCounted * from) { if (from) from->AddReference(); if (to) to->RemoveReference(); to = from; return to; } The error is: error: expected initializer before '*' token. How do I know what this is. I looked up inline member functions to be sure I understood and I dont think the inlining is the cause but I am not sure what is. Another example: template <class eachClass> eachClass FrReferenceIfClass(FxRC * ptr) { eachClass getObject = dynamic_cast<eachClass>(ptr); if (getObject) getObject->AddReference(); return getObject; } The error is: error: template declaration of 'eachClass FrReferenceIfClass' That is all. How do I decide what this is?. I am admittedly rusty with templates.

    Read the article

  • HTML: How to get a child element to show behind (lower z-index) than its parent

    - by dclowd9901
    I need for a certain dynamic element to always appear on top of another element, no matter what order in the DOM tree they are. Is this possible? I've tried z-index (with position: relative), and it doesn't seem to work. I hate to be vague, but this is the simplest way I can think of asking this question without explaining its purpose ad nauseum. So, to recap, I need <div class="a"> <div class="b"></div> </div> <div class="b"> <div class="a"></div> </div> To display exactly the same when rendered. And for flexibility purposes (I'm planning on distributing a plugin that needs this functionality), I'd really like to not have to resort to absolute or fixed positioning.

    Read the article

  • Error while installing Microsoft SharePoint 2010 on Windows 7 machine

    - by Brian Roisentul
    I've installed Microsoft SharePoint 2010 on my Windows 7 64bits machine. I've modified the config.xml file to accomplish this. Once it's installed I run the Configuration Wizard to create a new site and it throws me the following exception: An exception of type System.IO.FileNotFoundException was thrown. Additional exception information: Could not load file or assembly 'Microsoft.IdentityModel, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The system cannot find the file specified. System.IO.FileNotFoundException: Could not load file or assembly 'Microsoft.IdentityModel, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The system cannot find the file specified. File name: 'Microsoft.IdentityModel, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' at Microsoft.SharePoint.Administration.SPFarm.CurrentUserIsAdministrator(Boolean allowContentApplicationAccess) at Microsoft.SharePoint.Administration.SPConfigurationDatabase.Microsoft.SharePoint.Administration.ISPPersistedStoreProvider.DoesCurrentUserHaveWritePermission(SPPersistedObject persistedObject) at Microsoft.SharePoint.Administration.SPPersistedObject.BaseUpdate() at Microsoft.SharePoint.Administration.SPFarm.Update() at Microsoft.SharePoint.Administration.SPConfigurationDatabase.RegisterDefaultDatabaseServices(SqlConnectionStringBuilder connectionString) at Microsoft.SharePoint.Administration.SPConfigurationDatabase.Provision(SqlConnectionStringBuilder connectionString) at Microsoft.SharePoint.Administration.SPFarm.Create(SqlConnectionStringBuilder configurationDatabase, SqlConnectionStringBuilder administrationContentDatabase, IdentityType identityType, String farmUser, SecureString farmPassword, SecureString masterPassphrase) at Microsoft.SharePoint.PostSetupConfiguration.ConfigurationDatabaseTask.CreateOrConnectConfigDb() at Microsoft.SharePoint.PostSetupConfiguration.ConfigurationDatabaseTask.Run() at Microsoft.SharePoint.PostSetupConfiguration.TaskThread.ExecuteTask() Note: I don't have the Microsoft SQL Server 2008 64bits installed, but SharePoint 2010 seems that installed its necessary components.

    Read the article

  • SQL Server Management Studio Express 2005 has no Configuration Manager

    - by brohjoe
    Where is the configuration manager for SQL Express 2005? I need to configure SQL Server for TCP/IP but there is no configuration manager with the package. I see SQL Server Database Publishing Wizard, I see SQL Server Migration Assistant for Access, but no Configuration Manager. According to the MSDN, there should be one. I've even looked online for a download of the Configuration Manager for SQL Server 2005, but could not find one. Did I miss something in the download or should I just scrap SQL Server Express and download the full-blown SQL Server for Developers?

    Read the article

  • How can I disable keep-alive on ASP.NET Web Service client requests?

    - by Matthew Brindley
    I have a few web servers behind an Amazon EC2 load balancer. I'm using TCP balancing on port 80 (rather than HTTP balancing). I have a client polling a Web Service (running on all web servers) for new items every few seconds. However, the client seems to stay connected to one server and polls that same server each time. I've tried using ServicePointManager to disable KeepAlive, but that didn't change anything. The outgoing connection still had its "connection: keep-alive" HTTP header, and the server kept the TCP connection open. I've also tried adding an override of GetWebRequest to the proxy class created by VS, which inherits from SoapHttpClientProtocol, but I still see the keep-alive header. If I kill the client's process and restart, it'll connect to a new server via the load balancer, but it'll continue polling that new server forever. Is there a way to force it to connect to a random server each time? I want the load from the one client to be spread across all of the web servers. The client is written in C# (as is the server) and uses a Web Reference (not a Service Reference), which points to the load balancer.

    Read the article

  • Is there a way to formerly define a time interval for configuring a process?

    - by gshauger
    Horrible worded question...I know. I'm working on an application that processes data for the previous day. The problem is that I know the customer is going to eventually ask to it for every hour or some other arbitrary time interval. I know that languages such as Java or SQL have masks for defining dates. Well what about a way to define a time interval? Let me ask it this way. If someone asked you to create a configurable piece of software how would you allow the user to specify the time intervals?

    Read the article

  • Memory management, and async operations: when does an object become nil?

    - by Kenny Winker
    I have a view that will be displaying downloaded images and text. I'd like to handle all the downloading asynchronously using ASIHTTPRequest, but I'm not sure how to go about notifying the view when downloads are finished... If I pass my view controller as the delegate of the ASIHTTPRequest, and then my view is destroyed (user navigates away) will it fail gracefully when it tries to message my view controller because the delegate is now nil? i.e. if i do this: UIViewController *myvc = [[UIViewController alloc] init]; request.delegate = myvc; [myvc release]; Do myvc, and request.delegate now == a pointer to nil? This is the problem with being self-taught... I'm kinda fuzzy on some basic concepts. Other ideas of how to handle this are welcome.

    Read the article

  • ASP.NET MVC Model Binding

    - by Noel
    If i have a Controller Action that may recieve both HTTP GET and HTTP POST from a number of different sources with each source sending different data e.g. Source1 performs a form POST with two form items Item1 and Item2 Source2 performs a GET where the data is contained in the query string (?ItemX=2&ItemY=3) Is it possible to have a controller action that will cater for all these cases and perform binding automatically e.g. public ActionResult Test(Dictionary data) { // Do work ... return View(); } Is this possible with a custom binder or some other way? Dont want to work directly with HttpContext.Request if possible

    Read the article

  • UISegmentedControl with UITableVIew NSRangeException

    - by Vivas
    Hi, I am using one UIViewController as shown: @interface RssViewController : UIViewController <UITableViewDataSource,UITableViewDelegate,BlogRssParserDelegate> I am displaying an RSS feed in the UITableView (in RssViewController) depending on the segment selected on the UISegmentedControl. My app crashes when I scroll the tableview then select another segment of the UISegmentedControl. For example I have two RSS feeds by default I am displaying the RSS feed at segment 0. This feed has 36 rows. The RSS feed that I load at segment 1 has only 5 rows. When I scroll the RSS feed at segment 0 THEN before the scrolling stops I switch to the RSS feed at segment 1 I crash the app with the following error: * Terminating app due to uncaught exception 'NSRangeException', reason: '* -[NSCFArray objectAtIndex:]: index (36) beyond bounds (0)' If I wait till the scrolling on the RSS feed at segment 0 stops THEN select segment 1, everything works fine. How can I stop this crashing? I wanted to reuse the same tableview because only the data changes. I can see that it is crashing because of the row count - I went from 36 rows down to 5 rows BUT how can I fix this? Any help / suggestions would be appreciated.

    Read the article

  • How to connect from ruby to MS Sql Server

    - by apetrov
    Hi Crowd! I'm trying to connect to the sql server 2005 database from *NIX machine: I have the following configuration: Linux 64bit ruby -v ruby 1.8.6 (2007-09-24 patchlevel 111) [x86_64-linux] important gems: dbd-odbc (0.2.4) dbi (0.4.1) active record sql server adapter - as plugin ruby-odbc 0.9996 (installed without any options.) unixODBC is installed freeTDS is installed cat /etc/odbcinst.ini [FreeTDS] Description = TDS driver (Sybase/MS SQL) Driver = /usr/lib/libtdsodbc.so Setup = /usr/lib/odbc/libtdsS.so CPTimeout = CPReuse = FileUsage = 1 DSN: DRIVER=FreeTDS;TDS_Version=8.0;SERVER=XXXX;DATABASE=XXX;Port=1433;uid=XXX;pwd=XXXX;" or DRIVER=/usr/lib/libtdsodbc.so;TDS_Version=8.0;SERVER=XXXX;DATABASE=XXX;Port=1433;uid=XXX;pwd=XXXX;" I receive the following error: >>ActiveRecord::Base.sqlserver_connection({"mode"=>"ODBC", "adapter"=>"sqlserver", "dsn"=>my_dns) DBI::DatabaseError: IM002 (0) [unixODBC][Driver Manager]Data source name not found, and no default driver specified from /usr/lib/ruby/1.8/DBD/ODBC/ODBC.rb:95:in `connect' from /usr/lib/ruby/1.8/dbi.rb:424:in `connect' from /usr/lib/ruby/1.8/dbi.rb:215:in `connect' from /opt/ublip/rails/current/vendor/plugins/activerecord-sqlserver-adapter/lib/active_record/connection_adapters/sqlserver_adapter.rb:47:in `sqlserver_connection' It looks like ODBC unable to find appropriate ODBC driver, but I have no ideas why. I had a problem with /usr/lib/libtdsodbc.so which is empty in default debian package free-tds dev, but i solved it with remove broken package and installation from sources. Will appreciate any thought! Thanks & Regards Note: I'm albe to connect using the same steps on mac 10.5

    Read the article

  • Default type-parametrized function literal class parameter

    - by doom2.wad
    Is this an intended behavior or is it a bug? Consider the following trait (be it a class, doesn't matter): trait P[T] { class Inner(val f: T => Unit = _ => println("nope")) } This is what I would have expected: scala> val p = new P[Int] { | val inner = new Inner | } p: java.lang.Object with P[Int]{def inner: this.Inner} = $anon$1@12192a9 scala> p.inner.f(5) nope But this? scala> val p = new P[Int] { | val inner = new Inner() { | println("some primary constructor code in here") | } | } <console>:6: error: type mismatch; found : (T) => Unit required: (Int) => Unit val inner = new Inner() { ^

    Read the article

  • How to insert an Array/Objet into SQL (bestpractice)

    - by Jason
    I need to store three items as an array in a single column and be able to quickly/easily modify that data in later functions. [---YOU CAN SKIP THIS PART IF YOU TRUST ME--] To be clear, I love and use x_ref tables all the time but an x_ref doesn't work here because this is not a one-to-many relationship. I am making a project management tool that among other things, assigns a user to a project and assigns hours to that project on a weekly basis, per user, sometimes for weeks many weeks into the future. Of course there are many projects, a project can have many team members, a team member can be involved with many projects at one time BUT its not one-to-many because a team member can be working many weeks on the same project but have different hours for different weeks. In other words, each object really is unique. Also/finally, this data can be changed at any time by any team-member - hence it needs to be easily to manipulate. Basically, I need to handle three values (the team member, the week we're talking about, and how many hours) dropped into a project row in the projects table (under the column for project team members) and treated as one item - a team member - that will actually be part of a larger array of all the team members involved on the project. [--END SKIP, START READING HERE :) --] So assuming that the application's general schema and relation tables aren't total crap and that we are in fact up against a wall in this one case to use an array/object as a value for this column, is there a best practice for that? Like a particular SQL data-type? A particular object/array format? CSV? JSON? XML? Most of the app is in C# but (for very odd reasons that I won't explain) we could really use any environment if there is a particular one that handles this well. For the moment, I am thinking either (webservice + JS/JSON) or PHP unserialize/serialize (but I am bit sketched out by the PHP solution because it seems a bit cumbersome when using ajax?) Thoughts anyone?

    Read the article

  • Is it OK to use images of GPL'd code in a CC 3.0 BY video?

    - by marcusw
    I am making a video in which I would like to use pictures of some Linux Kernel code. I am looking to release the finished product under the CC 3.0 BY license, but the Kernel is released under the GPL, which would not allow this if the code is in text format. However, since it will be in low-resolution, incredibly incomplete, non-usable, non-compilable, non-editable (at least without lots of finagling) format, would this constitute fair use or find another loophole to slip through? Thanks for the help, I will understand if this is considered off topic.

    Read the article

  • Article Marketing SEO - How Many Times Should I Put the Main Keyword in the Article?

    In order to get the articles that you write and submit for your article marketing campaign to be viewed, you need to do some on page SEO. In other words, you need to do some strategic things while writing your articles that will cause the search engines to put priority on your articles and give them good rankings for the keywords that you are optimizing your articles for. Before you write your articles, you should have chosen a specific keyword or keyword phrase to target while you write your article.

    Read the article

  • Make a Website - Would You Do it Yourself?

    You need a website for your own business that you're starting up, for school project or just want to create your own blog page and feel good about it? Would you create the website yourself or would you outsource it? We look at some of the reasons why you would do it yourself.

    Read the article

  • MySQL Query exceptions

    - by Wayne
    In one page, it should show records that has the following selected month from the drop down menu and it is set in the ?month=March So the query will do this $sql = "SELECT * FROM schedule WHERE month = '" . Clean($_GET['month']) . "' AND finished='0' ORDER BY date ASC"; But it shows records that has a value of 2 in the finished column and I don't want the query to include this. I've tried $sql = "SELECT * FROM schedule WHERE month = '" . Clean($_GET['month']) . "' AND finished='0' OR finished = '1' OR finished = '3' ORDER BY date ASC"; But it shows records on different months when it shouldn't be. So basically I want the record to exclude the records that has the value of 2 in the record that will not be shown in the page.

    Read the article

  • get equation from XML, AS3

    - by VideoDnd
    There's an variable in my swf I want to receive XML. It's an integer value in the form of an equation. How do I receive the XML value for 'formatcount'? My Variable //Variable I want to grab XML<br> //formatcount=int('want xml value to go here'); formatcount=int(count*count/100); Path formatcount = myXML.FORMATCOUNT.text() My XML <?xml version="1.0" encoding="utf-8"?> <SESSION> <TIMER TITLE="speed">1000</TIMER> <COUNT TITLE="starting position">10000</COUNT> <FORMATCOUNT TITLE="ramp">count*count/1000</FORMATCOUNT> </SESSION>

    Read the article

< Previous Page | 111 112 113 114 115 116 117 118 119 120 121 122  | Next Page >