Search Results

Search found 4711 results on 189 pages for 'documents'.

Page 25/189 | < Previous Page | 21 22 23 24 25 26 27 28 29 30 31 32  | Next Page >

  • File store: CouchDB vs SQL Server + file system

    - by Andrey
    I'm exploring different ways of storing user-uploaded files (all are MS Office documents or alikes) on our high load web site. It's currently designed to store documents as files and have a SQL database store all metadata for those files. I'm concerned about growing out of the storage server and SQL server performance when number of documents reaches hundreds of millions. I was reading a lot of good information about CouchDB including its built-in scalability and performance, but I'm not sure how storing files as attachments in CouchDB would compare to storing files on a file system in terms of performance. Anybody used CouchDB clusters for storing LARGE amounts of documents and in high load environment?

    Read the article

  • Paid service that will bulk convert HTML and text to PDF?

    - by SCS
    We have a repository of documents in HTML and text that we're looking to get into PDF format, but my boss (another developer) feels that writing this portion of the application will detract from the main development that we're doing at the moment. Is anyone aware of any services that will be able to do bulk conversions of these documents to PDF? Any services that will also provide thumbnails for the documents is something we're looking for as well.

    Read the article

  • Is there a way to create a cmd shortcut for a specific folder on W7 or/and W8?

    - by Hinstein
    Let say i have 3 different folders that i want to access with CMD C:\Users\Henok\Documents\Visual Studio 2012\Projects\TestApp1\Debug C:\Users\Henok\Documents\Visual Studio 2012\Projects\TestApp2\Debug C:\Users\Henok\Documents\Visual Studio 2012\Projects\TestApp3\Debug I wonder if there is a way to create 3 different cmd shortcuts to access those directory (folders) individually without changing the default cmd directory location. Forgive me for my broken English, and thanks for your time.

    Read the article

  • windows 2003 - why can't serial port be accessed remotely?

    - by Danny Staple
    we have recently installed the updates on some of our servers, which have a bit of hardware attached via USB that presents itself as a serial (COM) port. The strange behaviour is that if I start a cmd shell on the server via VNC, I can open the serial port. If I run a service and start it from there (telnet, jenkins) then I receive a "not found" error for it. IE: C:\Documents and Settings\some_user>echo 1 >COM4: C:\Documents and Settings\some_user> on the local cmd will work, and on the remote telnet will give: C:\Documents and Settings\some_user>echo 1 >COM4: The system cannot find the file specified. C:\Documents and Settings\some_user> I cannot see any security settings on the Device manager settings panel for this device.

    Read the article

  • Alternative Windows Offline Files + Windows Backup + Previous Version Setup

    - by Herson
    Currently our documents are all hosted in a Windows 7 box. Users can access the files using Windows share and the documents are available offline (windows 7 feature). The documents are being backed up daily by Windows 7 backup and restore utility. Users can access previous versions of the file (from the backups) using Windows Explorer "previous versions" feature. This setup is currently working well, except for the following: We would prefer to have access to hourly versions of the file, not daily. The previous version mechanism is tied up to the backup mechanism. Windows 7 performs a full backup every week and incremental backup everyday. The previous versions of a file is actually what are the available in the backups. If you 20GB documents and want to maintain at least three(3) year history, you will use at minimum 3 years * 52 weeks * 20GB or about 3TB even if there are few changes in the documents. Its pretty inefficient use of space. Looking up previous versions of a file is very slow (tens of minutes). This is probably related to the previous issue - Windows has to traverse its all of its backups. I am considering using SVN + autocommit/autoupdate tortoisesvn. It will have the following advantages: Backups are easy and will also backup the whole history of each documents. (Just backup the repository). Creating previous versions can be frequent. I think svn commit / update can be done every two minutes or so. Users can sync over the net. However, I can see the following issues: More conflicts than the original setup because both multiple users can now edit the same file even both are online, i.e. can connect to the SVN repo. The users can off course lock the file first before editing, but that would mean they have to adjust. Delay on propagation of file changes. On windows 7 file sharing, changes made by one online user will be instantaneously available to other online users. With the SVN setup, changes will only be propagated when the users execute the svn add/commit/update sequence. Delay will be probably a few minutes. This workflow will no longer work: "Hi, I just edited document X, can you have a quick look?" I would like to ask the opinion of the community for alternative setups, or improvements on the above setups to work out the kinks.

    Read the article

  • Soft links between samba profile and profile.V2

    - by Alex Rose
    I am currently using a samba server to host windows sessions. For the moment, every machine runs windows XP, but in an upcoming hardware upgrade, I will have new Windows 7 machines installed. The problem is that the user directory changed from XP to 7 (for instance 'My Documents' became 'Documents'). To solve that, samba created a folder called profile.V2, which contains files and folders for Windows 7. Now I would like to link the two profiles folders ('profile' from windows XP, 'profile.V2' from windows 7) so that a user can logon from a XP or 7 machine and still have access to the same files. I tried to create softlinks between folder pairs ('Documents' -­­ 'My documents') and it seems to work. My question is : is it likely to create issues in the future?

    Read the article

  • Windows shortcut doesn't work: how to enclose filenames with spaces?

    - by Olivier Pons
    Here's my problem: I've made a shortcut on my Desktop (Windows XP (sigh)) like this: C:\WINDOWS\system32\cmd.exe /k "mysql -u root drupal-defaultadm < ^"C:\Documents and Settings\AAA\Mes documents\Downloads\01.drupal-defaultadm.sql^" && exit" When I double click on it, the DOS prompt is opened, but I get this error: File not found. C:\wamp\bin\mysql\mysql5.5.24\bin> So I'm trying to do the command "by hand" and only removing the ^: C:\wamp\bin\mysql\mysql5.5.24\bin>mysql -u root drupal-defaultadm < "C:\Documents and Settings\AAA\Mes documents\Downloads\01.drupal-defaultadm.sql" C:\wamp\bin\mysql\mysql5.5.24\bin> And gives no error. I'm pretty sure this has to do with the whitespaces enclosed with ". How shall I do to make it work?

    Read the article

  • Changing Word mail merge data source locations in bulk?

    - by Daft Viking
    I've just moved a number of Word mail merge files, and a number of Excel spreadsheets that are the data sources for the mail merges, from a Windows XP computer to a Windows 7 computer, and now all the paths for the merge sources are incorrect (used to be c:\documents and settings\user\my documents.... now c:\users\documents....). While I can correct the path of the data source in each file individually, I was hoping that there would be some way of updating the files in bulk, as there are a relatively large number of them. Word 2007 is what is being used, but the documents are all in the previous DOC format (not DOCX).

    Read the article

  • Question about my sorting algorithm in C++

    - by davit-datuashvili
    i have following code in c++ #include <iostream> using namespace std; void qsort5(int a[],int n){ int i; int j; if (n<=1) return; for (i=1;i<n;i++) j=0; if (a[i]<a[0]) swap(++j,i,a); swap(0,j,a); qsort5(a,j); qsort(a+j+1,n-j-1); } int main() { return 0; } void swap(int i,int j,int a[]) { int t=a[i]; a[i]=a[j]; a[j]=t; } i have problem 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(13) : error C2780: 'void std::swap(std::basic_string<_Elem,_Traits,_Alloc> &,std::basic_string<_Elem,_Traits,_Alloc> &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\xstring(2203) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(13) : error C2780: 'void std::swap(std::pair<_Ty1,_Ty2> &,std::pair<_Ty1,_Ty2> &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\utility(76) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(13) : error C2780: 'void std::swap(_Ty &,_Ty &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\utility(16) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(14) : error C2780: 'void std::swap(std::basic_string<_Elem,_Traits,_Alloc> &,std::basic_string<_Elem,_Traits,_Alloc> &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\xstring(2203) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(14) : error C2780: 'void std::swap(std::pair<_Ty1,_Ty2> &,std::pair<_Ty1,_Ty2> &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\utility(76) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(14) : error C2780: 'void std::swap(_Ty &,_Ty &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\utility(16) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(16) : error C2661: 'qsort' : no overloaded function takes 2 arguments 1>Build log was saved at "file://c:\Users\dato\Documents\Visual Studio 2008\Projects\qsort5\qsort5\Debug\BuildLog.htm" please help

    Read the article

  • mac os x, find all symbolic links that point to files on a different volume

    - by Eddified
    In my ~ dir, I have some symlinks that point to "/Volumes/Macintosh HD 2/..." and I want to find them all recursively. A look at the man page for 'find' says the '-lname' argument will search the symbolic link contents. It appears to work, but not recursively: $ pwd /Users/myusername $ sudo find . -lname '/Volumes*' $ cd Documents/ $ sudo find . -lname '/Volumes*' ./Documents on Win7 ./work.rtf What's going on? How can I make this work recursively? -- The 'find' program is supposed to always work recursively. I checked perms, they look ok, but as you can see I used "sudo" just to be sure... no dice. $ ls -ld Documents/ drwx------+ 14 myusername staff 476 Jan 12 16:32 Documents/

    Read the article

  • How do I escape spaces in command line in Windows without using quotation marks?

    - by David
    For example what is the alternative to this command without quotation marks: CD "c:\Documents and Settings" The full reason I don't want to use quotation marks is that this command DOES work: SVN add mypathname\*.* but this command DOES NOT work : SVN add "mypathname\*.*" The problem being when I change mypathname for a path with spaces in it I need to quote the whole thing. For example: SVN add "c:\Documents and Settings\username\svn\*.*" But when I try this I get the following error message: svn: warning: 'c:\Documents and Settings\username\svn\*.*' not found

    Read the article

  • Windows 7 showing subfiles when saving files from internet?

    - by mezamorphic
    Within the last week, whenever I go to save a file from the internet when I save to "Documents" it doesn't show just what is in My Documents at the higher level, it shows all the contents of every sub-folder. I haven't changed any settings- how can I get this back to normal (so that files of sub-folders are not shown when saving)? I tried with both Chrome and IE so it's not the browser doing it. The problem seems to only happen with My Documents, no other folders like Downloads etc.

    Read the article

  • Directories will list, but are not recognize by cd

    - by mdimond
    New to terminal and having problems out of the gate. Using Terminal 2.1.2 on a Mac running 10.6.8. Using the "ls Documents" will list the contents, but when I try to change directories, which I tried several different ways, I get the following results: new-host-2:~ MDimond$ cd. -bash: cd.: command not found new-host-2:~ MDimond$ cd./Users/MDimond/Documents -bash: cd./Users/MDimond/Documents: No such file or directory new-host-2:~ MDimond$ cd. /Documents -bash: cd.: command not found The /usr/bin has the cd command listed; the /bin does not. Any assistance would be greatly appreciated. Thanks, md

    Read the article

  • [Visual Studio Extension Of The Day] Test Scribe for Visual Studio Ultimate 2010 and Test Professional 2010

    - by Hosam Kamel
      Test Scribe is a documentation power tool designed to construct documents directly from the TFS for test plan and test run artifacts for the purpose of discussion, reporting etc... . Known Issues/Limitations Customizing the generated report by changing the template, adding comments, including attachments etc… is not supported While opening a test plan summary document in  Office 2007, if you get the warning: “The file Test Plan Summary cannot be opened because there are problems with the contents” (with Details: ‘The file is corrupt and cannot be opened’), click ‘OK’. Then, click ‘Yes’ to recover the contents of the document. This will then open the document in Office 2007. The same problem is not found in Office 2010. Generated documents are stored by default in the “My documents” folder. The output path of the generated report cannot be modified. Exporting word documents for individual test suites or test cases in a test plan is not supported. Download it from Visual Studio Extension Manager Originally posted at "Hosam Kamel| Developer & Platform Evangelist" http://blogs.msdn.com/hkamel

    Read the article

  • News about Oracle Documaker Enterprise Edition

    - by Susanne Hale
    Updates come from the Documaker front on two counts: Oracle Documaker Awarded XCelent Award for Best Functionality Celent has published a NEW report entitled Document Automation Solution Vendors for Insurers 2011. In the evaluation, Oracle received the XCelent award for Functionality, which recognizes solutions as the leader in this category of the evaluation. According to Celent, “Insurers need to address issues related to the creation and handling of all sorts of documents. Key issues in document creation are complexity and volume. Today, most document automation vendors provide an array of features to cope with the complexity and volume of documents insurers need to generate.” The report ranks ten solution providers on Technology, Functionality, Market Penetration, and Services. Each profile provides detailed information about the vendor and its document automation system, the professional services and support staff it offers, product features, insurance customers and reference feedback, its technology, implementation process, and pricing.  A summary of the report is available at Celent’s web site. Documaker User Group in Wisconsin Holds First Meeting Oracle Documaker users in Wisconsin made the first Documaker User Group meeting a great success, with representation from eight companies. On April 19, over 25 attendees got together to share information, best practices, experiences and concepts related to Documaker and enterprise document automation; they were also able to share feedback with Documaker product management. One insurer shared how they publish and deliver documents to both internal and external customers as quickly and cost effectively as possible, since providing point of sale documents to the sales force in real time is crucial to obtaining and maintaining the book of business. They outlined best practices that ensure consistent development and testing strategies processes are in place to maximize performance and reliability. And, they gave an overview of the supporting applications they developed to monitor and improve performance as well as monitor and track each transaction. Wisconsin User Group meeting photos are posted on the Oracle Insurance Facebook page http://www.facebook.com/OracleInsurance. The Wisconsin User Group will meet again on October 26. If you and other Documaker customers in your area are interested in setting up a user group in your area, please contact Susanne Hale ([email protected]), (703) 927-0863.

    Read the article

  • Show Notes: Bob Hensle on IT Strategies from Oracle

    - by Bob Rhubart
    The latest ArchBeat Podcast (RSS) features a conversation with Oracle Enterprise Architecture director Bob Hensle (LinkedIn). Bob talks about IT Strategies from Oracle, an extensive library of reference architectures, best practices, and other documents now available (it’s a freebie!) to registered Oracle Technology Network members. Listen to Part 1 Bob offers some background on the IT Strategies from Oracle project and an overview of the included documents. Listen to Part 2 (Feb 16) A discussion of how SOA and other issues are reflected in the IT Strategies documents. Share your feedback on any of the documents in the IT Strategies from Oracle Library: [email protected] For a nice complement to the IT Strategies from Oracle Library, check out Oracle Experiences in Enterprise Architecture, an ongoing series of short essays from members of the Oracle Enterprise Architecture team based on their field experience. In the Pipeline ArchBeat programs in the works include an interview with Dr. Frank Munz, the author of Middleware and Cloud Computing, excerpts from another architect virtual meet-up, and a conversation with Oracle ACE Director Debra Lilley about her insight into Fusion Applications. . Stayed tuned: RSS Technorati Tags: oracle,oracle technology network,software architecture,enterprise architecture,reference architecture del.icio.us Tags: oracle,oracle technology network,software architecture,enterprise architecture,reference architecture

    Read the article

  • Oracle IRM video demonstration of seperating duties of document security

    - by Simon Thorpe
    One thing an Information Rights Management technology should do well is separate out three main areas of responsibility.The business process of defining and controlling the classifications to which content is secured and the definition of the roles employees, customers, partners and contractors have when accessing secured content. Allow IT to manage the server and perform the role of authorizing the creation of new classifications to meet business needs but yet once the classification has been created and handed off to the business, IT no longer plays a role on the ongoing management. Empower the business to take ownership of classifications to which their own content is secured. For example an employee who is leading an acquisition project should be responsible for defining who has access to confidential project documents. This person should be able to manage the rights users have in the classification and also be the point of contact for those wishing to gain rights. Oracle IRM has since it's creation in the late 1990's had this core model at the heart of its design. Due in part to the important seperation of rights from the documents themselves, Oracle IRM places the right functionality within the right parts of the business. For example some IRM technologies allow the end user to make decisions about what users can print, edit or save a secured document. This in practice results in a wide variety of content secured with a plethora of options that don't conform to any policy. With Oracle IRM users choose from a list of classifications to which they have been given the ability to secure information against. Their role in the classification was given to them by the business owner of the classification, yet the definition of the role resides within the realm of corporate security who own the overall business classification policies. It is this type of design and philosophy in Oracle IRM that makes it an enterprise solution that works beyond a few users and a few secured documents to hundreds of thousands of users and millions of documents. This following video shows how Oracle IRM 11g, the market leading document security solution, lets the security organization manage and create classifications whilst the business owns and manages them. If you want to experience using Oracle IRM secured content and the effects of different roles users have, why not sign up for our free demonstration.

    Read the article

  • Designing a Content-Based ETL Process with .NET and SFDC

    - by Patrick
    As my firm makes the transition to using SFDC as our main operational system, we've spun together a couple of SFDC portals where we can post customer-specific documents to be viewed at will. As such, we've had the need for pseudo-ETL applications to be implemented that are able to extract metadata from the documents our analysts generate internally (most are industry-standard PDFs, XML, or MS Office formats) and place in networked "queue" folders. From there, our applications scoop of the queued documents and upload them to the appropriate SFDC CRM Content Library along with some select pieces of metadata. I've mostly used DbAmp to broker communication with SFDC (DbAmp is a Linked Server provider that allows you to use SQL conventions to interact with your SFDC Org data). I've been able to create [console] applications in C# that work pretty well, and they're usually structured something like this: static void Main() { // Load parameters from app.config. // Get documents from queue. var files = someInterface.GetFiles(someFilterOrRegexPattern); foreach (var file in files) { // Extract metadata from the file. // Validate some attributes of the file; add any validation errors to an in-memory // structure (e.g. List<ValidationErrors>). if (isValid) { var fileData = File.ReadAllBytes(file); // Upload using some wrapper for an ORM or DAL someInterface.Upload(fileData, meta.Param1, meta.Param2, ...); } else { // Bounce the file } } // Report any validation errors (via message bus or SMTP or some such). } And that's pretty much it. Most of the time I wrap all these operations in a "Worker" class that takes the needed interfaces as constructor parameters. This approach has worked reasonably well, but I just get this feeling in my gut that there's something awful about it and would love some feedback. Is writing an ETL process as a C# Console app a bad idea? I'm also wondering if there are some design patterns that would be useful in this scenario that I'm clearly overlooking. Thanks in advance!

    Read the article

  • Designing Content-Based ETL Process with .NET and SFDC

    - by Patrick
    As my firm makes the transition to using SFDC as our main operational system, we've spun together a couple of SFDC portals where we can post customer-specific documents to be viewed at will. As such, we've had the need for pseudo-ETL applications to be implemented that are able to extract metadata from the documents our analysts generate internally (most are industry-standard PDFs, XML, or MS Office formats) and place in networked "queue" folders. From there, our applications scoop of the queued documents and upload them to the appropriate SFDC CRM Content Library along with some select pieces of metadata. I've mostly used DbAmp to broker communication with SFDC (DbAmp is a Linked Server provider that allows you to use SQL conventions to interact with your SFDC Org data). I've been able to create [console] applications in C# that work pretty well, and they're usually structured something like this: static void Main() { // Load parameters from app.config. // Get documents from queue. var files = someInterface.GetFiles(someFilterOrRegexPattern); foreach (var file in files) { // Extract metadata from the file. // Validate some attributes of the file; add any validation errors to an in-memory // structure (e.g. List<ValidationErrors>). if (isValid) { // Upload using some wrapper for an ORM an someInterface.Upload(meta.Param1, meta.Param2, ...); } else { // Bounce the file } } // Report any validation errors (via message bus or SMTP or some such). } And that's pretty much it. Most of the time I wrap all these operations in a "Worker" class that takes the needed interfaces as constructor parameters. This approach has worked reasonably well, but I just get this feeling in my gut that there's something awful about it and would love some feedback. Is writing an ETL process as a C# Console app a bad idea? I'm also wondering if there are some design patterns that would be useful in this scenario that I'm clearly overlooking. Thanks in advance!

    Read the article

  • Oracle Payables Accounting Information Centres

    - by user793553
    Payables Accounting Information Centers Do you have error when trying to create accounting in Payables ? Do you have questions about Payables Accounting ? The following new Information centers include solutions to many of the issues and answers to your questions.          Overview  > Hot Topics > Resources Information Center: Oracle Payables Accounting R12 ( Doc ID 1476284.2) Under Hot Topics: Include details about: Recommended patches, new solution documents, GDF patches, Announcements, links to Communities. Under Resources: Popular Troubleshooting documents, Accounting Documentation, popular Knowledge documents, links to other info centers.  Use   Information Center: Using Oracle Payables Accounting (Doc ID 1478842.2) Include product documentation, Reconciliation, Upgrade, Performance, Undo Accounting, Trace and FND Debug, and Diagnostics. Troubleshoot Information Center: Troubleshooting Oracle Payables Accounting (Doc ID 1478863.2) Include Troubleshooting documents, Period close, GL Transfer, Trial Balance, Budget, Health Check. Do you have other questions..............? Post your question to the Oracle Payables Community 1. Log into My Oracle Support. 2. Click on the 'Community' link at the top of the page. 3. Click in 'Find a Community' field and enter Payables  4. Double click on Payables in the list. OR   Click Here

    Read the article

  • Advice needed on Process - Service Design

    - by user99314
    Need some advice from experts on designing a flow. Create a service that will read a csv file which may contain anywhere over 6000+ rows of individual ids as shown in the sample below. Need to read that file and go to oracle database and fetch a vname,vnumber,vid of each id in the csv and then go to document repository i.e. Oracle UCM and download all documents matching vname,vnumber,vid there can be =0 documents for each vname,vnumber,vid and save them on a file system. UCM exposes a webservice to dowload the documents. Finally create a new csv appending the filenames that are downloaded for each id. Need to keep track of any errors but need to make sure to go over the whole ids in the csv to download the documents and skip in case of errors. Need some advice on how to go about designing this as there may be over 6000+ rows in a csv file and looping it and hitting the database for each individual id and then hitting a UCM may be a bit expensive so open for any idea. How to go by designing this solution. Wondering if messaging can be helpful here or offloading process of getting the vname,vnumber,vid to pl/sql packages, creating staging tables etc. Initial csv that contains ids: **ID** 12345a 12s345 3456fr we9795 we9797 Final csv output: **ID Files Downloaded from UCM** 12345a a.pdf,b.doc,d.txt 12s345 a1.pdf,s2.pdf,f4.gif 3456fr b.xls we9795 we9797 x.doc Thanks

    Read the article

  • Using a "white list" for extracting terms for Text Mining

    - by [email protected]
    In Part 1 of my post on "Generating cluster names from a document clustering model" (part 1, part 2, part 3), I showed how to build a clustering model from text documents using Oracle Data Miner, which automates preparing data for text mining. In this process we specified a custom stoplist and lexer and relied on Oracle Text to identify important terms.  However, there is an alternative approach, the white list, which uses a thesaurus object with the Oracle Text CTXRULE index to allow you to specify the important terms. INTRODUCTIONA stoplist is used to exclude, i.e., black list, specific words in your documents from being indexed. For example, words like a, if, and, or, and but normally add no value when text mining. Other words can also be excluded if they do not help to differentiate documents, e.g., the word Oracle is ubiquitous in the Oracle product literature. One problem with stoplists is determining which words to specify. This usually requires inspecting the terms that are extracted, manually identifying which ones you don't want, and then re-indexing the documents to determine if you missed any. Since a corpus of documents could contain thousands of words, this could be a tedious exercise. Moreover, since every word is considered as an individual token, a term excluded in one context may be needed to help identify a term in another context. For example, in our Oracle product literature example, the words "Oracle Data Mining" taken individually are not particular helpful. The term "Oracle" may be found in nearly all documents, as with the term "Data." The term "Mining" is more unique, but could also refer to the Mining industry. If we exclude "Oracle" and "Data" by specifying them in the stoplist, we lose valuable information. But it we include them, they may introduce too much noise. Still, when you have a broad vocabulary or don't have a list of specific terms of interest, you rely on the text engine to identify important terms, often by computing the term frequency - inverse document frequency metric. (This is effectively a weight associated with each term indicating its relative importance in a document within a collection of documents. We'll revisit this later.) The results using this technique is often quite valuable. As noted above, an alternative to the subtractive nature of the stoplist is to specify a white list, or a list of terms--perhaps multi-word--that we want to extract and use for data mining. The obvious downside to this approach is the need to specify the set of terms of interest. However, this may not be as daunting a task as it seems. For example, in a given domain (Oracle product literature), there is often a recognized glossary, or a list of keywords and phrases (Oracle product names, industry names, product categories, etc.). Being able to identify multi-word terms, e.g., "Oracle Data Mining" or "Customer Relationship Management" as a single token can greatly increase the quality of the data mining results. The remainder of this post and subsequent posts will focus on how to produce a dataset that contains white list terms, suitable for mining. CREATING A WHITE LIST We'll leverage the thesaurus capability of Oracle Text. Using a thesaurus, we create a set of rules that are in effect our mapping from single and multi-word terms to the tokens used to represent those terms. For example, "Oracle Data Mining" becomes "ORACLEDATAMINING." First, we'll create and populate a mapping table called my_term_token_map. All text has been converted to upper case and values in the TERM column are intended to be mapped to the token in the TOKEN column. TERM                                TOKEN DATA MINING                         DATAMINING ORACLE DATA MINING                  ORACLEDATAMINING 11G                                 ORACLE11G JAVA                                JAVA CRM                                 CRM CUSTOMER RELATIONSHIP MANAGEMENT    CRM ... Next, we'll create a thesaurus object my_thesaurus and a rules table my_thesaurus_rules: CTX_THES.CREATE_THESAURUS('my_thesaurus', FALSE); CREATE TABLE my_thesaurus_rules (main_term     VARCHAR2(100),                                  query_string  VARCHAR2(400)); We next populate the thesaurus object and rules table using the term token map. A cursor is defined over my_term_token_map. As we iterate over  the rows, we insert a synonym relationship 'SYN' into the thesaurus. We also insert into the table my_thesaurus_rules the main term, and the corresponding query string, which specifies synonyms for the token in the thesaurus. DECLARE   cursor c2 is     select token, term     from my_term_token_map; BEGIN   for r_c2 in c2 loop     CTX_THES.CREATE_RELATION('my_thesaurus',r_c2.token,'SYN',r_c2.term);     EXECUTE IMMEDIATE 'insert into my_thesaurus_rules values                        (:1,''SYN(' || r_c2.token || ', my_thesaurus)'')'     using r_c2.token;   end loop; END; We are effectively inserting the token to return and the corresponding query that will look up synonyms in our thesaurus into the my_thesaurus_rules table, for example:     'ORACLEDATAMINING'        SYN ('ORACLEDATAMINING', my_thesaurus)At this point, we create a CTXRULE index on the my_thesaurus_rules table: create index my_thesaurus_rules_idx on        my_thesaurus_rules(query_string)        indextype is ctxsys.ctxrule; In my next post, this index will be used to extract the tokens that match each of the rules specified. We'll then compute the tf-idf weights for each of the terms and create a nested table suitable for mining.

    Read the article

  • Clean up after Visual Studio

    - by psheriff
    As programmer’s we know that if we create a temporary file during the running of our application we need to make sure it is removed when the application or process is complete. We do this, but why can’t Microsoft do it? Visual Studio leaves tons of temporary files all over your hard drive. This is why, over time, your computer loses hard disk space. This blog post will show you some of the most common places where these files are left and which ones you can safely delete..NET Left OversVisual Studio is a great development environment for creating applications quickly. However, it will leave a lot of miscellaneous files all over your hard drive. There are a few locations on your hard drive that you should be checking to see if there are left-over folders or files that you can delete. I have attempted to gather as much data as I can about the various versions of .NET and operating systems. Of course, your mileage may vary on the folders and files I list here. In fact, this problem is so prevalent that PDSA has created a Computer Cleaner specifically for the Visual Studio developer.  Instructions for downloading our PDSA Developer Utilities (of which Computer Cleaner is one) are at the end of this blog entry.Each version of Visual Studio will create “temporary” files in different folders. The problem is that the files created are not always “temporary”. Most of the time these files do not get cleaned up like they should. Let’s look at some of the folders that you should periodically review and delete files within these folders.Temporary ASP.NET FilesAs you create and run ASP.NET applications from Visual Studio temporary files are placed into the <sysdrive>:\Windows\Microsoft.NET\Framework[64]\<vernum>\Temporary ASP.NET Files folder. The folders and files under this folder can be removed with no harm to your development computer. Do not remove the "Temporary ASP.NET Files" folder itself, just the folders underneath this folder. If you use IIS for ASP.NET development, you may need to run the iisreset.exe utility from the command prompt prior to deleting any files/folder under this folder. IIS will sometimes keep files in use in this folder and iisreset will release the locks so the files/folders can be deleted.Website CacheThis folder is similar to the ASP.NET Temporary Files folder in that it contains files from ASP.NET applications run from Visual Studio. This folder is located in each users local settings folder. The location will be a little different on each operating system. For example on Windows Vista/Windows 7, the folder is located at <sysdrive>:\Users\<UserName>\AppData\Local\Microsoft\WebsiteCache. If you are running Windows XP this folder is located at <sysdrive>:\ Documents and Settings\<UserName>\Local Settings\Application Data\Microsoft\WebsiteCache. Check these locations periodically and delete all files and folders under this directory.Visual Studio BackupThis backup folder is used by Visual Studio to store temporary files while you develop in Visual Studio. This folder never gets cleaned out, so you should periodically delete all files and folders under this directory. On Windows XP, this folder is located at <sysdrive>:\Documents and Settings\<UserName>\My Documents\Visual Studio 200[5|8]\Backup Files. On Windows Vista/Windows 7 this folder is located at <sysdrive>:\Users\<UserName>\Documents\Visual Studio 200[5|8]\.Assembly CacheNo, this is not the global assembly cache (GAC). It appears that this cache is only created when doing WPF or Silverlight development with Visual Studio 2008 or Visual Studio 2010. This folder is located in <sysdrive>:\ Users\<UserName>\AppData\Local\assembly\dl3 on Windows Vista/Windows 7. On Windows XP this folder is located at <sysdrive>:\ Documents and Settings\<UserName>\Local Settings\Application Data\assembly. If you have not done any WPF or Silverlight development, you may not find this particular folder on your machine.Project AssembliesThis is yet another folder where Visual Studio stores temporary files. You will find a folder for each project you have opened and worked on. This folder is located at <sysdrive>:\Documents and Settings\<UserName>Local Settings\Application Data\Microsoft\Visual Studio\[8|9].0\ProjectAssemblies on Windows XP. On Microsoft Vista/Windows 7 you will find this folder at <sysdrive>:\Users\<UserName>\AppData\Local\Microsoft\Visual Studio\[8|9].0\ProjectAssemblies.Remember not all of these folders will appear on your particular machine. Which ones do show up will depend on what version of Visual Studio you are using, whether or not you are doing desktop or web development, and the operating system you are using.SummaryTaking the time to periodically clean up after Visual Studio will aid in keeping your computer running quickly and increase the space on your hard drive. Another place to make sure you are cleaning up is your TEMP folder. Check your OS settings for the location of your particular TEMP folder and be sure to delete any files in here that are not in use. I routinely clean up the files and folders described in this blog post and I find that I actually eliminate errors in Visual Studio and I increase my hard disk space.NEW! PDSA has just published a “pre-release” of our PDSA Developer Utilities at http://www.pdsa.com/DeveloperUtilities that contains a Computer Cleaner utility which will clean up the above-mentioned folders, as well as a lot of other miscellaneous folders that get Visual Studio build-up. You can download a free trial at http://www.pdsa.com/DeveloperUtilities. If you wish to purchase our utilities through the month of November, 2011 you can use the RSVP code: DUNOV11 to get them for only $39. This is $40 off the regular price.NOTE: You can download this article and many samples like the one shown in this blog entry at my website. http://www.pdsa.com/downloads. Select “Tips and Tricks”, then “Developer Machine Clean Up” from the drop down list.Good Luck with your Coding,Paul Sheriff** SPECIAL OFFER FOR MY BLOG READERS **We frequently offer a FREE gift for readers of my blog. Visit http://www.pdsa.com/Event/Blog for your FREE gift!

    Read the article

< Previous Page | 21 22 23 24 25 26 27 28 29 30 31 32  | Next Page >