Search Results

Search found 1124 results on 45 pages for 'indexing'.

Page 12/45 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >

Indexing table with duplicates MySQL/SQL Server with millions of records

- by Tesnep

I need help in indexing in MySQL. I have a table in MySQL with following rows: ID Store_ID Feature_ID Order_ID Viewed_Date Deal_ID IsTrial The ID is auto generated. Store_ID goes from 1 - 8. Feature_ID from 1 - let's say 100. Viewed Date is Date and time on which the data is inserted. IsTrial is either 0 or 1. You can ignore Order_ID and Deal_ID from this discussion. There are millions of data in the table and we have a reporting backend that needs to view the number of views in a certain period or overall where trial is 0 for a particular store id and for a particular feature. The query takes the form of: select count(viewed_date) from theTable where viewed_date between '2009-12-01' and '2010-12-31' and store_id = '2' and feature_id = '12' and Istrial = 0 In SQL Server you can have a filtered index to use for Istrial. Is there anything similar to this in MySQL? Also, Store_ID and Feature_ID have a lot of duplicate data. I created an index using Store_ID and Feature_ID. Although this seems to have decreased the search period, I need better improvement than this. Right now I have more than 4 million rows. To search for a particular query like the one above, it looks at 3.5 million rows in order to give me the count of 500k rows. PS. I forgot to add view_date filter in the query. Now I have done this.

Read the article
Indexing with pointer C/C++

- by Leavenotrace

Hey I'm trying to write a program to carry out newtons method and find the roots of the equation exp(-x)-(x^2)+3. It works in so far as finding the root, but I also want it to print out the root after each iteration but I can't get it to work, Could anyone point out my mistake I think its something to do with my indexing? Thanks a million :) #include <stdio.h> #include <math.h> #include <malloc.h> //Define Functions: double evalf(double x) { double answer=exp(-x)-(x*x)+3; return(answer); } double evalfprime(double x) { double answer=-exp(-x)-2*x; return(answer); } double *newton(double initialrt,double accuracy,double *data) { double root[102]; data=root; int maxit = 0; root[0] = initialrt; for (int i=1;i<102;i++) { *(data+i)=*(data+i-1)-evalf(*(data+i-1))/evalfprime(*(data+i-1)); if(fabs(*(data+i)-*(data+i-1))<accuracy) { maxit=i; break; } maxit=i; } if((maxit+1==102)&&(fabs(*(data+maxit)-*(data+maxit-1))>accuracy)) { printf("\nMax iteration reached, method terminated"); } else { printf("\nMethod successful"); printf("\nNumber of iterations: %d\nRoot Estimate: %lf\n",maxit+1,*(data+maxit)); } return(data); } int main() { double root,accuracy; double *data=(double*)malloc(sizeof(double)*102); printf("NEWTONS METHOD PROGRAMME:\nEquation: f(x)=exp(-x)-x^2+3=0\nMax No iterations=100\n\nEnter initial root estimate\n>> "); scanf("%lf",&root); _flushall(); printf("\nEnter accuracy required:\n>>"); scanf("%lf",&accuracy); *data= *newton(root,accuracy,data); printf("Iteration Root Error\n "); printf("%d %lf \n", 0,*(data)); for(int i=1;i<102;i++) { printf("%d %5.5lf %5.5lf\n", i,*(data+i),*(data+i)-*(data+i-1)); if(*(data+i*sizeof(double))-*(data+i*sizeof(double)-1)==0) { break; } } getchar(); getchar(); free(data); return(0); }

Read the article
Explaining verity index and document search limits

- by Ahmad

As present, we currently have a CF8 standard edition server which have some limitations around verity indexing. According to Adobe Verity Server has the following document search limits (limits are for all collections registered to Verity Server): - 10,000 documents for ColdFusion Developer Edition - 125,000 documents for ColdFusion Standard Edition - 250,000 documents for ColdFusion Enterprise Edition We have now reached a stage where the server wide number of documents indexed exceed 125k. However, the largest verity collection consists of about 25k documents(and this is expected to grow). Only one collection is ever searched at a time. In my understanding, this means that I can still search an entire collection with no restrictions. Is this correct? Or does it mean that only documents that were indexed across all collection prior to reaching the limit are actually searchable? We are considering moving to CF9 standard as a solution to this and to use the Solr solution which has no restrictions. The coldfusionjedi highlights some differences between Verity and Solr. However, before we upgrade I am trying to gain a clearer understanding of this before we commit to an upgrade. Can someone provide me a clear explanation as to what this means and how it actually affects verity searching and indexing?

Read the article
Couple o' quick questions on Apache Lucene

- by Doug

-- I don't want to start any religious wars, but a quick google search indicates that Apache Lucene is the preferred open source tool for indexing and searching. Are there others? -- What file format does Lucene use to store its index file(s)? Thank is advance. Doug

Read the article
How can I index HTML documents?

- by Swami

I am using Lucene .NEt to do full-text searching. Till now I have been indexing PDF docs, but now I have a few webpages that I need to index. What's the best/easiest way to index HTML documents to add to my Lucene index? I am using .NET/C#

Read the article
MongoDB query against geospatial index with maxDistance fails from node.js client

- by user1735497

I want to query against a geospatial index in mongo-db (designed after this tutorial http://www.mongodb.org/display/DOCS/Geospatial+Indexing). So when I execute this from the shell everything works fine: db.sellingpoints.find(( { location : { $near: [48.190120, 16.270895], $maxDistance: 7 / 111.2 } } ); but the same query from my nodejs application (using mongoskin or mongoose), won't return any results until i set the distance-value to a very high number (5690) db.collection('sellingpoints') .find({ location: { $near: [lat,lng], $maxDistance: distance / 111.2} }) .limit(limit) .toArray(callback); Has someone any idea how to fix that?

Read the article
How to handle very frequent updates to a Lucene index

- by fsm

I am trying to prototype an indexing/search application which uses very volatile indexing data sources (forums, social networks etc), here are some of the performance requirements, Very fast turn-around time (by this I mean that any new data (such as a new message on a forum) should be available in the search results very soon (less than a minute)) I need to discard old documents on a fairly regular basis to ensure that the search results are not dated. Last but not least, the search application needs to be responsive. (latency on the order of 100 milliseconds, and should support at least 10 qps) All of the requirements I have currently can be met w/o using Lucene (and that would let me satisfy all 1,2 and 3), but I am anticipating other requirements in the future (like search relevance etc) which Lucene makes easier to implement. However, since Lucene is designed for use cases far more complex than the one I'm currently working on, I'm having a hard time satisfying my performance requirements. Here are some questions, a. I read that the optimize() method in the IndexWriter class is expensive, and should not be used by applications that do frequent updates, what are the alternatives? b. In order to do incremental updates, I need to keep committing new data, and also keep refreshing the index reader to make sure it has the new data available. These are going to affect 1 and 3 above. Should I try duplicate indices? What are some common approaches to solving this problem? c. I know that Lucene provides a delete method, which lets you delete all documents that match a certain query, in my case, I need to delete all documents which are older than a certain age, now one option is to add a date field to every document and use that to delete documents later. Is it possible to do range queries on document ids (I can create my own id field since I think that the one created by lucene keeps changing) to delete documents? Is it any faster than comparing dates represented as strings? I know these are very open questions, so I am not looking for a detailed answer, I will try to treat all of your answers as suggestions and use them to inform my design. Thanks! Please let me know if you need any other information.

Read the article
How to force tracker to re-index a folder?

- by ubuntudroid

I'm using the tracker indexing tool to search for and in files on my Ubuntu 10.10 amd64 system. Having recently added some of files into a single folder I wondered how to force tracker to re-index this folder so I would be able to perform search-queries on these files. Any ideas? A terminal command would be completely sufficient.

Read the article
Does table (string, string) require index?

- by abatishchev

In my database running on SQL Server 2008 R2 I have a special table for global variables: CREATE TABLE global_variables ( name NVARCHAR(50), value NVARCHAR(50) NOT NULL CONSTRAINT PK_global_variables PRIMARY KEY CLUSTERED ( name ASC ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO Does such table require indexing on value or not?

Read the article
Unable to browse some pdfs and docs.

- by JamesEggers

I have a web site that uses Microsoft Indexing Service to index and query a directory that holds various documents of type pdf, rtf, mht, and doc. The indexing and querying works well (for the most part); however, some files will load while others will not. This is a Windows Server 2003 box running the site using IIS 6. The indexed directory is a sub directory off of the site's root directory (i.e. http://my.domain.com/files/). The file paths are accurate in the URL; however, I can only access some of the files of each file type. The files that I cannot access give a 404 File Not Found. I am able to open all files via windows explorer;however, attempting to open them via a browser over http is hit and miss. Has anyone experienced this issue and know how to resolve it? Anyone have any idea why I could access some files but not others? Does anyone have any recommendations on what to look into to try this (i.e. does owner matter or something like that?)? EDIT: Here is the Request and Response Headers for a bad file: GET /files/file1.pdf HTTP/1.1 Accept: image/gif, image/jpeg, image/pjpeg, image/pjpeg, application/x-shockwave-flash, application/xaml+xml, application/vnd.ms-xpsdocument, application/x-ms-xbap, application/x-ms-application, application/x-silverlight, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, / Accept-Language: en-us User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.590; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) Accept-Encoding: gzip, deflate Proxy-Connection: Keep-Alive Host: my.domain.com HTTP/1.1 404 Not Found Content-Length: 1635 Content-Type: text/html Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET Date: Mon, 01 Jun 2009 15:38:54 GMT [typical 404 page markup excluded] Here is the Request/Response headers for the good file: GET /files/file2.pdf HTTP/1.1 Accept: image/gif, image/jpeg, image/pjpeg, image/pjpeg, application/x-shockwave-flash, application/xaml+xml, application/vnd.ms-xpsdocument, application/x-ms-xbap, application/x-ms-application, application/x-silverlight, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, / Accept-Language: en-us User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.590; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) Accept-Encoding: gzip, deflate Proxy-Connection: Keep-Alive Host: my.domain.com HTTP/1.1 200 OK Content-Length: 352464 Content-Type: application/pdf Last-Modified: Tue, 13 Jan 2009 15:27:35 GMT Accept-Ranges: bytes ETag: "74ccc5759375c91:2a47" Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET Date: Mon, 01 Jun 2009 15:50:33 GMT

Read the article
mysql 5.0.23 vs 5.5 performance benefits and upgrade issues ?

- by WarDoGG

I have been told that mysql 5.5 has a significant performanance boost compared to 5.0 Our server handles alot of data (around 30 million records processed per 5-10 seconds) and requires every drop of performance boost we can give. Will it be beneficial if we upgrade from 5.0.23 to mysql 5.5 ? Also, we have lots of database indexes setup on the tables and i've been told that sometimes the indexes become corrupt after a version upgrade and they have to be rebuilt. Is this true ?

Read the article
Adding FK Index to existing table in Merge Replication Topology

- by Refracted Paladin

I have a table that has grown quite large that we are replicating to about 120 subscribers. A FK on that table does not have an index and when I ran an Execution Plan on a query that was causing issues it had this to say -- /* Missing Index Details from CaseNotesTimeoutQuerys.sql - mylocal\sqlexpress.MATRIX (WWCARES\pschaller (54)) The Query Processor estimates that implementing the following index could improve the query cost by 99.5556%. */ /* USE [MATRIX] GO CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[tblCaseNotes] ([PersonID]) GO */ I would like to add this but I am afraid it will FORCE a reinitialization. Can anyone verify or validate my concerns? Does it even work that way or would I need to run the script on each subscriber? Any insight would be appreciated.

Read the article
Delay index build until SQL Server table load is complete with SSIS

- by Mattew

I have a large table that I am updating. Is it possible to disable index updates on the destination table until the load is complete? It seems like a waste for it to be constantly updating the index with each commit. I can just drop and recreate the index before and after the load, I just want to know if there is a quick way to configure that in the OLEDB or SQL Server destination. Server is Windows Server 2003 Datacenter Edition, running SQL Server 2008 Standard Edition with SSIS.

Read the article
How should I perform database maintenance on a 24x7 system

- by solublefish

I'm a software developer who inherited a part-time DBA role. I'm responsible for an application backed by a small, high-volume 24x7 database on SQL Server 2008. While there's other stuff in the DB, the critical piece is a 50GB, 7.5M row table that serves 100K requests/sec during peak load, and about half that at "night". This is 99%+ read traffic, but the writes are constant, and required. I need to be able to perform periodic maintenance without a maintenance window. Say an index rebuild, a job to purge old data, Windows Update, or hardware upgrade. Most of the advice I've seen is along the lines of "MAKE a maintenance window." While I appreciate the sentiment, I hope there's another way. If it will solve this problem, I do have the ability to purchase new hardware or modify the database, the clients (a set of web services servers), and much of the application code (ADO.NET + ASP.NET). I've been thinking along the lines of using the warm spare (or a 3rd server) to do the maintenance, and then "swap" it into production. 1 Synchronize the spare by restoring backups, including a current transaction log. 2 Perform the maintenance tasks. 3 Reconfigure clients to connect to the spare server. Existing connections are finished within a minute or so. 4 The spare server is now the production server. The problem remaining is that the new production server is now out of date by however long it took to perform maintenance. Is there some way that the original production server can be made to queue up changes and merge them to the spare between steps 2 and 3? Any other ideas?

Read the article
OpenDB Alternative

- by shaiss

Although a good tool OpenDB hasn't been updated since 12/2008. The source code shows some recent activity, but still sparse at best. Anyone know of good web based cataloging alternatives? I'm looking to catalog the 500+ software CD's we have. Also, as a side note. OpenDB will handle software, but you have to create a custom media type, which is a PAIN!!!

Read the article
How do I catalog files on several external hard drives that I want to store off-line? OSX

- by raudi

My partner, an artist, has more than 10 external hardisks both USB and firewire and every 2-3 months a new one has to be added (She's working with videos and pictures) currently its 10TB and growing so too much for a affordable NAS. Right now the files are not indexed and I think can not be searched with spotlight because not all drives can be connected at the same time. So if she wants to search for a file, she has to guess which disk/disks (based mostly on the date) and then search several drives. Now I'm looking for a solution to index/catalog the drives, something like GentibusCD Cathy Disclib (all these solutions are unfortunately Windows only) Is there any software for OSX that will catalog all the hard drives, so she can search the catalog, find the files, and get the ID of the disk / disk name that has the content? Preferably something with a GUI so my partner can also use it easily Preferably with Thumbnails for pictures/videos (But even an equivalent to "tree /F /A" would be better than nothing)

Read the article
How remove/de-index a page from Google?

- by Jason

On the results page when I Google "e-luminate", the 3rd and 4th link seems to point to specific directory deep within the folders which stores the images. How can I get rid of these 2 results from Google search results? How can I get Google to de-index it? I checked on the server and the folders did not seem different from other folders but these 2 paths seems to get indexed by Google. Thank you.

Read the article
mysql 5.0.23 vs 5.5 performance benefits and upgrade issues?

- by WarDoGG

I have been told that mysql 5.5 has a significant performance boost compared to 5.0 Our server handles a lot of data (around 30 million records processed per 5-10 seconds) and requires every drop of performance boost we can give. Will it be beneficial if we upgrade from 5.0.23 to mysql 5.5? Also, we have lots of database indexes setup on the tables and I've been told that sometimes the indexes become corrupt after a version upgrade and they have to be rebuilt. Is this true?

Read the article
Mysql: create index on 1.4 billion records

- by SiLent SoNG

I have a table with 1.4 billion records. The table structure is as follows: CREATE TABLE text_page ( text VARCHAR(255), page_id INT UNSIGNED ) ENGINE=MYISAM DEFAULT CHARSET=ascii The requirement is to create an index over the column text. The table size is about 34G. I have tried to create the index by the following statement: ALTER TABLE text_page ADD KEY ix_text (text) After 10 hours' waiting I finally give up this approach. Is there any workable solution on this problem? UPDATE: the table is unlikely to be updated or inserted or deleted. The reason why to create index on the column text is because this kind of sql query would be frequently executed: SELECT page_id FROM text_page WHERE text = ? UPDATE: I have solved the problem by partitioning the table. The table is partitioned into 40 pieces on column text. Then creating index on the table takes about 1 hours to complete. It seems that MySQL index creation becomes very slow when the table size becomes very big. And partitioning reduces the table into smaller trunks.

Read the article
Same index for all (sub-)directories?

- by whatisthis

Hi. I was wondering if it was possible to write some .htaccess page that makes the server use an index.php file in a SINGLE directory as the index file for every directory/sub-directory on my server, rather than placing the exact same index.php in 200+ directories. If my description isn't clear, what I essentially mean is: /files/index.php is to be used as the index for, for example, /files/morefiles, as well as the index for all directories and sub-directories within /files/, even though those directories would not have an index file themselves. Thanks to all in advance.

Read the article
Share files and catelog them across a server

- by Ultan

Hi all, here is what I am looking for, I use a number of software packages when creating online learning courses. I have a huge number of courses and they use text, audio files, images and video files. I am looking for somthing that I can install on our server and upload the content, index, tag and categorise and enter a description so that when a screen print has to be changed for example I can find all the places that I used the image so that I can go in and change the image without having to check every file that I have created over the last year. An open source solution would be great or something that isn't to expensive. Thank you for any suggestions you can make.

Read the article
Vectorizatoin of index operation for a scipy.sparse matrix

- by celil

The following code runs too slowly even though everything seems to be vectorized. from numpy import * from scipy.sparse import * n = 100000; i = xrange(n); j = xrange(n); data = ones(n); A=csr_matrix((data,(i,j))); x = A[i,j] The problem seems to be that the indexing operation is implemented as a python function, and invoking A[i,j] results in the following profiling output 500033 function calls in 8.718 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 100000 7.933 0.000 8.156 0.000 csr.py:265(_get_single_element) 1 0.271 0.271 8.705 8.705 csr.py:177(__getitem__) (...) Namely, the python function _get_single_element gets called 100000 times which is really inefficient. Why isn't this implemented in pure C? Does anybody know of a way of getting around this limitation, and speeding up the above code? Should I be using a different sparse matrix type?

Read the article
Mapping a child collection without indexing based on database primary key or using bag

- by Colin Bowern

I have a existing parent-child relationship I am trying to map in Fluent Nhibernate: [RatingCollection] -- [Rating] Rating Collection has: ID (database generated ID) Code Name Rating has: ID (database generated id) Rating Collection ID Code Name I have been trying to figure out which permutation of HasMany makes sense here. What I have right now: HasMany<Rating>(x => x.Ratings) .WithTableName("Rating") .KeyColumnNames.Add("RatingCollectionId") .Component(c => { c.Map(x => x.Code); c.Map(x => x.Name); ); It works from a CRUD perspective but because it's a bag it ends up deleting the rating contents any time I try to do a simple update / insert to the Ratings property. What I want is an indexed collection but not using the database generated ID (which is in the six digit range right now). Any thoughts on how I could get a zero-based indexed collection (so I can go entity.Ratings[0].Name = "foo") which would allow me to modify the collection without deleting/reinserting it all when persisting?

Read the article
Creating and Indexing Email Database using SqlCe

- by Anindya Chatterjee

I am creating a simple email client program. I am using MS SqlCe as a storage of emails. The database schema for storing the message is as follows: StorageId int IDENTITY NOT NULL PRIMARY KEY, FolderName nvarchar(255) NOT NULL, MessageId nvarchar(3999) NOT NULL, MessageDate datetime NOT NULL, StorageData ntext NULL In the StorageData field I am going to store the MIME message as byte array. But the problem arises when I am going to implement search on the stored messages. I have no idea how I am going to index the messages on top of this schema. Can anyone please help me in suggesting a good but simple schema, so that it will be effective in terms of storage space and search friendliness as well? Regards, Anindya Chatterjee

Read the article
For a primary key of an integral type, why is it important to avoid gaps ?

- by Jacques René Mesrine

I am generating a surrogate key for a table & due to my hi/lo algorithm, everytime you reboot/restart the machine, gaps may appear. T1: current hi = 10000000 (sequence being dished out .. 1 to 100) Assume that current sequence is 10000050 T2: restart system. T3: System gives out the next_hi as 10000100 (sequence being dished out now ranges from 101 to 200) T4: Next request for a key returns 100001001 From a primary key or indexing internals perspective, why is it important that there be no gaps in the sequences ? I'm asking this for a deeper understanding of mysql specifically. Thanks

Read the article

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >