Assistance with building an inverted-index

Posted by tipu on Stack Overflow See other posts from Stack Overflow or by tipu
Published on 2010-04-03T03:44:44Z Indexed on 2010/04/03 3:53 UTC
Read the original article Hit count: 510

It's part of an information retrieval thing I'm doing for school. The plan is to create a hashmap of words using the the first two letters of the word as a key and any words with the two letters saved as a string value. So,

hashmap["ba"] = "bad barley base"

Once I'm done tokenizing a line I take that hashmap, serialize it, and append it to the text file named after the key.

The idea is that if I take my data and spread it over hundreds of files I'll lessen the time it takes to fulfill a search by lessening the density of each file. The problem I am running into is when I'm making 100+ files in each run it happens to choke on creating a few files for whatever reason and so those entries are empty. Is there any way to make this more efficient? Is it worth continuing this, or should I abandon it?

I'd like to mention I'm using PHP. The two languages I know relatively intimately are PHP and Java. I chose PHP because the front end will be very simple to do and I will be able to add features like autocompletion/suggested search without a problem. I also see no benefit in using Java. Any help is appreciated, thanks.

© Stack Overflow or respective owner

Related posts about php

Related posts about inverted-index