Efficient data importing?

Posted by Kevin on Stack Overflow See other posts from Stack Overflow or by Kevin
Published on 2010-04-09T14:55:46Z Indexed on 2010/04/09 15:03 UTC
Read the original article Hit count: 376

We work with a lot of real estate, and while rearchitecting how the data is imported, I came across an interesting issue.

Firstly, the way our system works (loosely speaking) is we run a Coldfusion process once a day that retrieves data provided from an IDX vendor via FTP. They push the data to us. Whatever they send us is what we get.

Over the years, this has proven to be rather unstable.

I am rearchitecting it with PHP on the RETS standard, which uses SOAP methods of retrieving data, which is already proven to be much better than what we had.

When it comes to 'updating' existing data, my initial thought was to query only for data that was updated. There is a field for 'Modified' that tells you when a listing was last updated, and the code I have will grab any listing updated within the last 6 hours (give myself a window in case something goes wrong).

However, I see a lot of real estate developers suggest creating 'batch' processes that run through all listings regardless of updated status that is constantly running.

Is this the better way to do it? Or am I fine with just grabbing the data I know I need? It doesn't make a lot of sense to me to do more processing than necessary. Thoughts?

© Stack Overflow or respective owner

Related posts about importing

Related posts about realestate