Cleaning a dataset of song data - what sort of problem is this?

Posted by Rob Lourens on Programmers See other posts from Programmers or by Rob Lourens
Published on 2013-11-05T06:28:04Z Indexed on 2013/11/05 10:11 UTC
Read the original article Hit count: 253

Filed under:
|

I have a set of data about songs. Each entry is a line of text which includes the artist name, song title, and some extra text. Some entries are only "extra text". My goal is to resolve as many of these as possible to songs on Spotify using their web API.

My strategy so far has been to search for the entry via the API - if there are no results, apply a transformation such as "remove all text between ( )" and search again. I have a list of heuristics and I've had reasonable success with this but as the code gets more and more convoluted I keep thinking there must be a more generic and consistent way. I don't know where to look - any suggestions for what to try, topics to study, buzzwords to google?

© Programmers or respective owner

Related posts about data

Related posts about data-mining