Text mining on large database (data mining)

Posted by yox on Stack Overflow See other posts from Stack Overflow or by yox
Published on 2010-04-13T22:16:15Z Indexed on 2010/04/13 22:23 UTC
Read the original article Hit count: 579

Hello,

I have a large database of resumes (CV), and a certain table skills grouping all users skills.

inside that table there's a field skill_text that describes the skill in full text.

I'm looking for an algorithm/software/method to extract significant terms/phrases from that table in order to build a new table with standarized skills..

Here are some examples skills extracted from the DB :

  • Sectoral and competitive analysis
  • Business Development (incl. in international settings)
  • Specific structure and road design software - Microstation, Macao, AutoCAD (basic knowledge)
  • Creative work (Photoshop, In-Design, Illustrator)
  • checking and reporting back on campaign progress
  • organising and attending events and exhibitions
  • Development : Aptana Studio, PHP, HTML, CSS, JavaScript, SQL, AJAX
  • Discipline: One to one marketing, E-marketing (SEO & SEA, display, emailing, affiliate program) Mix marketing, Viral Marketing, Social network marketing.

The output shoud be something like :

  • Sectoral and competitive analysis
  • Business Development
  • Specific structure and road design software -
  • Macao
  • AutoCAD
  • Photoshop
  • In-Design
  • Illustrator
  • organising events
  • Development
  • Aptana Studio
  • PHP
  • HTML
  • CSS
  • JavaScript
  • SQL
  • AJAX
  • Mix marketing
  • Viral Marketing
  • Social network marketing
  • emailing
  • SEO
  • One to one marketing

As you see only skills remains no other representation text.

I know this is possible using text mining technics but how to do it ? the database is realy large.. it's a good thing because we can calculate text frequency and decide if it's a real skill or just meaningless text... The big problem is .. how to determin that "blablabla" is a skill ?

thanks

© Stack Overflow or respective owner

Related posts about text-mining

Related posts about data-mining