Creating a music catalog and extracting first 30 seconds as soon as the first words are sung
- by Rad
I already read a question: Separation of singing voice from music.  I don’t need this complex audio processing. 
I only need some detection mechanism that would detect that there is some voice/vocal playing while the music is playing (or not playing)
I need to extract first 30 seconds when a vocalist starts singing along with full band music. See question 2 below.
I want to create a music catalog using ASP.NET MVC 2 and Silverlight clients and C#.NET 4.0 programming language that would be front store. 
On the backend I would also like to create a desktop WPF/Windows application to create the music catalog from already existing music files, most of which have metadata in them
ID3v1, ID3v2.3, ID3v2.4, iTunes MP4, WMA, Vorbis Comments and APE Tags etc.
I would possibly like to create a web service that would allow catalog contributors to upload a zipped album and trigger metadata extraction of music data and extraction of music segments as described below. I would be happy if I achieve no. 1 below.
Let's say I have 1000ths of songs in mp3 (or other formats) grouped in subfolders using some classification (Genre, Artists, Albums, Composers or other groupings).
I want to create tables in DB that would organize songs so they can be searched based on different criteria (year, length, above classification or by song title, description etc)
like what iTune store allows to their customers.
I want to extract metadata from various formats (I will try to get songs in mp3 format, but there may be other popular formats) and allow music 
Catalog manager person to add missing data from either desktop or web applications. He  or other contributors can upload zipped music via an HTML or Silverlight upload or WPF.
Can anybody suggest open source libraries, articles, code snippets that can do that in an automatic way using .NET and possibly SQL Server DB?
My main questions are these. This is an audio processing challenge. I want to extract 2 segments of music (questions 1 and 2):
 1. How to extract a music segment:  1-2 seconds before a vocal starts singing and up to 30 seconds from that point in time and 
2. Much more challenging is to find repeating segments (One would usually find or recognize the names of the songs and songs are usually known by these refrains. 
How would I go about creating a list of songs that go great together like what Genius from iTune does? Is there any characteristics of music that can be used to match songs?
The goal is for people quickly scan and recognize songs i.e. associate melody, words with a title/album so they can make intelligent decisions like buying a song, create similar mood songs.