Regex to extract portions of file name

Posted by jakesankey on Stack Overflow See other posts from Stack Overflow or by jakesankey
Published on 2010-05-30T06:02:24Z Indexed on 2010/05/30 7:52 UTC
Read the original article Hit count: 404

Filed under:
|
|

I have text files formatted as such:

R156484COMP_004A7001_20100104_065119.txt

I need to consistently extract the R****COMP, the 004A7001 number, 20100104 (date), and don't care about the 065119 number. the problem is that not ALL of the files being parsed have the exact naming convention. some may be like this:

R168166CRIT_156B2075_SU2_20091223_123456.txt

or

R285476COMP_SU1_125A6025_20100407_123456.txt

So how could I use regex instead of split to ensure I am always getting that serial (ex. 004A7001), the date (ex. 20100104), and the R****COMP (or CRIT)???

Here is what I do now but it only gets the files formatted like my first example.

if (file.Count(c => c == '_') != 3) continue;

and further down in the code I have:

                                string RNumber = Path.GetFileNameWithoutExtension(file);

                                string RNumberE = RNumber.Split('_')[0];

                                string RNumberD = RNumber.Split('_')[1];

                                string RNumberDate = RNumber.Split('_')[2];

                                DateTime dateTime = DateTime.ParseExact(RNumberDate, "yyyyMMdd", Thread.CurrentThread.CurrentCulture);
                                string cmmDate = dateTime.ToString("dd-MMM-yyyy");

UPDATE: This is now where I am at -- I get an error to parse RNumberDate to an actual date format. "Cannot implicitly convert type 'RegularExpressions.Match' to 'string'

string RNumber = Path.GetFileNameWithoutExtension(file);

                                 Match RNumberE = Regex.Match(RNumber, @"^(R|L)\d{6}(COMP|CRIT|TEST|SU[1-9])(?=_)", RegexOptions.IgnoreCase);

                                 Match RNumberD = Regex.Match(RNumber, @"(?<=_)\d{3}[A-Z]\d{4}(?=_)", RegexOptions.IgnoreCase);
                                 Match RNumberDate = Regex.Match(RNumber, @"(?<=_)\d{8}(?=_)", RegexOptions.IgnoreCase);



                                DateTime dateTime = DateTime.ParseExact(RNumberDate, "yyyyMMdd", Thread.CurrentThread.CurrentCulture);
                                string cmmDate = dateTime.ToString("dd-MMM-yyyy")

© Stack Overflow or respective owner

Related posts about c#

Related posts about regex