matchcollection - Developer IT

Efficiently Combine MatchCollections in .Net Regex

- by Laramie

In the simplified example, there are 2 Regular Expressions, one case sensitive, the other not. The idea would be to efficiently create an IEnumerable collection (see "combined" below) combining the results. string test = "abcABC"; string regex = "(?<grpa>a)|(?<grpb>b)|(?<grpc>c)]"; Regex regNoCase = new Regex(regex, RegexOptions.IgnoreCase); Regex regCase = new Regex(regex); MatchCollection matchNoCase = regNoCase.Matches(test); MatchCollection matchCase = regCase.Matches(test); //Combine matchNoCase and matchCase into an IEnumerable IEnumerable<Match> combined= null; foreach (Match match in combined) { //Use the Index and (successful) Groups properties //of the match in another operation } In practice, the MatchCollections might contain thousands of results and be run frequently using long dynamically created REGEXes, so I'd like to shy away from copying the results to arrays, etc. I am still learning LINQ and am fuzzy on how to go about combining these or what the performance hits to an already sluggish process will be.

Read the article

i need some help with my vb.net codes..plzz

- by akmalizhar

currently i need to develop an application that can exctract information from few website.. this is what i have done up until now.. Imports System Imports System.Text.RegularExpressions Imports System.IO Imports System.Net Imports System.Web Imports System.Data.SqlClient Imports System.Threading Imports System.Data.DataSet Imports System.Data.OleDb Module module1 Dim url As String Dim hotelName As String = "" Sub Main() Dim url As String = "" Console.Write("enter url: ") url = Console.ReadLine() extractor(url) End Sub Public Sub extractor(ByVal url As String) Dim strConn As String = "Data Source = localhost; Initial Catalog = knowledgeBase; Integrated Security = True; Connection Timeout = 0;" Dim conn As SqlConnection = New SqlConnection(strConn) conn.Open() Dim strSQL1 As String Dim matchStn1 As String = "" Dim matchstn2 As String = "" Dim matchstn3 As String = "" Dim matchstn4 As String = "" Dim matchstn5 As String = "" Dim matchstn6 As String = "" Dim matchstn7 As String = "" Dim matchstn8 As String = "" Dim matchstn9 As String = "" Dim matchstn10 As String = "" Dim objRequest As WebRequest = HttpWebRequest.Create(url) Dim objResponse As WebResponse = objRequest.GetResponse() Dim objStreamReader As New StreamReader(objResponse.GetResponseStream()) Dim strpage As String = objStreamReader.ReadToEnd Dim RegExStr As String = "<[^>]*>" Dim R As New Regex(RegExStr) Dim sourcestring As String = strpage Dim re As Regex = New Regex("<h2 class=""name hotel""[^>]*>[\s\S]+?</h2>") Dim mc As MatchCollection = re.Matches(sourcestring) Dim mIdx As Integer = 0 For Each m As Match In mc For groupIdx As Integer = 0 To m.Groups.Count - 1 matchStn1 = m.Groups(groupIdx).Value matchStn1 = R.Replace(matchStn1, " ") matchStn1 = matchStn1.Trim() Next mIdx = mIdx + 1 Next Dim re9 As Regex = New Regex("<li class=""cuisine""[^>]*>[^>]+</li>") Dim mc9 As MatchCollection = re9.Matches(sourcestring) Dim mIdx9 As Integer = 0 For Each m As Match In mc9 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn9 = m.Groups(groupIdx).Value matchstn9 = R.Replace(matchstn9, " ") matchstn9 = matchstn9.Trim() Next mIdx = mIdx + 1 Next Dim re2 As Regex = New Regex("<span class=""street-address""[^>]*>[^>]+</span>") Dim mc2 As MatchCollection = re2.Matches(sourcestring) Dim mIdx2 As Integer = 0 For Each m As Match In mc2 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn2 = m.Groups(groupIdx).Value matchstn2 = R.Replace(matchstn2, " ") matchstn2 = matchstn2.Trim() Next mIdx2 = mIdx2 + 1 Next Dim re3 As Regex = New Regex("<span class=""locality""[^>]*>[\s\S]+?</span>") Dim mc3 As MatchCollection = re3.Matches(sourcestring) Dim mIdx3 As Integer = 0 For Each m As Match In mc3 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn3 = m.Groups(groupIdx).Value matchstn3 = R.Replace(matchstn3, " ") matchstn3 = matchstn3.Trim() Next mIdx3 = mIdx3 + 1 Next Dim re4 As Regex = New Regex("<span property=""v:postal-code""[^>]*>[\s\S]+?</span>") Dim mc4 As MatchCollection = re4.Matches(sourcestring) Dim mIdx4 As Integer = 0 For Each m As Match In mc4 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn4 = m.Groups(groupIdx).Value matchstn4 = R.Replace(matchstn4, " ") matchstn4 = matchstn4.Trim() Next mIdx4 = mIdx4 + 1 Next Dim re5 As Regex = New Regex("<span class=""country-name""[^>]*>[\s\S]+?</span>") Dim mc5 As MatchCollection = re5.Matches(sourcestring) Dim mIdx5 As Integer = 0 For Each m As Match In mc5 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn5 = m.Groups(groupIdx).Value matchstn5 = R.Replace(matchstn5, " ") matchstn5 = matchstn5.Trim() Next mIdx5 = mIdx5 + 1 Next Dim re10 As Regex = New Regex("<address class=""adr""[^>]*>[\s\S]+?</address>") Dim mc10 As MatchCollection = re10.Matches(sourcestring) Dim mIdx10 As Integer = 0 For Each m As Match In mc10 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn10 = m.Groups(groupIdx).Value matchstn10 = R.Replace(matchstn10, " ") matchstn10 = matchstn10.Trim() strSQL1 = "insert into infoRestaurant (nameRestaurant, cuisine, streetAddress, locality, postalCode, countryName, addressFull, tel, attractionType) values (N" & _ FormatSqlParam(matchStn1) & ",N" & _ FormatSqlParam(matchstn9) & ",N" & _ FormatSqlParam(matchstn2) & ",N" & _ FormatSqlParam(matchstn3) & ",N" & _ FormatSqlParam(matchstn4) & ",N" & _ FormatSqlParam(matchstn5) & ",N" & _ FormatSqlParam(matchstn10) & ",N" & _ FormatSqlParam(matchstn6) & ",N" & _ FormatSqlParam(matchstn7) & ")" Dim objCommand1 As New SqlCommand(strSQL1, conn) objCommand1.ExecuteNonQuery() Next mIdx4 = mIdx4 + 1 Next Dim re6 As Regex = New Regex("<span class=""tel""[^>]*>[\s\S]+?</span>") Dim mc6 As MatchCollection = re6.Matches(sourcestring) Dim mIdx6 As Integer = 0 For Each m As Match In mc6 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn6 = m.Groups(groupIdx).Value matchstn6 = R.Replace(matchstn6, " ") matchstn6 = matchstn6.Trim() Next mIdx6 = mIdx6 + 1 Next Dim re7 As Regex = New Regex("<div><b>Attraction type:[^>]*>[\s\S]+?</div>") Dim mc7 As MatchCollection = re7.Matches(sourcestring) Dim mIdx7 As Integer = 0 For Each m As Match In mc7 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn7 = m.Groups(groupIdx).Value matchstn7 = R.Replace(matchstn7, " ") matchstn7 = matchstn7.Trim() Next mIdx7 = mIdx7 + 1 Next Dim re8 As Regex = New Regex("(?=<p id).*(?<=</p>)") Dim mc8 As MatchCollection = re8.Matches(sourcestring) Dim mIdx8 As Integer = 0 For Each m As Match In mc8 For groupIdx As Integer = 0 To m.Groups.Count - 1 matchstn8 = m.Groups(groupIdx).Value matchstn8 = R.Replace(matchstn8, " ") matchstn8 = matchstn8.Trim() Dim strSQL2 As String = "insert into feedBackRestaurant (feedBackView) values(N" + FormatSqlParam(matchstn8) + ")" Dim objCommand2 As New SqlCommand(strSQL2, conn) objCommand2.ExecuteNonQuery() Next mIdx8 = mIdx8 + 1 Next objStreamReader.Close() conn.Close() End Sub Public Function FormatSqlParam(ByVal strParam As String) As String Dim newParamFormat As String If strParam = String.Empty Then newParamFormat = "'" & "NA" & "'" Else newParamFormat = strParam.Trim() newParamFormat = "'" & newParamFormat.Replace("'", "''") & "'" End If Return newParamFormat End Function End Module ---problems-- problem that i face are 1. the database foreign key is not working here..someone told me that need some codes to be added..but i dunno how. 2. the data repeats as i run the application. i guest it require update database function.but i hv no idea how. 3. i have to add in multithreading function as well..and last, how to make my application is flexible eventhough the HTML code changes..can anyone help me??plzzz website that i need to extract is http://www.tripadvisor.com/Tourism-g293951-Malaysia-Vacations.html i need the information about hotel, restaurant and attraction place..plzz..i need some help here..

Read the article

regex matches with intersection in C#

- by StuffHappens

Hello. I wonder if it is possible to get MatchCollection with all matches even if there's intersection among them. string input = "a a a"; Regex regex = new Regex("a a"); MatchCollection matches = regex.Matches(input); Console.WriteLine(matches.Count); This code return 1, but I want it to return 2. How do achive it? Thank you for your help.

Read the article

Generic extension method returning IEnumerable<T> without using reflection

- by roosteronacid

Consider this snippet of code: public static class MatchCollectionExtensions { public static IEnumerable<T> AsEnumerable<T>(this MatchCollection mc) { return new T[mc.Count]; } } And this class: public class Ingredient { public String Name { get; set; } } Is there any way to magically transform a MatchCollection object to a collection of Ingredient? The use-case would look something like this: var matches = new Regex("([a-z])+,?").Matches("tomato,potato,carrot"); var ingredients = matches.AsEnumerable<Ingredient>();

Read the article

GetDate in a string in c#

- by Doncho

Hi, I'm having some trouble using regular expression to get date in a string. Example : string text = "75000+ Sept.-Oct. 2004"; MatchCollection dates = Regex.Matches(text, @"[0-9]{5,}[+][\s](jan|feb|fev|mar|apr|avr|may|mai|jun|jui|jul|jui|aug|aoû|sept|oct|nov|dec)[\.][\-](jan|feb|fev|mar|apr|avr|may|mai|jun|jui|jul|jui|aug|aoû|sept|oct|nov|dec)[\.]\s[0-9]{4}", RegexOptions.IgnoreCase); This code is matching with my current string but i would like to get in my matchcollection, "Sept 2004" and "Oct 2004" in order to parse it in 2 datetime. If anyone have any idea, thanks a lot.

Read the article

Change English numbers to Persian and vice versa in MVC (httpmodule)?

- by Mohammad

I wanna change all English numbers to Persian for showing to users. and change them to English numbers again for giving all requests (Postbacks) e.g: we have something like this in view IRQ170, I wanna show IRQ??? to users and give IRQ170 from users. I know, I have to use Httpmodule, But I don't know how ? Could you please guide me? Edit : Let me describe more : I've written the following http module : using System; using System.Collections.Specialized; using System.Diagnostics; using System.IO; using System.Text; using System.Text.RegularExpressions; using System.Web; using System.Web.UI; using Smartiz.Common; namespace Smartiz.UI.Classes { public class PersianNumberModule : IHttpModule { private StreamWatcher _watcher; #region Implementation of IHttpModule /// <summary> /// Initializes a module and prepares it to handle requests. /// </summary> /// <param name="context">An <see cref="T:System.Web.HttpApplication"/> that provides access to the methods, properties, and events common to all application objects within an ASP.NET application </param> public void Init(HttpApplication context) { context.BeginRequest += ContextBeginRequest; context.EndRequest += ContextEndRequest; } /// <summary> /// Disposes of the resources (other than memory) used by the module that implements <see cref="T:System.Web.IHttpModule"/>. /// </summary> public void Dispose() { } #endregion private void ContextBeginRequest(object sender, EventArgs e) { HttpApplication context = sender as HttpApplication; if (context == null) return; _watcher = new StreamWatcher(context.Response.Filter); context.Response.Filter = _watcher; } private void ContextEndRequest(object sender, EventArgs e) { HttpApplication context = sender as HttpApplication; if (context == null) return; _watcher = new StreamWatcher(context.Response.Filter); context.Response.Filter = _watcher; } } public class StreamWatcher : Stream { private readonly Stream _stream; private readonly MemoryStream _memoryStream = new MemoryStream(); public StreamWatcher(Stream stream) { _stream = stream; } public override void Flush() { _stream.Flush(); } public override int Read(byte[] buffer, int offset, int count) { int bytesRead = _stream.Read(buffer, offset, count); string orgContent = Encoding.UTF8.GetString(buffer, offset, bytesRead); string newContent = orgContent.ToEnglishNumber(); int newByteCountLength = Encoding.UTF8.GetByteCount(newContent); Encoding.UTF8.GetBytes(newContent, 0, Encoding.UTF8.GetByteCount(newContent), buffer, 0); return newByteCountLength; } public override void Write(byte[] buffer, int offset, int count) { string strBuffer = Encoding.UTF8.GetString(buffer, offset, count); MatchCollection htmlAttributes = Regex.Matches(strBuffer, @"(\S+)=[""']?((?:.(?![""']?\s+(?:\S+)=|[>""']))+.)[""']?", RegexOptions.IgnoreCase | RegexOptions.Multiline); foreach (Match match in htmlAttributes) { strBuffer = strBuffer.Replace(match.Value, match.Value.ToEnglishNumber()); } MatchCollection scripts = Regex.Matches(strBuffer, "<script[^>]*>(.*?)</script>", RegexOptions.Singleline | RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace); foreach (Match match in scripts) { MatchCollection values = Regex.Matches(match.Value, @"([""'])(?:(?=(\\?))\2.)*?\1", RegexOptions.Singleline | RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace); foreach (Match stringValue in values) { strBuffer = strBuffer.Replace(stringValue.Value, stringValue.Value.ToEnglishNumber()); } } MatchCollection styles = Regex.Matches(strBuffer, "<style[^>]*>(.*?)</style>", RegexOptions.Singleline | RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace); foreach (Match match in styles) { strBuffer = strBuffer.Replace(match.Value, match.Value.ToEnglishNumber()); } byte[] data = Encoding.UTF8.GetBytes(strBuffer); _memoryStream.Write(data, offset, count); _stream.Write(data, offset, count); } public override string ToString() { return Encoding.UTF8.GetString(_memoryStream.ToArray()); } #region Rest of the overrides public override bool CanRead { get { throw new NotImplementedException(); } } public override bool CanSeek { get { throw new NotImplementedException(); } } public override bool CanWrite { get { throw new NotImplementedException(); } } public override long Seek(long offset, SeekOrigin origin) { throw new NotImplementedException(); } public override void SetLength(long value) { throw new NotImplementedException(); } public override long Length { get { throw new NotImplementedException(); } } public override long Position { get { throw new NotImplementedException(); } set { throw new NotImplementedException(); } } #endregion } } It works well, but It converts all numbers in css and scripts files to Persian and it causes error.

Read the article

c# Truncate HTML safely for article summary

- by WickedW

Hi All, Does anyone have a c# variation of this? This is so I can take some html and display it without breaking as a summary lead in to an article? http://stackoverflow.com/questions/1193500/php-truncate-html-ignoring-tags Save me from reinventing the wheel! Thank you very much ---------- edit ------------------ Sorry, new here, and your right, should have phrased the question better, heres a bit more info I wish to take a html string and truncate it to a set number of words (or even char length) so I can then show the start of it as a summary (which then leads to the main article). I wish to preserve the html so I can show the links etc in preview. The main issue I have to solve is the fact that we may well end up with unclosed html tags if we truncate in the middle of 1 or more tags! The idea I have for solution is to a) truncate the html to N words (words better but chars ok) first (be sure not to stop in the middle of a tag and truncate a require attribute) b) work through the opened html tags in this truncated string (maybe stick them on stack as I go?) c) then work through the closing tags and ensure they match the ones on stack as I pop them off? d) if any open tags left on stack after this, then write them to end of truncated string and html should be good to go!!!! -- edit 12112009 Here is what I have bumbled together so far as a unittest file in VS2008, this 'may' help someone in future My hack attempts based on Jan code are at top for char version + word version (DISCLAIMER: this is dirty rough code!! on my part) I assume working with 'well-formed' HTML in all cases (but not necessarily a full document with a root node as per XML version) Abels XML version is at bottom, but not yet got round to fully getting tests to run on this yet (plus need to understand the code) ... I will update when I get chance to refine having trouble with posting code? is there no upload facility on stack? Thanks for all comments :) using System; using System.Collections.Generic; using System.Text.RegularExpressions; using System.Xml; using System.Xml.XPath; using Microsoft.VisualStudio.TestTools.UnitTesting; namespace PINET40TestProject { [TestClass] public class UtilityUnitTest { public static string TruncateHTMLSafeishChar(string text, int charCount) { bool inTag = false; int cntr = 0; int cntrContent = 0; // loop through html, counting only viewable content foreach (Char c in text) { if (cntrContent == charCount) break; cntr++; if (c == '<') { inTag = true; continue; } if (c == '>') { inTag = false; continue; } if (!inTag) cntrContent++; } string substr = text.Substring(0, cntr); //search for nonclosed tags MatchCollection openedTags = new Regex("<[^/](.|\n)*?>").Matches(substr); MatchCollection closedTags = new Regex("<[/](.|\n)*?>").Matches(substr); // create stack Stack<string> opentagsStack = new Stack<string>(); Stack<string> closedtagsStack = new Stack<string>(); // to be honest, this seemed like a good idea then I got lost along the way // so logic is probably hanging by a thread!! foreach (Match tag in openedTags) { string openedtag = tag.Value.Substring(1, tag.Value.Length - 2); // strip any attributes, sure we can use regex for this! if (openedtag.IndexOf(" ") >= 0) { openedtag = openedtag.Substring(0, openedtag.IndexOf(" ")); } // ignore brs as self-closed if (openedtag.Trim() != "br") { opentagsStack.Push(openedtag); } } foreach (Match tag in closedTags) { string closedtag = tag.Value.Substring(2, tag.Value.Length - 3); closedtagsStack.Push(closedtag); } if (closedtagsStack.Count < opentagsStack.Count) { while (opentagsStack.Count > 0) { string tagstr = opentagsStack.Pop(); if (closedtagsStack.Count == 0 || tagstr != closedtagsStack.Peek()) { substr += "</" + tagstr + ">"; } else { closedtagsStack.Pop(); } } } return substr; } public static string TruncateHTMLSafeishWord(string text, int wordCount) { bool inTag = false; int cntr = 0; int cntrWords = 0; Char lastc = ' '; // loop through html, counting only viewable content foreach (Char c in text) { if (cntrWords == wordCount) break; cntr++; if (c == '<') { inTag = true; continue; } if (c == '>') { inTag = false; continue; } if (!inTag) { // do not count double spaces, and a space not in a tag counts as a word if (c == 32 && lastc != 32) cntrWords++; } } string substr = text.Substring(0, cntr) + " ..."; //search for nonclosed tags MatchCollection openedTags = new Regex("<[^/](.|\n)*?>").Matches(substr); MatchCollection closedTags = new Regex("<[/](.|\n)*?>").Matches(substr); // create stack Stack<string> opentagsStack = new Stack<string>(); Stack<string> closedtagsStack = new Stack<string>(); foreach (Match tag in openedTags) { string openedtag = tag.Value.Substring(1, tag.Value.Length - 2); // strip any attributes, sure we can use regex for this! if (openedtag.IndexOf(" ") >= 0) { openedtag = openedtag.Substring(0, openedtag.IndexOf(" ")); } // ignore brs as self-closed if (openedtag.Trim() != "br") { opentagsStack.Push(openedtag); } } foreach (Match tag in closedTags) { string closedtag = tag.Value.Substring(2, tag.Value.Length - 3); closedtagsStack.Push(closedtag); } if (closedtagsStack.Count < opentagsStack.Count) { while (opentagsStack.Count > 0) { string tagstr = opentagsStack.Pop(); if (closedtagsStack.Count == 0 || tagstr != closedtagsStack.Peek()) { substr += "</" + tagstr + ">"; } else { closedtagsStack.Pop(); } } } return substr; } public static string TruncateHTMLSafeishCharXML(string text, int charCount) { // your data, probably comes from somewhere, or as params to a methodint XmlDocument xml = new XmlDocument(); xml.LoadXml(text); // create a navigator, this is our primary tool XPathNavigator navigator = xml.CreateNavigator(); XPathNavigator breakPoint = null; // find the text node we need: while (navigator.MoveToFollowing(XPathNodeType.Text)) { string lastText = navigator.Value.Substring(0, Math.Min(charCount, navigator.Value.Length)); charCount -= navigator.Value.Length; if (charCount <= 0) { // truncate the last text. Here goes your "search word boundary" code: navigator.SetValue(lastText); breakPoint = navigator.Clone(); break; } } // first remove text nodes, because Microsoft unfortunately merges them without asking while (navigator.MoveToFollowing(XPathNodeType.Text)) { if (navigator.ComparePosition(breakPoint) == XmlNodeOrder.After) { navigator.DeleteSelf(); } } // moves to parent, then move the rest navigator.MoveTo(breakPoint); while (navigator.MoveToFollowing(XPathNodeType.Element)) { if (navigator.ComparePosition(breakPoint) == XmlNodeOrder.After) { navigator.DeleteSelf(); } } // moves to parent // then remove *all* empty nodes to clean up (not necessary): // TODO, add empty elements like <br />, <img /> as exclusion navigator.MoveToRoot(); while (navigator.MoveToFollowing(XPathNodeType.Element)) { while (!navigator.HasChildren && (navigator.Value ?? "").Trim() == "") { navigator.DeleteSelf(); } } // moves to parent navigator.MoveToRoot(); return navigator.InnerXml; } [TestMethod] public void TestTruncateHTMLSafeish() { // Case where we just make it to start of HREF (so effectively an empty link) // 'simple' nested none attributed tags Assert.AreEqual(@"<h1>1234</h1><b><i>56789</i>012</b>", TruncateHTMLSafeishChar( @"<h1>1234</h1><b><i>56789</i>012345</b>", 12)); // In middle of a! Assert.AreEqual(@"<h1>1234</h1><a href=""testurl""><b>567</b></a>", TruncateHTMLSafeishChar( @"<h1>1234</h1><a href=""testurl""><b>5678</b></a><i><strong>some italic nested in string</strong></i>", 7)); // more Assert.AreEqual(@"<div><b><i><strong>1</strong></i></b></div>", TruncateHTMLSafeishChar( @"<div><b><i><strong>12</strong></i></b></div>", 1)); // br Assert.AreEqual(@"<h1>1 3 5</h1><br />6", TruncateHTMLSafeishChar( @"<h1>1 3 5</h1><br />678<br />", 6)); } [TestMethod] public void TestTruncateHTMLSafeishWord() { // zero case Assert.AreEqual(@" ...", TruncateHTMLSafeishWord( @"", 5)); // 'simple' nested none attributed tags Assert.AreEqual(@"<h1>one two <br /></h1><b><i>three ...</i></b>", TruncateHTMLSafeishWord( @"<h1>one two <br /></h1><b><i>three </i>four</b>", 3), "we have added ' ...' to end of summary"); // In middle of a! Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four ...</b></a>", TruncateHTMLSafeishWord( @"<h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four five </b></a><i><strong>some italic nested in string</strong></i>", 4)); // start of h1 Assert.AreEqual(@"<h1>one two three ...</h1>", TruncateHTMLSafeishWord( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 3)); // more than words available Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i> ...", TruncateHTMLSafeishWord( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 99)); } [TestMethod] public void TestTruncateHTMLSafeishWordXML() { // zero case Assert.AreEqual(@" ...", TruncateHTMLSafeishWord( @"", 5)); // 'simple' nested none attributed tags string output = TruncateHTMLSafeishCharXML( @"<body><h1>one two </h1><b><i>three </i>four</b></body>", 13); Assert.AreEqual(@"<body>\r\n <h1>one two </h1>\r\n <b>\r\n <i>three</i>\r\n </b>\r\n</body>", output, "XML version, no ... yet and addeds '\r\n + spaces?' to format document"); // In middle of a! Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four ...</b></a>", TruncateHTMLSafeishCharXML( @"<body><h1>one two three </h1><a href=""testurl""><b class=""mrclass"">four five </b></a><i><strong>some italic nested in string</strong></i></body>", 4)); // start of h1 Assert.AreEqual(@"<h1>one two three ...</h1>", TruncateHTMLSafeishCharXML( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 3)); // more than words available Assert.AreEqual(@"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i> ...", TruncateHTMLSafeishCharXML( @"<h1>one two three </h1><a href=""testurl""><b>four five </b></a><i><strong>some italic nested in string</strong></i>", 99)); } } }

Read the article

How to prevent regex from stopping at the first match of alternatives ?

- by miket2e

If I have the string hello world , how can I modify the regex world|wo|w so that it will match all of "world", "wo" and "w" rather than just the single first match of "world" that it comes to ? If this is not possible directly, is there a good workaround ? I'm using C# if it makes a difference: Regex testRegex = new Regex("world|wo|w"); MatchCollection theMatches = testRegex.Matches("hello world"); foreach (Match thisMatch in theMatches) { ... }

Read the article

A regular expression that will allow a string with only one Capital Letter

- by Phoenix

The string should be 6 - 20 characters in length. And it should contain 1 Capital letter. I can do this in code using C# string st = "SomeString" Regex rg = new Regex("[A-Z]"); MatchCollection mc = rg.Matches(st); Console.WriteLine("Total Capital Letters: " + mc.Count); if (mc.Count > 1) { return false; } But what i really want is a Regular expression that will match my string if it only contains one capital. The string can start with a common letter and should have only letters. Thanks In advance. (I did look at some of the other RegEx questions but they did not help).

Read the article

Text Decoding Problem

- by Jason Miesionczek

So given this input string: =?ISO-8859-1?Q?TEST=2C_This_Is_A_Test_of_Some_Encoding=AE?= And this function: private string DecodeSubject(string input) { StringBuilder sb = new StringBuilder(); MatchCollection matches = Regex.Matches(inputText.Text, @"=\?(?<encoding>[\S]+)\?.\?(?<data>[\S]+[=]*)\?="); foreach (Match m in matches) { string encoding = m.Groups["encoding"].Value; string data = m.Groups["data"].Value; Encoding enc = Encoding.GetEncoding(encoding.ToLower()); if (enc == Encoding.UTF8) { byte[] d = Convert.FromBase64String(data); sb.Append(Encoding.ASCII.GetString(d)); } else { byte[] bytes = Encoding.Default.GetBytes(data); string decoded = enc.GetString(bytes); sb.Append(decoded); } } return sb.ToString(); } The result is the same as the data extracted from the input string. What am i doing wrong that this text is not getting decoded properly?

Read the article

[C#, Regex] EOL Special Char not matching

- by Aurélien Ribon

Hello, I am trying to find every "a - b, c, d" pattern in an input string. The pattern I am using is the following : "^[ \t]*(\\w+)[ \t]*->[ \t]*(\\w+)(,[ \t]*\\w+)*$" The input is : a -> b b -> c c -> d The function is : private void ParseAndBuildGraph(String input) { MatchCollection mc = Regex.Matches(input, "^[ \t]*(\\w+)[ \t]*->[ \t]*(\\w+)(,[ \t]*\\w+)*$", RegexOptions.Multiline); foreach (Match m in mc) { Debug.WriteLine(m.Value); } } The output is : c -> d Actually, there is a problem with the line ending "$" special char. If I insert a "\r" before "$", it works, but I thought "$" would match any line termination (with the Multiline option), especially a \r\n in a Windows environment. Is it not the case ?

Read the article

Regex - replace only last part of an expression

I'm attempting to find the best methodology for finding a specific pattern and then replace the ending portion of the pattern. Here is a quick example (in C#): //Find any year value starting with a bracket or underscore string patternToFind = "[[_]2007"; Regex yearFind = new Regex(patternToFind); //I want to change any of these values to x2008 where x is the bracket or underscore originally in the text. I was trying to use Regex.Replace(), but cannot figure out if it can be applied. If all else fails, I can find Matches using the MatchCollection and then switch out the 2007 value with 2008; however, I'm hoping for something more elegant MatchCollections matches = yearFind.Matches(" 2007 [2007 _2007"); foreach (Match match in matches){ //use match to find and replace value }

Read the article

Regular Expression Help in .NET

- by Matt H.

I have a simple pattern I am trying to match, any characters captured between parenthesis at the end of an HTML paragraph. I am running into trouble any time there is additional parentheticals in that paragraph: i.e. If the input string is "..... (321)</p" i want to get the value (321) However, if the paragraph has this text: "... (123) (321)</p" my regex is returning "(123) (321)" (everything between the opening "(" and closing ")" I am using the regex pattern "\s(.+)</p" How can I grab the correct value (using VB.NET) This is what I'm doing so far: Dim reg As New Regex("\s$.+$</P>", RegexOptions.IgnoreCase) Dim matchC As MatchCollection = reg.Matches(su.Question) If matchC.Count > 0 Then Dim lastMatch As Match = matchC(matchC.Count - 1) Dim DesiredValue As String = lastMatch.Value End If

Read the article

Regex problem ...

- by carlos

hello i have the next regex Dim origen As String = " /c /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt"" /cid:45 423 /z:65 /a:23 /m:39 /t:45rt " Dim str As String = "(^|\s)/p:""\w:(\\(\w+[\s]*\w+)+)+\\\w+.\w+""(\s|$)" Dim ar As Integer Dim getfile As New Regex(str) Dim mgetfile As MatchCollection = getfile.Matches(origen) ar = mgetfile.Count When i evaluate this it works, and gets the /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt"" that basically is the path to a file. But if I change the origen string to Dim origen As String = " /c /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt""/cid:45 423 /z:65 /a:23 /m:39 /t:45rt " Check that the end of the file is follow by "/cid:45" whitchs makes de patter invalid, but insted of getting a mgetfile.count = 0 the program is block, if i make a debug I got a property evaluation failed. Thanks for your comments !!!

Read the article

RegEx replace query to pick out wiki syntax

- by Jeremy Thake

I've got a string of HTML that I need to grab the "[Title|http://www.test.com]" pattern out of e.g. "dafasdfasdf, adfasd. [Test|http://www.test.com/] adf ddasfasdf [SDAF|http://www.madee.com/] assg ad" I need to replace "[Title|http://www.test.com]" this with "Title". What is the best away to approach this? I was getting close with: string test = "dafasdfasdf adfasd [Test|http://www.test.com/] adf ddasfasdf [SDAF|http://www.madee.com/] assg ad "; string p18 = @"(\[.*?|.*?\])"; MatchCollection mc18 = Regex.Matches(test, p18, RegexOptions.Singleline | RegexOptions.IgnoreCase); foreach (Match m in mc18) { string value = m.Groups[1].Value; string fulltag = value.Substring(value.IndexOf("["), value.Length - value.IndexOf("[")); Console.WriteLine("text=" + fulltag); } There must be a cleaner way of getting the two values out e.g. the "Title" bit and the url itself. Any suggestions?

Read the article

Extract and replace named group regex

- by user557670

I was able to extract href value of anchors in an html string. Now, what I want to achieve is extract the href value and replace this value with a new GUID. I need to return both the replaced html string and list of extracted href value and it's corresponding GUID. Thanks in advance. My existing code is like: Dim sPattern As String = "<a[^>]*href\s*=\s*((\""(?<URL>[^\""]*)\"")|(\'(?<URL>[^\']*)\')|(?<URL>[^\s]* ))" Dim matches As MatchCollection = Regex.Matches(html, sPattern, RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace) If Not IsNothing(matches) AndAlso matches.Count > 0 Then Dim urls As List(Of String) = New List(Of String) For Each m As Match In matches urls.Add(m.Groups("URL").Value) Next End If Sample HTML string: <html><body><a title="http://www.google.com" href="http://www.google.com">http://www.google.com</a><br /><a href="http://www.yahoo.com">http://www.yahoo.com</a><br /><a title="http://www.apple.com" href="http://www.apple.com">Apple</a></body></html>

Read the article

Why is my regex so much slower compiled than interpreted ?

- by miket2e

I have a large and complex C# regex that runs OK when interpreted, but is a bit slow. I'm trying to speed this up by setting RegexOptions.Compiled, and this seems to take about 30 seconds for the first time and instantly after that. I'm trying to negate this by compiling the regex to an assembly first, so my app can be as fast as possible. My problem is when the compiling delay takes place: Regex myComplexRegex = new Regex(regexText, RegexOptions.Compiled); MatchCollection matches = myComplexRegex.Matches(searchText); foreach (Match match in matches) // <--- when the one-time long delay kicks in { } This is making compiling to an assembly basically useless, as I still get the delay on the first foreach call. What I want is for all the compiling delay to be done in advance when I compile to the assembly, not when the user runs the app. Where am I going wrong ? (The code I'm using to compile to an assembly is similar to http://www.dijksterhuis.org/regular-expressions-advanced/ , if that's relevant ).

Read the article

Problem with logicalOR ( regex ) not greedy

- by Marin Doric

This is the part of a string "21xy5". I want to insert " * " surrounded with whitespace between: digit and letter, letter and digit, letter and letter. I use this regex pattern "\d[a-z]|[a-z]\d|[a-z][a-z]" to find indexs where I gona insert string " * ". Problem is that when regex OR(|) in string 21xy5 trays to match 21-x|x-y|y-5, when first condition 21-x success, second x-y is not checked, and third success. So I have 21 * xy * 5 instead 21 * x * y * 5. If input string is like this xy21, then x-y success and then I have x * y21. Problem is that logical OR is not greedy. Regex reg = new Regex(@"\d[a-z]|[a-z]\d|[a-z][a-z]" ); MatchCollection matchC; matchC = reg.Matches(input); int ii = 1; foreach (Match element in matchC) { input = input.Insert(element.Index + ii, " * "); ii += 3; } return input;

Read the article

Regex: markdown-style link matching

- by The.Anti.9

I want to parse markdown style links, but I'm having some trouble matching the reference style ones. Like this one: [id]: http://example.com/ "Optional Title Here" My regex gets the id and the url, but not the title. Heres what I have: /\[([a-zA-Z0-9_-]+)\]: (\S+)\s?("".*?"")?/ I go through and add the references to a hashtable. the id as the key and the value is an instance of a class I made called LinkReference that just contains the url and the title. In case the problem is not my regex, and my code adding the matches to the hash table, Heres my code for that too: Regex rx = new Regex(@"\[([a-zA-Z0-9_-]+)\]: (\S+)\s?("".*?"")?"); MatchCollection matches = rx.Matches(InputText); foreach (Match match in matches) { GroupCollection groups = match.Groups; string title = null; try { title = groups[3].Value; } catch (Exception) { // keep title null } LinkReferences.Add(groups[1].Value, new LinkReference(groups[2].Value, title)); }

Read the article

How does MatchEvaluator works? ( C# regex replace)

- by Marin Doric

This is the input string 23x * y34x2. I want to insert " * " (star surrounded by whitespaces) after every number followed by letter, and after every letter followed by number. So my input string would look like this: 23 * x * y * 34 * x * 2. This is the regex that does the job: @"\d(?=[a-z])|a-z". This is the function that I wrote that inserts the " * ". Regex reg = new Regex(@"\d(?=[a-z])|[a-z](?=\d)"); MatchCollection matchC; matchC = reg.Matches(input); int ii = 1; foreach (Match element in matchC)//foreach match I will find the index of that match { input = input.Insert(element.Index + ii, " * ");//since I' am inserting " * " ( 3 characters ) ii += 3; //I must increment index by 3 } return input; //return modified input My question how to do same job using .net MatchEvaluator? I'am new to regex and don't understand good replacing with MatchEvaluator. This is the code that I tried to wrote: Regex reg = new Regex(@"\d(?=[a-z])|[a-z](?=\d)"); MatchEvaluator matchEval = new MatchEvaluator(ReplaceStar); input = reg.Replace(input, matchEval); return input; } public string ReplaceStar( Match match ) { //return What?? }

Read the article

EOL Special Char not matching

- by Aurélien Ribon

Hello, I am trying to find every "a - b, c, d" pattern in an input string. The pattern I am using is the following : "^[ \t]*(\\w+)[ \t]*->[ \t]*(\\w+)((?:,[ \t]*\\w+)*)$" This pattern is a C# pattern, the "\t" refers to a tabulation (its a single escaped litteral, intepreted by the .NET String API), the "\w" refers to the well know regex litteral predefined class, double escaped to be interpreted as a "\w" by the .NET STring API, and then as a "WORD CLASS" by the .NET Regex API. The input is : a -> b b -> c c -> d The function is : private void ParseAndBuildGraph(String input) { MatchCollection mc = Regex.Matches(input, "^[ \t]*(\\w+)[ \t]*->[ \t]*(\\w+)((?:,[ \t]*\\w+)*)$", RegexOptions.Multiline); foreach (Match m in mc) { Debug.WriteLine(m.Value); } } The output is : c -> d Actually, there is a problem with the line ending "$" special char. If I insert a "\r" before "$", it works, but I thought "$" would match any line termination (with the Multiline option), especially a \r\n in a Windows environment. Is it not the case ?

Read the article

C# Regex - Match and replace, Auto Increment

- by Marc Still

I have been toiling with a problem and any help would be appreciated. Problem: I have a paragraph and I want to replace a variable which appears several times (Variable = @Variable). This is the easy part, but the portion which I am having difficulty is trying to replace the variable with different values. I need for each occurrence to have a different value. For instance, I have a function that does a calculation for each variable. What I have thus far is below: private string SetVariables(string input, string pattern){ Regex rx = new Regex(pattern); MatchCollection matches = rx.Matches(input); int i = 1; if(matches.Count > 0) { foreach(Match match in matches) { rx.Replace(match.ToString(), getReplacementNumber(i)); i++ } } I am able to replace each variable that I need to with the number returned from getReplacementNumber(i) function, but how to I put it back into my original input with the replaced values, in the same order found in the match collection? Thanks in advance! Marcus

Read the article

How to remove words based on a word count

- by Chris

Here is what I'm trying to accomplish. I have an object coming back from the database with a string description. This description can be up to 1000 characters long, but we only want to display a short view of this. So I coded up the following, but I'm having trouble in actually removing the number of words after the regular expression finds the total count of words. Does anyone have good way of dispalying the words which are less than the Regex.Matches? Thanks! if (!string.IsNullOrEmpty(myObject.Description)) { string original = myObject.Description; MatchCollection wordColl = Regex.Matches(original, @"[\S]+"); if (wordColl.Count < 70) // 70 words? { uxDescriptionDisplay.Text = string.Format("<p>{0}</p>", myObject.Description); } else { string shortendText = original.Remove(200); // 200 characters? uxDescriptionDisplay.Text = string.Format("<p>{0}</p>", shortendText); } }

Read the article

C# : Regular Expression

- by Pramodh

I'm having a set of row data as follows List<String> l_lstRowData = new List<string> { "Data 1 32:01805043*0FFFFFFF", "Data 3, 20.0e-3", "Data 2, 1.0e-3 172:?:CRC" , "Data 6" }; and two List namely "KeyList" and "ValueList" like List<string> KeyList = new List<string>(); List<string> ValueList = new List<string>(); I need to fill the two List<String> from the data from l_lstRowData using Pattern Matching And here is my Pattern for this String l_strPattern = @"(?<KEY>(Data|data|DATA)\s[0-9]*[,]?[ ][0-9e.-]*)[ \t\r\n]*(?<Value>[0-9A-Za-z:?*!. \t\r\n\-]*)"; Regex CompiledPattern=new Regex(l_strPattern,RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace); So finally the two Lists will contain KeyList { "Data 1" } { "Data 3, 20.0e-3" } { "Data 2, 1.0e-3" } { "Data 6" } ValueList { "32:01805043*0FFFFFFF" } { "" } { "172:?:CRC" } { "" } Scenerio: The Group KEY in the Pattern Should match "The data followed by an integer value , and the if there exist a comma(,) then the next string i.e a double value The Group Value in the Pattern should match string after the whitespace.In the first string it should match 32:01805043*0FFFFFFF but in the 3rd 172:?:CRC. Here is my sample code for (int i = 0; i < l_lstRowData.Count; i++) { MatchCollection M = CompiledPattern.Matches(l_lstRowData[i], 0); KeyList.Add(M[0].Groups["KEY"].Value); ValueList.Add(M[0].Groups["Value"].Value); } But my Pattern is not working in this situation. Please help me to rewrite my Pattern.

Read the article

Simple line matching using Regex

- by Joan Venge

I have this string stream: "do=whoposted&t=1934067" rel=nofollow>61</A></TD><TD class=alt2 align=middle>5,286</TD></TR><TR><TD id=td_threadstatusicon_1911046 class=alt1><IMG id=thread_statusicon_1911046 border=0 alt="" src="http://url.com/forum/images/statusicon/thread_new.gif"> </TD><TD class=alt2><IMG title=Node border=0 alt=Node src="http://url.com/forum/images/icons/new.png"></TD><TD id=td_threadtitle_1911046 class=alt1 title="http://lulzimg.com/i14/7bd11b.jpg 
 
Complete name : cool-thread...."><DIV><A id=thread_gotonew_1911046 href="http://url.com/forum/f80/cool-topic-new/"><IMG class=inlineimg title="Go to first new post" border=0 alt="Go to first new post" src="http://url.com/forum/images/buttons/firstnew.gif"></A> [MULTI] <A style="FONT-WEIGHT: bold" id=thread_title_1911046 href="http://url.com/forum/f80/cool-topic-name-1911046/">Cool Topic Name</A> </DIV><DIV class=smallfont><SPAN style="CURSOR: pointer" onclick="window.open('http://url.com/forum/members/u2031889/', '_self')">m3no</SPAN> </DIV></TD><TD class=alt2 title="Replies: 11, Views: 1,554"><DIV style="TEXT-ALIGN: right; WHITE-SPACE: nowrap" class=smallfont>Today <SPAN class=time>08:04 AM</SPAN><BR>by <A href="http://url.com/forum/members/u1131830/" rel=nofollow>karetsos</A> <A " The lines I am interested are similar to this: <A style="FONT-WEIGHT: bold" id=thread_title_1911046 href="http://url.com/forum/f80/cool-topic-name-1911046/">Cool Topic Name</A> From here all I am trying to extract are: Thread id: 1911046 (could be from either location in the string) Thread name: "Cool Topic Name" Thread link: "http://url.com/forum/f80/cool-topic-name-1911046/" Currently I use this: Regex pattern = new Regex ( "<A\\s+href=\"([^\"]*)\">([^\\x00]*?)\\s+id=thread_title_(\\S+)</A>" ); MatchCollection matches = pattern.Matches ( doc.ToString ( ) ); foreach ( Match match in matches ) { int id = Convert.ToInt32 ( match.Groups [ 1 ].Value ); string name = match.Groups [ 3 ].Value; string link = match.Groups [ 2 ].Value; ... } I would appreciate if someone can help me fix the pattern to match it. This used to work but it returns 0 matches.

Search Results

Search found 27 results on 2 pages for 'matchcollection'.

Page 1/2 | 1 2 | Next Page >

- by Laramie

- by akmalizhar

- by StuffHappens

- by roosteronacid

- by Doncho

- by Mohammad

- by WickedW

- by miket2e

- by Phoenix

- by Jason Miesionczek

- by Aurélien Ribon

- by Matt H.

- by carlos

- by Jeremy Thake

- by user557670

- by miket2e

- by Marin Doric

- by The.Anti.9

- by Marin Doric

- by Aurélien Ribon

- by Marc Still

- by Chris

- by Pramodh

- by Joan Venge

1 2 | Next Page >