Search Results

Search found 4539 results on 182 pages for 'regex grouping'.

Page 173/182 | < Previous Page | 169 170 171 172 173 174 175 176 177 178 179 180  | Next Page >

  • PHP preg_replace - Don't match within h1 tags

    - by James
    Hi there. I am using preg_replace to add a link to keywords if they are found within a long HTML string. I don't want to add a link if the keyword is found within h1 tags or strong tags. The below regex nearly works and basically says (I think): If the keyword is not immediately wrapped by either a h1 tag or a strong tag then replace with the keyword that was matched, as a bolded link to google. $result = preg_replace('%(?!<h1>)(?!<strong>)\b(bobs widgets)\b(?!<\/strong>)(?!<\/h1>)%i','<a href="http://www.google.com"><strong>$1</strong></a>', $result, -1); (the reason I don't want to match if in strong tags is because I am recursing through a lot of keywords so don't want to link an already linked keyword on subsequent passes) the above works fine and won't match: <h1>bobs widgets</h1> It will however match the keyword in the following text, because the h1 tag isn't immediately either side of the keyword: <h1>Here are bobs widgets for sale</h1> I need to make the spaces either side optional and have tried adding \s* but that doesn't get me anywhere. I'd be very grateful for a push in the right direction here.

    Read the article

  • Read HTML file into email

    - by cast01
    Ive written a script to send html emails and all works well. I have the email stored in a separate HTML file which is then read in usin a while loop and fgets(). However, i want to be able to pass variables into the html. For example, in a html file i may have something like.. <body> Dear Name <br/> Thank you for your recent purhcase </body> and i read this into a string like so $file = fopen($filename, "r"); while(!feof($file)) { $html.= fgets($file, 4096); } fclose ($file); I want to be able to replace "Name" in the html file by a variable and im not entirely sure on the best way to do this. I could always make my own tag and then use regex to replace that with the name once ive read the file into the string, but im wondering if there is a better/easier method to do this. (On a side note, if anyone knows whether its better to use file_get_contents instead of multiple calls to fgets, id be interested to know)

    Read the article

  • Extract string between matching braces in Perl

    - by Srilesh
    My input file is as below : HEADER {ABC|*|DEF {GHI 0 1 0} {{Points {}}}} {ABC|*|DEF {GHI 0 2 0} {{Points {}}}} {ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}} {ABC|*|XYZ:ghi:jkl {JKL 0 372 0} {{Points {}}}} {ABC|*|XYZ:mno:pqr {GHI 0 34 0} {{Points {}}}} { ABC|*|XYZ:abc:pqr {GHI 0 68 0} {{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}} } TRAILER I want to extract the file into an array as below : $array[0] = "{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}" $array[1] = "{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}" $array[2] = "{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}" .. .. $array[5] = "{ ABC|*|XYZ:abc:pqr {GHI 0 68 0} {{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}} }" Which means, I need to match the first opening curly brace with its closing curly brace and extract the string in between. I have checked the below link, but this doesnt apply to my question. http://stackoverflow.com/questions/413071/regex-to-get-string-between-curly-braces-i-want-whats-between-the-curly-braces I am trying but would really help if someone can assist me with their expertise ... Thanks Sri ...

    Read the article

  • PHP and curl for fetching currency rate from Yahoo Finance

    - by gregj
    Hello, I wrote the following php snippet to fetch the currency conversion rate from Yahoo Finance.Im using curl to fetch the data. Suppose, i want to convert from US dollars (USD) to Indian National Rupee (INR),then the url is http://in.finance.yahoo.com/currency/convert?amt=1&from=USD&to=INR&submit= and the Indian Rupee value is shown as 45.225. However,if i run my code, the value im getting is 452.25. Why this discrepancy ? <?php $amount = $_GET['amount']; $from = $_GET['from']; $to = $_GET['to']; $url = "http://in.finance.yahoo.com/currency/convert?amt=".$amount."&from=".$from."&to=".$to; $handle = curl_init($url); curl_setopt ($handle, CURLOPT_RETURNTRANSFER, true); $data = curl_exec($handle); if(preg_match_all('/<td class="yfnc_tabledata1"><b>(?:[1-9]\d+|\d)(?:\.\d\d)?/',$data,$matches)) { print_r($matches[0][1]); } else { echo "Not found !"; } curl_close($handle); ?> Is there something wrong with my regex ? Please help. Thank You.

    Read the article

  • Generate a table of contents from HTML with Python

    - by Oli
    I'm trying to generate a table of contents from a block of HTML (not a complete file - just content) based on its <h2> and <h3> tags. My plan so far was to: Extract a list of headers using beautifulsoup Use a regex on the content to place anchor links before/inside the header tags (so the user can click on the table of contents) -- There might be a method for replacing inside beautifulsoup? Output a nested list of links to the headers in a predefined spot. It sounds easy when I say it like that, but it's proving to be a bit of a pain in the rear. Is there something out there that does all this for me in one go so I don't waste the next couple of hours reinventing the wheel? A example: <p>This is an introduction</p> <h2>This is a sub-header</h2> <p>...</p> <h3>This is a sub-sub-header</h3> <p>...</p> <h2>This is a sub-header</h2> <p>...</p>

    Read the article

  • How to prevent multiple registrations?

    - by GG.
    I develop a political survey website where anyone can vote once. Obviously I have to prevent multiple registrations for the survey remains relevant. Already I force every user to login with their Google, Facebook or Twitter account. But they can authenticate 3 times if they have an account on each, or authenticate with multiple accounts of the same platform (I have 3 accounts on Google). So I thought also store the IP address, but they can still go through a proxy... I thought also keep the HTTP User Agent with PHP's get_browser(), although they can still change browsers. I can extract the OS with a regex, to change OS is less easier than browsers. And there is also geolocation, for example with the Google Map API. So to summarize, several ideas: 1 / SSO Authentication (I keep the email) 2 / IP Address 3 / HTTP User Agent 4 / Geolocation with an API Have you any other ideas that I did not think? How to embed these tests? Execute in what order? Have you already deploy this kind of solution?

    Read the article

  • How to parse out base file name using Script-Fu

    - by ongle
    Using Gimp 2.6.6 for MAC OS X (under X11) as downloaded from gimp.org. I'm trying to automate a boring manual process with Script-Fu. I needed to parse the image file name to save off various layers as new files using a suffix on the original file name. My original attempts went like this but failed because (string-search ...) doesn't seem to be available under 2.6 (a change to the scripting engine?). (set! basefilename (substring filename 0 (string-search "." filename))) Then I tried to use this information to parse out the base file name using regex but (re-match-nth ...) is not recognized either. (if (re-match "^(.*)[.]([^.]+)$" filename buffer) (set! basefilename (re-match-nth orig-name buffer 1)) ) And while pulling the value out of the vector ran without error, the resulting value is not considered a string when it is passed into (string-append ...). (if (re-match "^(.*)[.]([^.]+)$" filename buffer) (set! basefilename (vector-ref buffer 1)) ) So I guess my question is, how would I parse out the base file name?

    Read the article

  • Build number increment not reflected in AssemblyVersion

    - by awshepard
    I've browsed through some of the discussion on auto-incrementing build numbers, but in the impatience of youth decided to roll my own and re-invent the wheel. I know there are probably better ways to go about this (which I'm definitely going to investigate), but my question centers more around the Assembly and/or Version classes. My approach was to write a separate exe (BuildIncrementer) that takes a command line parameter for file name, does a regex match on the contents to grab the [assembly: AssemblyVersion...] string, do the modifications that I want (increment the build number, etc.), then write the contents back to the file. This approach works as-is. The next thing I did was in the project that I wanted to use this on, I set up a pre-build command line that is simply the command to execute that BuildIncrementer.exe on this project's AssemblyInfo.cs file. This too works, updating the assembly info as desired. The problem comes when I run the project, it sends an email containing the current version, obtained with Assembly.GetExecutingAssembly().GetName().Version.ToString(). BUT, the version showing up is the previous version. When my AssemblyInfo.cs says [assembly: AssemblyVersion("1.0.2.49667")], I get sent 1.0.1.45660, which was the previous build. Anyone have any ideas why that might be?

    Read the article

  • What is the best way to deserialize generics written with a different version of a signed assembly?

    - by Rick Minerich
    In other cases it has been suggested that you simply add a SerializationBinder which removes the version from the assembly type. However, when using generic collections of a type found in a signed assembly, that type is strictly versioned based on its assembly. Here is what I've found works. internal class WeaklyNamedAssemblyBinder : SerializationBinder { public override Type BindToType(string assemblyName, string typeName) { ResolveEventHandler handler = new ResolveEventHandler(CurrentDomain_AssemblyResolve); AppDomain.CurrentDomain.AssemblyResolve += handler; Type returnedType; try { AssemblyName asmName = new AssemblyName(assemblyName); var assembly = Assembly.Load(asmName); returnedType = assembly.GetType(typeName); } catch { returnedType = null; } finally { AppDomain.CurrentDomain.AssemblyResolve -= handler; } return returnedType; } Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs args) { string truncatedAssemblyName = args.Name.Split(',')[0]; Assembly assembly = Assembly.Load(truncatedAssemblyName); return assembly; } } However, causing the binding process to change globally seems rather dangerous to me. Strange things could happen if serialization was happening in multiple threads. Perhaps a better solution is to do some regex manipulation of the typeName? What do you think?

    Read the article

  • Unable to render php files in browser

    - by p1
    Hello, I am very new to php, and I am trying to develop a facebook application using php. I am using Joyent as my hosting platform. Currently, I am trying to do some simple scripts in php and then build on them. However I am unable to see any php files being rendered properly in my application. For eg: I have a simple script called phpinfo.php: If I execute this on terminal like php phpinfo.php , I can see all the configurations. However if I try to access the same file as http://xxxxxx.facebook.joyent.us/phpinfo.php, I get : Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request. Even if I rename this file to index.php its still the same. However I am able to access other html files [index.html] on the same location . These are some of my php settings: These are some of the settings: [fbkusoni:~/web/public] aafhe7vh$ php phpinfo.php | grep On allow_url_fopen = On = On auto_globals_jit = On = On enable_dl = On = On file_uploads = On = On ignore_repeated_errors = On = On ignore_repeated_source = On = On implicit_flush = On = On log_errors = On = On register_argc_argv = On = On report_memleaks = On = On y2k_compliance = On = On Multibyte regex (oniguruma) backtrack check = On mysql.allow_persistent = On = On session.bug_compat_warn = On = On session.use_cookies = On = On suhosin.cookie.cryptdocroot = On = On suhosin.cookie.cryptua = On = On suhosin.mt_srand.ignore = On = On suhosin.protectkey = On = On suhosin.server.encode = On = On suhosin.server.strip = On = On suhosin.session.cryptdocroot = On = On suhosin.session.cryptua = On = On suhosin.session.encrypt = On = On suhosin.srand.ignore = On = On suhosin.stealth = On = On The answer might be very naive, but I am just trying to get started and looking for any suggestions regarding this and also using Joyent and cakephp to develop facebook applications. Thanks.

    Read the article

  • C++ Returning Multiple Items

    - by Travis Parks
    I am designing a class in C++ that extracts URLs from an HTML page. I am using Boost's Regex library to do the heavy lifting for me. I started designing a class and realized that I didn't want to tie down how the URLs are stored. One option would be to accept a std::vector<Url> by reference and just call push_back on it. I'd like to avoid forcing consumers of my class to use std::vector. So, I created a member template that took a destination iterator. It looks like this: template <typename TForwardIterator, typename TOutputIterator> TOutputIterator UrlExtractor::get_urls( TForwardIterator begin, TForwardIterator end, TOutputIterator dest); I feel like I am overcomplicating things. I like to write fairly generic code in C++, and I struggle to lock down my interfaces. But then I get into these predicaments where I am trying to templatize everything. At this point, someone reading the code doesn't realize that TForwardIterator is iterating over a std::string. In my particular situation, I am wondering if being this generic is a good thing. At what point do you start making code more explicit? Is there a standard approach to getting values out of a function generically?

    Read the article

  • How can I get all content within <td> tag using a HTML Agility Pack?

    - by Bob Dylan
    So I'm writing an application that will do a little screen scrapping. I'm using the HTML Agility Pack to load an entire HTML page into an instance of HtmlDocoument called doc. Now I want to parse that doc, looking for this: <table border="0" cellspacing="3"> <tr><td>First rows stuff</td></tr> <tr> <td> The data I want is in here <br /> and it's seperated by these annoying <br /> 's. No id's, classes, or even a single <p> tag. </p> Just a bunch of <br /> tags. </td> </tr> </table> So I just need to get the data within the 2nd row. How can I do this? Should I use a regex or something else?

    Read the article

  • Please take a stab at this VB.Net Oracle-related sample and help me with String.Format.

    - by Hamish Grubijan
    If the database is not Oracle, it is MS SQl 2008. My task: if Oracle, add two more parameters when calling a stored proc. Oracle and MSFT stored procs are generated; Oracle ones have 3 extra parameters: Vret_val out number, Vparam2 in out number, Vparam3 in out number, ... the rest (The are not actually named Vparam2 and Vparam3, but this should not matter). So, the code for a helper VB.Net class that calls a stored proc: Imports System.Data.Odbc Imports System.Configuration Dim objCon As OdbcConnection = Nothing Dim objAdapter As OdbcDataAdapter Dim cmdCommand As New OdbcCommand Dim objDataTable As DataTable Dim sconnection As String Try sconnection = mConnectionString objAdapter = New OdbcDataAdapter objCon = New OdbcConnection(sconnection) objCon.Open() objAdapter.SelectCommand = cmdCommand objAdapter.SelectCommand.Connection = objCon objAdapter.SelectCommand.CommandType = CommandType.StoredProcedure objAdapter.SelectCommand.CommandTimeout = Globals.mReportTimeOut If Not mIsOracle Then objAdapter.SelectCommand.CommandText = String.Format("{{call {0}}}", spName) Else Dim returnValue As New OdbcParameter returnValue.Direction = ParameterDirection.Output returnValue.ParameterName = "@Vret_val" returnValue.OdbcType = OdbcType.Numeric objAdapter.SelectCommand.Parameters.Add(returnValue) objAdapter.SelectCommand.CommandText = String.Format("{{call {0}(?)}}", spName) End If Try objDataTable = New DataTable(spName) objAdapter.Fill(objDataTable) Catch ex As Exception ... Question: I am puzzled as to what String.Format("{{call {0}(?)}}", spName) does, in particular the (?) part. My understanding of the String.Format is that it will simply replace {0} with spName. The {{, }}, and (?) do throw me off because { reminds me of formatting, (?) hints at some advanced regex use. Unfortunately I am getting little help from a key person who is on vacation without a leash [smart]phone. I am guessing that I simply add 5 more lines for each additional parameter, and change String.Format("{{call {0}(?)}}", spName) to String.Format("{{call {0}(?,?,?)}}", spName). I forgot to mention that I am coding this "blindly" - I have a compiler to help me, but no environment set up to test this. This will be over in a few days, but I need to do my best to try finishing it on time :) Thanks.

    Read the article

  • How to Use XSLT to Replace Coordinate Separator With List of Tuples?

    - by kuloch
    I have a space-separated list of coordinate tuples. Each tuple consists of a space-separated list of 2-dimensional coordinates. E.g. "1.1 2.8 1.2 2.9" represents a line from POINT(1.1 2.8) to POINT(1.2 2.9). I need this to instead be "1.1,2.8 1.2,2.9". How would I use XSLT to perform the replacement of space-to-comma between pairs of numbers? I have the "string(gml:LinearRing/gml:posList)". This is being used on a Java Web Service that spits out GML 3.1.1 features with geometries. The service supports optional KML output, by using XSLT to transform the GML document into a KML document (at least, the chunks deemed "important"). I am locked into XSLT 1.0, so regex from XSLT 2.0 is not an option. I am aware that GML uses lat/lon while KML uses lon/lat. That's being handled before XSLT, though it would be nice to have that also done with XSLT.

    Read the article

  • What can I do about ambigous wildcard patterns in Struts?

    - by Hanno Fietz
    I have a problem finding the right wildcard pattern to extract parts of my URL into action parameters in Struts. This is how I set up the action. The intent of the pattern is to capture the last two path elements and then everything that might precede them. <action name="**/*/*" class="com.example.ObjectAction"> <param name="filter">{1}</param> <param name="type">{2}</param> <param name="id">{3}</param> </action> Calling it with the URL channels/123/transmissions/456 I get the following result (the action just sets the input parameters on a POJO and returns that as XML): <result> <filter>channels/123/transmissions</filter> <id/> <type>456</type> </result> It should be: <result> <filter>channels/123</filter> <id>456</id> <type>transmissions</type> </result> Now, because ** matches all characters including the slash, I guess my pattern allows more than one way to match the URL, and Struts happens to pick one that leaves the id empty. Is the behaviour for multiple possible matches defined somewhere? Can I make the pattern less ambigous? Are there alternative ways of doing this? I'm running Struts 2.0.8. Upgrading to 2.1.9 would give me regex matching, but I got into trouble with Struts' dependencies and my OSGi environment when I went past 2.0.8, so I'd like to stick to that version for now.

    Read the article

  • Raising event in custom control.

    - by SeSSiZ
    Hi all, I'm writing a custom textblock control thats populate hyperlinks and raises event when clicked to hyperlink. I wrote this code but I got stucked. My code is : Imports System.Text.RegularExpressions Public Class CustomTextBlock Inherits TextBlock Public Event Klik As EventHandler(Of EventArgs) Public ReadOnly InlineCollectionProperty As DependencyProperty = DependencyProperty.Register("InlineCollection", GetType(String), GetType(CustomTextBlock), New PropertyMetadata(New PropertyChangedCallback(AddressOf CustomTextBlock.InlineChanged))) Private Shared Sub InlineChanged(ByVal sender As DependencyObject, ByVal e As DependencyPropertyChangedEventArgs) DirectCast(sender, CustomTextBlock).Inlines.Clear() Dim kelimeler = Split(e.NewValue, " ") For i = 0 To kelimeler.Length - 1 If Regex.Match(kelimeler(i), "(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?").Success Then Dim x = New Hyperlink(New Run(kelimeler(i))) x.AddHandler(Hyperlink.ClickEvent, New RoutedEventHandler(AddressOf t_Click)) x.ToolTip = kelimeler(i) x.Tag = kelimeler(i) DirectCast(sender, CustomTextBlock).Inlines.Add(x) If Not i = kelimeler.Length Then DirectCast(sender, CustomTextBlock).Inlines.Add(" ") Else DirectCast(sender, CustomTextBlock).Inlines.Add(kelimeler(i)) If Not i = kelimeler.Length Then DirectCast(sender, CustomTextBlock).Inlines.Add(" ") End If ''//Console.WriteLine(kelime(i).ToString.StartsWith("@")) Next kelimeler = Nothing End Sub Public Property InlineCollection As String Get Return DirectCast(GetValue(InlineCollectionProperty), String) End Get Set(ByVal value As String) SetValue(InlineCollectionProperty, value) End Set End Property Private Shared Sub t_Click(ByVal sender As Hyperlink, ByVal e As System.Windows.RoutedEventArgs) e.Handled = True RaiseEvent Klik(sender, EventArgs.Empty) End Sub End Class This code gives error at RaiseEvent Klik(sender, EventArgs.Empty) Error is : Cannot refer to an instance member of a class from within a shared method or shared member initializer without an expliticit instance of the class. Thanks for your answers, Alper

    Read the article

  • PHP: Best solution for links breaking in a mod_rewrite app

    - by psil
    I'm using mod rewrite to redirect all requests targeting non-existent files/directories to index.php?url=* This is surely the most common thing you do with mod_rewrite yet I have a problem: Naturally, if the page url is "mydomain.com/blog/view/1", the browser will look for images, stylesheets and relative links in the "virtual" directory "mydomain.com/blog/view/". Problem 1: Is using the base tag the best solution? I see that none of the PHP frameworks out there use the base tag, though. I'm currently having a regex replace all the relative links to point to the right path before output. Is that "okay"? Problem 2: It is possible that the server doesn't support mod_rewrite. However, all public files like images, stylesheets and the requests collector index.php are located in the directory /myapp/public. Normally mod_rewrite points all request to /public so it seems as if public was actually the root directory too all users. However if there is no mod_rewrite, I then have to point the users to /public from the root directory with a header() call. That means, however that all links are broken again because suddenly all images, etc. have to be called via /public/myimage.jpg Additional info: When there is no mod_rewrite the above request would look like this: mydomain.com/public/index.php/blog/view/1 What would be the best solutions for both problems?

    Read the article

  • general learning methodology

    - by momo
    just wanted to hear on the different general learning paths people embark on when learning a new language/framework. the one i currently use, which is how i learned bash and am currently learning python, is: instant hacking tutorial (very short tutorial introducing the basic syntax, variable declaration, loops, data types, etc. and how they are generally used) in depth tutorial with good programming style and slightly topic-specific (e.g. Mark Pilgrim's Dive into Python), important topics for me personally are regex methods, file IO, and ways the different data types are utilized best (i wrote a very primitive bayesian spam filter using python's dictionaries to keep track of word occurrences) spaced-repition of syntax or short recipes (i use anki, with questions like 'create dictionary with filename and filesize metadata, human-readable' or simpler ones like 'match 0 - 3 occurences of the letter M in a string', or 'return/create an iterator from two sequences') the use of spaced-repitition has been invaluable, and i credit it with the ease that i can recall/create python algorithms. however, i've recently started looking into django, and i've found that spaced-repitition, at least in my case, doesn't work very well for learning a framework, it works best with short code recipes (either that or i should start looking into more basic django framework tutorials). the problem i'm encountering is that since framework programming is not only algorithms, but actually learning the API, which can be quite complex since you have to learn all the methods, modules, the places where they are stored, and the sequence of which things have to be done. for ex. in django to start a project that deals with polls (from the django tutorial), one has to create the project, edit the settings.py file, create the polls app, edit the models.py file (which requires knowing the classes that are present in the module models), edit the urls.py file, etc. i found that my spaced-repition method didn't work very well for this type of learning, so i wanted to ask you guys what method(s) you use for learning the different frameworks/APIs.

    Read the article

  • Project management: Implementing custom errors in VS compilation process

    - by David Lively
    Like many architects, I've developed coding standards through years of experience to which I expect my developers to adhere. This is especially a problem with the crowd that believes that three or four years of experience makes you a senior-level developer.Approaching this as a training and code review issue has generated limited success. So, I was thinking that it would be great to be able to add custom compile-time errors to the build process to more strictly enforce this and other guidelines. For instance, we use stored procedures for ALL database access, which provides procedure-level security, db encapsulation (table structure is hidden from the app), and other benefits. (Note: I am not interested in starting a debate about this.) Some developers prefer inline SQL or parametrized queries, and that's fine - on their own time and own projects. I'd like a way to add a compilation check that finds, say, anything that looks like string sql = "insert into some_table (col1,col2) values (@col1, @col2);" and generates an error or, in certain circumstances, a warning, with a message like Inline SQL and parametrized queries are not permitted. Or, if they use the var keyword var x = new MyClass(); Variable definitions must be explicitly typed. Do Visual Studio and MSBuild provide a way to add this functionality? I'm thinking that I could use a regular expression to find unacceptable code and generate the correct error, but I'm not sure what, from a performance standpoint, is the best way to to integrate this into the build process. We could add a pre- or post-build step to run a custom EXE, but how can I return line- and file-specifc errors? Also, I'd like this to run after compilation of each file, rather than post-link. Is a regex the best way to perform this type of pattern matching, or should I go crazy and run the code through a C# parser, which would allow node-level validation via the parse tree? I'd appreciate suggestions and tales of prior experience.

    Read the article

  • find word and score based on positions

    - by ryder1211212
    hey guys i have a textfile i have divided it into 4 parts. i want to search each part for the words that appear in each part and score that word exmaple welcome to the national basketball finals,the basketball teams here today have come a long way. without much delay lets play basketball. i will want to return national = 1 as it appears only in one part etc am working on determining text context using word position. am working with c# and not very good in text processing basically if a word appears in the 4 sections it scores 4 if a word appears in the 3 sections it scores 3 if a word appears in the 2 sections it scores 2 if a word appears in the 1 section it scores 1 thanks in advance so far i have this var s = "welcome to the national basketball finals,the basketball teams here today have come a long way. without much delay lets play basketball. "; var numberOfParts = 4; var eachPartLength = s.Length / numberOfParts; var parts = new List<string>(); var words = Regex.Split(s, @"\W").Where(w => w.Length > 0); // this splits all words, removes empty strings var wordsIndex = 0; for (int i = 0; i < numberOfParts; i++) { var sb = new StringBuilder(); while (sb.Length < eachPartLength && wordsIndex < words.Count()) { sb.AppendFormat("{0} ", words.ElementAt(wordsIndex)); wordsIndex++; } // here you have the part Response.Write("[{0}]"+ sb); parts.Add(sb.ToString()); var allwords = parts.SelectMany(p => p.Split(' ').Distinct()); var wordsInAllParts = allwords.Where(w => parts.All(p => p.Contains(w))).Distinct();

    Read the article

  • Best way to convert a Unicode URL to ASCII (UTF-8 percent-escaped) in Python?

    - by benhoyt
    I'm wondering what's the best way -- or if there's a simple way with the standard library -- to convert a URL with Unicode chars in the domain name and path to the equivalent ASCII URL, encoded with domain as IDNA and the path %-encoded, as per RFC 3986. I get from the user a URL in UTF-8. So if they've typed in http://?.ws/? I get 'http://\xe2\x9e\xa1.ws/\xe2\x99\xa5' in Python. And what I want out is the ASCII version: 'http://xn--hgi.ws/%E2%99%A5'. What I do at the moment is split the URL up into parts via a regex, and then manually IDNA-encode the domain, and separately encode the path and query string with different urllib.quote() calls. # url is UTF-8 here, eg: url = u'http://?.ws/?'.encode('utf-8') match = re.match(r'([a-z]{3,5})://(.+\.[a-z0-9]{1,6})' r'(:\d{1,5})?(/.*?)(\?.*)?$', url, flags=re.I) if not match: raise BadURLException(url) protocol, domain, port, path, query = match.groups() try: domain = unicode(domain, 'utf-8') except UnicodeDecodeError: return '' # bad UTF-8 chars in domain domain = domain.encode('idna') if port is None: port = '' path = urllib.quote(path) if query is None: query = '' else: query = urllib.quote(query, safe='=&?/') url = protocol + '://' + domain + port + path + query # url is ASCII here, eg: url = 'http://xn--hgi.ws/%E3%89%8C' Is this correct? Any better suggestions? Is there a simple standard-library function to do this?

    Read the article

  • PHP: What is an efficient way to parse a text file containing very long lines?

    - by Shaun
    I'm working on a parser in php which is designed to extract MySQL records out of a text file. A particular line might begin with a string corresponding to which table the records (rows) need to be inserted into, followed by the records themselves. The records are delimited by a backslash and the fields (columns) are separated by commas. For the sake of simplicity, let's assume that we have a table representing people in our database, with fields being First Name, Last Name, and Occupation. Thus, one line of the file might be as follows [People] = "\Han,Solo,Smuggler\Luke,Skywalker,Jedi..." Where the ellipses (...) could be additional people. One straightforward approach might be to use fgets() to extract a line from the file, and use preg_match() to extract the table name, records, and fields from that line. However, let's suppose that we have an awful lot of Star Wars characters to track. So many, in fact, that this line ends up being 200,000+ characters/bytes long. In such a case, taking the above approach to extract the database information seems a bit inefficient. You have to first read hundreds of thousands of characters into memory, then read back over those same characters to find regex matches. Is there a way, similar to the Java String next(String pattern) method of the Scanner class constructed using a file, that allows you to match patterns in-line while scanning through the file? The idea is that you don't have to scan through the same text twice (to read it from the file into a string, and then to match patterns) or store the text redundantly in memory (in both the file line string and the matched patterns). Would this even yield a significant increase in performance? It's hard to tell exactly what PHP or Java are doing behind the scenes.

    Read the article

  • Ruby Design Problem for SQL Bulk Inserter

    - by crunchyt
    This is a Ruby design problem. How can I make a reusable flat file parser that can perform different data scrubbing operations per call, return the emitted results from each scrubbing operation to the caller and perform bulk SQL insertions? Now, before anyone gets narky/concerned, I have written this code already in a very unDRY fashion. Which is why I am asking any Ruby rockstars our there for some assitance. Basically, everytime I want to perform this logic, I create two nested loops, with custom processing in between, buffer each processed line to an array, and output to the DB as a bulk insert when the buffer size limit is reached. Although I have written lots of helpers, the main pattern is being copy pasted everytime. Not very DRY! Here is a Ruby/Pseudo code example of what I am repeating. lines_from_file.each do |line| line.match(/some regex/).each do |sub_str| # Process substring into useful format # EG1: Simple gsub() call # EG2: Custom function call to do complex scrubbing # and matching, emitting results to array # EG3: Loop to match opening/closing/nested brackets # or other delimiters and emit results to array end # Add processed lines to a buffer as SQL insert statement @buffer << PREPARED INSERT STATEMENT # Flush buffer when "buffer size limit reached" or "end of file" if sql_buffer_full || last_line_reached @dbc.insert(SQL INSERTS FROM BUFFER) @buffer = nil end end I am familiar with Proc/Lambda functions. However, because I want to pass two separate procs to the one function, I am not sure how to proceed. I have some idea about how to solve this, but I would really like to see what the real Rubyists suggest? Over to you. Thanks in advance :D

    Read the article

  • Rewrite a URL that's already been redirected?

    - by Jack
    Hi guys, I'm running an Apache2 web server with a dynamic IP address. I bought exampledomain.net, and I use no-ip.com's domain-update service to redirect any visitors to my current ip address (endnote #1). For example, someone visits exampledomain.net and they get redirected to 73.181.57.34. It works like a charm. However, it isn't all that user-friendly. Can I rewrite the redirected, ip-address URL? I tried these rewrite rules in the root folder's .htaccess... RewriteEngine On RewriteCond %{HTTP_HOST} ^73\.181\.57\.34:88 RewriteRule ^(.*)$ http://www.exampledomain.net/$1 [L,NC] # I simplified the RewriteCond. I would use regex in a real situation. Of course, this creates an infinite loop. The user visits www.exampledomain.net. They're redirected to 73.181.57.34:88 by no-ip. Apache redirects them to www.exampledomain.net which redirects them back to 73.181.57.34:88... so on and so forth. I'm a noob when it comes to rewriting, but is there a way to rewrite a URL without redirecting? I tried these rewrite rules too (a shot in the dark)... RewriteEngine On RewriteCond %{HTTP_HOST} ^73\.181\.57\.34:88 RewriteRule ^(.*)$ my.exampledomain.net/$1 [L,NC] # I'd read that Apache replied with a redirect header when you include http Thanks! (1) No-IP works like this: You download and install their dynamic update client on your server. Every couple of minutes it polls your server for its current external ip address. If it's changed, it updates your server's ip address in no-ip's records.

    Read the article

  • php parsing speed optimization

    - by Arnaud
    I would like to add tooltip or generate link according to the element available in the database, for exemple if the html page printed is: to reboot your linux host in single-user mode you can ... I will use explode(" ", $row[page]) and the idea is now to lookup for every single word in the page to find out if they have a related referance in this exemple let's say i've got a table referance an one entry for reboot and one for linux reboot: restart a computeur linux: operating system now my output will look like (replaced < and by @) to @a href="ref/reboot"@reboot@/a@ your @a href="ref/linux"@linux@/a@ host in single-user mode you can ... Instead of have a static list generated when I saved the content, if I add more keyword in the future, then the text will become more interactive. My main concerne and question is how can I create a efficient enough process to do it ? Should I store all the db entry in an array and compare them ? Do an sql query for each word (seems to be crazy) Dump the table in a file and use a very long regex or a "grep -f pattern data" way of doing it? Or or or or I'm sure it must be a better way of doing it, just don't have a clue about it, or maybe this will be far too resource un-friendly and I should avoid doing such things. Cheers!

    Read the article

< Previous Page | 169 170 171 172 173 174 175 176 177 178 179 180  | Next Page >