Search Results

Search found 3176 results on 128 pages for 'parsing'.

Page 13/128 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >

Haskell: Best tools to validate textual input?

- by Ana

In Haskell, there are a few different options to "parsing text". I know of Alex & Happy, Parsec and Attoparsec. Probably some others. I'd like to put together a library where the user can input pieces of a URL (scheme e.g. HTTP, hostname, username, port, path, query, etc.) I'd like to validate the pieces according to the ABNF specified in RFC 3986. In other words, I'd like to put together a set of functions such as: validateScheme :: String -> Bool validateUsername :: String -> Bool validatePassword :: String -> Bool validateAuthority :: String -> Bool validatePath :: String -> Bool validateQuery :: String -> Bool What is the most appropriate tool to use to write these functions? Alex's regexps is very concise, but it's a tokenizer and doesn't straightforwardly allow you to parse using specific rules, so it's not quite what I'm looking for, but perhaps it can be wrangled into doing this easily. I've written Parsec code that does some of the above, but it looks very different from the original ABNF and unnecessarily long. So, there must be an easier and/or more appropriate way. Recommendations?

Read the article
Sentence Tree v/s Words List

- by Rohit Jose

I was recently tasked with building a Name Entity Recognizer as part of a project. The objective was to parse a given sentence and come up with all the possible combinations of the entities. One approach that was suggested was to keep a lookup table for all the know connector words like articles and conjunctions, remove them from the words list after splitting the sentence on the basis of the spaces. This would leave out the Name Entities in the sentence. A lookup is then done for these identified entities on another lookup table that associates them to the entity type, for example if the sentence was: Remember the Titans was a movie directed by Boaz Yakin, the possible outputs would be: {Remember the Titans,Movie} was {a movie,Movie} directed by {Boaz Yakin,director} {Remember the Titans,Movie} was a movie directed by Boaz Yakin {Remember the Titans,Movie} was {a movie,Movie} directed by Boaz Yakin {Remember the Titans,Movie} was a movie directed by {Boaz Yakin,director} Remember the Titans was {a movie,Movie} directed by Boaz Yakin Remember the Titans was {a movie,Movie} directed by {Boaz Yakin,director} Remember the Titans was a movie directed by {Boaz Yakin,director} Remember the {the titans,Movie,Sports Team} was {a movie,Movie} directed by {Boaz Yakin,director} Remember the {the titans,Movie,Sports Team} was a movie directed by Boaz Yakin Remember the {the titans,Movie,Sports Team} was {a movie,Movie} directed by Boaz Yakin Remember the {the titans,Movie,Sports Team} was a movie directed by {Boaz Yakin,director} The entity lookup table here would contain the following data: Remember the Titans=Movie a movie=Movie Boaz Yakin=director the Titans=Movie the Titans=Sports Team Another alternative logic that was put forward was to build a crude sentence tree that would contain the connector words in the lookup table as parent nodes and do a lookup in the entity table for the leaf node that might contain the entities. The tree that was built for the sentence above would be: The question I am faced with is the benefits of the two approaches, should I be going for the tree approach to represent the sentence parsing, since it provides a more semantic structure? Is there a better approach I should be going for solving it?

Read the article
How do I parse a header with two different version [ID3] avoiding code duplication?

- by user66141

I really hope you can give me some interesting viewpoints for my situation, because I am not satisfied with my current approach. I am writing an MP3 parser, starting with an ID3v2 parser. Right now I`m working on the extended header parsing, my issue is that the optional header is defined differently in version 2.3 and 2.4 of the tag. The 2.3 version optional header is defined as follows: struct ID3_3_EXTENDED_HEADER{ DWORD dwExtHeaderSize; //Extended header size (either 6 or 8 bytes , excluded) WORD wExtFlags; //Extended header flags DWORD dwSizeOfPadding; //Size of padding (size of the tag excluding the frames and headers) }; While the 2.4 version is defined : struct ID3_4_EXTENDED_HEADER{ DWORD dwExtHeaderSize; //Extended header size (synchsafe int) BYTE bNumberOfFlagBytes; //Number of flag bytes BYTE bFlags; //Flags }; How could I parse the header while minimizing code duplication? Using two different functions to parse each version sounds less great, using a single function with a different flow for each occasion is similar, any good practices for this kind of issues ? Any tips for avoiding code duplication? Any help would be appreciated.

Read the article
Any good reason open files in text mode?

- by Tinctorius

(Almost-)POSIX-compliant operating systems and Windows are known to distinguish between 'binary mode' and 'text mode' file I/O. While the former mode doesn't transform any data between the actual file or stream and the application, the latter 'translates' the contents to some standard format in a platform-specific manner: line endings are transparently translated to '\n' in C, and some platforms (CP/M, DOS and Windows) cut off a file when a byte with value 0x1A is found. These transformations seem a little useless to me. People share files between computers with different operating systems. Text mode would cause some data to be handled differently across some platforms, so when this matters, one would probably use binary mode instead. As an example: while Windows uses the sequence CR LF to end a line in text mode, UNIX text mode will not treat CR as part of the line ending sequence. Applications would have to filter that noise themselves. Older Mac versions only use CR in text mode as line endings, so neither UNIX nor Windows would understand its files. If this matters, a portable application would probably implement the parsing by itself instead of using text mode. Implementing newline interpretation in the parser might also remove some overhead of using text mode, as buffers would need to be rewritten (and possibly resized) before returning to the application, while this may be less efficient than when it would happen in the application instead. So, my question is: is there any good reason to still rely on the host OS to translate line endings and file truncation?

Read the article
How exactly is an Abstract Syntax Tree created?

- by Howcan

I think I understand the goal of an AST, and I've build a couple of tree structures before, but never an AST. I'm mostly confused because the nodes are text and not number, so I can't think of a nice way to input a token/string as I'm parsing some code. For example, when I looked at diagrams of AST's, the variable and its value were leaf nodes to an equal sign. This makes perfect sense to me, but how would I go about implementing this? I guess I can do it case by case, so that when I stumble upon an "=" I use that as a node, and add the value parsed before the "=" as the leaf. It just seems wrong, because I'd probably have to make cases for tons and tons of things, depending on the syntax. And then I came upon another problem, how is the tree traversed? Do I go all the way down the height, and go back up a node when I hit the bottom, and do the same for it's neighbor? I've seen tons of diagrams on ASTs, but I couldn't find a fairly simple example of one in code, which would probably help.

Read the article
Extracting useful information from free text

- by insta

We filter and analyse seats for events. Apparently writing a domain query language for the floor people isn't an option. I'm using C# 4.0 & .NET 4.0, and have relatively free reign to use whatever open-source tools are available. </background-info> If a request comes in for "FLOOR B", the sales people want it to show up if they've entered "FLOOR A-FLOOR F" in a filter. The only problem I have is that there's absolutely no structure to the parsed parameters. I get the string already concatenated (it actually uses a tilde instead of dash). Examples I've seen so far with matches after each: 101WC-199WC (needs to match 150WC) AAA-ZZZ (needs to match AAA, BBB, ABC but not BB) LOGE15-LOGE20 (needs to match LOGE15 but not LOGE150) At first I wanted to try just stripping off the numeric part of the lower and upper, and then incrementing through that. The problem I have is that only some entries have numbers, sometimes the numbers AND letters increment, sometimes its all letters that increment. Since I can't impose any kind of grammar to use (I really wanted [..] expansion syntax), I'm stuck using these entries. Are there any suggestions for how to approach this parsing problem?

Read the article
Any good reason to open files in text mode?

- by Tinctorius

(Almost-)POSIX-compliant operating systems and Windows are known to distinguish between 'binary mode' and 'text mode' file I/O. While the former mode doesn't transform any data between the actual file or stream and the application, the latter 'translates' the contents to some standard format in a platform-specific manner: line endings are transparently translated to '\n' in C, and some platforms (CP/M, DOS and Windows) cut off a file when a byte with value 0x1A is found. These transformations seem a little useless to me. People share files between computers with different operating systems. Text mode would cause some data to be handled differently across some platforms, so when this matters, one would probably use binary mode instead. As an example: while Windows uses the sequence CR LF to end a line in text mode, UNIX text mode will not treat CR as part of the line ending sequence. Applications would have to filter that noise themselves. Older Mac versions only use CR in text mode as line endings, so neither UNIX nor Windows would understand its files. If this matters, a portable application would probably implement the parsing by itself instead of using text mode. Implementing newline interpretation in the parser might also remove some overhead of using text mode, as buffers would need to be rewritten (and possibly resized) before returning to the application, while this may be less efficient than when it would happen in the application instead. So, my question is: is there any good reason to still rely on the host OS to translate line endings and file truncation?

Read the article
High-level strategy for distinguishing a regular string from invalid JSON (ie. JSON-like string detection)

- by Jonline

Disclaimer On Absence of Code: I have no code to post because I haven't started writing; was looking for more theoretical guidance as I doubt I'll have trouble coding it but am pretty befuddled on what approach(es) would yield best results. I'm not seeking any code, either, though; just direction. Dilemma I'm toying with adding a "magic method"-style feature to a UI I'm building for a client, and it would require intelligently detecting whether or not a string was meant to be JSON as against a simple string. I had considered these general ideas: Look for a sort of arbitrarily-determined acceptable ratio of the frequency of JSON-like syntax (ie. regex to find strings separated by colons; look for colons between curly-braces, etc.) to the number of quote-encapsulated strings + nulls, bools and ints/floats. But the smaller the data set, the more fickle this would get look for key identifiers like opening and closing curly braces... not sure if there even are more easy identifiers, and this doesn't appeal anyway because it's so prescriptive about the kinds of mistakes it could find try incrementally parsing chunks, as those between curly braces, and seeing what proportion of these fractional statements turn out to be valid JSON; this seems like it would suffer less than (1) from smaller datasets, but would probably be much more processing-intensive, and very susceptible to a missing or inverted brace Just curious if the computational folks or algorithm pros out there had any approaches in mind that my semantics-oriented brain might have missed. PS: It occurs to me that natural language processing, about which I am totally ignorant, might be a cool approach; but, if NLP is a good strategy here, it sort of doesn't matter because I have zero experience with it and don't have time to learn & then implement/ this feature isn't worth it to the client.

Read the article
Scripting custom drawing in Delphi application with IF/THEN/ELSE statements?

- by Jerry Dodge

I'm building a Delphi application which displays a blueprint of a building, including doors, windows, wiring, lighting, outlets, switches, etc. I have implemented a very lightweight script of my own to call drawing commands to the canvas, which is loaded from a database. For example, one command is ELP 1110,1110,1290,1290,3,8388608 which draws an ellipse, parameters are 1110x1110 to 1290x1290 with pen width of 3 and the color 8388608 converted from an integer to a TColor. What I'm now doing is implementing objects with common drawing routines, and I'd like to use my scripting engine, but this calls for IF/THEN/ELSE statements and such. For example, when I'm drawing a light, if the light is turned on, I'd like to draw it yellow, but if it's off, I'd like to draw it gray. My current scripting engine has no recognition of such statements. It just accepts simple drawing commands which correspond with TCanvas methods. Here's the procedure I've developed (incomplete) for executing a drawing command on a canvas: function DrawCommand(const Cmd: String; var Canvas: TCanvas): Boolean; type TSingleArray = array of Single; var Br: TBrush; Pn: TPen; X: Integer; P: Integer; L: String; Inst: String; T: String; Nums: TSingleArray; begin Result:= False; Br:= Canvas.Brush; Pn:= Canvas.Pen; if Assigned(Canvas) then begin if Length(Cmd) > 5 then begin L:= UpperCase(Cmd); if Pos(' ', L)> 0 then begin Inst:= Copy(L, 1, Pos(' ', L) - 1); Delete(L, 1, Pos(' ', L)); L:= L + ','; SetLength(Nums, 0); X:= 0; while Pos(',', L) > 0 do begin P:= Pos(',', L); T:= Copy(L, 1, P - 1); Delete(L, 1, P); SetLength(Nums, X + 1); Nums[X]:= StrToFloatDef(T, 0); Inc(X); end; Br.Style:= bsClear; Pn.Style:= psSolid; Pn.Color:= clBlack; if Inst = 'LIN' then begin Pn.Width:= Trunc(Nums[4]); if Length(Nums) > 5 then begin Br.Style:= bsSolid; Br.Color:= Trunc(Nums[5]); end; Canvas.MoveTo(Trunc(Nums[0]), Trunc(Nums[1])); Canvas.LineTo(Trunc(Nums[2]), Trunc(Nums[3])); Result:= True; end else if Inst = 'ELP' then begin Pn.Width:= Trunc(Nums[4]); if Length(Nums) > 5 then begin Br.Style:= bsSolid; Br.Color:= Trunc(Nums[5]); end; Canvas.Ellipse(Trunc(Nums[0]),Trunc(Nums[1]),Trunc(Nums[2]),Trunc(Nums[3])); Result:= True; end else if Inst = 'ARC' then begin Pn.Width:= Trunc(Nums[8]); Canvas.Arc(Trunc(Nums[0]),Trunc(Nums[1]),Trunc(Nums[2]),Trunc(Nums[3]), Trunc(Nums[4]),Trunc(Nums[5]),Trunc(Nums[6]),Trunc(Nums[7])); Result:= True; end else if Inst = 'TXT' then begin Canvas.Font.Size:= Trunc(Nums[2]); Br.Style:= bsClear; Pn.Style:= psSolid; T:= Cmd; Delete(T, 1, Pos(' ', T)); Delete(T, 1, Pos(',', T)); Delete(T, 1, Pos(',', T)); Delete(T, 1, Pos(',', T)); Canvas.TextOut(Trunc(Nums[0]), Trunc(Nums[1]), T); Result:= True; end; end else begin //No space found, not a valid command end; end; end; end; What I'd like to know is what's a good lightweight third-party scripting engine I could use to accomplish this? I would hate to implement parsing of IF, THEN, ELSE, END, IFELSE, IFEND, and all those necessary commands. I need simply the ability to tell the scripting engine if certain properties meet certain conditions, it needs to draw the object a certain way. The light example above is only one scenario, but the same solution needs to also be applicable to other scenarios, such as a door being open or closed, locked or unlocked, and draw it a different way accordingly. This needs to be implemented in the object script drawing level. I can't hard-code any of these scripting/drawing rules, the drawing needs to be controlled based on the current state of the object, and I may also wish to draw a light a certain shade or darkness depending on how dimmed the light is.

Read the article
JavaCC: Please give me links to "real" examples.

- by java.is.for.desktop

Hello, everyone! I know that there are many examples of JavaCC parsers here, but they all do nothing. They just accept a string, or produce parsing errors. What I need is a few examples of real parsers, which actually do something during parsing. (Such as building a DOM tree during parsing an XML string). Please help! ;)

Read the article
Generalized Bottom up Parser Combinators in Haskell

- by Panini Sai

I am wondered why there is no generalized parser combinators for Bottom-up parsing in Haskell like a Parsec combinators for top down parsing. ( I could find some research work went during 2004 but nothing after https://haskell-functional-parsing.googlecode.com/files/Ljunglof-2002a.pdf http://www.di.ubi.pt/~jpf/Site/Publications_files/technicalReport.pdf ) Is there any specific reason for not achieving it?

Read the article
Idea of an algorithm to detect a website's navigation structure?

- by Uwe Keim

Currently I am in the process of developing an importer of any existing, arbitrary (static) HTML website into the upcoming release of our CMS. While the downloading the files is solved successfully, I'm pulling my hair off when it comes to detect a site structure (pages and subpages) purely from the HTML files, without the user specifying additional hints. Basically I want to get a tree like: + Root page 1 + Child page 1 + Child page 2 + Child child page1 + Child page 3 + Root page 2 + Child page 4 + Root page 3 + ... I.e. I want to be able to detect the menu structure from the links inside the pages. This has not to be 100% accurate, but at least I want to achieve more than just a flat list. I thought of looking at multiple pages to see similar areas and identify these as menu areas and parse the links there, but after all I'm not that satisfied with this idea. My question: Can you imagine any algorithm when it comes to detecting such a structure? Update 1: What I'm looking for is not a web spider, but an algorithm do create a logical tree of the relationship of the pages to be able to create pages and subpages inside my CMS when importing them. Update 2: As of Robert's suggestion I'll solve this by starting at the root page, and then simply parse links as you go and treat every link inside a page simply as a child page. Probably I'll recurse not in a deep-first manner but rather in a breadth-first manner to get a more balanced navigation structure.

Read the article
How should I implement a command processing application?

- by Nini Michaels

I want to make a simple, proof-of-concept application (REPL) that takes a number and then processes commands on that number. Example: I start with 1. Then I write "add 2", it gives me 3. Then I write "multiply 7", it gives me 21. Then I want to know if it is prime, so I write "is prime" (on the current number - 21), it gives me false. "is odd" would give me true. And so on. Now, for a simple application with few commands, even a simple switch would do for processing the commands. But if I want extensibility, how would I need to implement the functionality? Do I use the command pattern? Do I build a simple parser/interpreter for the language? What if I want more complex commands, like "multiply 5 until >200" ? What would be an easy way to extend it (add new commands) without recompiling? Edit: to clarify a few things, my end goal would not be to make something similar to WolframAlpha, but rather a list (of numbers) processor. But I want to start slowly at first (on single numbers). I'm having in mind something similar to the way one would use Haskell to process lists, but a very simple version. I'm wondering if something like the command pattern (or equivalent) would suffice, or if I have to make a new mini-language and a parser for it to achieve my goals?

Read the article
Persisting NLP parsed data

- by tjb1982

I've recently started experimenting with NLP using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (postgres supports this and I've found it works really well). Something like this: Component ( id, POS, parent_id ) Word ( id, raw, lemma, POS, NER ) CW_Map ( component_id, word_id, position int ) But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

Read the article
How can I best manage making open source code releases from my company's confidential research code?

- by DeveloperDon

My company (let's call them Acme Technology) has a library of approximately one thousand source files that originally came from its Acme Labs research group, incubated in a development group for a couple years, and has more recently been provided to a handful of customers under non-disclosure. Acme is getting ready to release perhaps 75% of the code to the open source community. The other 25% would be released later, but for now, is either not ready for customer use or contains code related to future innovations they need to keep out of the hands of competitors. The code is presently formatted with #ifdefs that permit the same code base to work with the pre-production platforms that will be available to university researchers and a much wider range of commercial customers once it goes to open source, while at the same time being available for experimentation and prototyping and forward compatibility testing with the future platform. Keeping a single code base is considered essential for the economics (and sanity) of my group who would have a tough time maintaining two copies in parallel. Files in our current base look something like this: > // Copyright 2012 (C) Acme Technology, All Rights Reserved. > // Very large, often varied and restrictive copyright license in English and French, > // sometimes also embedded in make files and shell scripts with varied > // comment styles. > > > ... Usual header stuff... > > void initTechnologyLibrary() { > nuiInterface(on); > #ifdef UNDER_RESEARCH > holographicVisualization(on); > #endif > } And we would like to convert them to something like: > // GPL Copyright (C) Acme Technology Labs 2012, Some rights reserved. > // Acme appreciates your interest in its technology, please contact [email protected] > // for technical support, and www.acme.com/emergingTech for updates and RSS feed. > > ... Usual header stuff... > > void initTechnologyLibrary() { > nuiInterface(on); > } Is there a tool, parse library, or popular script that can replace the copyright and strip out not just #ifdefs, but variations like #if defined(UNDER_RESEARCH), etc.? The code is presently in Git and would likely be hosted somewhere that uses Git. Would there be a way to safely link repositories together so we can efficiently reintegrate our improvements with the open source versions? Advice about other pitfalls is welcome.

Read the article
What is the simplest human readable configuration file format?

- by Juha

Current configuration file is as follows: mainwindow.title = 'test' mainwindow.position.x = 100 mainwindow.position.y = 200 mainwindow.button.label = 'apply' mainwindow.button.size.x = 100 mainwindow.button.size.y = 30 logger.datarate = 100 logger.enable = True logger.filename = './test.log' This is read with python to a nested dictionary: { 'mainwindow':{ 'button':{ 'label': {'value':'apply'}, ... }, 'logger':{ datarate: {'value': 100}, enable: {'value': True}, filename: {'value': './test.log'} }, ... } Is there a better way of doing this? The idea is to get XML type of behavior and avoid XML as long as possible. The end user is assumed almost totally computer illiterate and basically uses notepad and copy-paste. Thus the python standard "header + variables" type is considered too difficult. The dummy user edits the config file, able programmers handle the dictionaries. Nested dictionary is chosen for easy splitting (logger does not need or even cannot have/edit mainwindow parameters).

Read the article
What is this algorithm for converting strings into numbers called?

- by CodexArcanum

I've been doing some work in Parsec recently, and for my toy language I wanted multi-based fractional numbers to be expressible. After digging around in Parsec's source a bit, I found their implementation of a floating-point number parser, and copied it to make the needed modifications. So I understand what this code does, and vaguely why (I haven't worked out the math fully yet, but I think I get the gist). But where did it come from? This seems like a pretty clever way to turn strings into floats and ints, is there a name for this algorithm? Or is it just something basic that's a hole in my knowledge? Did the folks behind Parsec devise it? Here's the code, first for integers: number' :: Integer -> Parser Integer number' base = do { digits <- many1 ( oneOf ( sigilRange base )) ; let n = foldl (\x d -> base * x + toInteger (convertDigit base d)) 0 digits ; seq n (return n) } So the basic idea here is that digits contains the string representing the whole number part, ie "192". The foldl converts each digit individually into a number, then adds that to the running total multiplied by the base, which means that by the end each digit has been multiplied by the correct factor (in aggregate) to position it. The fractional part is even more interesting: fraction' :: Integer -> Parser Double fraction' base = do { digits <- many1 ( oneOf ( sigilRange base )) ; let base' = fromIntegral base ; let f = foldr (\d x -> (x + fromIntegral (convertDigit base d))/base') 0.0 digits ; seq f (return f) Same general idea, but now a foldr and using repeated division. I don't quite understand why you add first and then divide for the fraction, but multiply first then add for the whole. I know it works, just haven't sorted out why. Anyway, I feel dumb not working it out myself, it's very simple and clever looking at it. Is there a name for this algorithm? Maybe the imperative version using a loop would be more familiar?

Read the article
How to add precedence to LALR parser like in YACC?

- by greenoldman

Please note, I am asking about writing LALR parser, not writing rules for LALR parser. What I need is... ...to mimic YACC precedence definitions. I don't know how it is implemented, and below I describe what I've done and read so far. For now I have basic LALR parser written. Next step -- adding precedence, so 2+3*4 could be parsed as 2+(3*4). I've read about precedence parsers, however I don't see how to fit such model into LALR. I don't understand two points: how to compute when insert parenthesis generator how to compute how many parenthesis the generator should create I insert generators when the symbols is taken from input and put at the stack, right? So let's say I have something like this (| denotes boundary between stack and input): ID = 5 | + ..., at this point I add open, so it gives ID = < 5 | + ..., then I read more input ID = < 5 + | 5 ... and more ID = < 5 + 5 | ; ... and more ID = < 5 + 5 ; | ... At this point I should have several reduce moves in normal LALR, but the open parenthesis does not match so I continue reading more input. Which does not make sense. So this was when problem. And about count, let's say I have such data < 2 + < 3 * 4 >. As human I can see that the last generator should create 2 parenthesis, but how to compute this? After all there could be two scenarios: ( 2 + ( 3 *4 )) -- parenthesis is used to show the outcome of generator or (2 + (( 3 * 4 ) ^ 5) because there was more input Please note that in both cases before 3 was open generator, and after 4 there was close generator. However in both cases, after reading 4 I have to reduce, so I have to know what generator "creates".

Read the article
LL(8) and left-recursion

- by Peregring-lk

I want to understand the relation between LL/LR grammars and the left-recursion problem (for any question I know parcially the answer, but I ask them as I don't know nothing, because I am a little confused now, and prefer complete answers) I'm happy with sintetized or short and direct answers (or just links solving it unambiguously): What type of language isn't LL(8) languages? LL(K) and LL(8) have problems with left-recursion? Or only LL(k) parsers? LALR(1) parser have troubles with left or right recursion? What type of troubles? Only in terms of the LL/LALR comparision. What is better, Bison (LALR(1)) or Boost.Spirit (LL(8))? (Let's suppose other features of them are irrelevant in this question) Why GCC use a (hand-made) LL(8) parser? Only for the "handling-error" problem?

Read the article
Persisting natural language processing parsed data

- by tjb1982

I've recently started experimenting with natural language processing (NLP) using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might be interesting is to store the children as an adjacency list and make good use of recursive queries (Postgres supports this and I've found it works really well). But I assume there are probably many standard ways to do this depending on what kind of analysis is being done that have been adopted by people working in the field over the years. So what are the standard persistence strategies for NLP parsed data and how are they used?

Read the article
First and Follow Sets for a Grammar

- by Aimee Jones

I'm studying for a Compiler Construction module I'm doing and I have a sample question as follows: Calculate the FIRST and FOLLOW sets for the following grammar.. S -> uBDz B -> Bv B -> w D -> EF E -> y E -> e F -> x F -> e I have tried to figure it out so far but I'm a bit unsure if I'm correct. Could someone verify if I'm doing it right, and if not, what am I missing? My answer is below: FIRST | FOLLOW S | {u} | {$} B | {w} | {y,x,v,z} D | {y,e,x} | {z} E | {y,e} | {x,z} F | {x,e} | {z}

Read the article
Generating Wrappers for REST APIs

- by Kyle

Would it be feasible to generate wrappers for REST APIs? An earlier question asked about machine readable descriptions of RESTful services addressed how we could write (and then read) API specifications in a standardized way which would lend itself well to generated wrappers. Could a first pass parser generate a decent wrapper that human intervention could fix up? Perhaps the first pass wouldn't be consistent, but would remove a lot of the grunt work and make it easy to flesh out the rest of the API and types. What would need to be considered? What's stopping people from doing this? Has it already been done and my google fu is weak for the day?

Read the article
Getting data from a webpage in a stable and efficient way

- by Mike Heremans

Recently I've learned that using a regex to parse the HTML of a website to get the data you need isn't the best course of action. So my question is simple: What then, is the best / most efficient and a generally stable way to get this data? I should note that: There are no API's There is no other source where I can get the data from (no databases, feeds and such) There is no access to the source files. (Data from public websites) Let's say the data is normal text, displayed in a table in a html page I'm currently using python for my project but a language independent solution/tips would be nice. As a side question: How would you go about it when the webpage is constructed by Ajax calls?

Read the article
Template syntax for users - is there a right way to do it?

- by RickM

Ok, I'm in the middle of building a saas system, and as part of that, the hosted clients need to be able to edit certain layout templates, baqsically just html, css and javascript files. I'm obviously going to be wanting to use a template syntax here as it would be dumb to let people execute PHP code, so in this instance template syntax does need to be used. I know that in the grand scale of things, this is a very minor thing, but what template syntax do you use, and why? Is there one that's considered better than others? I've seen all sorts being used with no real consistency, for example: Smarty Style: {$someVar} {foreach from="foo" item="bar"} {$bar.food} {/foreach} ASP Style: {% someVar %} {% foreach foo as bar %} {% bar.food %} {% endforeach %} HTML Style: <someVar> <foreach from="foo" item="bar"> <bar:food> </foreach> PyroCMS/FuelPHP "LEX" Style: {{ someVar }} {{ foreach from="foo" item="bar" }} {{ bar:food }} {{ endforeach }} Obviously these arent 100% accurate (for example, LEX is used alongside PHP for loops), and are only to give you an example of what I mean. What, in your opinion would be the best one (if any) to go with. I ask this bearing in mind that people using this are likely to be novice users. I did look around at a bunch of hosted CMS and E-Commerce systems as these seem to make use of user-editable templates, and most seem to be using some form of their own syntax. I should note that whatever style I end up going with, it will be with a custom template handler due to the complexity of the system and how template files are stored. Plus I'd not want to touch the likes of Smarty with a barge pole!

Read the article
Basic questions while making a toy calculator

- by Jwan622

I am making a calculator to better understand how to program and I had a question about the following lines of code: I wanted to make my equals sign with this C# code: private void btnEquals_Click(object sender, EventArgs e) { if (plusButtonClicked == true) { total2 = total1 + Convert.ToDouble(txtDisplay.Text); //double.Parse(txtDisplay.Text); } else if (minusButtonClicked == { total2 = total1 - double.Parse(txtDisplay.Text) } } txtDisplay.Text = total2.ToString(); total1 = 0; However, my friend said this way of writing code was superior, with changes in the minus sign. private void btnEquals_Click(object sender, EventArgs e) { if (plusButtonClicked == true) { total2 = total1 + Convert.ToDouble(txtDisplay.Text); //double.Parse(txtDisplay.Text); } else if (minusButtonClicked == true) { double d1; if(double.TryParse(txtDisplay.Text, out d1)) { total2 = total1 - d1; } } txtDisplay.Text = total2.ToString(); total1 = 0; My questions: 1) What does the "out d1" section of this minus sign code mean? 2) My assumption here is that the "TryParse" code results in fewer systems crashes? If I just use "Double.Parse" and I don't put anything in the textbox, the program will crash sometimes right?

Read the article

Search Results

Search found 3176 results on 128 pages for 'parsing'.

Page 13/128 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >

- by Ana

- by Rohit Jose

- by user66141

- by Tinctorius

- by Howcan

- by insta

- by Tinctorius

- by Jonline

- by Jerry Dodge

- by java.is.for.desktop

- by Panini Sai

- by Uwe Keim

- by Nini Michaels

- by tjb1982

- by DeveloperDon

- by Juha

- by CodexArcanum

- by greenoldman

- by Peregring-lk

- by tjb1982

- by Aimee Jones

- by Kyle

- by Mike Heremans

- by RickM

- by Jwan622

< Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20 | Next Page >