Search Results

Search found 16 results on 1 pages for 'pyparsing'.

Page 1/1 | 1 

  • Parsing names with pyparsing

    - by johnthexiii
    I have a file of names and ages, john 25 bob 30 john bob 35 Here is what I have so far from pyparsing import * data = ''' john 25 bob 30 john bob 35 ''' name = Word(alphas + Optional(' ') + alphas) rowData = Group(name + Suppress(White(" ")) + Word(nums)) table = ZeroOrMore(rowData) print table.parseString(data) the output I am expecting is [['john', 25], ['bob', 30], ['john bob', 35]] Here is the stacktrace Traceback (most recent call last): File "C:\Users\mccauley\Desktop\client.py", line 11, in <module> eventType = Word(alphas + Optional(' ') + alphas) File "C:\Python27\lib\site-packages\pyparsing.py", line 1657, in __init__ self.name = _ustr(self) File "C:\Python27\lib\site-packages\pyparsing.py", line 122, in _ustr return str(obj) File "C:\Python27\lib\site-packages\pyparsing.py", line 1743, in __str__ self.strRepr = "W:(%s)" % charsAsStr(self.initCharsOrig) File "C:\Python27\lib\site-packages\pyparsing.py", line 1735, in charsAsStr if len(s)>4: TypeError: object of type 'And' has no len()

    Read the article

  • PyParsing: Not all tokens passed to setParseAction()

    - by Rosarch
    I'm parsing sentences like "CS 2110 or INFO 3300". I would like to output a format like: [[("CS" 2110)], [("INFO", 3300)]] To do this, I thought I could use setParseAction(). However, the print statements in statementParse() suggest that only the last tokens are actually passed: >>> statement.parseString("CS 2110 or INFO 3300") Match [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] at loc 7(1,8) string CS 2110 or INFO 3300 loc: 7 tokens: ['INFO', 3300] Matched [{Suppress:("or") Re:('[A-Z]{2,}') Re:('[0-9]{4}')}] -> ['INFO', 3300] (['CS', 2110, 'INFO', 3300], {'Course': [(2110, 1), (3300, 3)], 'DeptCode': [('CS', 0), ('INFO', 2)]}) I expected all the tokens to be passed, but it's only ['INFO', 3300]. Am I doing something wrong? Or is there another way that I can produce the desired output? Here is the pyparsing code: from pyparsing import * def statementParse(str, location, tokens): print "string %s" % str print "loc: %s " % location print "tokens: %s" % tokens DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode") COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber") OR_CONJ = Suppress("or") COURSE_NUMBER.setParseAction(lambda s, l, toks : int(toks[0])) course = DEPT_CODE + COURSE_NUMBER.setResultsName("Course") statement = course + Optional(OR_CONJ + course).setParseAction(statementParse).setDebug()

    Read the article

  • Help needed with pyparsing [closed]

    - by Zearin
    Overview So, I’m in the middle of refactoring a project, and I’m separating out a bunch of parsing code. The code I’m concerned with is pyparsing. I have a very poor understanding of pyparsing, even after spending a lot of time reading through the official documentation. I’m having trouble because (1) pyparsing takes a (deliberately) unorthodox approach to parsing, and (2) I’m working on code I didn’t write, with poor comments, and a non-elementary set of existing grammars. (I can’t get in touch with the original author, either.) Failing Test I’m using PyVows to test my code. One of my tests is as follows (I think this is clear even if you’re unfamiliar with PyVows; let me know if it isn’t): def test_multiline_command_ends(self, topic): output = parsed_input('multiline command ends\n\n',topic) expect(output).to_equal( r'''['multiline', 'command ends', '\n', '\n'] - args: command ends - multiline_command: multiline - statement: ['multiline', 'command ends', '\n', '\n'] - args: command ends - multiline_command: multiline - terminator: ['\n', '\n'] - terminator: ['\n', '\n']''') But when I run the test, I get the following in the terminal: Failed Test Results Expected topic("['multiline', 'command ends']\n- args: command ends\n- command: multiline\n- statement: ['multiline', 'command ends']\n - args: command ends\n - command: multiline") to equal "['multiline', 'command ends', '\\n', '\\n']\n- args: command ends\n- multiline_command: multiline\n- statement: ['multiline', 'command ends', '\\n', '\\n']\n - args: command ends\n - multiline_command: multiline\n - terminator: ['\\n', '\\n']\n- terminator: ['\\n', '\\n']" Note: Since the output is to a Terminal, the expected output (the second one) has extra backslashes. This is normal. The test ran without issue before this piece of refactoring began. Expected Behavior The first line of output should match the second, but it doesn’t. Specifically, it’s not including the two newline characters in that first list object. So I’m getting this: "['multiline', 'command ends']\n- args: command ends\n- command: multiline\n- statement: ['multiline', 'command ends']\n - args: command ends\n - command: multiline" When I should be getting this: "['multiline', 'command ends', '\\n', '\\n']\n- args: command ends\n- multiline_command: multiline\n- statement: ['multiline', 'command ends', '\\n', '\\n']\n - args: command ends\n - multiline_command: multiline\n - terminator: ['\\n', '\\n']\n- terminator: ['\\n', '\\n']" Earlier in the code, there is also this statement: pyparsing.ParserElement.setDefaultWhitespaceChars(' \t') …Which I think should prevent exactly this kind of error. But I’m not sure. Even if the problem can’t be identified with certainty, simply narrowing down where the problem is would be a HUGE help. Please let me know how I might take a step or two towards fixing this.

    Read the article

  • Python/PyParsing: Difficulty with setResultsName

    - by Rosarch
    I think I'm making a mistake in how I call setResultsName(): from pyparsing import * DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("Dept Code") COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("Course Number") COURSE_NUMBER.setParseAction(lambda s, l, toks : int(toks[0])) course = DEPT_CODE + COURSE_NUMBER course.setResultsName("course") statement = course From IDLE: >>> myparser import * >>> statement.parseString("CS 2110") (['CS', 2110], {'Dept Code': [('CS', 0)], 'Course Number': [(2110, 1)]}) The output I hope for: >>> myparser import * >>> statement.parseString("CS 2110") (['CS', 2110], {'Course': ['CS', 2110], 'Dept Code': [('CS', 0)], 'Course Number': [(2110, 1)]}) Does setResultsName() only work for terminals?

    Read the article

  • Pyparsing CSV string with random quotes

    - by gtfx
    Hey, I have a string like the following: <118date=2010-05-09,time=16:41:27,device_id=FE-2KA3F09000049,log_id=0400147717,log_part=00,type=statistics,subtype=n/a,pri=information,session_id=o49CedRc021772,from="[email protected]",mailer="mta",client_name="example.org,[194.177.17.24]",resolved=OK,to="[email protected]",direction="in",message_length=6832079,virus="",disposition="Accept",classifier="Not,Spam",subject="=?windows-1255?B?Rlc6IEZ3OiDg5fDp5fog+fno5fog7Pf46eHp7S3u4+Tp7SE=?=" I tried using CSV module and it didn't fit, cause i haven't found a way to ignore what's quoted. Pyparsing looked like a better answer but i haven't found a way to declare all the grammars. Currently, i am using my old Perl script to parse it, but i want this written in Python. if you need my Perl snippet i will be glad to provide it. Any help is appreciated.

    Read the article

  • Python - pyparsing unicode characters

    - by mgj
    Hi..:) I tried using w = Word(printables), but it isn't working. How should I give the spec for this. 'w' is meant to process Hindi characters (UTF-8) The code specifies the grammar and parses accordingly. 671.assess :: ????? ::2 x=number + "." + src + "::" + w + "::" + number + "." + number If there is only english characters it is working so the code is correct for the ascii format but the code is not working for the unicode format. I mean that the code works when we have something of the form 671.assess :: ahsaas ::2 i.e. it parses words in the english format, but I am not sure how to parse and then print characters in the unicode format. I need this for English Hindi word alignment for purpose. The python code looks like this: # -*- coding: utf-8 -*- from pyparsing import Literal, Word, Optional, nums, alphas, ZeroOrMore, printables , Group , alphas8bit , # grammar src = Word(printables) trans = Word(printables) number = Word(nums) x=number + "." + src + "::" + trans + "::" + number + "." + number #parsing for eng-dict efiledata = open('b1aop_or_not_word.txt').read() eresults = x.parseString(efiledata) edict1 = {} edict2 = {} counter=0 xx=list() for result in eresults: trans=""#translation string ew=""#english word xx=result[0] ew=xx[2] trans=xx[4] edict1 = { ew:trans } edict2.update(edict1) print len(edict2) #no of entries in the english dictionary print "edict2 has been created" print "english dictionary" , edict2 #parsing for hin-dict hfiledata = open('b1aop_or_not_word.txt').read() hresults = x.scanString(hfiledata) hdict1 = {} hdict2 = {} counter=0 for result in hresults: trans=""#translation string hw=""#hin word xx=result[0] hw=xx[2] trans=xx[4] #print trans hdict1 = { trans:hw } hdict2.update(hdict1) print len(hdict2) #no of entries in the hindi dictionary print"hdict2 has been created" print "hindi dictionary" , hdict2 ''' ####################################################################################################################### def translate(d, ow, hinlist): if ow in d.keys():#ow=old word d=dict print ow , "exists in the dictionary keys" transes = d[ow] transes = transes.split() print "possible transes for" , ow , " = ", transes for word in transes: if word in hinlist: print "trans for" , ow , " = ", word return word return None else: print ow , "absent" return None f = open('bidir','w') #lines = ["'\ #5# 10 # and better performance in business in turn benefits consumers . # 0 0 0 0 0 0 0 0 0 0 \ #5# 11 # vHyaapaar mEmn bEhtr kaam upbhOkHtaaomn kE lIe laabhpHrdd hOtaa hAI . # 0 0 0 0 0 0 0 0 0 0 0 \ #'"] data=open('bi_full_2','rb').read() lines = data.split('!@#$%') loc=0 for line in lines: eng, hin = [subline.split(' # ') for subline in line.strip('\n').split('\n')] for transdict, source, dest in [(edict2, eng, hin), (hdict2, hin, eng)]: sourcethings = source[2].split() for word in source[1].split(): tl = dest[1].split() otherword = translate(transdict, word, tl) loc = source[1].split().index(word) if otherword is not None: otherword = otherword.strip() print word, ' <-> ', otherword, 'meaning=good' if otherword in dest[1].split(): print word, ' <-> ', otherword, 'trans=good' sourcethings[loc] = str( dest[1].split().index(otherword) + 1) source[2] = ' '.join(sourcethings) eng = ' # '.join(eng) hin = ' # '.join(hin) f.write(eng+'\n'+hin+'\n\n\n') f.close() ''' if an example input sentence for the source file is: 1# 5 # modern markets : confident consumers # 0 0 0 0 0 1# 6 # AddhUnIk baajaar : AshHvsHt upbhOkHtaa . # 0 0 0 0 0 0 !@#$% the ouptut would look like this :- 1# 5 # modern markets : confident consumers # 1 2 3 4 5 1# 6 # AddhUnIk baajaar : AshHvsHt upbhOkHtaa . # 1 2 3 4 5 0 !@#$% Output Explanation:- This achieves bidirectional alignment. It means the first word of english 'modern' maps to the first word of hindi 'AddhUnIk' and vice versa. Here even characters are take as words as they also are an integral part of bidirectional mapping. Thus if you observe the hindi WORD '.' has a null alignment and it maps to nothing with respect to the English sentence as it doesn't have a full stop. The 3rd line int the output basically represents a delimiter when we are working for a number of sentences for which your trying to achieve bidirectional mapping. What modification should i make for it to work if the I have the hindi sentences in Unicode(UTF-8) format.

    Read the article

  • pyparsing ambiguity

    - by Claudiu
    I'm trying to parse some text using PyParser. The problem is that I have names that can contain white spaces. So my input might look like this: Joe Bob Jimmy Foo Joe decides to eat. Bob decides to not eat. Jimmy Foo decides to eat. How can I create a parser for the decides to eat line? If I create my name parser naively, meaning with alphabetic characters plus space characters, then it will match the entire line.

    Read the article

  • PyParsing: Is this correct use of setParseAction()?

    - by Rosarch
    I have strings like this: "MSE 2110, 3030, 4102" I would like to output: [("MSE", 2110), ("MSE", 3030), ("MSE", 4102)] This is my way of going about it, although I haven't quite gotten it yet: def makeCourseList(str, location, tokens): print "before: %s" % tokens for index, course_number in enumerate(tokens[1:]): tokens[index + 1] = (tokens[0][0], course_number) print "after: %s" % tokens course = Group(DEPT_CODE + COURSE_NUMBER) # .setResultsName("Course") course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)).setParseAction(makeCourseList) This outputs: >>> course.parseString("CS 2110") ([(['CS', 2110], {})], {}) >>> course_data.parseString("CS 2110, 4301, 2123, 1110") before: [['CS', 2110], 4301, 2123, 1110] after: [['CS', 2110], ('CS', 4301), ('CS', 2123), ('CS', 1110)] ([(['CS', 2110], {}), ('CS', 4301), ('CS', 2123), ('CS', 1110)], {}) Is this the right way to do it, or am I totally off? Also, the output of isn't quite correct - I want course_data to emit a list of course symbols that are in the same format as each other. Right now, the first course is different from the others. (It has a {}, whereas the others don't.)

    Read the article

  • Strip text except from the contents of a tag

    - by myle
    The opposite may be achieved using pyparsing as follows: from pyparsing import Suppress, replaceWith, makeHTMLTags, SkipTo #... removeText = replaceWith("") scriptOpen, scriptClose = makeHTMLTags("script") scriptBody = scriptOpen + SkipTo(scriptClose) + scriptClose scriptBody.setParseAction(removeText) data = (scriptBody).transformString(data) How could I keep the contents of the tag "table"?

    Read the article

  • Python: How best to parse a simple grammar?

    - by Rosarch
    Ok, so I've asked a bunch of smaller questions about this project, but I still don't have much confidence in the designs I'm coming up with, so I'm going to ask a question on a broader scale. I am parsing pre-requisite descriptions for a course catalog. The descriptions almost always follow a certain form, which makes me think I can parse most of them. From the text, I would like to generate a graph of course pre-requisite relationships. (That part will be easy, after I have parsed the data.) Some sample inputs and outputs: "CS 2110" => ("CS", 2110) # 0 "CS 2110 and INFO 3300" => [("CS", 2110), ("INFO", 3300)] # 1 "CS 2110, INFO 3300" => [("CS", 2110), ("INFO", 3300)] # 1 "CS 2110, 3300, 3140" => [("CS", 2110), ("CS", 3300), ("CS", 3140)] # 1 "CS 2110 or INFO 3300" => [[("CS", 2110)], [("INFO", 3300)]] # 2 "MATH 2210, 2230, 2310, or 2940" => [[("MATH", 2210), ("MATH", 2230), ("MATH", 2310)], [("MATH", 2940)]] # 3 If the entire description is just a course, it is output directly. If the courses are conjoined ("and"), they are all output in the same list If the course are disjoined ("or"), they are in separate lists Here, we have both "and" and "or". One caveat that makes it easier: it appears that the nesting of "and"/"or" phrases is never greater than as shown in example 3. What is the best way to do this? I started with PLY, but I couldn't figure out how to resolve the reduce/reduce conflicts. The advantage of PLY is that it's easy to manipulate what each parse rule generates: def p_course(p): 'course : DEPT_CODE COURSE_NUMBER' p[0] = (p[1], int(p[2])) With PyParse, it's less clear how to modify the output of parseString(). I was considering building upon @Alex Martelli's idea of keeping state in an object and building up the output from that, but I'm not sure exactly how that is best done. def addCourse(self, str, location, tokens): self.result.append((tokens[0][0], tokens[0][1])) def makeCourseList(self, str, location, tokens): dept = tokens[0][0] new_tokens = [(dept, tokens[0][1])] new_tokens.extend((dept, tok) for tok in tokens[1:]) self.result.append(new_tokens) For instance, to handle "or" cases: def __init__(self): self.result = [] # ... self.statement = (course_data + Optional(OR_CONJ + course_data)).setParseAction(self.disjunctionCourses) def disjunctionCourses(self, str, location, tokens): if len(tokens) == 1: return tokens print "disjunction tokens: %s" % tokens How does disjunctionCourses() know which smaller phrases to disjoin? All it gets is tokens, but what's been parsed so far is stored in result, so how can the function tell which data in result corresponds to which elements of token? I guess I could search through the tokens, then find an element of result with the same data, but that feel convoluted... What's a better way to approach this problem?

    Read the article

  • What makes Ometa special?

    - by Brian
    Ometa is "a new object-oriented language for pattern matching." I've encountered pattern matching in languages like Oz tools to parse grammars like Lexx/Yacc or Pyparsing before. Despite looking at example code, reading discussions, and talking to a friend, I still am not able to get a real understanding of what makes Ometa special (or at least, why some people think it is). Any explanation?

    Read the article

  • Parse an HTTP request Authorization header with Python

    - by Kris Walker
    I need to take a header like this: Authorization: Digest qop="chap", realm="[email protected]", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41" And parse it into this using Python: {'protocol':'Digest', 'qop':'chap', 'realm':'[email protected]', 'username':'Foobear', 'response':'6629fae49393a05397450978507c4ef1', 'cnonce':'5ccc069c403ebaf9f0171e9517f40e41'} Is there a library to do this, or something I could look at for inspiration? I'm doing this on Google App Engine, and I'm not sure if the Pyparsing library is available, but maybe I could include it with my app if it is the best solution. Currently I'm creating my own MyHeaderParser object and using it with reduce() on the header string. It's working, but very fragile. Brilliant solution by nadia below: import re reg = re.compile('(\w+)[=] ?"?(\w+)"?') s = """Digest realm="stackoverflow.com", username="kixx" """ print str(dict(reg.findall(s)))

    Read the article

  • math syntax checker written in python

    - by neurino
    All I need is to check, using python, if a string is a valid math expression or not. For simplicity let's say I just need + - * / operators (+ - as unary too) with numbers and nested parenthesis. I add also simple variable names for completeness. So I can test this way: test("-3 * (2 + 1)") #valid test("-3 * ") #NOT valid test("v1 + v2") #valid test("v2 - 2v") #NOT valid ("2v" not a valid variable name) I tried pyparsing but just trying the example: "simple algebraic expression parser, that performs +,-,*,/ and ^ arithmetic operations" I get passed invalid code and also trying to fix it I always get wrong syntaxes being parsed without raising Exceptions just try: >>>test('9', 9) 9 qwerty = 9.0 ['9'] => ['9'] >>>test('9 qwerty', 9) 9 qwerty = 9.0 ['9'] => ['9'] both test pass... o_O Any advice?

    Read the article

  • Package system broken - E: Sub-process /usr/bin/dpkg returned an error code (1)

    - by delha
    After installing some packages and libraries I have an error on Package Manager, I can't run any update because it says: "The package system is broken If you are using third party repositories then disable them, since they are a common source of problems. Now run the following command in a terminal: apt-get install -f " I've tried to do what it says and it returns me: jara@jara-Aspire-5738:~$ sudo apt-get install -f Reading package lists... Done Building dependency tree Reading state information... Done Correcting dependencies... Done The following packages were automatically installed and are no longer required: libcaca-dev libopencv2.3-bin nite-dev python-bluez ps-engine libslang2-dev python-sphinx ros-electric-geometry-tutorials ros-electric-geometry-visualization python-matplotlib libzzip-dev ros-electric-orocos-kinematics-dynamics ros-electric-physics-ode libbluetooth-dev libaudiofile-dev libassimp2 libnetpbm10-dev ros-electric-laser-pipeline python-epydoc ros-electric-geometry-experimental libasound2-dev evtest python-matplotlib-data libyaml-dev ros-electric-bullet ros-electric-executive-smach ros-electric-documentation libgl2ps0 libncurses5-dev ros-electric-robot-model texlive-fonts-recommended python-lxml libwxgtk2.8-dev daemontools libxxf86vm-dev libqhull-dev libavahi-client-dev ros-electric-geometry libgl2ps-dev libcurl4-openssl-dev assimp-dev libusb-1.0-0-dev libopencv2.3 ros-electric-diagnostics-monitors libsdl1.2-dev libjs-underscore libsdl-image1.2 tipa libusb-dev libtinfo-dev python-tz python-sip libfltk1.1 libesd0 libfreeimage-dev ros-electric-visualization x11proto-xf86vidmode-dev python-docutils libvtk5.6 ros-electric-assimp x11proto-scrnsaver-dev libnetcdf-dev libidn11-dev libeigen3-dev joystick libhdf5-serial-1.8.4 ros-electric-joystick-drivers texlive-fonts-recommended-doc esound-common libesd0-dev tcl8.5-dev ros-electric-multimaster-experimental ros-electric-rx libaudio-dev ros-electric-ros-tutorials libwxbase2.8-dev ros-electric-visualization-common python-sip-dev ros-electric-visualization-tutorials libfltk1.1-dev libpulse-dev libnetpbm10 python-markupsafe openni-dev tk8.5-dev wx2.8-headers freeglut3-dev libavahi-common-dev python-roman python-jinja2 ros-electric-robot-model-visualization libxss-dev libqhull5 libaa1-dev ros-electric-eigen freeglut3 ros-electric-executive-smach-visualization ros-electric-common-tutorials ros-electric-robot-model-tutorials libnetcdf6 libjs-sphinxdoc python-pyparsing libaudiofile0 Use 'apt-get autoremove' to remove them. The following extra packages will be installed: libcv-dev The following NEW packages will be installed libcv-dev 0 upgraded, 1 newly installed, 0 to remove and 4 not upgraded. 2 not fully installed or removed. Need to get 0 B/3,114 kB of archives. After this operation, 11.1 MB of additional disk space will be used. Do you want to continue [Y/n]? y (Reading database ... 261801 files and directories currently installed.) Unpacking libcv-dev (from .../libcv-dev_2.1.0-7build1_amd64.deb) ... dpkg: error processing /var/cache/apt/archives/libcv-dev_2.1.0-7build1_amd64.deb (-- unpack): trying to overwrite '/usr/bin/opencv_haartraining', which is also in package libopencv2.3-bin 2.3.1+svn6514+branch23-12~oneiric dpkg-deb: error: subprocess paste was killed by signal (Broken pipe) Errors were encountered while processing: /var/cache/apt/archives/libcv-dev_2.1.0-7build1_amd64.deb E: Sub-process /usr/bin/dpkg returned an error code (1) I've tried everything people recommend on internet like: sudo apt-get clean sudo apt-get autoremove sudo apt-get update sudo apt-get upgrade sudo apt-get -f install Also I've tried to install the synaptic manager but it doesn't let me install anything.. As you can see nothing works so I'm desperate! I'm using ubuntu 11.10, 64 bits Thanks!!

    Read the article

  • How do I process a nested list?

    - by ddbeck
    Suppose I have a bulleted list like this: * list item 1 * list item 2 (a parent) ** list item 3 (a child of list item 2) ** list item 4 (a child of list item 2 as well) *** list item 5 (a child of list item 4 and a grand-child of list item 2) * list item 6 I'd like to parse that into a nested list or some other data structure which makes the parent-child relationship between elements explicit (rather than depending on their contents and relative position). For example, here's a list of tuples containing an item and a list of its children (and so forth): [('list item 1',), ('list item 2', [('list item 3',), [('list item 4', [('list item 5'),]] ('list item 6',)] I've attempted to do this with plain Python and some experimentation with Pyparsing, but I'm not making progress. I'm left with two major questions: What's the strategy I need to employ to make this work? I know recursion is part of the solution, but I'm having a hard time making the connection between this and, say, a Fibonacci sequence. I'm certain I'm not the first person to have done this, but I don't know the terminology of the problem to make fruitful searches for more information on this topic. What problems are related to this so that I can learn more about solving these kinds of problems in general?

    Read the article

1