Search Results

Search found 48264 results on 1931 pages for 'text search'.

Page 381/1931 | < Previous Page | 377 378 379 380 381 382 383 384 385 386 387 388  | Next Page >

  • Searching for duplicate records within a text file where the duplicate is determined by only two fie

    - by plg
    First, Python Newbie; be patient/kind. Next, once a month I receive a large text file (think 7 Million records) to test for duplicate values. This is catalog information. I get 7 fields, but the two I'm interested in are a supplier code and a full orderable part number. To determine if the record is dupliacted, I compress all special characters from the part number (except . and #) and create a compressed part number. The test for duplicates becomes the supplier code and compressed part number combination. This part is fairly straight forward. Currently, I am just copying the original file with 2 new columns (compressed part and duplicate indicator). If the part is a duplicate, I put a "YES" in the last field. Now that this is done, I want to be able to go back (or better yet, at the same time) to get the previous record where there was a supplier code/compressed part number match. So far, my code looks like this: Compress Full Part to a Compressed Part and Check for Duplicates on Supplier Code and Compressed Part combination import sys import re import time ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ start=time.time() try: file1 = open("C:\Accounting\May Accounting\May.txt", "r") except IOError: print sys.stderr, "Cannot Open Read File" sys.exit(1) try: file2 = open(file1.name[0:len(file1.name)-4] + "_" + "COMPRESSPN.txt", "a") except IOError: print sys.stderr, "Cannot Open Write File" sys.exit(1) hdrList="CIGSUPPLIER|FULL_PART|PART_STATUS|ALIAS_FLAG|ACQUISITION_FLAG|COMPRESSED_PART|DUPLICATE_INDICATOR" file2.write(hdrList+chr(10)) lines_seen=set() affirm="YES" records = file1.readlines() for record in records: fields = record.split(chr(124)) if fields[0]=="CIGSupplier": continue #If incoming file has a header line, skip it file2.write(fields[0]+"|"), #Supplier Code file2.write(fields[1]+"|"), #Full_Part file2.write(fields[2]+"|"), #Part Status file2.write(fields[3]+"|"), #Alias Flag file2.write(re.sub("[$\r\n]", "", fields[4])+"|"), #Acquisition Flag file2.write(re.sub("[^0-9a-zA-Z.#]", "", fields[1])+"|"), #Compressed_Part dupechk=fields[0]+"|"+re.sub("[^0-9a-zA-Z.#]", "", fields[1]) if dupechk not in lines_seen: file2.write(chr(10)) lines_seen.add(dupechk) else: file2.write(affirm+chr(10)) print "it took", time.time() - start, "seconds." ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ file2.close() file1.close() It runs in less than 6 minutes, so I am happy with this part, even if it is not elegant. Right now, when I get my results, I import the results into Access and do a self join to locate the duplicates. Loading/querying/exporting results in Access a file this size takes around an hour, so I would like to be able to export the matched duplicates to another text file or an Excel file. Confusing enough? Thanks.

    Read the article

  • PHP text parsing and / or make your own language?

    - by AlexanderJohannesen
    Been Googling around without finding much at all, so does anyone know of a class or library that helps you parse any sort of language, like a Domain Specific Language (I'm creating one, so I'm flexible in what the syntax and format can be) into either PHP code or some helpful struct or a class hiearchy or ... ? Anything goes at this point. :) I want to experiment with parsing text files into tokens, building up a small grammar and syntax library to express things like Business Natural Languages.

    Read the article

  • Detecting if a file is binary or plain text?

    - by dr. evil
    How can I detect if a file is binary or a plain text? Basically my .NET app is processing batch files and extracting data however I don't want to process binary files. As a solution I'm thinking about analysing first X bytes of the file and if there are more unprintable characters than printable characters it should be binary. Is this the right way to do it? Is there nay better implementation for this task?

    Read the article

  • Routing optional parameters with dashes in MVC

    - by Géza
    I've made an routing definition like this: routes.MapRoute("ProductSearch", "Search-{MainGroup}-{SubGroup}-{ItemType}", new { controller = "Product", action = "Search", MainGroup = "", SubGroup = "", ItemWebType = ""}); It is not working if the parameters are empty. Actually it resolves the url, so Url.Action method resolves the path "Search-12--" but the link is not working, so the GET of the page is not working With slashes it is working the Url.Action method makes "Search/12" "Search/{MainGroup}/{SubGroup}/{ItemType}" is it somehow possible to correct it?

    Read the article

< Previous Page | 377 378 379 380 381 382 383 384 385 386 387 388  | Next Page >