Search Results

Search found 70655 results on 2827 pages for 'python time'.

Page 235/2827 | < Previous Page | 231 232 233 234 235 236 237 238 239 240 241 242  | Next Page >

  • Can I upload an object in memory to FTP using Python?

    - by fsckin
    Here's what I'm doing now: mysock = urllib.urlopen('http://localhost/image.jpg') fileToSave = mysock.read() oFile = open(r"C:\image.jpg",'wb') oFile.write(fileToSave) oFile.close f=file('image.jpg','rb') ftp.storbinary('STOR '+os.path.basename('image.jpg'),f) os.remove('image.jpg') Writing files to disk and then imediately deleting them seems like extra work on the system that should be avoided. Can I upload an object in memory to FTP using Python?

    Read the article

  • How should I grab pairs from an array in python?

    - by tomaski
    Say I have an array that looks like this: ['item1', 'item2', 'item3', 'item4', 'item5', 'item6', 'item7', 'item8', 'item9', 'item10'] Using Python, how would I grab pairs from it, where each item is included in a pair with both the item before and after it? ['item1', 'item2'] ['item2', 'item3'] ['item3', 'item4'] ['item4', 'item5'] ['item5', 'item6'] ['item6', 'item7'] ['item7', 'item8'] ['item8', 'item9'] ['item9', 'item10'] Seems like something i could hack together, but I'm wondering if someone has an elegant solution they've used before?

    Read the article

  • Is there a good Python library that can parse C++?

    - by csbrooks
    Google didn't turn up anything that seemed relevant. I have a bunch of existing, working C++ code, and I'd like to use python to crawl through it and figure out relationships between classes, etc. EDIT: Just wanted to point out: I don't think I need or want to parse every bit of C++; I just need something smart enough to pick up on class, function and member variable declarations, and to skip over function definitions.

    Read the article

  • html/javascript automaticly getting link from submit button (maybe automating with python?)

    - by user317955
    I have a website where I have to click a submit button on a form. This gives me a link. I know the link is made up with the paramter that is passed through a hidden value. I was wondering if I could make a python script or something else that would go to the website and click some buttons returning the link that the submit button generates, if so how could I pass the extra parameter that influences the creation of the link? thanks in advance.

    Read the article

  • Is there an easy way to "append()" two dictionaries together in Python?

    - by digitaldreamer
    If I have two dictionaries I'd like to combine in Python, i.e. a = {'1': 1, '2': 2} b = {'3': 3, '4': 4} If I run update on them it reorders the list: a.update(b) {'1': 1, '3': 3, '2': 2, '4': 4} when what I really want is attach "b" to the end of "a": {'1': 1, '2': 2, '3': 3, '4': 4} Is there an easy way to attach "b" to the end of "a" without having to manually combine them like so: for key in b: a[key]=b[key] Something like += or append() would be ideal, but of course neither works on dictionaries.

    Read the article

  • Adding dynamic business logic/business process checks to a system

    - by Jordan Reiter
    I'm wondering if there is a good extant pattern (language here is Python/Django but also interested on the more abstract level) for creating a business logic layer that can be created without coding. For example, suppose that a house rental should only be available during a specific time. A coder might create the following class: from bizlogic import rules, LogicRule from orders.models import Order class BeachHouseAvailable(LogicRule): def check(self, reservation): house = reservation.house_reserved if not (house.earliest_available < reservation.starts < house.latest_available ) raise RuleViolationWhen("Beach house is available only between %s and %s" % (house.earliest_available, house.latest_available)) return True rules.add(Order, BeachHouseAvailable, name="BeachHouse Available") This is fine, but I don't want to have to code something like this each time a new rule is needed. I'd like to create something dynamic, ideally something that can be stored in a database. The thing is, it would have to be flexible enough to encompass a wide variety of rules: avoiding duplicates/overlaps (to continue the example "You already have a reservation for this time/location") logic rules ("You can't rent a house to yourself", "This house is in a different place from your chosen destination") sanity tests ("You've set a rental price that's 10x the normal rate. Are you sure this is the right price?" Things like that. Before I recreate the wheel, I'm wondering if there are already methods out there for doing something like this.

    Read the article

  • Can I save & store a user's submission in a way that proves that the data has not been altered, and that the timestamp is accurate?

    - by jt0dd
    There are many situations where the validity of the timestamp attached to a certain post (submission of information) might be invaluable for the post owner's legal usage. I'm not looking for a service to achieve this, as requested in this great question, but rather a method for the achievement of such a service. For the legal (in most any law system) authentication of text content and its submission time, the owner of the content would need to prove: that the timestamp itself has not been altered and was accurate to begin with. that the text content linked to the timestamp had not been altered I'd like to know how to achieve this via programming (not a language-specific solution, but rather the methodology behind the solution). Can a timestamp be validated to being accurate to the time that the content was really submitted? Can data be stored in a form that it can be read, but not written to, in a proven way? In other words, can I save & store a user's submission in a way that proves that the data has not been altered, and that the timestamp is accurate? I can't think of any programming method that would make this possible, but I am not the most experienced programmer out there. Based on MidnightLightning's answer to the question I cited, this sort of thing is being done. Clarification: I'm looking for a method (hashing, encryption, etc) that would allow an average guy like me to achieve the desired effect through programming. I'm interested in this subject for the purpose of Defensive Publication. I'd like to learn a method that allows an every-day programmer to pick up his computer, write a program, pass information through it, and say: I created this text at this moment in time, and I can prove it. This means the information should be protected from the programmer who writes the code as well. Perhaps a 3rd party API would be required. I'm ok with that.

    Read the article

  • Questioning pythonic type checking

    - by Pace
    I've seen countless times the following approach suggested for "taking in a collection of objects and doing X if X is a Y and ignoring the object otherwise" def quackAllDucks(ducks): for duck in ducks: try: duck.quack("QUACK") except AttributeError: #Not a duck, can't quack, don't worry about it pass The alternative implementation below always gets flak for the performance hit caused by type checking def quackAllDucks(ducks): for duck in ducks: if hasattr(duck,"quack"): duck.quack("QUACK") However, it seems to me that in 99% of scenarios you would want to use the second solution because of the following: If the user gets the parameters wrong then they will not be treated like a duck and there will be no indication. A lot of time will be wasted debugging why there is no quacking going on until the user finally realizes his silly mistake. The second solution would throw a stack trace as soon the user tried to quack. If the user has any bugs in their quack() method which cause an AttributeError then those bugs will be silently swallowed. Once again time will be wasted digging for the bug when the second solution would simply give a stack trace showing the immediate issue. In fact, it seems to me that the only time you would ever want to use the first method is when: The block of code in question is in an extremely performance critical section of your application. Following the principal of "avoid premature optimization" you would only realize this of course, after you had implemented the safer approach and found it to be a bottleneck. There are many types of quacking objects out there and you are only interested in quacking objects that quack with a very specific set of arguments (this seems to be a very rare case to me). Given this, why is it that so many people prefer the first approach over the second approach? What is it that I am missing? Also, I realize there are other solutions (such as using abcs) but these are the two solutions I seem to see most often for the basic case.

    Read the article

  • how to choose a web framework and javascript library?

    - by Trylks
    I've been procrastinating learning some framework for web apps w/ some library for AJAX, something like django with prototype, or turbogears with mootools, or zeta components with dojo, grok, jquery, symfony... The point is to spend some of my spare time, have "fun" and create cool stuff that hopefully is some useful. I think maybe I wouldn't like something like GWT or pyjamas because I wouldn't like to "get married" with some technology, I want to keep my freedom to add another javascript library, and so on. I didn't decide even the language yet, but I think I'd prefer python. PHP could be fine if there is some framework that is nice enough. Besides that, I don't even know where to start. I don't feel like learning a framework to then realize there is something that I cannot comfortably do, switch to another framework then find that a third framework has something really cool, etc. And the same goes for javascript libraries. So, some guidance would be really appreciated. I don't really know why are so many options available and what do they aim for, I guess some of them focus on some aspects and some on others, but I just want to make cool and nice apps that I can easily maintain, without spending too much time on coding or learning and avoiding the "trapped in the framework" feeling, when doing something is awfully complicated (or even impossible) with compared with the rest of things or doing that same thing on a different framework. I guess in the end I'll go for django and jquery since they are the most widely used options, afaik, but if I was going for the most widely used options I guess I should choose Java or PHP (I don't really like Java for my spare time, but php is not so bad), so I preferred to ask first. I think the question has to consider both, framework and library, since sometimes they are coupled. I think this is the place to ask this kind of things, sorry if not, and thank you.

    Read the article

  • Are small amounts of functional programming understandable by non-FP people?

    - by kd35a
    Case: I'm working at a company, writing an application in Python that is handling a lot of data in arrays. I'm the only developer of this program at the moment, but it will probably be used/modified/extended in the future (1-3 years) by some other programmer, at this moment unknown to me. I will probably not be there directly to help then, but maybe give some support via email if I have time for it. So, as a developer who has learned functional programming (Haskell), I tend to solve, for example, filtering like this: filtered = filter(lambda item: included(item.time, dur), measures) The rest of the code is OO, it's just some small cases where I want to solve it like this, because it is much simpler and more beautiful according to me. Question: Is it OK today to write code like this? How does a developer that hasn't written/learned FP react to code like this? Is it readable? Modifiable? Should I write documentation like explaining to a child what the line does? # Filter out the items from measures for which included(item.time, dur) != True I have asked my boss, and he just says "FP is black magic, but if it works and is the most efficient solution, then it's OK to use it." What is your opinion on this? As a non-FP programmer, how do you react to the code? Is the code "googable" so you can understand what it does? I would love feedback on this :) Edit: I marked phant0m's post as answer, because he gives good advice on how to write the code in a more readable way, and still keep the advantages. But I would also like to recommend superM's post because of his viewpoint as a non-FP programmer.

    Read the article

  • can a python script know that another instance of the same script is running... and then talk to it?

    - by Justin Grant
    I'd like to prevent multiple instances of the same long-running python command-line script from running at the same time, and I'd like the new instance to be able to send data to the original insance before the new instance commits suicide. How can I do this in a cross-platform way? Specifically, I'd like to enable the following behavior: "foo.py" is launched from the command line, and it will stay running for a long time-- days or weeks until the machine is rebooted or the parent process kills it. every few minutes the same script is launched again, but with different command-line parameters when launched, the script should see if any other instances are running. if other instances are running, then instance #2 should send its command-line parameters to instance #1, and then instance #2 should exit. instance #1, if it receives command-line parameters from another script, should spin up a new thread and (using the command-line parameters sent in the step above) start performing the work that instance #2 was going to perform. So I'm looking for two things: how can a python program know another instance of itself is running, and then how can one python command-line program communicate with another? Making this more complicated, the same script needs to run on both Windows and Linux, so ideally the solution would use only the Python standard library and not any OS-specific calls. Although if I need to have a Windows codepath and an *nix codepath (and a big if statement in my code to choose one or the other), that's OK if a "same code" solution isn't possible. I realize I could probably work out a file-based approach (e.g. instance #1 watches a directory for changes and each instance drops a file into that directory when it wants to do work) but I'm a little concerned about cleaning up those files after a non-graceful machine shutdown. I'd ideally be able to use an in-memory solution. But again I'm flexible, if a persistent-file-based approach is the only way to do it, I'm open to that option. More details: I'm trying to do this because our servers are using a monitoring tool which supports running python scripts to collect monitoring data (e.g. results of a database query or web service call) which the monitoring tool then indexes for later use. Some of these scripts are very expensive to start up but cheap to run after startup (e.g. making a DB connection vs. running a query). So we've chosen to keep them running in an infinite loop until the parent process kills them. This works great, but on larger servers 100 instances of the same script may be running, even if they're only gathering data every 20 minutes each. This wreaks havoc with RAM, DB connection limits, etc. We want to switch from 100 processes with 1 thread to one process with 100 threads, each executing the work that, previously, one script was doing. But changing how the scripts are invoked by the monitoring tool is not possible. We need to keep invocation the same (launch a process with different command-line parameters) but but change the scripts to recognize that another one is active, and have the "new" script send its work instructions (from the command line params) over to the "old" script.

    Read the article

  • org-sort multi: date/time (?d ?t) | priority (?p) | title (?a)

    - by lawlist
    Is anyone aware of an org-sort function / modification that can refile / organize a group of TODO so that it sorts them by three (3) criteria: first sort by due date, second sort by priority, and third sort by by title of the task? EDIT: I believe that org-sort by deadline (?d) has a bug that cannot properly handle undated tasks. I am working on a workaround (i.e., moving the undated todo to a different heading before the deadline (?d) sort occurs), but perhaps the best thing to do would be to try and fix the original sorting function. Development of the workaround can be found in this thread (i.e., moving the undated tasks to a different heading in one fell swoop): How to automate org-refile for multiple todo EDIT: Apparently, the following code (ancient history) that I found on the internet was eventually modified and included as a part of org-sort-entries. Unfortunately, undated todo are not properly sorted when sorting by deadline -- i.e., they are mixed in with the dated todo. ;; multiple sort (defun org-sort-multi (&rest sort-types) "Multiple sorts on a certain level of an outline tree, or plain list items. SORT-TYPES is a list where each entry is either a character or a cons pair (BOOL . CHAR), where BOOL is whether or not to sort case-sensitively, and CHAR is one of the characters defined in `org-sort-entries-or-items'. Entries are applied in back to front order. Example: To sort first by TODO status, then by priority, then by date, then alphabetically (case-sensitive) use the following call: (org-sort-multi '(?d ?p ?t (t . ?a)))" (interactive) (dolist (x (nreverse sort-types)) (when (char-valid-p x) (setq x (cons nil x))) (condition-case nil (org-sort-entries (car x) (cdr x)) (error nil)))) ;; sort current level (defun lawlist-sort (&rest sort-types) "Sort the current org level. SORT-TYPES is a list where each entry is either a character or a cons pair (BOOL . CHAR), where BOOL is whether or not to sort case-sensitively, and CHAR is one of the characters defined in `org-sort-entries-or-items'. Entries are applied in back to front order. Defaults to \"?o ?p\" which is sorted by TODO status, then by priority" (interactive) (when (equal mode-name "Org") (let ((sort-types (or sort-types (if (or (org-entry-get nil "TODO") (org-entry-get nil "PRIORITY")) '(?d ?t ?p) ;; date, time, priority '((nil . ?a)))))) (save-excursion (outline-up-heading 1) (let ((start (point)) end) (while (and (not (bobp)) (not (eobp)) (<= (point) start)) (condition-case nil (outline-forward-same-level 1) (error (outline-up-heading 1)))) (unless (> (point) start) (goto-char (point-max))) (setq end (point)) (goto-char start) (apply 'org-sort-multi sort-types) (goto-char end) (when (eobp) (forward-line -1)) (when (looking-at "^\\s-*$") ;; (delete-line) ) (goto-char start) ;; (dotimes (x ) (org-cycle)) ))))) EDIT: Here is a more modern version of multi-sort, which is likely based upon further development of the above-code: (defun org-sort-all () (interactive) (save-excursion (goto-char (point-min)) (while (re-search-forward "^\* " nil t) (goto-char (match-beginning 0)) (condition-case err (progn (org-sort-entries t ?a) (org-sort-entries t ?p) (org-sort-entries t ?o) (forward-line)) (error nil))) (goto-char (point-min)) (while (re-search-forward "\* PROJECT " nil t) (goto-char (line-beginning-position)) (ignore-errors (org-sort-entries t ?a) (org-sort-entries t ?p) (org-sort-entries t ?o)) (forward-line)))) EDIT: The best option will be to fix sorting of deadlines (?d) so that undated todo are moved to the bottom of the outline, instead of mixed in with the dated todo. Here is an excerpt from the current org.el included within Emacs Trunk (as of July 1, 2013): (defun org-sort (with-case) "Call `org-sort-entries', `org-table-sort-lines' or `org-sort-list'. Optional argument WITH-CASE means sort case-sensitively." (interactive "P") (cond ((org-at-table-p) (org-call-with-arg 'org-table-sort-lines with-case)) ((org-at-item-p) (org-call-with-arg 'org-sort-list with-case)) (t (org-call-with-arg 'org-sort-entries with-case)))) (defun org-sort-remove-invisible (s) (remove-text-properties 0 (length s) org-rm-props s) (while (string-match org-bracket-link-regexp s) (setq s (replace-match (if (match-end 2) (match-string 3 s) (match-string 1 s)) t t s))) s) (defvar org-priority-regexp) ; defined later in the file (defvar org-after-sorting-entries-or-items-hook nil "Hook that is run after a bunch of entries or items have been sorted. When children are sorted, the cursor is in the parent line when this hook gets called. When a region or a plain list is sorted, the cursor will be in the first entry of the sorted region/list.") (defun org-sort-entries (&optional with-case sorting-type getkey-func compare-func property) "Sort entries on a certain level of an outline tree. If there is an active region, the entries in the region are sorted. Else, if the cursor is before the first entry, sort the top-level items. Else, the children of the entry at point are sorted. Sorting can be alphabetically, numerically, by date/time as given by a time stamp, by a property or by priority. The command prompts for the sorting type unless it has been given to the function through the SORTING-TYPE argument, which needs to be a character, \(?n ?N ?a ?A ?t ?T ?s ?S ?d ?D ?p ?P ?o ?O ?r ?R ?f ?F). Here is the precise meaning of each character: n Numerically, by converting the beginning of the entry/item to a number. a Alphabetically, ignoring the TODO keyword and the priority, if any. o By order of TODO keywords. t By date/time, either the first active time stamp in the entry, or, if none exist, by the first inactive one. s By the scheduled date/time. d By deadline date/time. c By creation time, which is assumed to be the first inactive time stamp at the beginning of a line. p By priority according to the cookie. r By the value of a property. Capital letters will reverse the sort order. If the SORTING-TYPE is ?f or ?F, then GETKEY-FUNC specifies a function to be called with point at the beginning of the record. It must return either a string or a number that should serve as the sorting key for that record. Comparing entries ignores case by default. However, with an optional argument WITH-CASE, the sorting considers case as well." (interactive "P") (let ((case-func (if with-case 'identity 'downcase)) (cmstr ;; The clock marker is lost when using `sort-subr', let's ;; store the clocking string. (when (equal (marker-buffer org-clock-marker) (current-buffer)) (save-excursion (goto-char org-clock-marker) (looking-back "^.*") (match-string-no-properties 0)))) start beg end stars re re2 txt what tmp) ;; Find beginning and end of region to sort (cond ((org-region-active-p) ;; we will sort the region (setq end (region-end) what "region") (goto-char (region-beginning)) (if (not (org-at-heading-p)) (outline-next-heading)) (setq start (point))) ((or (org-at-heading-p) (condition-case nil (progn (org-back-to-heading) t) (error nil))) ;; we will sort the children of the current headline (org-back-to-heading) (setq start (point) end (progn (org-end-of-subtree t t) (or (bolp) (insert "\n")) (org-back-over-empty-lines) (point)) what "children") (goto-char start) (show-subtree) (outline-next-heading)) (t ;; we will sort the top-level entries in this file (goto-char (point-min)) (or (org-at-heading-p) (outline-next-heading)) (setq start (point)) (goto-char (point-max)) (beginning-of-line 1) (when (looking-at ".*?\\S-") ;; File ends in a non-white line (end-of-line 1) (insert "\n")) (setq end (point-max)) (setq what "top-level") (goto-char start) (show-all))) (setq beg (point)) (if (>= beg end) (error "Nothing to sort")) (looking-at "\\(\\*+\\)") (setq stars (match-string 1) re (concat "^" (regexp-quote stars) " +") re2 (concat "^" (regexp-quote (substring stars 0 -1)) "[ \t\n]") txt (buffer-substring beg end)) (if (not (equal (substring txt -1) "\n")) (setq txt (concat txt "\n"))) (if (and (not (equal stars "*")) (string-match re2 txt)) (error "Region to sort contains a level above the first entry")) (unless sorting-type (message "Sort %s: [a]lpha [n]umeric [p]riority p[r]operty todo[o]rder [f]unc [t]ime [s]cheduled [d]eadline [c]reated A/N/P/R/O/F/T/S/D/C means reversed:" what) (setq sorting-type (read-char-exclusive)) (and (= (downcase sorting-type) ?f) (setq getkey-func (org-icompleting-read "Sort using function: " obarray 'fboundp t nil nil)) (setq getkey-func (intern getkey-func))) (and (= (downcase sorting-type) ?r) (setq property (org-icompleting-read "Property: " (mapcar 'list (org-buffer-property-keys t)) nil t)))) (message "Sorting entries...") (save-restriction (narrow-to-region start end) (let ((dcst (downcase sorting-type)) (case-fold-search nil) (now (current-time))) (sort-subr (/= dcst sorting-type) ;; This function moves to the beginning character of the "record" to ;; be sorted. (lambda nil (if (re-search-forward re nil t) (goto-char (match-beginning 0)) (goto-char (point-max)))) ;; This function moves to the last character of the "record" being ;; sorted. (lambda nil (save-match-data (condition-case nil (outline-forward-same-level 1) (error (goto-char (point-max)))))) ;; This function returns the value that gets sorted against. (lambda nil (cond ((= dcst ?n) (if (looking-at org-complex-heading-regexp) (string-to-number (match-string 4)) nil)) ((= dcst ?a) (if (looking-at org-complex-heading-regexp) (funcall case-func (match-string 4)) nil)) ((= dcst ?t) (let ((end (save-excursion (outline-next-heading) (point)))) (if (or (re-search-forward org-ts-regexp end t) (re-search-forward org-ts-regexp-both end t)) (org-time-string-to-seconds (match-string 0)) (org-float-time now)))) ((= dcst ?c) (let ((end (save-excursion (outline-next-heading) (point)))) (if (re-search-forward (concat "^[ \t]*\\[" org-ts-regexp1 "\\]") end t) (org-time-string-to-seconds (match-string 0)) (org-float-time now)))) ((= dcst ?s) (let ((end (save-excursion (outline-next-heading) (point)))) (if (re-search-forward org-scheduled-time-regexp end t) (org-time-string-to-seconds (match-string 1)) (org-float-time now)))) ((= dcst ?d) (let ((end (save-excursion (outline-next-heading) (point)))) (if (re-search-forward org-deadline-time-regexp end t) (org-time-string-to-seconds (match-string 1)) (org-float-time now)))) ((= dcst ?p) (if (re-search-forward org-priority-regexp (point-at-eol) t) (string-to-char (match-string 2)) org-default-priority)) ((= dcst ?r) (or (org-entry-get nil property) "")) ((= dcst ?o) (if (looking-at org-complex-heading-regexp) (- 9999 (length (member (match-string 2) org-todo-keywords-1))))) ((= dcst ?f) (if getkey-func (progn (setq tmp (funcall getkey-func)) (if (stringp tmp) (setq tmp (funcall case-func tmp))) tmp) (error "Invalid key function `%s'" getkey-func))) (t (error "Invalid sorting type `%c'" sorting-type)))) nil (cond ((= dcst ?a) 'string<) ((= dcst ?f) compare-func) ((member dcst '(?p ?t ?s ?d ?c)) '<))))) (run-hooks 'org-after-sorting-entries-or-items-hook) ;; Reset the clock marker if needed (when cmstr (save-excursion (goto-char start) (search-forward cmstr nil t) (move-marker org-clock-marker (point)))) (message "Sorting entries...done"))) (defun org-do-sort (table what &optional with-case sorting-type) "Sort TABLE of WHAT according to SORTING-TYPE. The user will be prompted for the SORTING-TYPE if the call to this function does not specify it. WHAT is only for the prompt, to indicate what is being sorted. The sorting key will be extracted from the car of the elements of the table. If WITH-CASE is non-nil, the sorting will be case-sensitive." (unless sorting-type (message "Sort %s: [a]lphabetic, [n]umeric, [t]ime. A/N/T means reversed:" what) (setq sorting-type (read-char-exclusive))) (let ((dcst (downcase sorting-type)) extractfun comparefun) ;; Define the appropriate functions (cond ((= dcst ?n) (setq extractfun 'string-to-number comparefun (if (= dcst sorting-type) '< '>))) ((= dcst ?a) (setq extractfun (if with-case (lambda(x) (org-sort-remove-invisible x)) (lambda(x) (downcase (org-sort-remove-invisible x)))) comparefun (if (= dcst sorting-type) 'string< (lambda (a b) (and (not (string< a b)) (not (string= a b))))))) ((= dcst ?t) (setq extractfun (lambda (x) (if (or (string-match org-ts-regexp x) (string-match org-ts-regexp-both x)) (org-float-time (org-time-string-to-time (match-string 0 x))) 0)) comparefun (if (= dcst sorting-type) '< '>))) (t (error "Invalid sorting type `%c'" sorting-type))) (sort (mapcar (lambda (x) (cons (funcall extractfun (car x)) (cdr x))) table) (lambda (a b) (funcall comparefun (car a) (car b))))))

    Read the article

  • What is the right way to process inconsistent data files?

    - by Tahabi
    I'm working at a company that uses Excel files to store product data, specifically, test results from products before they are shipped out. There are a few thousand spreadsheets with anywhere from 50-100 relevant data points per file. Over the years, the schema for the spreadsheets has changed significantly, but not unidirectionally - in the sense that, changes often get reverted and then re-added in the space of a few dozen to few hundred files. My project is to convert about 8000 of these spreadsheets into a database that can be queried. I'm using MongoDB to deal with the inconsistency in the data, and Python. My question is, what is the "right" or canonical way to deal with the huge variance in my source files? I've written a data structure which stores the data I want for the latest template, which will be the final template used going forward, but that only helps for a few hundred files historically. Brute-forcing a solution would mean writing similar data structures for each version/template - which means potentially writing hundreds of schemas with dozens of fields each. This seems very inefficient, especially when sometimes a change in the template is as little as moving a single line of data one row down or splitting what used to be one data field into two data fields. A slightly more elegant solution I have in mind would be writing schemas for all the variants I can find for pre-defined groups in the source files, and then writing a function to match a particular series of files with a series of variants that matches that set of files. This is because, more often that not, most of the file will remain consistent over a long period, only marred by one or two errant sections, but inside the period, which section is inconsistent, is inconsistent. For example, say a file has four sections with three data fields, which is represented by four Python dictionaries with three keys each. For files 7000-7250, sections 1-3 will be consistent, but section 4 will be shifted one row down. For files 7251-7500, 1-3 are consistent, section 4 is one row down, but a section five appears. For files 7501-7635, sections 1 and 3 will be consistent, but section 2 will have five data fields instead of three, section five disappears, and section 4 is still shifted down one row. For files 7636-7800, section 1 is consistent, section 4 gets shifted back up, section 2 returns to three cells, but section 3 is removed entirely. Files 7800-8000 have everything in order. The proposed function would take the file number and match it to a dictionary representing the data mappings for different variants of each section. For example, a section_four_variants dictionary might have two members, one for the shifted-down version, and one for the normal version, a section_two_variants might have three and five field members, etc. The script would then read the matchings, load the correct mapping, extract the data, and insert it into the database. Is this an accepted/right way to go about solving this problem? Should I structure things differently? I don't know what to search Google for either to see what other solutions might be, though I believe the problem lies in the domain of ETL processing. I also have no formal CS training aside from what I've taught myself over the years. If this is not the right forum for this question, please tell me where to move it, if at all. Any help is most appreciated. Thank you.

    Read the article

  • Is this over-abstraction? (And is there a name for it?)

    - by mwhite
    I work on a large Django application that uses CouchDB as a database and couchdbkit for mapping CouchDB documents to objects in Python, similar to Django's default ORM. It has dozens of model classes and a hundred or two CouchDB views. The application allows users to register a "domain", which gives them a unique URL containing the domain name that gives them access to a project whose data has no overlap with the data of other domains. Each document that is part of a domain has its domain property set to that domain's name. As far as relationships between the documents go, all domains are effectively mutually exclusive subsets of the data, except for a few edge cases (some users can be members of more than one domain, and there are some administrative reports that include all domains, etc.). The code is full of explicit references to the domain name, and I'm wondering if it would be worth the added complexity to abstract this out. I'd also like to know if there's a name for the sort of bound property approach I'm taking here. Basically, I have something like this in mind: Before in models.py class User(Document): domain = StringProperty() class Group(Document): domain = StringProperty() name = StringProperty() user_ids = StringListProperty() # method that returns related document set def users(self): return [User.get(id) for id in self.user_ids] # method that queries a couch view optimized for a specific lookup @classmethod def by_name(cls, domain, name): # the view method is provided by couchdbkit and handles # wrapping json CouchDB results as Python objects, and # can take various parameters modifying behavior return cls.view('groups/by_name', key=[domain, name]) # method that creates a related document def get_new_user(self): user = User(domain=self.domain) user.save() self.user_ids.append(user._id) return user in views.py: from models import User, Group # there are tons of views like this, (request, domain, ...) def create_new_user_in_group(request, domain, group_name): group = Group.by_name(domain, group_name)[0] user = User(domain=domain) user.save() group.user_ids.append(user._id) group.save() in group/by_name/map.js: function (doc) { if (doc.doc_type == "Group") { emit([doc.domain, doc.name], null); } } After models.py class DomainDocument(Document): domain = StringProperty() @classmethod def domain_view(cls, *args, **kwargs): kwargs['key'] = [cls.domain.default] + kwargs['key'] return super(DomainDocument, cls).view(*args, **kwargs) @classmethod def get(cls, *args, **kwargs, validate_domain=True): ret = super(DomainDocument, cls).get(*args, **kwargs) if validate_domain and ret.domain != cls.domain.default: raise Exception() return ret def models(self): # a mapping of all models in the application. accessing one returns the equivalent of class BoundUser(User): domain = StringProperty(default=self.domain) class User(DomainDocument): pass class Group(DomainDocument): name = StringProperty() user_ids = StringListProperty() def users(self): return [self.models.User.get(id) for id in self.user_ids] @classmethod def by_name(cls, name): return cls.domain_view('groups/by_name', key=[name]) def get_new_user(self): user = self.models.User() user.save() views.py @domain_view # decorator that sets request.models to the same sort of object that is returned by DomainDocument.models and removes the domain argument from the URL router def create_new_user_in_group(request, group_name): group = request.models.Group.by_name(group_name) user = request.models.User() user.save() group.user_ids.append(user._id) group.save() (Might be better to leave the abstraction leaky here in order to avoid having to deal with a couchapp-style //! include of a wrapper for emit that prepends doc.domain to the key or some other similar solution.) function (doc) { if (doc.doc_type == "Group") { emit([doc.name], null); } } Pros and Cons So what are the pros and cons of this? Pros: DRYer prevents you from creating related documents but forgetting to set the domain. prevents you from accidentally writing a django view - couch view execution path that leads to a security breach doesn't prevent you from accessing underlying self.domain and normal Document.view() method potentially gets rid of the need for a lot of sanity checks verifying whether two documents whose domains we expect to be equal are. Cons: adds some complexity hides what's really happening requires no model modules to have classes with the same name, or you would need to add sub-attributes to self.models for modules. However, requiring project-wide unique class names for models should actually be fine because they correspond to the doc_type property couchdbkit uses to decide which class to instantiate them as, which should be unique. removes explicit dependency documentation (from group.models import Group)

    Read the article

  • Reading graph inputs for a programming puzzle and then solving it

    - by Vrashabh
    I just took a programming competition question and I absolutely bombed it. I had trouble right at the beginning itself from reading the input set. The question was basically a variant of this puzzle http://codercharts.com/puzzle/evacuation-plan but also had an hour component in the first line(say 3 hours after start of evacuation). It reads like this This puzzle is a tribute to all the people who suffered from the earthquake in Japan. The goal of this puzzle is, given a network of road and locations, to determine the maximum number of people that can be evacuated. The people must be evacuated from evacuation points to rescue points. The list of road and the number of people they can carry per hour is provided. Input Specifications Your program must accept one and only one command line argument: the input file. The input file is formatted as follows: the first line contains 4 integers n r s t n is the number of locations (each location is given by a number from 0 to n-1) r is the number of roads s is the number of locations to be evacuated from (evacuation points) t is the number of locations where people must be evacuated to (rescue points) the second line contains s integers giving the locations of the evacuation points the third line contains t integers giving the locations of the rescue points the r following lines contain to the road definitions. Each road is defined by 3 integers l1 l2 width where l1 and l2 are the locations connected by the road (roads are one-way) and width is the number of people per hour that can fit on the road Now look at the sample input set 5 5 1 2 3 0 3 4 0 1 10 0 2 5 1 2 4 1 3 5 2 4 10 The 3 in the first line is the additional component and is defined as the number of hours since the resuce has started which is 3 in this case. Now my solution was to use Dijisktras algorithm to find the shortest path between each of the rescue and evac nodes. Now my problem started with how to read the input set. I read the first line in python and stored the values in variables. But then I did not know how to store the values of the distance between the nodes and what DS to use and how to input it to say a standard implementation of dijikstras algorithm. So my question is two fold 1.) How do I take the input of such problems? - I have faced this problem in quite a few competitions recently and I hope I can get a simple code snippet or an explanation in java or python to read the data input set in such a way that I can input it as a graph to graph algorithms like dijikstra and floyd/warshall. Also a solution to the above problem would also help. 2.) How to solve this puzzle? My algorithm was: Find shortest path between evac points (in the above example it is 14 from 0 to 3) Multiply it by number of hours to get maximal number of saves Also the answer given for the variant for the input set was 24 which I dont understand. Can someone explain that also. UPDATE: I get how the answer is 14 in the given problem link - it seems to be just the shortest path between node 0 and 3. But with the 3 hour component how is the answer 24 UPDATE I get how it is 24 - its a complete graph traversal at every hour and this is how I solve it Hour 1 Node 0 to Node 1 - 10 people Node 0 to Node 2- 5 people TotalRescueCount=0 Node 1=10 Node 2= 5 Hour 2 Node 1 to Node 3 = 5(Rescued) Node 2 to Node 4 = 5(Rescued) Node 0 to Node 1 = 10 Node 0 to Node 2 = 5 Node 1 to Node 2 = 4 TotalRescueCount = 10 Node 1 = 10 Node 2= 5+4 = 9 Hour 3 Node 1 to Node 3 = 5(Rescued) Node 2 to Node 4 = 5+4 = 9(Rescued) TotalRescueCount = 9+5+10 = 24 It hard enough for this case , for multiple evac and rescue points how in the world would I write a pgm for this ?

    Read the article

  • Algorithm design, "randomising" timetable schedule in Python although open to other languages.

    - by S1syphus
    Before I start I should add I am a musician and not a native programmer, this was undertook to make my life easier. Here is the situation, at work I'm given a new csv file each which contains a list of sound files, their length, and the minimum total amount of time they must be played. I create a playlist of exactly 60 minutes, from this excel file. Each sample played the by the minimum number of instances, but spread out from each other; so there will never be a period where for where one sound is played twice in a row or in close proximity to itself. Secondly, if the minimum instances of each song has been used, and there is still time with in the 60 min, it needs to fill the remaining time using sounds till 60 minutes is reached, while adhering to above. The smallest duration possible is 15 seconds, and then multiples of 15 seconds. Here is what I came up with in python and the problems I'm having with it, and as one user said its buggy due to the random library used in it. So I'm guessing a total rethink is on the table, here is where I need your help. Whats is the best way to solve the issue, I have had a brief look at things like knapsack and bin packing algorithms, while both are relevant neither are appropriate and maybe a bit beyond me.

    Read the article

< Previous Page | 231 232 233 234 235 236 237 238 239 240 241 242  | Next Page >