Search Results

Search found 15087 results on 604 pages for 'python multithreading'.

Page 216/604 | < Previous Page | 212 213 214 215 216 217 218 219 220 221 222 223  | Next Page >

  • What is the right way to process inconsistent data files?

    - by Tahabi
    I'm working at a company that uses Excel files to store product data, specifically, test results from products before they are shipped out. There are a few thousand spreadsheets with anywhere from 50-100 relevant data points per file. Over the years, the schema for the spreadsheets has changed significantly, but not unidirectionally - in the sense that, changes often get reverted and then re-added in the space of a few dozen to few hundred files. My project is to convert about 8000 of these spreadsheets into a database that can be queried. I'm using MongoDB to deal with the inconsistency in the data, and Python. My question is, what is the "right" or canonical way to deal with the huge variance in my source files? I've written a data structure which stores the data I want for the latest template, which will be the final template used going forward, but that only helps for a few hundred files historically. Brute-forcing a solution would mean writing similar data structures for each version/template - which means potentially writing hundreds of schemas with dozens of fields each. This seems very inefficient, especially when sometimes a change in the template is as little as moving a single line of data one row down or splitting what used to be one data field into two data fields. A slightly more elegant solution I have in mind would be writing schemas for all the variants I can find for pre-defined groups in the source files, and then writing a function to match a particular series of files with a series of variants that matches that set of files. This is because, more often that not, most of the file will remain consistent over a long period, only marred by one or two errant sections, but inside the period, which section is inconsistent, is inconsistent. For example, say a file has four sections with three data fields, which is represented by four Python dictionaries with three keys each. For files 7000-7250, sections 1-3 will be consistent, but section 4 will be shifted one row down. For files 7251-7500, 1-3 are consistent, section 4 is one row down, but a section five appears. For files 7501-7635, sections 1 and 3 will be consistent, but section 2 will have five data fields instead of three, section five disappears, and section 4 is still shifted down one row. For files 7636-7800, section 1 is consistent, section 4 gets shifted back up, section 2 returns to three cells, but section 3 is removed entirely. Files 7800-8000 have everything in order. The proposed function would take the file number and match it to a dictionary representing the data mappings for different variants of each section. For example, a section_four_variants dictionary might have two members, one for the shifted-down version, and one for the normal version, a section_two_variants might have three and five field members, etc. The script would then read the matchings, load the correct mapping, extract the data, and insert it into the database. Is this an accepted/right way to go about solving this problem? Should I structure things differently? I don't know what to search Google for either to see what other solutions might be, though I believe the problem lies in the domain of ETL processing. I also have no formal CS training aside from what I've taught myself over the years. If this is not the right forum for this question, please tell me where to move it, if at all. Any help is most appreciated. Thank you.

    Read the article

  • Is this over-abstraction? (And is there a name for it?)

    - by mwhite
    I work on a large Django application that uses CouchDB as a database and couchdbkit for mapping CouchDB documents to objects in Python, similar to Django's default ORM. It has dozens of model classes and a hundred or two CouchDB views. The application allows users to register a "domain", which gives them a unique URL containing the domain name that gives them access to a project whose data has no overlap with the data of other domains. Each document that is part of a domain has its domain property set to that domain's name. As far as relationships between the documents go, all domains are effectively mutually exclusive subsets of the data, except for a few edge cases (some users can be members of more than one domain, and there are some administrative reports that include all domains, etc.). The code is full of explicit references to the domain name, and I'm wondering if it would be worth the added complexity to abstract this out. I'd also like to know if there's a name for the sort of bound property approach I'm taking here. Basically, I have something like this in mind: Before in models.py class User(Document): domain = StringProperty() class Group(Document): domain = StringProperty() name = StringProperty() user_ids = StringListProperty() # method that returns related document set def users(self): return [User.get(id) for id in self.user_ids] # method that queries a couch view optimized for a specific lookup @classmethod def by_name(cls, domain, name): # the view method is provided by couchdbkit and handles # wrapping json CouchDB results as Python objects, and # can take various parameters modifying behavior return cls.view('groups/by_name', key=[domain, name]) # method that creates a related document def get_new_user(self): user = User(domain=self.domain) user.save() self.user_ids.append(user._id) return user in views.py: from models import User, Group # there are tons of views like this, (request, domain, ...) def create_new_user_in_group(request, domain, group_name): group = Group.by_name(domain, group_name)[0] user = User(domain=domain) user.save() group.user_ids.append(user._id) group.save() in group/by_name/map.js: function (doc) { if (doc.doc_type == "Group") { emit([doc.domain, doc.name], null); } } After models.py class DomainDocument(Document): domain = StringProperty() @classmethod def domain_view(cls, *args, **kwargs): kwargs['key'] = [cls.domain.default] + kwargs['key'] return super(DomainDocument, cls).view(*args, **kwargs) @classmethod def get(cls, *args, **kwargs, validate_domain=True): ret = super(DomainDocument, cls).get(*args, **kwargs) if validate_domain and ret.domain != cls.domain.default: raise Exception() return ret def models(self): # a mapping of all models in the application. accessing one returns the equivalent of class BoundUser(User): domain = StringProperty(default=self.domain) class User(DomainDocument): pass class Group(DomainDocument): name = StringProperty() user_ids = StringListProperty() def users(self): return [self.models.User.get(id) for id in self.user_ids] @classmethod def by_name(cls, name): return cls.domain_view('groups/by_name', key=[name]) def get_new_user(self): user = self.models.User() user.save() views.py @domain_view # decorator that sets request.models to the same sort of object that is returned by DomainDocument.models and removes the domain argument from the URL router def create_new_user_in_group(request, group_name): group = request.models.Group.by_name(group_name) user = request.models.User() user.save() group.user_ids.append(user._id) group.save() (Might be better to leave the abstraction leaky here in order to avoid having to deal with a couchapp-style //! include of a wrapper for emit that prepends doc.domain to the key or some other similar solution.) function (doc) { if (doc.doc_type == "Group") { emit([doc.name], null); } } Pros and Cons So what are the pros and cons of this? Pros: DRYer prevents you from creating related documents but forgetting to set the domain. prevents you from accidentally writing a django view - couch view execution path that leads to a security breach doesn't prevent you from accessing underlying self.domain and normal Document.view() method potentially gets rid of the need for a lot of sanity checks verifying whether two documents whose domains we expect to be equal are. Cons: adds some complexity hides what's really happening requires no model modules to have classes with the same name, or you would need to add sub-attributes to self.models for modules. However, requiring project-wide unique class names for models should actually be fine because they correspond to the doc_type property couchdbkit uses to decide which class to instantiate them as, which should be unique. removes explicit dependency documentation (from group.models import Group)

    Read the article

  • Reading graph inputs for a programming puzzle and then solving it

    - by Vrashabh
    I just took a programming competition question and I absolutely bombed it. I had trouble right at the beginning itself from reading the input set. The question was basically a variant of this puzzle http://codercharts.com/puzzle/evacuation-plan but also had an hour component in the first line(say 3 hours after start of evacuation). It reads like this This puzzle is a tribute to all the people who suffered from the earthquake in Japan. The goal of this puzzle is, given a network of road and locations, to determine the maximum number of people that can be evacuated. The people must be evacuated from evacuation points to rescue points. The list of road and the number of people they can carry per hour is provided. Input Specifications Your program must accept one and only one command line argument: the input file. The input file is formatted as follows: the first line contains 4 integers n r s t n is the number of locations (each location is given by a number from 0 to n-1) r is the number of roads s is the number of locations to be evacuated from (evacuation points) t is the number of locations where people must be evacuated to (rescue points) the second line contains s integers giving the locations of the evacuation points the third line contains t integers giving the locations of the rescue points the r following lines contain to the road definitions. Each road is defined by 3 integers l1 l2 width where l1 and l2 are the locations connected by the road (roads are one-way) and width is the number of people per hour that can fit on the road Now look at the sample input set 5 5 1 2 3 0 3 4 0 1 10 0 2 5 1 2 4 1 3 5 2 4 10 The 3 in the first line is the additional component and is defined as the number of hours since the resuce has started which is 3 in this case. Now my solution was to use Dijisktras algorithm to find the shortest path between each of the rescue and evac nodes. Now my problem started with how to read the input set. I read the first line in python and stored the values in variables. But then I did not know how to store the values of the distance between the nodes and what DS to use and how to input it to say a standard implementation of dijikstras algorithm. So my question is two fold 1.) How do I take the input of such problems? - I have faced this problem in quite a few competitions recently and I hope I can get a simple code snippet or an explanation in java or python to read the data input set in such a way that I can input it as a graph to graph algorithms like dijikstra and floyd/warshall. Also a solution to the above problem would also help. 2.) How to solve this puzzle? My algorithm was: Find shortest path between evac points (in the above example it is 14 from 0 to 3) Multiply it by number of hours to get maximal number of saves Also the answer given for the variant for the input set was 24 which I dont understand. Can someone explain that also. UPDATE: I get how the answer is 14 in the given problem link - it seems to be just the shortest path between node 0 and 3. But with the 3 hour component how is the answer 24 UPDATE I get how it is 24 - its a complete graph traversal at every hour and this is how I solve it Hour 1 Node 0 to Node 1 - 10 people Node 0 to Node 2- 5 people TotalRescueCount=0 Node 1=10 Node 2= 5 Hour 2 Node 1 to Node 3 = 5(Rescued) Node 2 to Node 4 = 5(Rescued) Node 0 to Node 1 = 10 Node 0 to Node 2 = 5 Node 1 to Node 2 = 4 TotalRescueCount = 10 Node 1 = 10 Node 2= 5+4 = 9 Hour 3 Node 1 to Node 3 = 5(Rescued) Node 2 to Node 4 = 5+4 = 9(Rescued) TotalRescueCount = 9+5+10 = 24 It hard enough for this case , for multiple evac and rescue points how in the world would I write a pgm for this ?

    Read the article

  • Is there a way to ‘join’ (block) in POSIX threads, without exiting the joinee?

    - by elliottcable
    I’m buried in multithreading / parallelism documents, trying to figure out how to implement a threading implementation in a programming language I’ve been designing. I’m trying to map a mental model to the pthreads.h library, but I’m having trouble with one thing: I need my interpreter instances to continue to exist after they complete interpretation of a routine (the language’s closure/function data type), because I want to later assign other routines to them for interpretation, thus saving me the thread and interpreter setup/teardown time. This would be fine, except that pthread_join(3) requires that I call pthread_exit(3) to ‘unblock’ the original thread. How can I block the original thread (when it needs the result of executing the routine), and then unblock it when interpretation of the child routine is complete?

    Read the article

  • [newbie]Webservice - Threads - close if no active threads, if any activen, then wait till its compl

    - by Raj
    Hi I have an ASP.Net webservice running. When the system date changes, i want to close all the threads and restart them again, if the thread/s aren't doing any work. If any of the thread/s is in the process of doing some work(active threads), then i need to wait till the current thread/s complete its process and then close. This will prevent from aborting any current process. Any ideas on how i can perform such a task? Thanks PS:I am new to ASP.NET and Multithreading.

    Read the article

  • In a multithreaded app, would a multi-core or multiprocessor arrangement be better?

    - by Michael
    I've read a lot on this topic already both here (e.g., stackoverflow.com/questions/1713554/threads-processes-vs-multithreading-multi-core-multiprocessor-how-they-are or http://stackoverflow.com/questions/680684/multi-cpu-multi-core-and-hyper-thread) and elsewhere (e.g., ixbtlabs.com/articles2/cpu/rmmt-l2-cache.html or software.intel.com/en-us/articles/multi-core-introduction/), but I still am not sure about a couple things that seem very straightforward. So I thought I'd just ask. (1) Is a multi-core processor in which each core has dedicated cache effectively the same as a multiprocessor system (balanced of course for processor speed, cache size, and so on)? (2) Let's say I have some images to analyze (i.e., computer vision), and I have these images loaded into RAM. My app spawns a thread for each image that needs to be analyzed. Will this app on a shared cache multi-core processor run slower than on a dedicated cache multi-core processor, and would the latter run at the same speed as on an equivalent single-core multiprocessor machine? Thank you for the help!

    Read the article

  • Singleton again, but with multi-thread and Objective-C

    - by Tonny Xu
    I know Singleton pattern has been discussed so much. But because I'm not fully understand the memory management mechanism in Objective-C, so when I combined Singleton implementation with multithreading, I got a big error and cost me a whole day to resolve it. I had a singleton object, let's call it ObjectX. I declared an object inside ObjectX that will detach a new thread, let's call that object objectWillSpawnNewThread, when I called [[ObjectX sharedInstance].objectWillSpawnNewThread startNewThread]; The new thread could not be executed correctly, and finally I found out that I should not declare the object objectWillSpawnNewThread inside the singleton class. Here are my questions: How does Objective-C allocate static object in the memory? Where does Objective-C allocate them(Main thread stack or somewhere else)? Why would it be failed if we spawn a new thread inside the singleton object? I had searched the Objective-C language [ObjC.pdf] and Objective-C memory management document, maybe I missed something, but I currently I could not find any helpful information.

    Read the article

  • How to deal with OpenMP thread pool contention

    - by dpe82
    I'm working on an application that uses both coarse and fine grained multi-threading. That is, we manage scheduling of large work units on a pool of threads manually, and then within those work units certain functions utilize OpenMP for finer grain multithreading. We have realized gains by selectively using OpenMP in our costliest loops, but are concerned about creating contention for the OpenMP worker pool as we add OpenMP blocks to cheaper loops. Is there a way to signal to OpenMP that a block of code should use the pool if it is available, and if not it should process the loop serially?

    Read the article

  • WinForms equivalent of performSelectorOnMainThread in Objective-C

    - by jamone
    I haven't done much multithreading before and now find the need to do some background work and keep the UI responsive. I have the following code. data.ImportProgressChanged += new DataAccess.ImportDelegate(data_ImportProgressChanged); Thread importThread = new Thread( new ThreadStart(data.ImportPeopleFromFAD)); importThread.IsBackground = true; importThread.Start(); void data_ImportProgressChanged(int progress) { toolStripProgressBar.Value = progress; } //In my data object I have public void ImportPeopleFromFAD() { ImportProgressChanged(someInt); } But the UI doesn't get updated since the ImportProgressChanged() call is made on the background thread. In objective C I know you can use performSelectorOnMainThread and pass it a method to call using the main thread. What is the equivalent way of calling ImportProgressChanged() from the main thread?

    Read the article

  • Threading in C#

    - by Lalit Dhake
    Hi, I have console application. In that i have some process that fetch the data from database through different layers ( business and Data access). and generate outputs. This process will execute many times say 10 times. Ok? I want to run simultaneously this process. not one will start, it will finish and then second will start. I want after starting 1'st process, just 2'nd , 3rd....10'th must be start. Means it should be multithreading. how can i achieve this ? is that will give me error while connection with data base open and close ?

    Read the article

  • C# equlivent of performSelectorOnMainThread

    - by jamone
    I haven't done much multithreading before and now find the need to do some background work and keep the UI responsive. I have the following code. data.ImportProgressChanged += new DataAccess.ImportDelegate(data_ImportProgressChanged); Thread importThread = new Thread( new ThreadStart(data.ImportPeopleFromFAD)); importThread.IsBackground = true; importThread.Start(); void data_ImportProgressChanged(int progress) { toolStripProgressBar.Value = progress; } //In my data object I have public void ImportPeopleFromFAD() { ImportProgressChanged(someInt); } But the UI doesn't get updated since the ImportProgressChanged() call is made on the background thread. In objective C I know you can use performSelectorOnMainThread and pass it a method to call using the main thread. What is the equivalent way of calling ImportProgressChanged() from the main thread?

    Read the article

  • Could Grand Central Dispatch (`libdispatch`) ever be made available on Windows?

    - by elliottcable
    I’m looking into multithreading, and GCD seems like a much better option than manually writing a solution using pthread.h and pthreads-win32. However, although it looks like libdispatch is either working on, or soon going to be working on, most newer POSIX-compatible systems… I have to ask, what about Windows? What are the chances of libdispatch being ported to Windows? What are the barriers preventing that from happening? If it came down to it, what would I need to do to preform that portage? Edit: Some things I already know, to get the discussion started: We need a blocks-compatible compiler that will compile on Windows, no? Will PLBlocks handle that? Can we use the LLVM blocks runtime? Can’t we replace all the pthread.h dependencies in userspace libdispatch with APR calls, for portability? Or, alternatively, use pthreads-win32 I suppose…

    Read the article

  • Could Grand Central Dispatch (`libdispatch`) ever be available on Windows?

    - by elliottcable
    I’m looking into multithreading, and GCD seems like a much better option than manually writing a solution using pthread.h and pthreads-win32. However, although it looks like libdispatch is either working on, or soon going to be working on, most newer POSIX-compatible systems… I have to ask, what about Windows? What are the chances of libdispatch being ported to Windows? What are the barriers preventing that from happening? If it came down to it, what would I need to do to preform that portage?

    Read the article

  • C# Importing Large Volume of Data from CSV to Database

    - by guazz
    What's the most efficient method to load large volumes of data from CSV (3 million + rows) to a database. The data needs to be formatted(e.g. name column needs to be split into first name and last name, etc.) I need to do this in a efficiently as possible i.e. time constraints I am siding with the option of reading, transforming and loading the data using a C# application row-by-row? Is this ideal, if not, what are my options? Should I use multithreading?

    Read the article

  • Buffering db inserts in multithreaded program

    - by Winter
    I have a system which breaks a large taks into small tasks using about 30 threads as a time. As each individual thread finishes it persists its calculated results to the database. What I want to achieve is to have each thread pass its results to a new persisance class that will perform a type of double buffering and data persistance while running in its own thread. For example, after 100 threads have moved their data to the buffer the persistance class then the persistance class swaps the buffers and persists all 100 entries to the database. This would allow utilization of prepared statements and thus cut way down on the I/O between the program and the database. Is there a pattern or good example of this type of multithreading double buffering?

    Read the article

  • java: communication between threads

    - by TRU7H
    I'm making a little java game in which I would have two threads (well as the FIRST step towards multithreading...), one for the logic and one for the drawing. So my question is: How can I make those two communicating which each other? Requirements: accessing variables and object from a another thread syncing them so they each complete a same number of "loops" in the same time. (the logic calculates and then the another one draws the results and the loop begins again...) So how is this achievable in java? Thanks in advance!

    Read the article

  • c++ simple start a function with its own thread

    - by user1397417
    i had once had a very simple one or two line code that would start a function with its own thread and continue running until application closed, c++ console app. lost the project it was in, and remember it was hard to find. cant find it online now. most example account for complicated multithreading situations. but i just need to open this one function in its own thread. hopefully someone knows what im talking about, or a similar solution. eg. start void abc in its own thread, no parameters

    Read the article

  • iphone dev - loading table content asynchronously

    - by Brian
    My app has a navigation controller which push and pop a series of views. One of the tableViews loads .xml file from URL and it takes 4-5 seconds. If I click the back button on the navigation bar, it will only respond after the content of the table finish loading. Is there an easy way to load the content asynchronously so that the app will still respond to my gesture on the navigation bar? p.s. I search this on the Internet and people are talking about multithreading. I don't know a lot about threads so please be more specific. Thanks in advanced =)

    Read the article

  • Do I need to syncronize thread access to an int

    - by Martin Harris
    I've just written a method that is called by multiple threads simultaneously and I need to keep track of when all the threads have completed, the code uses this pattern: private void RunReport() { _reportsRunning++; try { //code to run the report } finally { _reportsRunning--; } } This is the only place within the code that _reportsRunning's value is changed, and the method takes about a second to run. Occasionally when I have more than six or so threads running reports together the final result for _reportsRunning can get down to -1, if I wrap the calls to _runningReports++ and _runningReports-- in a lock then the behaviour appears to be correct and consistant. So, to the question: When I was learning multithreading in C++ I was taught that you didn't need to synchronize calls to increment and decrement operations because they were always one assembly instruction and therefore it was impossible for the thread to be switched out mid-call. Was I taught correctly, and if so how come that doesn't hold true for C#?

    Read the article

  • C# regularly return values from a different thread

    - by sdds
    Hello, I'm very new to multithreading and lack experience. I need to compute some data in a different thread so the UI doesn't hang up, and then send the data as it is processed to a table on the main form. So, basically, the user can work with the data that is already computed, while other data is still being processed. What is the best way to achieve this? I would also be very grateful for any examples. Thanks in advance.

    Read the article

  • Please recommend me intermediate-to-advanced Python books to buy.

    - by anonnoir
    I'm in the final year, final semester of my law degree, and will be graduating very soon. (April, to be specific.) But before I begin practice, I plan to take 2 two months off, purely for serious programming study. So I'm currently looking for some Python-related books, gauged intermediate to advanced, which are interesting (because of the subject matter itself) and possibly useful to my future line of work. I've identified 2 possible purchases at the moment: Natural Language Processing with Python. The law deals mostly with words, and I've quite a number of ideas as to where I might go with NLP. Data extraction, summaries, client management systems linked with document templates, etc. Programming Collective Intelligence. This book fascinates me, because I've always liked the idea of machine learning (and I'm currently studying it by the side too, for fun). I'd like to build/play around with Web 2.0 applications; and who knows if I can apply some of the things I learn to my legal work. (E.g. Playground experiments to determine how and under what circumstances judges might be biased, by forcing algorithms to pore through judgments and calculate similarities, etc.) Please feel free to criticize my current choices, but do at least offer or recommend other books that I should read in their place. My budget can deal with 4 books, max. These books will be used heavily throughout the 2 months; I will be reading them back to back, absorbing the explanations given, and hacking away at their code. Also, the books themselves should satisfy 2 main criteria: Application. The book must teach how to solve problems. I like reading theory, but I want to build things and solve problems first. Even playful applications are fine, because games and experiments always have real-world applications sooner or later. Readability. I like reading technical books, no matter how difficult they are. I enjoy the effort and the feeling that you're learning something. But the book shouldn't contain code or explanations that are too cryptic or erratic. Even if it's difficult, the book's content should be accessible with focused reading. Note: I realize that I am somewhat of a beginner to the whole programming thing, so please don't put me down. But from experience, I think it's better to aim up and leave my comfort zone when learning new things, rather than to just remain stagnant the way I am. (At least the difficulty gives me focus: i.e. if a programmer can be that good, perhaps if I sustain my own efforts I too can be as good as him someday.) If anything, I'm also a very determined person, so two months of day-to-night intensive programming study with nothing else on my mind should, I think, give me a bit of a fighting chance to push my programming skills to a much higher level.

    Read the article

  • How can I create blog post functionality without Wordpress or Drupal?

    - by Ali
    I'm currently learning Python (as a beginner in programming). I go through each chapter learning basics. I haven't gotten far enough to understand how CMS works. I eventually want a blog that doesn't depend on Wordpress or Drupal. I would like to develop it myself as my skills progress. My immediate curiosity is on blog posts. What is the component called that will allow me to make a daily post on my blog? There must be a technical term for this function. I would like to learn how to make one, but don't even know what to research. Everything I research points me to Wordpress or Drupal. I would like to create my own. Thanks in advance! Ali

    Read the article

< Previous Page | 212 213 214 215 216 217 218 219 220 221 222 223  | Next Page >