genetic algorithms - Page 18

When calculating how many days between 2 dates, should you include both dates in the count, or neither, or 1?

- by Andy

I hope this question is alright to ask here. I am trying to make an algorithm that counts how many days between 2 dates. For example, 3/1/2012 and 3/2/2012. Whats the correct answer, or the most popular choice, and should be the one I use? So in this case, if I don't include both dates I am comparing, its 0. If I include one of them (both the start date), its 1. Lastly, if I include both, its 2. Thanks.

Read the article

In Search Data Structure And Algorithm Project Title Based on Topic

- by Salehin Suhaimi

As the title says, my lecturer gave me a project that i needed to finish in 3 weeks before final semester exams. So i thought i will start now. The requirement is to "build a simple program that has GUI based on all the chapter that we've learned." But i got stuck on WHAT program should i build. Any idea a program that is related to this chapter i've learned? Any input will help. list, array list, linked list, vectors, stacks, Queues, ADT, Hashing, Binary Search Tree, AVL Tree, That's about all i can remember. Any idea where can i start looking?

Read the article

Algorithm for optimal combination of two variables

- by AlanChavez

I'm looking for an algorithm that would be able to determine the optimal combination of two variables, but I'm not sure where to start looking. For example, if I have 10,000 rows in a database and each row contains price, and square feet is there any algorithm out there that will be able to determine what combination of price and sq ft is optimal. I know this is vague, but I assume is along the lines of Fuzzy logic and fuzzy sets, but I'm not sure and I'd like to start digging in the right field to see if I can come up with something that solves my problem.

Read the article

Approach to Authenticate Clients to TCP Server

- by dab

I'm writing a Server/Client application where clients will connect to the server. What I want to do, is make sure that the client connecting to the server is actually using my protocol and I can "trust" the data being sent from the client to the server. What I thought about doing is creating a sort of hash on the client's machine that follows a particular algorithm. What I did in a previous version was took their IP address, the client version, and a few other attributes of the client and sent it as a calculated hash to the server, who then took their IP, and the version of the protocol the client claimed to be using, and calculated that number to see if they matched. This works ok until you get clients that connect from within a router environment where their internal IP is different from their external IP. My fix for this was to pass the client's internal IP used to calculate this hash with the authentication protocol. My fear is this approach is not secure enough. Since I'm passing the data used to create the "auth hash". Here's an example of what I'm talking about: Client IP: 192.168.1.10, Version: 2.4.5.2 hash = 2*4*5*1 * (1+9+2) * (1+6+8) * (1) * (1+0) Client Connects to Server client sends: auth hash ip version Server calculates that info, and accepts or denies the hash. Before I go and come up with another algorithm to prove a client can provide data a server (or use this existing algorithm), I was wondering if there are any existing, proven, and secure systems out there for generating a hash that both sides can generate with general knowledge. The server won't know about the client until the very first connection is established. The protocol's intent is to manage a network of clients who will be contributing data to the server periodically. New clients will be added simply by connecting the client to the server and "registering" with the server. So a client connects to the server for the first time, and registers their info (mac address or some other kind of unique computer identifier), then when they connect again, the server will recognize that client as a previous person and associate them with their data in the database.

Read the article

Given two sets of DNA, what does it take to computationally "grow" that person from a fertilised egg and see what they become? [closed]

- by Nicholas Hill

My question is essentially entirely in the title, but let me add some points to prevent some "why on earth would you want to do that" sort of answers: This is more of a mind experiment than an attempt to implement real software. For fun. Don't worry about computational speed or the number of available memory bytes. Computers get faster and better all of the time. Imagine we have two data files: Mother.dna and Father.dna. What else would be required? (Bonus point for someone who tells me approx how many GB each file will be, and if the size of the files are exactly the same number of bytes for everyone alive on Earth!) There would ideally need to be a way to see what the egg becomes as it becomes a human adult. If you fancy, feel free to outline the design. I am initially thinking that there'd need to be some sort of volumetric voxel-based 3D environment for simulation purposes.

Read the article

Possible applications of algorithm devised for differentiating between structured vs random text

- by rooznom

I have written a program that can rapidly (within 5 sec on a 2GB RAM desktop, 2.33 Ghz CPU) differentiate between structured text (e.g english text) and random alphanumeric strings. It can also provide a probability score for the prediction. Are there any practical applications/uses of such a program. Note that the program is based on entropy models and does not have any dictionary comparisons in its workflow. Thanks in advance for your responses

Read the article

How to identify a PDF classification problem?

- by burtonic

We are crawling and downloading lots of companies' PDFs and trying to pick out the ones that are Annual Reports. Such reports can be downloaded from most companies' investor-relations pages. The PDFs are scanned and the database is populated with, among other things, the: Title Contents (full text) Page count Word count Orientation First line Using this data we are checking for the obvious phrases such as: Annual report Financial statement Quarterly report Interim report Then recording the frequency of these phrases and others. So far we have around 350,000 PDFs to scan and a training set of 4,000 documents that have been manually classified as either a report or not. We are experimenting with a number of different approaches including Bayesian classifiers and weighting the different factors available. We are building the classifier in Ruby. My question is: if you were thinking about this problem, where would you start?

Read the article

Calculate pi to an accuracy of 5 decimal places?

- by pgras

In this message at point 18 I saw following programming question: Given that Pi can be estimated using the function 4 * (1 – 1/3 + 1/5 – 1/7 + …) with more terms giving greater accuracy, write a function that calculates Pi to an accuracy of 5 decimal places. So I know how to implement the given function and how to choose how "far" I should calculate, but how can I tell when I've reached the "accuracy of 5 decimal places" ?

Read the article

solve TOR edge node problem by using .onion proxy?

- by rd.

I would like to improve the TOR network, where the exit nodes are a vulnerability to concealing traffic. From my understanding, traffic to .onion sites are not decrypted by exit nodes, so therefore - in theory - a .onion site web proxy could be used to further anonymize traffic. Yes/no? perhaps you have insight into the coding and routing behind these concepts to elaborate on why this is a good/not good idea.

Read the article

Labeling algorithm for points

- by Qwertie

I need an algorithm to place horizontal text labels for multiple series of points on the screen (basically I need to show timestamps and other information for a history of moving objects on a map; in general there are multiple data points per object). The text labels should appear close to their points--above, below, or on the right side--but should not overlap other points or text labels. Does anyone know an algorithm/heuristic for this?

Read the article

Arranging the colors on the board in the most pleasing form

- by Shashwat

Given a rectangular board of height H and width W. N colors are given. ith color occupy Pi percentage of area on the board. Sum of Pis is 1. What can be algorithm to layout the colors on the board in the form of rectangles in the most pleasing form. By pleasing mean the aspect ratios (Width/Height) of rectangle of each color should be as close to 1 as possible. In an ideal case the board would be filled only with squares.

Read the article

What are the common techniques to handle user-generated HTML modified differently by different browsers?

- by Jakie

I am developing a website updater. The front end uses HTML, CSS and JavaScript, and the backend uses Python. The way it works is that <p/>, <b/> and some other HTML elements can be updated by the user. To enable this, I load the webpage and, with JQuery, convert all those elements to <textarea/> elements. Once they the content of the text area is changed, I apply the change to the original elements and send it to a Python script to store the new content. The problem is that I'm finding that different browsers change the original HTML. How do you get around this issue? What Python libraries do you use? What techniques or application designs do you use to avoid or overcome this issue? The problems I found are: IE removes the quotes around class and id attributes. For example, <img class='abc'/> becomes <img class=abc/>. Firefox removes the backslash from the line breaks: <br \> becomes <br>. Some websites have very specific display technicalities, so an insertion of a simple "\n"(which IE does) can affect the display of a website. Example: changing <img class='headingpic' /><div id="maincontent"> to <img class='headingpic'/>\n <div id="maincontent"> inserts a vertical gap in IE. The things I have unsuccessfully tried to overcome these issues: Using either JQuery or Python to remove all >\n< occurences, <br> etc. But this fails because I get different patterns in IE, sometimes a ·\n, sometimes a \n···. In a Python, parse the new HTML, extract the new text/content, insert it into the old HTML so the elements and format never change, just the content. This is very difficult and seems to be overkill.

Read the article

Is using build-in sorting considered cheating in practice tests?

- by user10326

I am using one of the practice online judges where a practice problem is asked and one submits the answer and gets back if it is accepted or not based on test inputs. My question is the following: In one of the practice tests, I needed to sort an array as part of the solution algorithm. If it matters the problem was: find 2 numbers in an array that add up to a specific target. As part of my algorithm I sorted the array, but to do that I used Java's quicksort and not implement sorting as part of the same method. To do that I had to do: java.util.Arrays.sort(array); Since I had to use the fully qualified name I am wondering if this is a kind of "cheating". (I mean perhaps an online judge does not expect this) Is it? In a formal interview (since these tests are practice for interview as I understand) would this be acceptable?

Read the article

Algorithm to generate N random numbers between A and B which sum up to X

- by Shaamaan

This problem seemed like something which should be solvable with but a few lines of code. Unfortunately, once I actually started to write the thing, I've realized it's not as simple as it sounds. What I need is a set of X random numbers, each of which is between A and B and they all add up to X. The exact variables for the problem I'm facing seem to be even simpler: I need 5 numbers, between -1 and 1 (note: these are decimal numbers), which add up to 1. My initial "few lines of code, should be easy" approach was to randomize 4 numbers between -1 and 1 (which is simple enough), and then make the last one 1-(sum of previous numbers). This quickly proved wrong, as the last number could just as well be larger than 1 or smaller than -1. What would be the best way to approach this problem? PS. Just for reference: I'm using C#, but I don't think it matters. I'm actually having trouble creating a good enough solution for the problem in my head.

Read the article

How to quickly search through a very large list of strings / records on a database

- by Giorgio

I have the following problem: I have a database containing more than 2 million records. Each record has a string field X and I want to display a list of records for which field X contains a certain string. Each record is about 500 bytes in size. To make it more concrete: in the GUI of my application I have a text field where I can enter a string. Above the text field I have a table displaying the (first N, e.g. 100) records that match the string in the text field. When I type or delete one character in the text field, the table content must be updated on the fly. I wonder if there is an efficient way of doing this using appropriate index structures and / or caching. As explained above, I only want to display the first N items that match the query. Therefore, for N small enough, it should not be a big issue loading the matching items from the database. Besides, caching items in main memory can make retrieval faster. I think the main problem is how to find the matching items quickly, given the pattern string. Can I rely on some DBMS facilities, or do I have to build some in-memory index myself? Any ideas? EDIT I have run a first experiment. I have split the records into different text files (at most 200 records per file) and put the files in different directories (I used the content of one data field to determine the directory tree). I end up with about 50000 files in about 40000 directories. I have then run Lucene to index the files. Searching for a string with the Lucene demo program is pretty fast. Splitting and indexing took a few minutes: this is totally acceptable for me because it is a static data set that I want to query. The next step is to integrate Lucene in the main program and use the hits returned by Lucene to load the relevant records into main memory.

Read the article

Tiling Problem Solutions for Various Size "Dominoes"

- by user67081

I've got an interesting tiling problem, I have a large square image (size 128k so 131072 squares) with dimensons 256x512... I want to fill this image with certain grain types (a 1x1 tile, a 1x2 strip, a 2x1 strip, and 2x2 square) and have no overlap, no holes, and no extension past the image boundary. Given some probability for each of these grain types, a list of the number required to be placed is generated for each. Obviously an iterative/brute force method doesn't work well here if we just randomly place the pieces, instead a certain algorithm is required. 1) all 2x2 square grains are randomly placed until exhaustion. 2) 1x2 and 2x1 grains are randomly placed alternatively until exhaustion 3) the remaining 1x1 tiles are placed to fill in all holes. It turns out this algorithm works pretty well for some cases and has no problem filling the entire image, however as you might guess, increasing the probability (and thus number) of 1x2 and 2x1 grains eventually causes the placement to stall (since there are too many holes created by the strips and not all them can be placed). My approach to this solution has been as follows: 1) Create a mini-image of size 8x8 or 16x16. 2) Fill this image randomly and following the algorithm specified above so that the desired probability of the entire image is realized in the mini-image. 3) Create N of these mini-images and then randomly successively place them in the large image. Unfortunately there are some downfalls to this simplification. 1) given the small size of the mini-images, nailing an exact probability for the entire image is not possible. Example if I want p(2x1)=P(1x2)=0.4, the mini image may only give 0.41 as the closes probability. 2) The mini-images create a pseudo boundary where no overlaps occur which isn't really descriptive of the model this is being used for. 3) There is only a fixed number of mini-images so i'm not sure how random this really is. I'm really just looking to brainstorm about possible solutions to this. My main concern is really to nail down closer probabilities, now one might suggest I just increase the mini-image size. Well I have, and it turns out that in certain cases(p(1x2)=p(2x1)=0.5) the mini-image 16x16 isn't even iteratively solvable.. So it's pretty obvious how difficult it is to randomly solve this for anything greater than 8x8 sizes.. So I'd love to hear some ideas. Thanks

Read the article

Where would you start if you were trying to solve this PDF classification problem?

- by burtonic

We are crawling and downloading lots of companies' PDFs and trying to pick out the ones that are Annual Reports. Such reports can be downloaded from most companies' investor-relations pages. The PDFs are scanned and the database is populated with, among other things, the: Title Contents (full text) Page count Word count Orientation First line Using this data we are checking for the obvious phrases such as: Annual report Financial statement Quarterly report Interim report Then recording the frequency of these phrases and others. So far we have around 350,000 PDFs to scan and a training set of 4,000 documents that have been manually classified as either a report or not. We are experimenting with a number of different approaches including Bayesian classifiers and weighting the different factors available. We are building the classifier in Ruby. My question is: if you were thinking about this problem, where would you start?

Read the article

Slides and Code from “Using C#’s Async Effectively”

- by Reed

The slides and code from my talk on the new async language features in C# and VB.Net are now available on https://github.com/ReedCopsey/Effective-Async This includes the complete slide deck, and all 4 projects, including: FakeService: Simple WCF service to run locally and simulate network service calls. AsyncService: Simple WCF service which wraps FakeService to demonstrate converting sync to async SimpleWPFExample: Simplest example of converting a method call to async from a synchronous version AsyncExamples: Windows Store application demonstrating main concepts, pitfalls, tips, and tricks from the slide deck

Read the article

Improving python code

- by cobie

I just answered the question on project euler about finding circular primes below 1 million using python. My solution is below. I was able to reduce the running time of the solution from 9 seconds to about 3 seconds. I would like to see what else can be done to the code to reduce its running time further. This is strictly for educational purposes and for fun. import math import time def getPrimes(n): """returns set of all primes below n""" non_primes = [j for j in range(4, n, 2)] # 2 covers all even numbers for i in range(3, n, 2): non_primes.extend([j for j in range(i*2, n, i)]) return set([i for i in range(2, n)]) - set(non_primes) def getCircularPrimes(n): primes = getPrimes(n) is_circ = [] for prime in primes: prime_str = str(prime) iter_count = len(prime_str) - 1 rotated_num = [] while iter_count > 0: prime_str = prime_str[1:] + prime_str[:1] rotated_num.append(int(prime_str)) iter_count -= 1 if primes >= set(rotated_num): is_circ.append(prime) return len(is_circ)

Read the article

Efficient Bus Loading

- by System Down

This is something I did for a bus travel company a long time ago, and I was never happy with the results. I was thinking about that old project recently and thought I'd revisit that problem. Problem: Bus travel company has several buses with different passenger capacities (e.g. 15 50-passenger buses, 25 30-passenger buses ... etc). They specialized in offering transportation to very large groups (300+ passengers per group). Since each group needs to travel together they needed to manage their fleet efficiently to reduce waste. For instance, 88 passengers are better served by three 30-passenger buses (2 empty seats) than by two 50-passenger buses (12 empty seats). Another example, 75 passengers would be better served by one 50-passenger bus and one 30-passenger bus, a mix of types. What's a good algorithm to do this?

Read the article

Conventions for search result scoring

- by DeaconDesperado

I assume this type of question is more on-topic here than on regular SO. I have been working on a search feature for my team's web application and have had a lot of success building a multithreaded, "divide and conquer" processing system to work through a large amount of fulltext. Our problem domain is pretty specific. Users of the app generate posts, and as a general rule, posts that are more recent are considered to be of greater relevance. Some of the data we are trying to extract from search is very specific (user's feelings about specific items or things) and we are using python nltk to do named-entity extraction to find interesting likely query terms. Essentially we look for descriptive adjective-noun pairs and generate a general picture of a user's expressed sentiment as a list of tokens. This search is intended as an internal tool for our team to draw out a local picture of sentiments like "soggy pizza." There's some machine learning in there too to do entity resolution on terms like "soggy" to all manner of adjectives expressing nastiness. My problem is I am at a loss for how to go about scoring these results. The text being searched is split up into tokens in a list, so my initial approach would be to normalize a float score between 0.0-1.0 generated off of how far into the list the terms appear and how often they are repeated (a later mention of the term being worth less, earlier more, greater frequency-greater score, etc.) A certain amount of weight could be given to the timestamp as well, though I am not certain how to calculate this. I am curious if anyone has had to solve a similar problem in a search relevance grading between appreciable metrics (frequency, term location/colocation, recency) and if there are and guidelines for how to weight each. I should mention as well that the final fallback procedure in the search is to pipe the query to Sphinx, which has its own scoring practices. Sphinx operates as the last resort in case our application specific processing can't find any eligible candidates.

Read the article

Is the Leptonica implementation of 'Modified Median Cut' not using the median at all?

- by TheCodeJunkie

I'm playing around a bit with image processing and decided to read up on how color quantization worked and after a bit of reading I found the Modified Median Cut Quantization algorithm. I've been reading the code of the C implementation in Leptonica library and came across something I thought was a bit odd. Now I want to stress that I am far from an expert in this area, not am I a math-head, so I am predicting that this all comes down to me not understanding all of it and not that the implementation of the algorithm is wrong at all. The algorithm states that the vbox should be split along the lagest axis and that it should be split using the following logic The largest axis is divided by locating the bin with the median pixel (by population), selecting the longer side, and dividing in the center of that side. We could have simply put the bin with the median pixel in the shorter side, but in the early stages of subdivision, this tends to put low density clusters (that are not considered in the subdivision) in the same vbox as part of a high density cluster that will outvote it in median vbox color, even with future median-based subdivisions. The algorithm used here is particularly important in early subdivisions, and 3is useful for giving visible but low population color clusters their own vbox. This has little effect on the subdivision of high density clusters, which ultimately will have roughly equal population in their vboxes. For the sake of the argument, let's assume that we have a vbox that we are in the process of splitting and that the red axis is the largest. In the Leptonica algorithm, on line 01297, the code appears to do the following Iterate over all the possible green and blue variations of the red color For each iteration it adds to the total number of pixels (population) it's found along the red axis For each red color it sum up the population of the current red and the previous ones, thus storing an accumulated value, for each red note: when I say 'red' I mean each point along the axis that is covered by the iteration, the actual color may not be red but contains a certain amount of red So for the sake of illustration, assume we have 9 "bins" along the red axis and that they have the following populations 4 8 20 16 1 9 12 8 8 After the iteration of all red bins, the partialsum array will contain the following count for the bins mentioned above 4 12 32 48 49 58 70 78 86 And total would have a value of 86 Once that's done it's time to perform the actual median cut and for the red axis this is performed on line 01346 It iterates over bins and check they accumulated sum. And here's the part that throws me of from the description of the algorithm. It looks for the first bin that has a value that is greater than total/2 Wouldn't total/2 mean that it is looking for a bin that has a value that is greater than the average value and not the median ? The median for the above bins would be 49 The use of 43 or 49 could potentially have a huge impact on how the boxes are split, even though the algorithm then proceeds by moving to the center of the larger side of where the matched value was.. Another thing that puzzles me a bit is that the paper specified that the bin with the median value should be located, but does not mention how to proceed if there are an even number of bins.. the median would be the result of (a+b)/2 and it's not guaranteed that any of the bins contains that population count. So this is what makes me thing that there are some approximations going on that are negligible because of how the split actually takes part at the center of the larger side of the selected bin. Sorry if it got a bit long winded, but I wanted to be as thoroughas I could because it's been driving me nuts for a couple of days now ;)

Read the article

Matching users based on a series of questions

- by SeanWM

I'm trying to figure out a way to match users based on specific personality traits. Each trait will have its own category. I figure in my user table I'll add a column for each category: id name cat1 cat2 cat3 1 Sean ? ? ? 2 Other ? ? ? Let's say I ask each user 3 questions in each category. For each question, you can answer one of the following: No, Maybe, Yes How would I calculate one number based off the answers in those 3 questions that would hold a value I can compare other users to? I was thinking having some sort of weight. Like: No -> 0 Maybe -> 1 Yes -> 2 Then doing some sort of meaningful calculation. I want to end up with something like this so I can query the users and find who matches close: id name cat1 cat2 cat3 1 Sean 4 5 1 2 Other 1 2 5 In the situation above, the users don't really match. I'd want to match with someone with a +1 or -1 of my score in each category. I'm not a math guy so I'm just looking for some ideas to get me started.

Read the article

Help with a formula for Google Adwords [closed]

- by XaviEsteve

Hi guys, This question is more about maths and algorythms in Google Adwords but guess it's the most appropriate community in SE to ask this. I am creating a spreadsheet to calculate Adword formulas and I am stuck in how to calculate the Monthly Net Income for each keyword. I have created a formula which calculates it but can't figure out how to limit the Monthly Budget. The formula I've created is this one: Monthly Net Income = ( DailyClicks x ConversionRate x SaleProfit) - ( CPC x DailyClicks ) There is an example of the formula in the file which is a Google Spreadsheet publicly available here: https://spreadsheets.google.com/ccc?key=0AnQMyM9XJ0EidDB6TUF0OTdaZ2dFb2ZGNmhQeE5lb0E&hl=en_GB#gid=2 (you can create your own copy going to File Make a copy...) I am releasing this set of tools as Public Domain so feel free to use it :) Any help is much appreciated!

Read the article

High-level strategy for distinguishing a regular string from invalid JSON (ie. JSON-like string detection)

- by Jonline

Disclaimer On Absence of Code: I have no code to post because I haven't started writing; was looking for more theoretical guidance as I doubt I'll have trouble coding it but am pretty befuddled on what approach(es) would yield best results. I'm not seeking any code, either, though; just direction. Dilemma I'm toying with adding a "magic method"-style feature to a UI I'm building for a client, and it would require intelligently detecting whether or not a string was meant to be JSON as against a simple string. I had considered these general ideas: Look for a sort of arbitrarily-determined acceptable ratio of the frequency of JSON-like syntax (ie. regex to find strings separated by colons; look for colons between curly-braces, etc.) to the number of quote-encapsulated strings + nulls, bools and ints/floats. But the smaller the data set, the more fickle this would get look for key identifiers like opening and closing curly braces... not sure if there even are more easy identifiers, and this doesn't appeal anyway because it's so prescriptive about the kinds of mistakes it could find try incrementally parsing chunks, as those between curly braces, and seeing what proportion of these fractional statements turn out to be valid JSON; this seems like it would suffer less than (1) from smaller datasets, but would probably be much more processing-intensive, and very susceptible to a missing or inverted brace Just curious if the computational folks or algorithm pros out there had any approaches in mind that my semantics-oriented brain might have missed. PS: It occurs to me that natural language processing, about which I am totally ignorant, might be a cool approach; but, if NLP is a good strategy here, it sort of doesn't matter because I have zero experience with it and don't have time to learn & then implement/ this feature isn't worth it to the client.

Search Results

Search found 1933 results on 78 pages for 'genetic algorithms'.

Page 18/78 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >

- by Andy

- by Salehin Suhaimi

- by AlanChavez

- by dab

- by Nicholas Hill

- by rooznom

- by burtonic

- by pgras

- by rd.

- by Qwertie

- by Shashwat

- by Jakie

- by user10326

- by Shaamaan

- by Giorgio

- by user67081

- by burtonic

- by Reed

- by cobie

- by System Down

- by DeaconDesperado

- by TheCodeJunkie

- by SeanWM

- by XaviEsteve

- by Jonline

< Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >