Search Results

Search found 114 results on 5 pages for 'median'.

Page 1/5 | 1 2 3 4 5  | Next Page >

  • Confused about definition of a 'median' when constructing a kd-Tree

    - by user352636
    Hi there. Im trying to build a kd-tree for searching through a set of points, but am getting confused about the use of 'median' in the wikipedia article. For ease of use, the wikipedia article states the pseudo-code of kd-tree construction as: function kdtree (list of points pointList, int depth) { if pointList is empty return nil; else { // Select axis based on depth so that axis cycles through all valid values var int axis := depth mod k; // Sort point list and choose median as pivot element select median by axis from pointList; // Create node and construct subtrees var tree_node node; node.location := median; node.leftChild := kdtree(points in pointList before median, depth+1); node.rightChild := kdtree(points in pointList after median, depth+1); return node; } } I'm getting confused about the "select median..." line, simply because I'm not quite sure what is the 'right' way to apply a median here. As far as I know, the median of an odd-sized (sorted) list of numbers is the middle element (aka, for a list of 5 things, element number 3, or index 2 in a standard zero-based array), and the median of an even-sized array is the sum of the two 'middle' elements divided by two (aka, for a list of 6 things, the median is the sum of elements 3 and 4 - or 2 and 3, if zero-indexed - divided by 2.). However, surely that definition does not work here as we are working with a distinct set of points? How then does one choose the correct median for an even-sized list of numbers, especially for a length 2 list? I appreciate any and all help, thanks! -Stephen

    Read the article

  • python how to find the median of a list

    - by user3450574
    I'm trying to write a function named median that takes a list as an input and returns the median value of the list. I'm working with Python 2.7.2 The list can be of any size and the numbers are not guaranteed to be in any particular order. If the list contains an even number of elements, the function should return the average of the middle two. This is the code I'm starting with: def median(list): print(median([7,12,3,1,6,9]))

    Read the article

  • Parallel computation of the median of a large array

    - by recipriversexclusion
    I got asked this question once and still haven't been able to figure it out: You have an array of N integers, where N is large, say, a billion. You want to calculate the median value of this array. Assume you have m+1 machines (m workers, one master) to distribute the job to. How would you go about doing this? Since the median is a nonlinear operator, you can't just find the median in each machine and then take the median of those values.

    Read the article

  • Is the Leptonica implementation of 'Modified Median Cut' not using the median at all?

    - by TheCodeJunkie
    I'm playing around a bit with image processing and decided to read up on how color quantization worked and after a bit of reading I found the Modified Median Cut Quantization algorithm. I've been reading the code of the C implementation in Leptonica library and came across something I thought was a bit odd. Now I want to stress that I am far from an expert in this area, not am I a math-head, so I am predicting that this all comes down to me not understanding all of it and not that the implementation of the algorithm is wrong at all. The algorithm states that the vbox should be split along the lagest axis and that it should be split using the following logic The largest axis is divided by locating the bin with the median pixel (by population), selecting the longer side, and dividing in the center of that side. We could have simply put the bin with the median pixel in the shorter side, but in the early stages of subdivision, this tends to put low density clusters (that are not considered in the subdivision) in the same vbox as part of a high density cluster that will outvote it in median vbox color, even with future median-based subdivisions. The algorithm used here is particularly important in early subdivisions, and 3is useful for giving visible but low population color clusters their own vbox. This has little effect on the subdivision of high density clusters, which ultimately will have roughly equal population in their vboxes. For the sake of the argument, let's assume that we have a vbox that we are in the process of splitting and that the red axis is the largest. In the Leptonica algorithm, on line 01297, the code appears to do the following Iterate over all the possible green and blue variations of the red color For each iteration it adds to the total number of pixels (population) it's found along the red axis For each red color it sum up the population of the current red and the previous ones, thus storing an accumulated value, for each red note: when I say 'red' I mean each point along the axis that is covered by the iteration, the actual color may not be red but contains a certain amount of red So for the sake of illustration, assume we have 9 "bins" along the red axis and that they have the following populations 4 8 20 16 1 9 12 8 8 After the iteration of all red bins, the partialsum array will contain the following count for the bins mentioned above 4 12 32 48 49 58 70 78 86 And total would have a value of 86 Once that's done it's time to perform the actual median cut and for the red axis this is performed on line 01346 It iterates over bins and check they accumulated sum. And here's the part that throws me of from the description of the algorithm. It looks for the first bin that has a value that is greater than total/2 Wouldn't total/2 mean that it is looking for a bin that has a value that is greater than the average value and not the median ? The median for the above bins would be 49 The use of 43 or 49 could potentially have a huge impact on how the boxes are split, even though the algorithm then proceeds by moving to the center of the larger side of where the matched value was.. Another thing that puzzles me a bit is that the paper specified that the bin with the median value should be located, but does not mention how to proceed if there are an even number of bins.. the median would be the result of (a+b)/2 and it's not guaranteed that any of the bins contains that population count. So this is what makes me thing that there are some approximations going on that are negligible because of how the split actually takes part at the center of the larger side of the selected bin. Sorry if it got a bit long winded, but I wanted to be as thoroughas I could because it's been driving me nuts for a couple of days now ;)

    Read the article

  • Function to Calculate Median in Sql Server

    - by Yaakov Ellis
    According to MSDN, Median is not available as an aggregate function in Transact-Sql. However, I would like to find out whether it is possible to create this functionality (using the Create Aggregate function, user defined function, or some other method). What would be the best way (if possible) to do this - allow for the calculation of a median value (assuming a numeric data type) in an aggregate query?

    Read the article

  • Script to calculate the Median value for SQL Server data

    The standard SQL language has a number of aggregate functions like: SUM, MIN, MAX, AVG, but a common statistics function that SQL Server does not have is a built-in aggregate function for median. The median is the value that falls in the middle of a sorted resultset with equal parts that are smaller and equal parts that are greater. Since there is no built-in implementation for the median, the following is a simple solution I put together to find the median. Get smart with SQL Backup ProPowerful centralised management, encryption and more.SQL Backup Pro was the smartest kid at school. Discover why.

    Read the article

  • sql median value of column

    - by Mike
    Is there a way in sql (or tsql) to select the median of a column of numbers? I know you can sort by the column and pick the middle value, but it would be nice if there was a built-in function

    Read the article

  • is Boost Library's weighted median broken?

    - by user624188
    I confess that I am no expert in C++. I am looking for a fast way to compute weighted median, which Boost seemed to have. But it seems I am not able to make it work. #include <iostream> #include <boost/accumulators/accumulators.hpp> #include <boost/accumulators/statistics/stats.hpp> #include <boost/accumulators/statistics/median.hpp> #include <boost/accumulators/statistics/weighted_median.hpp> using namespace boost::accumulators; int main() { // Define an accumulator set accumulator_set<double, stats<tag::median > > acc1; accumulator_set<double, stats<tag::median >, float> acc2; // push in some data ... acc1(0.1); acc1(0.2); acc1(0.3); acc1(0.4); acc1(0.5); acc1(0.6); acc2(0.1, weight=0.); acc2(0.2, weight=0.); acc2(0.3, weight=0.); acc2(0.4, weight=1.); acc2(0.5, weight=1.); acc2(0.6, weight=1.); // Display the results ... std::cout << " Median: " << median(acc1) << std::endl; std::cout << "Weighted Median: " << median(acc2) << std::endl; return 0; } produces the following output, which is clearly wrong. Median: 0.3 Weighted Median: 0.3 Am I doing something wrong? Any help will be greatly appreciated. * however, the weighted sum works correctly * @glowcoder: The weighted sum works perfectly fine like this. #include <iostream> #include <boost/accumulators/accumulators.hpp> #include <boost/accumulators/statistics/stats.hpp> #include <boost/accumulators/statistics/sum.hpp> #include <boost/accumulators/statistics/weighted_sum.hpp> using namespace boost::accumulators; int main() { // Define an accumulator set accumulator_set<double, stats<tag::sum > > acc1; accumulator_set<double, stats<tag::sum >, float> acc2; // accumulator_set<double, stats<tag::median >, float> acc2; // push in some data ... acc1(0.1); acc1(0.2); acc1(0.3); acc1(0.4); acc1(0.5); acc1(0.6); acc2(0.1, weight=0.); acc2(0.2, weight=0.); acc2(0.3, weight=0.); acc2(0.4, weight=1.); acc2(0.5, weight=1.); acc2(0.6, weight=1.); // Display the results ... std::cout << " Median: " << sum(acc1) << std::endl; std::cout << "Weighted Median: " << sum(acc2) << std::endl; return 0; } and the result is Sum: 2.1 Weighted Sum: 1.5

    Read the article

  • What is the right approach when using STL container for median calculation?

    - by sharkin
    Let's say I need to retrieve the median from a sequence of 1000000 random numeric values. If using anything but STL::list, I have no (built-in) way to sort sequence for median calculation. If using STL::list, I can't randomly access values to retrieve middle (median) of sorted sequence. Is it better to implement sorting myself and go with e.g. STL::vector, or is it better to use STL::list and use STL::list::iterator to for-loop-walk to the median value? The latter seems less overheadish, but also feels more ugly.. Or are there more and better alternatives for me?

    Read the article

  • How to calculate median of a Map<Int,Int>?

    - by Chris
    For a map where the key represents a number of a sequence and the value the count how often this number appeared in the squence, how would an implementation of an algorithm in java look like to calculate the median? For example: 1,1,2,2,2,2,3,3,3,4,5,6,6,6,7,7 in a map: Map<Int,Int> map = ... map.put(1,2) map.put(2,4) map.put(3,3) map.put(4,1) map.put(5,1) map.put(6,3) map.put(7,2) double median = calculateMedian(map); print(median); would result in: > print(median); 3 > So what i am looking for is a java implementation of calculateMedian.

    Read the article

  • Create a fast algorithm for a "weighted" median

    - by Hameer Abbasi
    Suppose we have a set S with k elements of 2-dimensional vectors, (x, n). What would be the most efficient algorithm to calculate the median of the weighted set? By "weighted set", I mean that the number x has a weight n. Here is an example (inefficient due to sorting) algorithm, where Sx is the x-part, and Sn is the n-part. Assume that all co-ordinate pairs are already arranged in Sx, with the respective changes also being done in Sn, and the sum of n is sumN: sum <= 0; i<= 0 while(sum < sumN) sum <= sum + Sn(i) ++i if(sum > sumN/2) return Sx(i) else return (Sx(i)*Sn(i) + Sx(i+1)*Sn(i+1))/(Sn(i) + Sn(i+1)) EDIT: Would this hold in two or more dimensions, if we were to calculate the median first in one dimension, then in another, with n being the sum along that dimension in the second pass?

    Read the article

  • Median calculation in ActionScript 3

    - by Gabor Kako
    I have a number array and I'd like to calculate the median. When the array is odd, the calculation is OK, when it's even strange number comes up. private var numbers:String = "2,5,3,4,6,1"; private var array:Array = numbers.split(","); private function getMedian(array:Array):Number { var sortnums:Array = array.sort(Array.NUMERIC); var length:Number = sortnums.length; var mid1:Number; var mid2:Number; var median:Number; if(length % 2 == 0){ mid1 = length / 2; trace("mid1: "+mid1); mid2= ((length - 1) / 2)-0.5; trace("mid2: "+mid2); trace ("mid1: "+sortnums[mid1]+", mid2: "+sortnums[mid2]); median = (sortnums[mid1] + sortnums[mid2]) / 2; }else{ mid1 = (length / 2)-0.5 median = sortnums[mid1] } trace (median); return median; } The result is 21.5, but should be 3.5 mid1 and mid2 are a position in the array. Could somebody help?

    Read the article

  • How can I calculate data for a boxplot (quartiles, median) in a Ralis app on Heroku? ( Heroku uses P

    - by hadees
    I'm trying to calculate the data needed to generate a box plot which means I need to figure out the 1st and 3rd Quartiles along with the median. I have found some solutions for doing it in Postgresql however they seem to depend on either PL/Python or PL/R which it seems like Heroku does not have either enabled for their postgresql databases. In fact I ran "select lanname from pg_language;" and only got back "internal". I also found some code to do it in pure ruby but that seems somewhat inefficient to me. I'm rather new to Box Plots, Postgresql, and Ruby on Rails so I'm open to suggestions on how I should handle this. There is a possibility to have a lot of data which is why I'm concerned with performance however if the solution ends up being too complex I may just do it in ruby and if my application gets big enough to warrant it get my own Postgresql I can host somewhere else. *note: since I was only able to post one link, cause I'm new, I decided to share a pastie with some relevant information

    Read the article

  • How can I calculate data for a boxplot (quartiles, median) in a Rails app on Heroku? (Heroku uses Po

    - by hadees
    I'm trying to calculate the data needed to generate a box plot which means I need to figure out the 1st and 3rd Quartiles along with the median. I have found some solutions for doing it in Postgresql however they seem to depend on either PL/Python or PL/R which it seems like Heroku does not have either enabled for their postgresql databases. In fact I ran "select lanname from pg_language;" and only got back "internal". I also found some code to do it in pure ruby but that seems somewhat inefficient to me. I'm rather new to Box Plots, Postgresql, and Ruby on Rails so I'm open to suggestions on how I should handle this. There is a possibility to have a lot of data which is why I'm concerned with performance however if the solution ends up being too complex I may just do it in ruby and if my application gets big enough to warrant it get my own Postgresql I can host somewhere else. *note: since I was only able to post one link, cause I'm new, I decided to share a pastie with some relevant information

    Read the article

  • Getting the median of 3 values using scheme's car & cdr

    - by kristian Roger
    The problem this time is to get the median of three values (easy) I did this: (define (med x y z) (car(cdr(x y z))) and it was accepted but when testing it: (med 3 4 5) I get this error: Error: attempt to call a non-procedure (2 3 4) And when entering letters instead of number i get: (md x y z) Error: undefined varia y (package user) Using something besides x y z I get: (md d l m) Error: undefined variable d (package user) the question was deleted dont know how anyway write a function that return the median of 3 values

    Read the article

  • Median Filter a bi-level image with JAI

    - by Mark
    I'd like to apply a Median Filter to a bi-level image and output a bi-level image. The JAI median filter seems to output an RGB image, which I'm having trouble downconverting back to bi-level. Currently I can't even get the image back into gray color-space, my code looks like this: BufferedImage src; // contains a bi-level image ParameterBlock pb = new ParameterBlock(); pb.addSource(src); pb.add(MedianFilterDescriptor.MEDIAN_MASK_SQUARE); pb.add(3); RenderedOp result = JAI.create("MedianFilter", pb); ParameterBlock pb2 = new ParameterBlock(); pb2.addSource(result); pb2.add(new double[][]{{0.33, 0.34, 0.33, 0}}); RenderedOp grayResult = JAI.create("BandCombine", pb2); BufferedImage foo = grayResult.getAsBufferedImage(); This code hangs on the grayResult line and appears not to return. I assume that I'll eventually need to call the "Binarize" operation in JAI. Edit: Actually, the code appears to be stalling once I call getAsBufferedImage(), but returns nearly instantly when the second operation ("BandCombine") is removed. Is there a better way to keep the Median Filtering in the source color domain? If not, how do I downconvert back to binary?

    Read the article

  • Getting the median of 3 values using scheme's

    - by kristian Roger
    The problem this time is to get the median of three values (easy) I did this: (define (med x y z) (car(cdr(x y z))) and it was accepted but when testing it: (med 3 4 5) I get this error: Error: attempt to call a non-procedure (2 3 4) And when entering letters instead of number i get: (md x y z) Error: undefined varia y (package user) Using something besides x y z I get: (md d l m) Error: undefined variable d (package user) the question was deleted dont know how anyway write a function that return the median of 3 values Sorry for editing the question I got that I should put the values in order first not just a sill car and cdr thing so I did so 33> (define (med x y z) (if(and( (<x y) (<y z) y if(and( (<y x) (<x z) x z))))) Warning: invalid expression (if (and< (<x y) (<y z) y if (and ((<y x) (<x z) x z)))) but as u see Im getting a warning so what is wronge ?

    Read the article

  • building median for each string in IList<IDictionary<string, double>>

    - by Oliver
    Actually i have a little brain bender and and just can't find the right direction to get it to work: Given is an IList<IDictionary<string, double>> and it is filled as followed: Name|Value ----+----- "x" | 3.8 "y" | 4.2 "z" | 1.5 ----+----- "x" | 7.2 "y" | 2.9 "z" | 1.3 ----+----- ... | ... To fill this up with some random data i used the following methods: var list = CreateRandomPoints(new string[] { "x", "y", "z" }, 20); This will work as followed: private IList<IDictionary<string, double>> CreateRandomPoints(string[] variableNames, int count) { var list = new List<IDictionary<string, double>>(count); list.AddRange(CreateRandomPoints(variableNames).Take(count)); return list; } private IEnumerable<IDictionary<string, double>> CreateRandomPoints(string[] variableNames) { while (true) yield return CreateRandomLine(variableNames); } private IDictionary<string, double> CreateRandomLine(string[] variableNames) { var dict = new Dictionary<string, double>(variableNames.Length); foreach (var variable in variableNames) { dict.Add(variable, _Rand.NextDouble() * 10); } return dict; } Also i can say that it is already ensured that every Dictionary within the list contains the same keys (but from list to list the names and count of the keys can change). So that's what i got. Now to the things i need: I'd like to get the median (or any other math aggregate operation) of each Key within all the dictionaries, so that my function to call would look something like: IDictionary<string, double> GetMedianOfRows(this IList<IDictionary<string, double>> list) The best would be to give some kind of aggregate operation as a parameter to the function to make it more generic (don't know if the func has the correct parameters, but should imagine what i'd like to do): private IDictionary<string, double> Aggregate(this IList<IDictionary<string, double>> list, Func<IEnumerable<double>, double> aggregator) Also my actual biggest problem is to do the job with a single iteration over the list, cause if within the list are 20 variables with 1000 values i don't like to iterate 20 times over the list. Instead i would go one time over the list and compute all twenty variables at once (The biggest advantage of doing it that way would be to use this also as IEnumerable<T> on any part of the list in a later step). So here is the code i already got: public static IDictionary<string, double> GetMedianOfRows(this IList<IDictionary<string, double>> list) { //Check of parameters is left out! //Get first item for initialization of result dictionary var firstItem = list[0]; //Create result dictionary and fill in variable names var dict = new Dictionary<string, double>(firstItem.Count); //Iterate over the whole list foreach (IDictionary<string, double> row in list) { //Iterate over each key/value pair within the list foreach (var kvp in row) { //How to determine median of all values? } } return dict; } Just to be sure here is a little explanation about the Median.

    Read the article

  • How to sort a boxplot by the median values in pandas

    - by Chris
    I've got a dataframe outcome2 that I generate a grouped boxplot with in the following manner: In [11]: outcome2.boxplot(column='Hospital 30-Day Death (Mortality) Rates from Heart Attack',by='State') plt.ylabel('30 Day Death Rate') plt.title('30 Day Death Rate by State') Out [11]: What I'd like to do is sort the plot by the median for each state, instead of alphabetically. Not sure how to go about doing so.

    Read the article

  • geting the median of 3 values using sheme car & cdr

    - by kristian Roger
    Hi still stuck with the ugly scheme the problem this time is to get the median of three values (easy) I did all these : (define (med x y z) (car(cdr(x y z))) and it was accepted but when testing it (med 3 4 5) I will get this error Error: attempt to call a non-procedure (2 3 4) and when entering letters inetead of number i got (md x y z) Error: undefined varia y (package user) using somthin else than x y z i got (md d l m) Error: undefined variable d (package user) so what is wronge ?!

    Read the article

  • Boost multi_index ordered_unique Median Value

    - by Autopulated
    I would like to quickly retrieve the median value from a boost multi_index container with an ordered_unique index, however the index iterators aren't random access (I don't understand why they can't be, though this is consistent with std::set...). Is there a faster/neater way to do this other than incrementing an iterator container.size() / 2 times?

    Read the article

  • Performance issues when using SSD for a developer notebook (WAMP/LAMP stack)?

    - by András Szepesházi
    I'm a web application developer using my notebook as a standalone development environment (WAMP stack). I just switched from a Core2-duo Vista 32 bit notebook with 2Gb RAM and SATA HDD, to an i5-2520M Win7 64 bit with 4Gb RAM and 128 GB SDD (Corsair P3 128). My initial experience was what I expected, fast boot, quick load of all the applications (Eclipse takes now 5 seconds as opposed to 30s on my old notebook), overall great experience. Then I started to build up my development stack, both as LAMP (using VirtualBox with a debian guest) and WAMP (windows native apache + mysql + php). I wanted to compare those two. This still all worked great out, then I started to pull in my projects to these stacks. And here came the nasty surprise, one of those projects produced a lot worse response times than on my old notebook (that was true for both the VirtualBox and WAMP stack). Apache, php and mysql configurations were practically identical in all environments. I started to do a lot of benchmarking and profiling, and here is what I've found: All general benchmarks (Performance Test 7.0, HDTune Pro, wPrime2 and some more) gave a big advantage to the new notebook. Nothing surprising here. Disc specific tests showed that read/write operations peaked around 380M/160M for the SSD, and all the different sized block operations also performed very well. Started apache performance benchmarking with Apache Benchmark for a small static html file (10 concurrent threads, 500 iterations). Old notebook: min 47ms, median 111ms, max 156ms New WAMP stack: min 71ms, median 135ms, max 296ms New LAMP stack (in VirtualBox): min 6ms, median 46ms, max 175ms Right here I don't get why the native WAMP stack performed so bad, but at least the LAMP environment brought the expected speed. Apache performance measurement for non-cached php content. The php runs a loop of 1000 and generates sha1(uniqid()) inisde. Again, 10 concurrent threads, 500 iterations were used for the benchmark. Old notebook: min 0ms, median 39ms, max 218ms New WAMP stack: min 20ms, median 61ms, max 186ms New LAMP stack (in VirtualBox): min 124ms, median 704ms, max 2463ms What the hell? The new LAMP performed miserably, and even the new native WAMP was outperformed by the old notebook. php + mysql test. The test consists of connecting to a database and reading a single record form a table using INNER JOIN on 3 more (indexed) tables, repeated 100 times within a loop. Databases were identical. 10 concurrent threads, 100 iterations were used for the benchmark. Old notebook: min 1201ms, median 1734ms, max 3728ms New WAMP stack: min 367ms, median 675ms, max 1893ms New LAMP stack (in VirtualBox): min 1410ms, median 3659ms, max 5045ms And the same test with concurrency set to 1 (instead of 10): Old notebook: min 1201ms, median 1261ms, max 1357ms New WAMP stack: min 399ms, median 483ms, max 539ms New LAMP stack (in VirtualBox): min 285ms, median 348ms, max 444ms Strictly for my purposes, as I'm using a self contained development environment (= low concurrency) I could be satisfied with the second test's result. Though I have no idea why the VirtualBox environment performed so bad with higher concurrency. Finally I performed a test of including many php files. The application that I mentioned at the beginning, the one that was performing so bad, has a heavy bootstrap, loads hundreds of small library and configuration files while initializing. So this test does nothing else just includes about 100 files. Concurrency set to 1, 100 iterations: Old notebook: min 140ms, median 168ms, max 406ms New WAMP stack: min 434ms, median 488ms, max 604ms New LAMP stack (in VirtualBox): min 413ms, median 1040ms, max 1921ms Even if I consider that VirtualBox reached those files via shared folders, and that slows things down a bit, I still don't see how could the old notebook outperform so heavily both new configurations. And I think this is the real root of the slow performance, as the application uses even more includes, and the whole bootstrap will occur several times within a page request (for each ajax call, for example). To sum it up, here I am with a brand new high-performance notebook that loads the same page in 20 seconds, that my old notebook can do in 5-7 seconds. Needless to say, I'm not a very happy person right now. Why do you think I experience these poor performance values? What are my options to remedy this situation?

    Read the article

  • What is the best way to keep track of the median?

    - by Steven Mou
    I read a question in one book: Numbers are randomly generated and stored into an (expanding) array, How would you keep track of the median? There are two data structures can solve the problem. One is the balanced binary tree, the other is two heaps which keep trace of the biggest half and the smallest half of the elements. I think these two solutions has the same running time as O(n lg n), but I am not sure of my judgement. In your opinions, What is the best way to keep track of the median?

    Read the article

1 2 3 4 5  | Next Page >