reservoir sampling

reservoir sampling problem: correctness of proof

- by eSKay

This MSDN article proves the correctness of Reservoir Sampling algorithm as follows: Base case is trivial. For the k+1st case, the probability a given element i with position <= k is in R is s/k. The probability i is replaced is the probability k+1st element is chosen multiplied by i being chosen to be replaced, which is: s/(k+1) * 1/s = 1/(k+1), and prob that i is not replaced is k/k+1. So any given element's probability of lasting after k+1 rounds is: (chosen in k steps, and not removed in k steps) = s/k * k/(k+1), which is s/(k+1). So, when k+1 = n, any element is present with probability s/n. about step 3: What are the k+1 rounds mentioned? What is chosen in k steps, and not removed in k steps? Why are we only calculating this probability for elements that were already in R after the first s steps?

Read the article

reservoir sampling problem

- by eSKay

This MSDN article proves the correctness of Reservoir Sampling algorithm as follows: Base case is trivial. For the k+1st case, the probability a given element i with position <= k is in R is s/k. The probability i is replaced is the probability k+1st element is chosen multiplied by i being chosen to be replaced, which is: s/(k+1) * 1/s = 1/(k+1), and prob that i is not replaced is k/k+1. So any given element's probability of lasting after k+1 rounds is: (chosen in k steps, and not removed in k steps) = s/k * k/(k+1), which is s/(k+1). So, when k+1 = n, any element is present with probability s/n. about step 3: What are the k+1 rounds mentioned? What is chosen in k steps, and not removed in k steps? Why are we only calculating this probability for elements that were already in R after the first s steps?

Read the article

- by Codenotguru

to retrieve k random numbers from an array of undetermined size we use a technique called reservoir sampling. Can anybody briefly highlight how it happens with a sample code??

Read the article

audio stream sampling rate in linux

- by farhan

Im trying read and store samples from an audio microphone in linux using C/C++. Using PCM ioctls i setup the device to have a certain sampling rate say 10Khz using the SOUND_PCM_WRITE_RATE ioctl etc. The device gets setup correctly and im able to read back from the device after setup using the "read". int got = read(itsFd, b.getDataPtr(), b.sizeBytes()); The problem i have is that after setting the appropriate sampling rate i have a thread that continuously reads from /dev/dsp1 and stores these samples, but the number of samples that i get for 1 second of recording are way off the sampling rate and always orders of magnitude more than the set sampling rate. Any ideas where to begin on figuring out what might be the problem?

Read the article

Structural and Sampling (JavaScript) Profiling in Google Chrome

Structural and Sampling (JavaScript) Profiling in Google Chrome Slow JavaScript code on your pages? Chrome provides both a sampling, and a structural profiler to help you track down, isolate, and fix the underlying problem. Tune in to learn how to use both profilers, and how to improve your own workflow to build better, faster browser applications! We'll talk about chrome://tracing, the built-in JS profiler in DevTools, and much more. From: GoogleDevelopers Views: 0 3 ratings Time: 01:00:00 More in Science & Technology

Read the article

Sampling Duplicates

- by user3640982

I have a dataset from which I need to sample. It is set up with an ID field and a year field. I want every record from the most current year and then I want the most current ID's but sampled from every 3rd year going back. The data is ordered by year. For example ID<-rep(1:3, 5) Year<-rep(c(1,2,3,4,5),each=3) df<-data.frame(ID,Year) ID Year 1 1 1 2 2 1 3 3 1 4 1 2 5 2 2 6 3 2 7 1 3 8 2 3 9 3 3 10 1 4 11 2 4 12 3 4 13 1 5 14 2 5 15 3 5 So from this example, I would want to return ID Year 1 1 1 2 2 1 3 3 1 4 1 4 5 2 4 6 3 4 I'm thinking that some combination of duplicated() and which() should get what I want, but the problem is duplicated() just tells if it has been repeated; it doesn't say which record is being repeated. which(duplicated(df$ID)) [1] 4 5 6 7 8 9 10 11 12 13 14 15 This a problem since not every ID exists in every year. Any help would be appreciated. Thanks, Eric

Read the article

Try a sample: Using the counter predicate for event sampling

- by extended_events

Extended Events offers a rich filtering mechanism, called predicates, that allows you to reduce the number of events you collect by specifying criteria that will be applied during event collection. (You can find more information about predicates in Using SQL Server 2008 Extended Events (by Jonathan Kehayias)) By evaluating predicates early in the event firing sequence we can reduce the performance impact of collecting events by stopping event collection when the criteria are not met. You can specify predicates on both event fields and on a special object called a predicate source. Predicate sources are similar to action in that they typically are related to some type of global information available from the server. You will find that many of the actions available in Extended Events have equivalent predicate sources, but actions and predicates sources are not the same thing. Applying predicates, whether on a field or predicate source, is very similar to what you are used to in T-SQL in terms of how they work; you pick some field/source and compare it to a value, for example, session_id = 52. There is one predicate source that merits special attention though, not just for its special use, but for how the order of predicate evaluation impacts the behavior you see. I’m referring to the counter predicate source. The counter predicate source gives you a way to sample a subset of events that otherwise meet the criteria of the predicate; for example you could collect every other event, or only every tenth event. Simple CountingThe counter predicate source works by creating an in memory counter that increments every time the predicate statement is evaluated. Here is a simple example with my favorite event, sql_statement_completed, that only collects the second statement that is run. (OK, that’s not much of a sample, but this is for demonstration purposes. Here is the session definition: CREATE EVENT SESSION counter_test ON SERVERADD EVENT sqlserver.sql_statement_completed (ACTION (sqlserver.sql_text) WHERE package0.counter = 2)ADD TARGET package0.ring_bufferWITH (MAX_DISPATCH_LATENCY = 1 SECONDS) You can find general information about the session DDL syntax in BOL and from Pedro’s post Introduction to Extended Events. The important part here is the WHERE statement that defines that I only what the event where package0.count = 2; in other words, only the second instance of the event. Notice that I need to provide the package name along with the predicate source. You don’t need to provide the package name if you’re using event fields, only for predicate sources. Let’s say I run the following test queries: -- Run three statements to test the sessionSELECT 'This is the first statement'GOSELECT 'This is the second statement'GOSELECT 'This is the third statement';GO Once you return the event data from the ring buffer and parse the XML (see my earlier post on reading event data) you should see something like this: event_name sql_text sql_statement_completed SELECT ‘This is the second statement’ You can see that only the second statement from the test was actually collected. (Feel free to try this yourself. Check out what happens if you remove the WHERE statement from your session. Go ahead, I’ll wait.) Percentage Sampling OK, so that wasn’t particularly interesting, but you can probably see that this could be interesting, for example, lets say I need a 25% sample of the statements executed on my server for some type of QA analysis, that might be more interesting than just the second statement. All comparisons of predicates are handled using an object called a predicate comparator; the simple comparisons such as equals, greater than, etc. are mapped to the common mathematical symbols you know and love (eg. = and >), but to do the less common comparisons you will need to use the predicate comparators directly. You would probably look to the MOD operation to do this type sampling; we would too, but we don’t call it MOD, we call it divides_by_uint64. This comparator evaluates whether one number is divisible by another with no remainder. The general syntax for using a predicate comparator is pred_comp(field, value), field is always first and value is always second. So lets take a look at how the session changes to answer our new question of 25% sampling: CREATE EVENT SESSION counter_test_25 ON SERVERADD EVENT sqlserver.sql_statement_completed (ACTION (sqlserver.sql_text) WHERE package0.divides_by_uint64(package0.counter,4))ADD TARGET package0.ring_bufferWITH (MAX_DISPATCH_LATENCY = 1 SECONDS)GO Here I’ve replaced the simple equivalency check with the divides_by_uint64 comparator to check if the counter is evenly divisible by 4, which gives us back every fourth record. I’ll leave it as an exercise for the reader to test this session. Why order matters I indicated at the start of this post that order matters when it comes to the counter predicate – it does. Like most other predicate systems, Extended Events evaluates the predicate statement from left to right; as soon as the predicate statement is proven false we abandon evaluation of the remainder of the statement. The counter predicate source is only incremented when it is evaluated so whether or not the counter is incremented will depend on where it is in the predicate statement and whether a previous criteria made the predicate false or not. Here is a generic example: Pred1: (WHERE statement_1 AND package0.counter = 2)Pred2: (WHERE package0.counter = 2 AND statement_1) Let’s say I cause a number of events as follows and examine what happens to the counter predicate source. Iteration Statement Pred1 Counter Pred2 Counter A Not statement_1 0 1 B statement_1 1 2 C Not statement_1 1 3 D statement_1 2 4 As you can see, in the case of Pred1, statement_1 is evaluated first, when it fails (A & C) predicate evaluation is stopped and the counter is not incremented. With Pred2 the counter is evaluated first, so it is incremented on every iteration of the event and the remaining parts of the predicate are then evaluated. In this example, Pred1 would return an event for D while Pred2 would return an event for B. But wait, there is an interesting side-effect here; consider Pred2 if I had run my statements in the following order: Not statement_1 Not statement_1 statement_1 statement_1 In this case I would never get an event back from the system because the point at which counter=2, the rest of the predicate evaluates as false so the event is not returned. If you’re using the counter target for sampling and you’re not getting the expected events, or any events, check the order of the predicate criteria. As a general rule I’d suggest that the counter criteria should be the last element of your predicate statement since that will assure that your sampling rate will apply to the set of event records defined by the rest of your predicate. Aside: I’m interested in hearing about uses for putting the counter predicate criteria earlier in the predicate statement. If you have one, post it in a comment to share with the class. - Mike Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article

Very slow direct3D texture sampling

- by __dominic

Hi, So I'm writing a small game using Direct3D 9 and I'm using multitexturing for the terrain. All I'm doing is sampling 3 textures and a blend map and getting the overall color from the three textures based on the color channels from the blend map. Anyway, I am getting a massive frame rate drop when I sample more than 1 texture, I'm going from 120+ fps to just under 50. This is the HLSL code responsible for the slow down: float3 ground = tex2D(GroundTex, multiTex).rgb; float3 stone = tex2D(StoneTex, multiTex).rgb; float3 grass = tex2D(GrassTex, multiTex).rgb; float3 blend = tex2D(BlendMapTex, blendMap).rgb; Am I doing it wrong ? If anyone has any info or tips about texture sampling or anything, that would be nice. Thanks.

Read the article

To sample or not to sample...

- by [email protected]

Ideally, we would know the exact answer to every question. How many people support presidential candidate A vs. B? How many people suffer from H1N1 in a given state? Does this batch of manufactured widgets have any defective parts? Knowing exact answers is expensive in terms of time and money and, in most cases, is impractical if not impossible. Consider asking every person in a region for their candidate preference, testing every person with flu symptoms for H1N1 (assuming every person reported when they had flu symptoms), or destructively testing widgets to determine if they are "good" (leaving no product to sell). Knowing exact answers, fortunately, isn't necessary or even useful in many situations. Understanding the direction of a trend or statistically significant results may be sufficient to answer the underlying question: who is likely to win the election, have we likely reached a critical threshold for flu, or is this batch of widgets good enough to ship? Statistics help us to answer these questions with a certain degree of confidence. This focuses on how we collect data. In data mining, we focus on the use of data, that is data that has already been collected. In some cases, we may have all the data (all purchases made by all customers), in others the data may have been collected using sampling (voters, their demographics and candidate choice). Building data mining models on all of your data can be expensive in terms of time and hardware resources. Consider a company with 40 million customers. Do we need to mine all 40 million customers to get useful data mining models? The quality of models built on all data may be no better than models built on a relatively small sample. Determining how much is a reasonable amount of data involves experimentation. When starting the model building process on large datasets, it is often more efficient to begin with a small sample, perhaps 1000 - 10,000 cases (records) depending on the algorithm, source data, and hardware. This allows you to see quickly what issues might arise with choice of algorithm, algorithm settings, data quality, and need for further data preparation. Instead of waiting for a model on a large dataset to build only to find that the results don't meet expectations, once you are satisfied with the results on the initial sample, you can take a larger sample to see if model quality improves, and to get a sense of how the algorithm scales to the particular dataset. If model accuracy or quality continues to improve, consider increasing the sample size. Sampling in data mining is also used to produce a held-aside or test dataset for assessing classification and regression model accuracy. Here, we reserve some of the build data (data that includes known target values) to be used for an honest estimate of model error using data the model has not seen before. This sampling transformation is often called a split because the build data is split into two randomly selected sets, often with 60% of the records being used for model building and 40% for testing. Sampling must be performed with care, as it can adversely affect model quality and usability. Even a truly random sample doesn't guarantee that all values are represented in a given attribute. This is particularly troublesome when the attribute with omitted values is the target. A predictive model that has not seen any examples for a particular target value can never predict that target value! For other attributes, values may consist of a single value (a constant attribute) or all unique values (an identifier attribute), each of which may be excluded during mining. Values from categorical predictor attributes that didn't appear in the training data are not used when testing or scoring datasets. In subsequent posts, we'll talk about three sampling techniques using Oracle Database: simple random sampling without replacement, stratified sampling, and simple random sampling with replacement.

Read the article

Statistical Sampling for Verifying Database Backups

A DBA's huge workload can start to threaten best practices for data backup and recovery, but ingenuity, and an eye for a good tactic, can usually find a way. For Tom, the revelation about a solution came from eating crabs. Statistical sampling can be brought to bear to minimise the risk of faliure of an emergency database restore.

Read the article

Statistical Sampling for Verifying Database Backups

A DBA's huge workload can start to threaten best practices for data backup and recovery, but ingenuity, and an eye for a good tactic, can usually find a way. For Tom, the revelation about a solution came from eating crabs. Statistical sampling can be brought to bear to minimize the risk of failure of an emergency database restore.

Read the article

Constructive criticsm on my linear sampling Gaussian blur

- by Aequitas

I've been attempting to implement a gaussian blur utilising linear sampling, I've come across a few articles presented on the web and a question posed here which dealt with the topic. I've now attempted to implement my own Gaussian function and pixel shader drawing reference from these articles. This is how I'm currently calculating my weights and offsets: int support = int(sigma * 3.0) weights.push_back(exp(-(0*0)/(2*sigma*sigma))/(sqrt(2*pi)*sigma)); total += weights.back(); offsets.push_back(0); for (int i = 1; i <= support; i++) { float w1 = exp(-(i*i)/(2*sigma*sigma))/(sqrt(2*pi)*sigma); float w2 = exp(-((i+1)*(i+1))/(2*sigma*sigma))/(sqrt(2*pi)*sigma); weights.push_back(w1 + w2); total += 2.0f * weights[i]; offsets.push_back(w1 / weights[i]); } for (int i = 0; i < support; i++) { weights[i] /= total; } Here is an example of my vertical pixel shader: vec3 acc = texture2D(tex_object, v_tex_coord.st).rgb*weights[0]; vec2 pixel_size = vec2(1.0 / tex_size.x, 1.0 / tex_size.y); for (int i = 1; i < NUM_SAMPLES; i++) { acc += texture2D(tex_object, (v_tex_coord.st+(vec2(0.0, offsets[i])*pixel_size))).rgb*weights[i]; acc += texture2D(tex_object, (v_tex_coord.st-(vec2(0.0, offsets[i])*pixel_size))).rgb*weights[i]; } gl_FragColor = vec4(acc, 1.0); Am I taking the correct route with this? Any criticism or potential tips to improving my method would be much appreciated.

Read the article

Sampling Heightmap Edges for Normal map

- by pl12

I use a Sobel filter to generate normal maps from procedural height maps. The heightmaps are 258x258 pixels. I scale my texture coordinates like so: texCoord = (texCoord * (256/258)) + (1/258) Yet even with this I am left with the following problem: As you can see the edges of the normal map still proves to be problematic. Putting the texture wrap mode to "clamp" also proved no help. EDIT: The Sobel Filter function by sampling the 8 surrounding pixels around a given pixel so that a derivative can be calculated in order to find the "normal" of the given pixel. The texture coordinates are instanced once per quad (for the quadtree that makes up the world) and are created as follows (it is quite possible that the problem results from the way I scale and offset the texCoords as seen above): Java: for(int i = 0; i<vertices.length; i++){ Vector2f coord = new Vector2f((vertices[i].x)/(worldSize), (vertices[i].z)/( worldSize)); texCoords[i] = coord; } the quad used for input here rests on the X0Z plane. 'worldSize' is the diameter of the planet. No negative texCoords are seen as the quad used for input for this method is not centered around the origin. Is there something I am missing here? Thanks.

Read the article

Sampling SQL server batch activity

- by extended_events

Recently I was troubleshooting a performance issue on an internal tracking workload and needed to collect some very low level events over a period of 3-4 hours. During analysis of the data I found that a common pattern I was using was to find a batch with a duration that was longer than average and follow all the events it produced. This pattern got me thinking that I was discarding a substantial amount of event data that had been collected, and that it would be great to be able to reduce the collection overhead on the server if I could still get all activity from some batches. In the past I’ve used a sampling technique based on the counter predicate to build a baseline of overall activity (see Mikes post here). This isn’t exactly what I want though as there would certainly be events from a particular batch that wouldn’t pass the predicate. What I need is a way to identify streams of work and select say one in ten of them to watch, and sql server provides just such a mechanism: session_id. Session_id is a server assigned integer that is bound to a connection at login and lasts until logout. So by combining the session_id predicate source and the divides_by_uint64 predicate comparator we can limit collection, and still get all the events in batches for investigation. CREATE EVENT SESSION session_10_percent ON SERVER ADD EVENT sqlserver.sql_statement_starting( WHERE (package0.divides_by_uint64(sqlserver.session_id,10))), ADD EVENT sqlos.wait_info ( WHERE (package0.divides_by_uint64(sqlserver.session_id,10))), ADD EVENT sqlos.wait_info_external ( WHERE (package0.divides_by_uint64(sqlserver.session_id,10))), ADD EVENT sqlserver.sql_statement_completed( WHERE (package0.divides_by_uint64(sqlserver.session_id,10))) ADD TARGET ring_buffer WITH (MAX_DISPATCH_LATENCY=30 SECONDS,TRACK_CAUSALITY=ON) GO There we go; event collection is reduced while still providing enough information to find the root of the problem. By the way the performance issue turned out to be an IO issue, and the session definition above was more than enough to show long waits on PAGEIOLATCH*.

Read the article

Ask Tom - "On Dynamic Sampling"

Our technologist samples dynamically, considers usage, and sets levels.

Read the article

how to change sampling interval in Mathcad

- by Bo Tian

I have plotted a x-y graph in Mathcad v14. The default sampling interval for the x axis is 1. How can I change that to 1/1000. Thank you.

Read the article

Random Sampling in Excel

- by bonsvr

I have an Excel sheet as follows: NO NAME AMOUNT 1 A 50 1 B 50 2 A 100 2 C 100 3 D 70 3 B 70 4 A 30 4 F 30 5 C 150 5 G 150 . . . . There are let's say 10,000 rows. I want to get a random sample from rows. There are 2 conditions: 1. Sampling must be based on "NO" column. 2. Size of the sample is determined by the user: it can be %5, %10 or %20. For example, one decides to randomly choose %20 of total rows in the above example: The result is like: NO NAME AMOUNT 2 A 100 2 C 100 90 Z 500 90 E 500 . . . . There should be 2,000 rows. I don't know whether my question is too specific. I am new to Excel VBA, and I faced a situation like this. Above process is about getting a random sample from an account ledger for auditing purposes.

Read the article

Why can't I record 16khz sampling audio using my laptop?

- by KayKay

I want to know why my laptop can't record 16khz sampling audio. The sampling rates I can have using my laptop are higher than 16khz. e.g, 44khz, 48khz, 192khz, and so on... I need to record 16khz sampling audio using my laptop. Sound card in my laptop is Conexant 20671 SmartAudio HD Although I can record 16khz sampling by Sound Forge 8.0, I am doubt whether the recorded audio is really 16khz sampling or not. Because the sound card can't record 16khz sampling, I think there may be some problems on the recording process. Could you give me any hint why the sound card can't record 16khz? and any method to identify whether the recorded audio by Sound Forge 8.0 is really 16khz? Thanks.

Read the article

sampling integers uniformly efficiently in python using numpy/scipy

- by user248237

I have a problem where depending on the result of a random coin flip, I have to sample a random starting position from a string. If the sampling of this random position is uniform over the string, I thought of two approaches to do it: one using multinomial from numpy.random, the other using the simple randint function of Python standard lib. I tested this as follows: from numpy import * from numpy.random import multinomial from random import randint import time def use_multinomial(length, num_points): probs = ones(length)/float(length) for n in range(num_points): result = multinomial(1, probs) def use_rand(length, num_points): for n in range(num_points): rand(1, length) def main(): length = 1700 num_points = 50000 t1 = time.time() use_multinomial(length, num_points) t2 = time.time() print "Multinomial took: %s seconds" %(t2 - t1) t1 = time.time() use_rand(length, num_points) t2 = time.time() print "Rand took: %s seconds" %(t2 - t1) if __name__ == '__main__': main() The output is: Multinomial took: 6.58072400093 seconds Rand took: 2.35189199448 seconds it seems like randint is faster, but it still seems very slow to me. Is there a vectorized way to get this to be much faster, using numpy or scipy? thanks.

Read the article

Oracle - timed sampling from v$session_longops

- by FrustratedWithFormsDesigner

I am trying to track performance on some procedures that run too slow (and seem to keep getting slower). I am using v$session_longops to track how much work has been done, and I have a query (sofar/((v$session_longops.LAST_UPDATE_TIME-v$session_longops.start_time)*24*60*60)) that tells me the rate at which work is being done. What I'd like to be able to do is capture the rate at which work is being done and how it changes over time. Right now, I just re-execute the query manually, and then copy/paste to Excel. Not very optimal, especially when the phone rings or something else happens to interrupt my sampling frequency. Is there a way to have script in SQL*Plus run a query evern n seconds, spool the results to a file, and then continue doing this until the job ends? (Oracle 10g)

Read the article

How would i down-sample a .wav file then reconstruct it using nyquist? - in MATLAB

- by Andrew

This is all done in MATLAB 2010 My objective is to show the results of: undersampling, nyquist rate/ oversampling First i need to downsample the .wav file to get an incomplete/ or impartial data stream that i can then reconstuct. Heres the flow chart of what im going to be doing So the flow is analog signal - sampling analog filter - ADC - resample down - resample up - DAC - reconstruction analog filter what needs to be achieved: F= Frequency F(Hz=1/s) E.x. 100Hz = 1000 (Cyc/sec) F(s)= 1/(2f) Example problem: 1000 hz = Highest frequency 1/2(1000hz) = 1/2000 = 5x10(-3) sec/cyc or a sampling rate of 5ms This is my first signal processing project using matlab. what i have so far. % Fs = frequency sampled (44100hz or the sampling frequency of a cd) [test,fs]=wavread('test.wav'); % loads the .wav file left=test(:,1); % Plot of the .wav signal time vs. strength time=(1/44100)*length(left); t=linspace(0,time,length(left)); plot(t,left) xlabel('time (sec)'); ylabel('relative signal strength') **%this is were i would need to sample it at the different frequecys (both above and below and at) nyquist frequency.*I think.*** soundsc(left,fs) % shows the resaultant audio file , which is the same as original ( only at or above nyquist frequency however) Can anyone tell me how to make it better, and how to do the sampling at verious frequencies?

Read the article

Python 3.1 - Memory Error during sampling of a large list

- by jimy

The input list can be more than 1 million numbers. When I run the following code with smaller 'repeats', its fine; def sample(x): length = 1000000 new_array = random.sample((list(x)),length) return (new_array) def repeat_sample(x): i = 0 repeats = 100 list_of_samples = [] for i in range(repeats): list_of_samples.append(sample(x)) return(list_of_samples) repeat_sample(large_array) However, using high repeats such as the 100 above, results in MemoryError. Traceback is as follows; Traceback (most recent call last): File "C:\Python31\rnd.py", line 221, in <module> STORED_REPEAT_SAMPLE = repeat_sample(STORED_ARRAY) File "C:\Python31\rnd.py", line 129, in repeat_sample list_of_samples.append(sample(x)) File "C:\Python31\rnd.py", line 121, in sample new_array = random.sample((list(x)),length) File "C:\Python31\lib\random.py", line 309, in sample result = [None] * k MemoryError I am assuming I'm running out of memory. I do not know how to get around this problem. Thank you for your time!

Read the article

Matlab fft function

- by CTZStef

The code below is from the Matlab 2011a help about fft function. I think there is a problem here : why do they multiply t(1:50) by Fs, and then say it's time in millisecond ? Certainly, it happens to be true in this very particular case, but change the value of Fs to, say, 2000, and it won't work anymore, obviously because of this factor of 2. Right ? Quite misleading, isn't it ? What do I miss ? Fs = 1000; % Sampling frequency T = 1/Fs; % Sample time L = 1000; % Length of signal t = (0:L-1)*T; % Time vector % Sum of a 50 Hz sinusoid and a 120 Hz sinusoid x = 0.7*sin(2*pi*50*t) + sin(2*pi*120*t); y = x + 2*randn(size(t)); % Sinusoids plus noise plot(Fs*t(1:50),y(1:50)) title('Signal Corrupted with Zero-Mean Random Noise') xlabel('time (milliseconds)') Clearer with this : fs = 2000; % Sampling frequency T = 1 / fs; % Sample time L = 1000; % Length of signal t2 = (0:L-1)*T; % Time vector f = 50; % signal frequency s2 = sin(2*pi*f*t2); figure, plot(fs*t2(1:50),s2(1:50)); % NOT good figure, plot(t2(1:50),s2(1:50)); % good

Read the article

Estimating the boundary of arbitrarily distributed data

- by Dave

I have two dimensional discrete spatial data. I would like to make an approximation of the spatial boundaries of this data so that I can produce a plot with another dataset on top of it. Ideally, this would be an ordered set of (x,y) points that matplotlib can plot with the plt.Polygon() patch. My initial attempt is very inelegant: I place a fine grid over the data, and where data is found in a cell, a square matplotlib patch is created of that cell. The resolution of the boundary thus depends on the sampling frequency of the grid. Here is an example, where the grey region are the cells containing data, black where no data exists. OK, problem solved - why am I still here? Well.... I'd like a more "elegant" solution, or at least one that is faster (ie. I don't want to get on with "real" work, I'd like to have some fun with this!). The best way I can think of is a ray-tracing approach - eg: from xmin to xmax, at y=ymin, check if data boundary crossed in intervals dx y=ymin+dy, do 1 do 1-2, but now sample in y An alternative is defining a centre, and sampling in r-theta space - ie radial spokes in dtheta increments. Both would produce a set of (x,y) points, but then how do I order/link neighbouring points them to create the boundary? A nearest neighbour approach is not appropriate as, for example (to borrow from Geography), an isthmus (think of Panama connecting N&S America) could then close off and isolate regions. This also might not deal very well with the holes seen in the data, which I would like to represent as a different plt.Polygon. The solution perhaps comes from solving an area maximisation problem. For a set of points defining the data limits, what is the maximum contiguous area contained within those points To form the enclosed area, what are the neighbouring points for the nth point? How will the holes be treated in this scheme - is this erring into topology now? Apologies, much of this is me thinking out loud. I'd be grateful for some hints, suggestions or solutions. I suspect this is an oft-studied problem with many solution techniques, but I'm looking for something simple to code and quick to run... I guess everyone is, really! Cheers, David

Search Results

Search found 210 results on 9 pages for 'reservoir sampling'.

Page 1/9 | 1 2 3 4 5 6 7 8 9 | Next Page >

- by eSKay

- by eSKay

- by Codenotguru

- by farhan

- by user3640982

- by extended_events

- by __dominic

- by [email protected]

- by Aequitas

- by pl12

- by extended_events

- by Bo Tian

- by bonsvr

- by KayKay

- by user248237

- by FrustratedWithFormsDesigner

- by Andrew

- by jimy

- by CTZStef

- by Dave

1 2 3 4 5 6 7 8 9 | Next Page >