Building Stored Procedure to group data into ranges with roughly equal results in each bucket

Posted by Len on Stack Overflow See other posts from Stack Overflow or by Len
Published on 2010-03-28T21:24:32Z Indexed on 2010/03/28 21:43 UTC
Read the original article Hit count: 262

I am trying to build one procedure to take a large amount of data and create 5 range buckets to display the data. the buckets ranges will have to be set according to the results.

Here is my existing SP

GO
/****** Object:  StoredProcedure [dbo].[sp_GetRangeCounts]    Script Date: 03/28/2010 19:50:45 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[sp_GetRangeCounts]
    @idMenu int

AS

declare 
@myMin decimal(19,2),
@myMax decimal(19,2),
@myDif decimal(19,2),
@range1 decimal(19,2),
@range2 decimal(19,2), 
@range3 decimal(19,2),
@range4 decimal(19,2),
@range5 decimal(19,2),
@range6 decimal(19,2)

SELECT @myMin=Min(modelpropvalue), @myMax=Max(modelpropvalue)
FROM xmodelpropertyvalues where modelPropUnitDescriptionID=@idMenu 

set @myDif=(@myMax-@myMin)/5
set @range1=@myMin
set @range2=@myMin+@myDif
set @range3=@range2+@myDif
set @range4=@range3+@myDif
set @range5=@range4+@myDif
set @range6=@range5+@myDif

select @myMin,@myMax,@myDif,@range1,@range2,@range3,@range4,@range5,@range6

select t.range as myRange, count(*) as myCount
from (
  select case  
    when modelpropvalue between @range1 and @range2 then 'range1'
    when modelpropvalue between @range2 and @range3 then 'range2'
    when modelpropvalue between @range3 and @range4 then 'range3'
    when modelpropvalue between @range4 and @range5 then 'range4'
    when modelpropvalue between @range5 and @range6 then 'range5'
     end as range
  from xmodelpropertyvalues where modelpropunitDescriptionID=@idmenu) t
group by t.range order by t.range

This calculates the min and max value from my table, works out the difference between the two and creates 5 buckets. The problem is that if there are a small amount of very high (or very low) values then the buckets will appear very distorted - as in these results...

range1  2806
range2  296
range3  75
range5  1

Basically I want to rebuild the SP so it creates buckets with equal amounts of results in each. I have played around with some of the following approaches without quite nailing it...

SELECT modelpropvalue, NTILE(5) OVER (ORDER BY modelpropvalue) FROM xmodelpropertyvalues - this creates a new column with either 1,2,3,4 or 5 in it

 ROW_NUMBER()OVER (ORDER BY modelpropvalue) between @range1 and @range2
 ROW_NUMBER()OVER (ORDER BY modelpropvalue) between @range2 and @range3
  • or maybe i could allocate every record a row number then divide into ranges from this?

© Stack Overflow or respective owner

Related posts about sql-server

Related posts about stored-procedures