Search Results

Search found 1354 results on 55 pages for 'compute scalar'.

Page 7/55 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14  | Next Page >

  • Rackspace Cloud Sites: Compute Cycles exploding. Very expensive.

    - by Jaap
    Hi All, Since last week my compute cycles (CC) went through the roof (Rackspace Cloud Sites). Normally I stay under the 10,000 cycles per month . Now this month I already have more than 75,000 compute cycles. I don't have more visitors and I did not change anything in the code. I looked in the raw log files, that didn't help either... This explosion of CC already costs me more than 750 USD right now. And still counting. Anyone know what to do? I have contacted Rackspace last week. But still no solution/answer.... Looks like Rackspace is liking the money! Help! Thanks.

    Read the article

  • Confused about nova-network

    - by neo0
    I'm so sorry because this question doesn't related to Ubuntu. I asked in Openstack forum but this forum is not very active. So I think if someone have experience with Openstack Nova can help me with my problem. I've read some explanations about nova-network and how to configure it like this one from wiki: http://wiki.openstack.org/UnderstandingFlatNetworking I'm confusing about a detail. If every traffic from the instances must go through nova controller node, then why we still need the public interface for nova-compute node? Is it necessary? What happen when a request from outside to an instance. For example I have a controller node and a nova-compute node. In nova-compute node I run an instance with a Wordpress website. Then someone connect to the public IP of this instance. So the request go directly from router to the nova-compute node or from router to controller node then nova-compute node? Thank you!

    Read the article

  • Juju instances in aganet-state: down after turning them off (and back on) on EC2

    - by Tyler McAdams
    I turned my Juju instances off on EC2 for a while and after bringing them back online they seem to be in an odd state: [code] claude-vm@claude-vm-fusion:~/Documents/Shell Scripts$ juju status 2012-11-17 17:06:44,094 INFO Connecting to environment... 2012-11-17 17:06:45,590 INFO Connected to environment. machines: 0: agent-state: not-started dns-name: ec2-54-242-142-196.compute-1.amazonaws.com instance-id: i-b0996fcf instance-state: running 1: agent-state: down dns-name: ec2-50-19-186-245.compute-1.amazonaws.com instance-id: i-8c8375f3 instance-state: running 2: agent-state: down dns-name: ec2-54-242-255-238.compute-1.amazonaws.com instance-id: i-56807629 instance-state: running services: wordpress: charm: cs:precise/wordpress-9 exposed: true relations: db: - wordpress-db loadbalancer: - wordpress units: wordpress/0: agent-state: down machine: 2 open-ports: - 80/tcp public-address: ec2-54-242-227-57.compute-1.amazonaws.com wordpress-db: charm: cs:precise/mysql-10 relations: db: - wordpress units: wordpress-db/0: agent-state: down machine: 1 public-address: ec2-54-242-212-177.compute-1.amazonaws.com 2012-11-17 17:06:47,274 INFO 'status' command finished successfully [/code] Can I not take my instances down for a while? Or is this something else?

    Read the article

  • Cannot call scalar-valued CLR UDF from select ... from table statement

    - by Henrik B
    I have created a scalar-valued CLR UDF (user defined function). It takes a timezone id and a datetime and returns the datetime converted to that timezone. I can call it from a simple select without problems: "select dbo.udfConvert('Romance Standard Time', @datetime)" (@datetime is of course a valid datetime variable) But if I call it passing in a datetime from a table it fails: "select dbo.udfConvert('Romance Standard Time', StartTime) from sometable" (column StartTime is of course a column of type datetime) The error message is: "Cannot find either column "dbo" or the user-defined function or aggregate "dbo.udfConvert", or the name is ambiguous." This message is really for beginners that has misspelled something, but as it works in one case and not in the other, I don't think I have done any misspellings. Any ideas?

    Read the article

  • Why does Perl complain "Can't modify constant item in scalar assignment"?

    - by joe
    I have this Perl subroutine that is causing a problem: sub new { my $class = shift; my $ldap_obj = Net::LDAP->new( 'test.company.com' ) or die "$@"; my $self = { _ldap = $ldap_obj, _dn ='dc=users,dc=ldap,dc=company,dc=com', _dn_login = 'dc=login,dc=ldap,dc=company,dc=com', _description ='company', }; # Print all the values just for clarification. bless $self, $class; return $self; } what is wrong on this code : i got this error Can't modify constant item in scalar assignment at Core.pm line 12, near "$ldap_obj,"

    Read the article

  • Will an optimizing compiler remove calls to a method whose result will be multiplied by zero?

    - by Tim R.
    Suppose you have a computationally expensive method, Compute(p), which returns some float, and another method, Falloff(p), which returns another float from zero to one. If you compute Falloff(p) * Compute(p), will Compute(p) still run when Falloff(p) returns zero? Or would you need to write a special case to prevent Compute(p) from running unnecessarily? Theoretically, an optimizing compiler could determine that omitting Compute when Falloff returns zero would have no effect on the program. However, this is kind of hard to test, since if you have Compute output some debug data to determine whether it is running, the compiler would know not to omit it because of that debug info, resulting in sort of a Schrodinger's cat situation. I know the safe solution to this problem is just to add the special case, but I'm just curious.

    Read the article

  • Cleaner way to store to replace a scalar hash value with an array ref?

    - by user275455
    I am building a hash where the keys, associated with scalars, are not necessarily unique. I want the desired behavior to be that if the key is unique, the value is the scalar. If the key is not unique, I want the value to be an array reference of the scalars associated witht the key. Since the hash is built up iteratively, I don't know if the key is unique ahead of time. Right now, I am doing something like this: if(!defined($hash{$key})){ $hash{$key} = $val; } elseif(ref($hash{$key}) ne 'ARRAY'){ my @a; push(@a, $hash{$key}); push(@, $val); $hash{$key} = \@a; } else{ push(@{$hash{$key}}, $val); } Is there a simpler way to do this?

    Read the article

  • T-SQL User-Defined Functions: the good, the bad, and the ugly (part 2)

    - by Hugo Kornelis
    In a previous blog post , I demonstrated just how much you can hurt your performance by encapsulating expressions and computations in a user-defined function (UDF). I focused on scalar functions that didn’t include any data access. In this post, I will complete the discussion on scalar UDFs by covering the effect of data access in a scalar UDF. Note that, like the previous post, this all applies to T-SQL user-defined functions only. SQL Server also supports CLR user-defined functions (written in...(read more)

    Read the article

  • T-SQL User-Defined Functions: the good, the bad, and the ugly (part 3)

    - by Hugo Kornelis
    I showed why T-SQL scalar user-defined functions are bad for performance in two previous posts. In this post, I will show that CLR scalar user-defined functions are bad as well (though not always quite as bad as T-SQL scalar user-defined functions). I will admit that I had not really planned to cover CLR in this series. But shortly after publishing the first part , I received an email from Adam Machanic , which basically said that I should make clear that the information in that post does not apply...(read more)

    Read the article

  • T-SQL User-Defined Functions: the good, the bad, and the ugly (part 2)

    - by Hugo Kornelis
    In a previous blog post , I demonstrated just how much you can hurt your performance by encapsulating expressions and computations in a user-defined function (UDF). I focused on scalar functions that didn’t include any data access. In this post, I will complete the discussion on scalar UDFs by covering the effect of data access in a scalar UDF. Note that, like the previous post, this all applies to T-SQL user-defined functions only. SQL Server also supports CLR user-defined functions (written in...(read more)

    Read the article

  • T-SQL User-Defined Functions: the good, the bad, and the ugly (part 3)

    - by Hugo Kornelis
    I showed why T-SQL scalar user-defined functions are bad for performance in two previous posts. In this post, I will show that CLR scalar user-defined functions are bad as well (though not always quite as bad as T-SQL scalar user-defined functions). I will admit that I had not really planned to cover CLR in this series. But shortly after publishing the first part , I received an email from Adam Machanic , which basically said that I should make clear that the information in that post does not apply...(read more)

    Read the article

  • How to call stored procedure by hibernate?

    - by user367097
    Hi I have an oracle stored procedure GET_VENDOR_STATUS_COUNT(DOCUMENT_ID IN NUMBER , NOT_INVITED OUT NUMBER,INVITE_WITHDRAWN OUT NUMBER,... rest all parameters are OUT parameters. In hbm file I have written - <sql-query name="getVendorStatus" callable="true"> <return-scalar column="NOT_INVITED" type="string"/> <return-scalar column="INVITE_WITHDRAWN" type="string"/> <return-scalar column="INVITED" type="string"/> <return-scalar column="DISQUALIFIED" type="string"/> <return-scalar column="RESPONSE_AWAITED" type="string"/> <return-scalar column="RESPONSE_IN_PROGRESS" type="string"/> <return-scalar column="RESPONSE_RECEIVED" type="string"/> { call GET_VENDOR_STATUS_COUNT(:DOCUMENT_ID , :NOT_INVITED ,:INVITE_WITHDRAWN ,:INVITED ,:DISQUALIFIED ,:RESPONSE_AWAITED ,:RESPONSE_IN_PROGRESS ,:RESPONSE_RECEIVED ) } </sql-query> In java I have written - session.getNamedQuery("getVendorStatus").setParameter("DOCUMENT_ID", "DOCUMENT_ID").setParameter("NOT_INVITED", "NOT_INVITED") ... continue till all the parametes . I am getting the sql exception 18:29:33,056 WARN [JDBCExceptionReporter] SQL Error: 1006, SQLState: 72000 18:29:33,056 ERROR [JDBCExceptionReporter] ORA-01006: bind variable does not exist Please let me know what is the exact process of calling a stored procedure from hibernate. I do not want to use JDBC callable statement.

    Read the article

  • Azure eBook Update #1 &ndash; 16 authors so far!

    - by Eric Nelson
    I just wanted to share with folks where we are up to with the Windows Azure eBook (Check out the original post for full details) I have had lots of great submissions from folks with some awesome stuff to share on Azure. Currently we have 16 authors and 25 proposed articles. There is still a couple of days left to submit your proposal if you would like to get involved (see the original post ) and some topic suggestions below for which we don’t currently have authors. It is official – I’m excited! :-) Article Area Accepted Wikipedia Explorer: A case study how we did it and why. CaseSetudy Optional Patterns for the Windows Azure Platform (picking up 1 or 2 patterns that seem to be evolving) Architecture Optional Azure and cost-oriented architecture. Architecture Yes Code walkthrough of a comprehensive application submitted to newCloudApp contest CaseSetudy Yes Principles of highly scalable apps on Azure Compute Optional Auto-Scaling Azure Compute Yes Implementing a distributed cache using memcached with worker roles Interop Yes Building a content-based router service to direct requests to internal HTTP endpoints Compute Optional How to debug an Azure app by with a custom TraceListener & the AppFabric Service Bus AppFabric Yes How to host Java apps in Azure Interop Yes Bing Maps Tile Servers using Azure Blog Storage Interop Yes Tricks for storing time and date fields in Table Storage Storage Yes Service Runtime in Windows Azure Compute Yes Azure Drive Storage Optional Queries in Azure Table Storage Optional Getting RubyOnRails running on Azure Interop Yes Consuming Azure services within Windows Phone Interop Yes De-risking Your First Azure Project Architecture Yes Designing for failure Architecture Optional Connecting to SQL Azure In x Minutes SQLAzure Yes Using Azure Table Service as a NoSQL store via the REST API Storage Yes Azure Table Service REST API Storage Optional Threading, Scalability and Reliability in the Cloud Compute Yes Azure Diagnostics Compute Yes 5 steps to getting started with Windows Azure Introduction Yes The best tools for working with Windows Azure Tools Author Needed Understanding how SQL Azure works SQLAzure Author Needed Getting started with AppFabric Control Services AppFabric Author Needed Using the Microsoft Sync Framework with SQL Azure SQLAzure Author Needed Dallas - just a TV show or something more? Dallas Author Needed Comparing Azure to other cloud offerings Interop Author Needed Hybrid solutions using Azure and on-premise Interop Author Needed

    Read the article

  • Javascript: Machine Constants Applicable?

    - by DavidB2013
    I write numerical routines for students of science and engineering (although they are freely available for use by anybody else as well) and am wondering how to properly use machine constants in a JavaScript program, or if they are even applicable. For example, say I am writing a program in C++ that numerically computes the roots of the following equation: exp(-0.7x) + sin(3x) - 1.2x + 0.3546 = 0 A root-finding routine should be able to compute roots to within the machine epsilon. In C++, this value is specified by the language: DBL_EPSILON. C++ also specifies the smallest and largest values that can be held by a float or double variable. However, how does this convert to JavaScript? Since a Javascript program runs in a web browser, and I don't know what kind of computer will run the program, and JavaScript does not have corresponding predefined values for these quantities, how can I implement my own version of these constants so that my programs compute results to as much accuracy as allowed on the computer running the web browser? My first draft is to simply copy over the literal constants from C++: FLT_MIN: 1.17549435082229e-038 FLT_MAX: 3.40282346638529e+038 DBL_EPSILON: 2.2204460492503131e-16 I am also willing to write small code blocks that could compute these values for each machine on which the program is run. That way, a supercomputer might compute results to a higher accuracy than an old, low-level, PC. BUT, I don't know if such a routine would actually reach the computer, in which case, I would be wasting my time. Anybody here know how to compute and use (in Javascript) values that correspond to machine constants in a compiled language? Is it worth my time to write small programs in Javascript that compute DBL_EPSILON, FLT_MIN, FLT_MIN, etc. for use in numerical routines? Or am I better off simply assigning literal constants that come straight from C++ on a standard Windows PC?

    Read the article

  • How to update non-scalar entity properties in EF 4.0?

    - by Mike
    At first I was using this as an extension method to update my detached entities... Public Sub AttachUpdated(ByVal obj As ObjectContext, ByVal objectDetached As EntityObject) If objectDetached.EntityState = EntityState.Detached Then Dim original As Object = Nothing If obj.TryGetObjectByKey(objectDetached.EntityKey, original) Then obj.ApplyCurrentValues(objectDetached.EntityKey.EntitySetName, objectDetached) Else Throw New ObjectNotFoundException() End If End If End Sub Everything has been working great until I had to update non-scalar properties. Correct me if I am wrong but that is because "ApplyCurrentValues" only supports scalars. To get around this I was just saving the FK_ID field instead of the entity object relation. Now I am faced with a many to many relationship so its not that simple. I would like to do something like this... Dim Resource = RelatedResource.GetByID(item.Value) Condition.RelatedResources.Add(Resource) But when I call SaveChanges the added Resources aren't saved. I started to play around with self-tracking entities (not sure if they will help solve my prob) but it seems they cannot be serialized to ViewState and this is a requirement for me. I guess one solution would be to add the xRef table as an entity and add the fks myself but I would rather it just work how I expect it too. I am open to any suggestions on how to either save my many to many relationships or serialize self-tracking entities (if self-trackingwould even solve my problem). Thanks!

    Read the article

  • How do I check to see if a scalar has a compiled regex in it with Perl?

    - by Robert P
    Let's say I have a subroutine/method that a user can call to test some data that (as an example) might look like this: sub test_output { my ($self, $test) = @_; my $output = $self->long_process_to_get_data(); if ($output =~ /\Q$test/) { $self->assert_something(); } else { $self->do_something_else(); } } Normally, $test is a string, which we're looking for anywhere in the output. This was an interface put together to make calling it very easy. However, we've found that sometimes, a straight string is problematic - for example, a large, possibly varying number of spaces...a pattern, if you will. Thus, I'd like to let them pass in a regex as an option. I could just do: $output =~ $test if I could assume that it's always a regex, but ah, but the backwards compatibility! If they pass in a string, it still needs to test it like a raw string. So in that case, I'll need to test to see if $test is a regex. Is there any good facility for detecting whether or not a scalar has a compiled regex in it?

    Read the article

  • How do I check if a scalar has a compiled regex in it with Perl?

    - by Robert P
    Let's say I have a subroutine/method that a user can call to test some data that (as an example) might look like this: sub test_output { my ($self, $test) = @_; my $output = $self->long_process_to_get_data(); if ($output =~ /\Q$test/) { $self->assert_something(); } else { $self->do_something_else(); } } Normally, $test is a string, which we're looking for anywhere in the output. This was an interface put together to make calling it very easy. However, we've found that sometimes, a straight string is problematic - for example, a large, possibly varying number of spaces...a pattern, if you will. Thus, I'd like to let them pass in a regex as an option. I could just do: $output =~ $test if I could assume that it's always a regex, but ah, but the backwards compatibility! If they pass in a string, it still needs to test it like a raw string. So in that case, I'll need to test to see if $test is a regex. Is there any good facility for detecting whether or not a scalar has a compiled regex in it?

    Read the article

  • Better to build or buy a compute grid platform?

    - by James B
    I am looking to do some quite processor-intensive brute force processing for string matching. I have run my prototype in a multi-threaded environment and compared the performance to an implementation using Gridgain with a couple of nodes (also multithreaded). The performance I observed was that my Gridgain implementation performed slower to my multithreaded implementation. It could be the case that there was a flaw in my gridgain implementation, but it was only a prototype, and I thought the results were indicative. So my question is this: What are the advantages of having to learn and then build an implementation for a particular grid platform (hadoop, gridgain, or EC2 if going hosted - other suggestions welcome), when one could fairly easily put together a lightweight compute grid platform with a much shallower learning curve?...i.e. what do we get for free with these cloud/grid platforms that are worth having/tricky to implement? (Please note, I don't have any need for a data grid) Cheers, -James (p.s. Happy to make this community wiki if needbe)

    Read the article

  • Java or Python distributed compute job (on a student budget)?

    - by midget_sadhu
    I have a large dataset (c. 40G) that I want to use for some NLP (largely embarrassingly parallel) over a couple of computers in the lab, to which i do not have root access, and only 1G of user space. I experimented with hadoop, but of course this was dead in the water-- the data is stored on an external usb hard drive, and i cant load it on to the dfs because of the 1G user space cap. I have been looking into a couple of python based options (as I'd rather use NLTK instead of Java's lingpipe if I can help it), and it seems distributed compute options look like: Ipython DISCO After my hadoop experience, i am trying to make sure i try and make an informed choice -- any help on what might be more appropriate would be greatly appreciated. Amazon's EC2 etc not really an option, as i have next to no budget.

    Read the article

  • How can I compute the average cost for this solution of the element uniqueness problem?

    - by Alceu Costa
    In the book Introduction to the Design & Analysis of Algorithms, the following solution is proposed to the element uniqueness problem: ALGORITHM UniqueElements(A[0 .. n-1]) // Determines whether all the elements in a given array are distinct // Input: An array A[0 .. n-1] // Output: Returns "true" if all the elements in A are distinct // and false otherwise. for i := 0 to n - 2 do for j := i + 1 to n - 1 do if A[i] = A[j] return false return true How can I compute the average cost (i.e. number of comparisons for a given n) for this algorithm? What is a reasonable assumption about the input?

    Read the article

  • Why can't I assign a scalar value to a class using shorthand, but instead declare it first, then set

    - by ~delan-azabani
    I am writing a UTF-8 library for C++ as an exercise as this is my first real-world C++ code. So far, I've implemented concatenation, character indexing, parsing and encoding UTF-8 in a class called "ustring". It looks like it's working, but two (seemingly equivalent) ways of declaring a new ustring behave differently. The first way: ustring a; a = "test"; works, and the overloaded "=" operator parses the string into the class (which stores the Unicode strings as an dynamically allocated int pointer). However, the following does not work: ustring a = "test"; because I get the following error: test.cpp:4: error: conversion from ‘const char [5]’ to non-scalar type ‘ustring’ requested Is there a way to workaround this error? It probably is a problem with my code, though. The following is what I've written so far for the library: #include <cstdlib> #include <cstring> class ustring { int * values; long len; public: long length() { return len; } ustring * operator=(ustring input) { len = input.len; values = (int *) malloc(sizeof(int) * len); for (long i = 0; i < len; i++) values[i] = input.values[i]; return this; } ustring * operator=(char input[]) { len = sizeof(input); values = (int *) malloc(0); long s = 0; // s = number of parsed chars int a, b, c, d, contNeed = 0, cont = 0; for (long i = 0; i < sizeof(input); i++) if (input[i] < 0x80) { // ASCII, direct copy (00-7f) values = (int *) realloc(values, sizeof(int) * ++s); values[s - 1] = input[i]; } else if (input[i] < 0xc0) { // this is a continuation (80-bf) if (cont == contNeed) { // no need for continuation, use U+fffd values = (int *) realloc(values, sizeof(int) * ++s); values[s - 1] = 0xfffd; } cont = cont + 1; values[s - 1] = values[s - 1] | ((input[i] & 0x3f) << ((contNeed - cont) * 6)); if (cont == contNeed) cont = contNeed = 0; } else if (input[i] < 0xc2) { // invalid byte, use U+fffd (c0-c1) values = (int *) realloc(values, sizeof(int) * ++s); values[s - 1] = 0xfffd; } else if (input[i] < 0xe0) { // start of 2-byte sequence (c2-df) contNeed = 1; values = (int *) realloc(values, sizeof(int) * ++s); values[s - 1] = (input[i] & 0x1f) << 6; } else if (input[i] < 0xf0) { // start of 3-byte sequence (e0-ef) contNeed = 2; values = (int *) realloc(values, sizeof(int) * ++s); values[s - 1] = (input[i] & 0x0f) << 12; } else if (input[i] < 0xf5) { // start of 4-byte sequence (f0-f4) contNeed = 3; values = (int *) realloc(values, sizeof(int) * ++s); values[s - 1] = (input[i] & 0x07) << 18; } else { // restricted or invalid (f5-ff) values = (int *) realloc(values, sizeof(int) * ++s); values[s - 1] = 0xfffd; } return this; } ustring operator+(ustring input) { ustring result; result.len = len + input.len; result.values = (int *) malloc(sizeof(int) * result.len); for (long i = 0; i < len; i++) result.values[i] = values[i]; for (long i = 0; i < input.len; i++) result.values[i + len] = input.values[i]; return result; } ustring operator[](long index) { ustring result; result.len = 1; result.values = (int *) malloc(sizeof(int)); result.values[0] = values[index]; return result; } char * encode() { char * r = (char *) malloc(0); long s = 0; for (long i = 0; i < len; i++) { if (values[i] < 0x80) r = (char *) realloc(r, s + 1), r[s + 0] = char(values[i]), s += 1; else if (values[i] < 0x800) r = (char *) realloc(r, s + 2), r[s + 0] = char(values[i] >> 6 | 0x60), r[s + 1] = char(values[i] & 0x3f | 0x80), s += 2; else if (values[i] < 0x10000) r = (char *) realloc(r, s + 3), r[s + 0] = char(values[i] >> 12 | 0xe0), r[s + 1] = char(values[i] >> 6 & 0x3f | 0x80), r[s + 2] = char(values[i] & 0x3f | 0x80), s += 3; else r = (char *) realloc(r, s + 4), r[s + 0] = char(values[i] >> 18 | 0xf0), r[s + 1] = char(values[i] >> 12 & 0x3f | 0x80), r[s + 2] = char(values[i] >> 6 & 0x3f | 0x80), r[s + 3] = char(values[i] & 0x3f | 0x80), s += 4; } return r; } };

    Read the article

  • Fun with Aggregates

    - by Paul White
    There are interesting things to be learned from even the simplest queries.  For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join.  The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units.  The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too?  To answer that, let’s look at some other ways to formulate this query.  This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones.  The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above.  It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join!  The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join.  In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting.  The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set.  In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL).  To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709;   -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case.  The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate.  This is something I would like to see exposed in show plan so I suggested it on Connect.  Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all.  Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans.  Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :)  The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier.  What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too).  Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places.  It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates.  As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

    Read the article

  • Fun with Aggregates

    - by Paul White
    There are interesting things to be learned from even the simplest queries.  For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join.  The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units.  The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too?  To answer that, let’s look at some other ways to formulate this query.  This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones.  The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above.  It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join!  The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join.  In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting.  The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set.  In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL).  To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709;   -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case.  The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate.  This is something I would like to see exposed in show plan so I suggested it on Connect.  Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all.  Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans.  Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :)  The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier.  What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too).  Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places.  It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates.  As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

    Read the article

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14  | Next Page >