Search Results

Search found 91 results on 4 pages for 'correlated'.

Page 1/4 | 1 2 3 4 | Next Page >

correlated query /subquery VS join query

- by trojanwarrior3000

can we always convert a usual subquery/correlated subquery to join type query?

Read the article
Oracle - correlated subquery problems

- by FrustratedWithFormsDesigner

I have this query: select acc_num from (select distinct ac_outer.acc_num, ac_outer.owner from ac_tab ac_outer where (ac_outer.owner = '1234567') and ac_outer.owner = (select sq.owner from (select a1.owner from ac_tab a1 where a1.acc_num = ac_outer.acc_num order by a1.a_date desc, a1.b_date desc, a1.c_date desc) sq where rownum = 1) order by dbms_random.value()) subq order by acc_num; The idea is to get all acc_nums (not a primary key) from ac_tab, that have an owner of 1234567. Since an acc_num in ac_tab could have changed owners over time, I am trying to use the inner correlated subqueries to ensure that an acc_num is returned ONLY if it's most recent owner is 12345678. Naturally, it doesn't work (or I wouldn't be posting here ;) ) Oracle gives me an error: ORA-000904 ac_outer.acc_num is an invalid identifier. I thought that ac_outer should be visible to the correlated subqueries, but for some reason it's not. Is there a way to fix the query, or do I have to resort to PL/SQL to solve this? (Oracle verison is 10g)

Read the article
MySql scoping problem with correlated subqueries

- by Rolf

Hi, I'm having this Mysql query, It works: SELECT nom ,prenom ,(SELECT GROUP_CONCAT(category_en) FROM (SELECT DISTINCT category_en FROM categories c WHERE id IN (SELECT DISTINCT category_id FROM m3allems_to_categories m2c WHERE m3allem_id = 37) ) cS ) categories ,(SELECT GROUP_CONCAT(area_en) FROM (SELECT DISTINCT area_en FROM areas c WHERE id IN (SELECT DISTINCT area_id FROM m3allems_to_areas m2a WHERE m3allem_id = 37) ) aSq ) areas FROM m3allems m WHERE m.id = 37 The result is: nom prenom categories areas Man Multi Carpentry,Paint,Walls Beirut,Baalbak,Saida It works correclty, but only when i hardcode into the query the id that I want (37). I want it to work for all entries in the m3allem table, so I try this: SELECT nom ,prenom ,(SELECT GROUP_CONCAT(category_en) FROM (SELECT DISTINCT category_en FROM categories c WHERE id IN (SELECT DISTINCT category_id FROM m3allems_to_categories m2c WHERE m3allem_id = m.id) ) cS ) categories ,(SELECT GROUP_CONCAT(area_en) FROM (SELECT DISTINCT area_en FROM areas c WHERE id IN (SELECT DISTINCT area_id FROM m3allems_to_areas m2a WHERE m3allem_id = m.id) ) aSq ) areas FROM m3allems m And I get an error: Unknown column 'm.id' in 'where clause' Why? From the MySql manual: 13.2.8.7. Correlated Subqueries [...] Scoping rule: MySQL evaluates from inside to outside. So... do this not work when the subquery is in a SELECT section? I did not read anything about that. Does anyone know? What should I do? It took me a long time to build this query... I know it's a monster query but it gets what I want in a single query, and I am so close to getting it to work! Can anyone help?

Read the article
MySQL/SQL: Update with correlated subquery from the updated table itself

- by Roee Adler

I have a generic question that I will try to explain using an example. Say I have a table with the fields: "id", "name", "category", "appearances" and "ratio" The idea is that I have several items, each related to a single category and "appears" several times. The ratio field should include the percentage of each item's appearances out of the total number of appearances of items in the category. In pseudo-code what I need is the following: For each category find the total sum of appearances for items related to it. For example it can be done with (select sum("appearances") from table group by category) For each item set the ratio value as the item's appearances divided by the sum found for the category above Now I'm trying to achieve this with a single update query, but can't seem to do it. What I thought I should do is: update Table T set T.ratio = T.appearances / ( select sum(S.appearances) from Table S where S.id = T.id ) But MySQL does not accept the alias T in the update column, and I did not find other ways of achieving this. Any ideas?

Read the article
Efficient way to get highly correlated pairs from large data set in Python or R

- by Akavall

I have a large data set (Let's say 10,000 variables with about 1000 elements each), we can think of it as 2D list, something like: [[variable_1], [variable_2], ............ [variable_n] ] I want to extract highly correlated variable pairs from that data. I want "highly correlated" to be a parameter that I can choose. I don't need all pairs to be extracted, and I don't necessarily want the most correlated pairs. As long as there is an efficient method that gets me highly correlated pairs I am happy. Also, it would be nice if a variable does not show up in more than one pair. Although this might not be crucial. Of course, there is a brute force way to finding such pairs, but it is too slow for me. I've googled around for a bit and found some theoretical work on this issue, but I wasn't able for find a package that could do what I am looking for. I mostly work in python, so a package in python would be most helpful, but if there exists a package in R that does what I am looking for it will be great. Does anyone know of a package that does the above in Python or R? Or any other ideas? Thank You in Advance

Read the article
How can I update a record using a correlated subquery?

- by froadie

I have a function that accepts one parameter and returns a table/resultset. I want to set a field in a table to the first result of that recordset, passing in one of the table's other fields as the parameter. If that's too complicated in words, the query looks something like this: UPDATE myTable SET myField = (SELECT TOP 1 myFunctionField FROM fn_doSomething(myOtherField) WHERE someCondition = 'something') WHERE someOtherCondition = 'somethingElse' In this example, myField and myOtherField are fields in myTable, and myFunctionField is a field return by fn_doSomething. This seems logical to me, but I'm getting the following strange error: 'myOtherField' is not a recognized OPTIMIZER LOCK HINTS option. Any idea what I'm doing wrong, and how I can accomplish this? *UPDATE: * Based on Anil Soman's answer, I realized that the function is expecting a string parameter and the field being passed is an integer. I'm not sure if this should be a problem as an explicit call to the function using an integer value works - e.g. fn_doSomething(12345) seems to automatically cast the number to an string. However, I tried to do an explicit cast: UPDATE myTable SET myField = (SELECT TOP 1 myFunctionField FROM fn_doSomething(CAST(myOtherField AS varchar(1000))) WHERE someCondition = 'something') WHERE someOtherCondition = 'somethingElse' Now I'm getting the following error: Line 5: Incorrect syntax near '('.

Read the article
Correlated SQL Join Query from multiple tables

- by SooDesuNe

I have two tables like the ones below. I need to find what exchangeRate was in effect at the dateOfPurchase. I've tried some correlated sub queries, but I'm having difficulty getting the correlated record to be used in the sub queries. I expect a solution will need to follow this basic outline: SELECT only the exchangeRates for the applicable countryCode From 1. SELECT the newest exchangeRate less than the dateOfPurchase Fill in the query table with all the fields from 2. and the purchasesTable. My Tables: purchasesTable: > dateOfPurchase | costOfPurchase | countryOfPurchase > 29-March-2010 | 20.00 | EUR > 29-March-2010 | 3000 | JPN > 30-March-2010 | 50.00 | EUR > 30-March-2010 | 3000 | JPN > 30-March-2010 | 2000 | JPN > 31-March-2010 | 100.00 | EUR > 31-March-2010 | 125.00 | EUR > 31-March-2010 | 2000 | JPN > 31-March-2010 | 2400 | JPN costOfPurchase is in whatever the local currency is for a given countryCode exchangeRateTable > effectiveDate | countryCode | exchangeRate > 29-March-2010 | JPN | 90 > 29-March-2010 | EUR | 1.75 > 30-March-2010 | JPN | 92 > 31-March-2010 | JPN | 91 The results of the query that I'm looking for: > dateOfPurchase | costOfPurchase | countryOfPurchase | exchangeRate > 29-March-2010 | 20.00 | EUR | 1.75 > 29-March-2010 | 3000 | JPN | 90 > 30-March-2010 | 50.00 | EUR | 1.75 > 30-March-2010 | 3000 | JPN | 92 > 30-March-2010 | 2000 | JPN | 92 > 31-March-2010 | 100.00 | EUR | 1.75 > 31-March-2010 | 125.00 | EUR | 1.75 > 31-March-2010 | 2000 | JPN | 91 > 31-March-2010 | 2400 | JPN | 91 So for example in the results, the exchange rate, in effect for EUR on 31-March was 1.75. I'm using Access, but a MySQL answer would be fine too. UPDATE: Modification to Allan's answer: SELECT dateOfPurchase, costOfPurchase, countryOfPurchase, exchangeRate FROM purchasesTable p LEFT OUTER JOIN (SELECT e1.exchangeRate, e1.countryCode, e1.effectiveDate, min(e2.effectiveDate) AS enddate FROM exchangeRateTable e1 LEFT OUTER JOIN exchangeRateTable e2 ON e1.effectiveDate < e2.effectiveDate AND e1.countryCode = e2.countryCode GROUP BY e1.exchangeRate, e1.countryCode, e1.effectiveDate) e ON p.dateOfPurchase >= e.effectiveDate AND (p.dateOfPurchase < e.enddate OR e.enddate is null) AND p.countryOfPurchase = e.countryCode I had to make a couple small changes.

Read the article
How to generate correlated binary variables

- by jonalm

Dear All I need to generate a series of N random binary variables with a given correlation function. Let x = {x_i} be a series of binary variables (taking the value 0 or 1, i running form 1 to N). The marginal probability is given Pr(x_i = 1) = p, and the values should be correlated in the following way E[ x_i x_j ] = const * |i-j|^-alfa where alfa is a positive number. Is it possible to generate a series like this? preferably in python.

Read the article
Impact of ordering of correlated subqueries within a projection

- by Michael Petito

I'm noticing something a bit unexpected with how SQL Server (SQL Server 2008 in this case) treats correlated subqueries within a select statement. My assumption was that a query plan should not be affected by the mere order in which subqueries (or columns, for that matter) are written within the projection clause of the select statement. However, this does not appear to be the case. Consider the following two queries, which are identical except for the ordering of the subqueries within the CTE: --query 1: subquery for Color is second WITH vw AS ( SELECT p.[ID], (SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName], (SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color] FROM Person p ) SELECT ID, Color, FirstName FROM vw WHERE Color = 'Gray'; --query 2: subquery for Color is first WITH vw AS ( SELECT p.[ID], (SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color], (SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName] FROM Person p ) SELECT ID, Color, FirstName FROM vw WHERE Color = 'Gray'; If you look at the two query plans, you'll see that an outer join is used for each subquery and that the order of the joins is the same as the order the subqueries are written. There is a filter applied to the result of the outer join for color, to filter out rows where the color is not 'Gray'. (It's odd to me that SQL would use an outer join for the color subquery since I have a non-null constraint on the result of the color subquery, but OK.) Most of the rows are removed by the color filter. The result is that query 2 is significantly cheaper than query 1 because fewer rows are involved with the second join. All reasons for constructing such a statement aside, is this an expected behavior? Shouldn't SQL server opt to move the filter as early as possible in the query plan, regardless of the order the subqueries are written?

Read the article
Can't use where clause on correlated columns.

- by Keyo

I want to add a where clause to make sure video_count is greater than zero. Only categories which are referenced once or more in video_category.video_id should be returned. Because video_count is not a field in any table I cannot do this. Here is the query. SELECT category . * , ( SELECT COUNT( * ) FROM video_category WHERE video_category.category_id = category.category_id ) AS 'video_count' FROM category WHERE category.status = 1 AND video_count > '0' AND publish_date < NOW() ORDER BY updated DESC; Thanks for the help.

Read the article
subquery factoring questions.

- by Sujee

Hi, Please explain. a) is "subquery factoring" used to replace a non-correlated subquery? What about correlated subquery? b) if it is true, are "subquery" and "subquery factoring" executed exactly once? c) "subquery" vs "subquery factoring" which one is better Thank you.

Read the article
Power management on Android -- is app CPU correlated to power usage? [closed]

- by foampile

2 questions: Is application CPU usage on Android correlated and how highly to battery usage? In other words, are apps that suck a lot of CPU also draining the battery or not necessarily? Is there a way to itemize and display the phone's power use by application, at any given point in time as well as within defined time buckets and maybe view charts and such? Sort of like a diagnostic monitor for power usage by application or system component? Thanks

Read the article
How can I measure the quality of a fax file?

- by debita

Hi! I'm trying to make a tool that can measure the quality of a fax file, comparing the received one with the one sent. I tried Phase_Correlation software, in order to see if the images are similar... but it's not enough. My purpose is to evaluate if the fax is legible after the transmission. Any ideas? Is there any way of comparing two tiffs? pdfs? or image files? Thanks a lot

Read the article
Using outer query result in a subquery in postgresql

- by brad

I have two tables points and contacts and I'm trying to get the average points.score per contact grouped on a monthly basis. Note that points and contacts aren't related, I just want the sum of points created in a month divided by the number of contacts that existed in that month. So, I need to sum points grouped by the created_at month, and I need to take the count of contacts FOR THAT MONTH ONLY. It's that last part that's tricking me up. I'm not sure how I can use a column from an outer query in the subquery. I tried something like this: SELECT SUM(score) AS points_sum, EXTRACT(month FROM created_at) AS month, date_trunc('MONTH', created_at) + INTERVAL '1 month' AS next_month, (SELECT COUNT(id) FROM contacts WHERE contacts.created_at <= next_month) as contact_count FROM points GROUP BY month, next_month ORDER BY month So, I'm extracting the actual month that my points are being summed, and at the same time, getting the beginning of the next_month so that I can say "Get me the count of contacts where their created at is < next_month" But it complains that column next_month doesn't exist This is understandable as the subquery knows nothing about the outer query. Qualifying with points.next_month doesn't work either. So can someone point me in the right direction of how to achieve this? Tables: Points score | created_at 10 | "2011-11-15 21:44:00.363423" 11 | "2011-10-15 21:44:00.69667" 12 | "2011-09-15 21:44:00.773289" 13 | "2011-08-15 21:44:00.848838" 14 | "2011-07-15 21:44:00.924152" Contacts id | created_at 6 | "2011-07-15 21:43:17.534777" 5 | "2011-08-15 21:43:17.520828" 4 | "2011-09-15 21:43:17.506452" 3 | "2011-10-15 21:43:17.491848" 1 | "2011-11-15 21:42:54.759225" sum, month and next_month (without the subselect) sum | month | next_month 14 | 7 | "2011-08-01 00:00:00" 13 | 8 | "2011-09-01 00:00:00" 12 | 9 | "2011-10-01 00:00:00" 11 | 10 | "2011-11-01 00:00:00" 10 | 11 | "2011-12-01 00:00:00"

Read the article
SQL Server uncorrelated subquery very slow

- by brianberns

I have a simple, uncorrelated subquery that performs very poorly on SQL Server. I'm not very experienced at reading execution plans, but it looks like the inner query is being executed once for every row in the outer query, even though the results are the same each time. What can I do to tell SQL Server to execute the inner query only once? The query looks like this: select * from Record record0_ where record0_.RecordTypeFK='c2a0ffa5-d23b-11db-9ea3-000e7f30d6a2' and ( record0_.EntityFK in ( select record1_.EntityFK from Record record1_ join RecordTextValue textvalues2_ on record1_.PK=textvalues2_.RecordFK and textvalues2_.FieldFK = '0d323c22-0ec2-11e0-a148-0018f3dde540' and (textvalues2_.Value like 'O%' escape '~') ) )

Read the article
MySQL multiple dependent subqueries, painfully slow

- by matt80

I have a working query that retrieves the data that I need, but unfortunately it is painfully slow (runs over 3 minutes). I have indexes in place, but I think the problem is the multiple dependent subqueries. I've been trying to rewrite the query using joins but I can't seem to get it to work. Any help would be greatly appreciated. The tables: Basically, I have 2 tables. The first (prices) holds the prices of items in a store. Each row is the price of an item that day, and new rows are added every day with an updated price. The second table (watches_US) holds the item information (name, description, etc). CREATE TABLE `prices` ( `prices_id` int(11) NOT NULL auto_increment, `prices_locale` enum('CA','DE','FR','JP','UK','US') NOT NULL default 'US', `prices_watches_ID` char(10) NOT NULL, `prices_date` datetime NOT NULL, `prices_am` varchar(10) default NULL, `prices_new` varchar(10) default NULL, `prices_used` varchar(10) default NULL, PRIMARY KEY (`prices_id`), KEY `prices_am` (`prices_am`), KEY `prices_locale` (`prices_locale`), KEY `prices_watches_ID` (`prices_watches_ID`), KEY `prices_date` (`prices_date`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=61764 ; CREATE TABLE `watches_US` ( `watches_ID` char(10) NOT NULL, `watches_date_added` datetime NOT NULL, `watches_last_update` datetime default NULL, `watches_title` varchar(255) default NULL, `watches_small_image_height` int(11) default NULL, `watches_small_image_width` int(11) default NULL, `watches_description` text, PRIMARY KEY (`watches_ID`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; The query retrieves the last 10 prices changes over a period of 30 hours, ordered by the size of the price change. So I have subqueries to get the newest price, the oldest price within 30 hours, and then to calculate the price change. Here's the query: SELECT watches_US.*, prices.*, watches_US.watches_ID as current_ID, ( SELECT prices_am FROM prices WHERE prices_watches_ID = current_ID AND prices_locale = 'US' ORDER BY prices_date DESC LIMIT 1 ) as new_price, ( SELECT prices_date FROM prices WHERE prices_watches_ID = current_ID AND prices_locale = 'US' ORDER BY prices_date DESC LIMIT 1 ) as new_price_date, ( SELECT prices_am FROM prices WHERE ( prices_watches_ID = current_ID AND prices_locale = 'US') AND ( prices_date >= DATE_SUB(new_price_date,INTERVAL 30 HOUR) ) ORDER BY prices_date ASC LIMIT 1 ) as old_price, ( SELECT ROUND(((new_price - old_price)/old_price)*100,2) ) as percent_change, ( SELECT (new_price - old_price) ) as absolute_change FROM watches_US LEFT OUTER JOIN prices ON prices.prices_watches_ID = watches_US.watches_ID WHERE ( prices_locale = 'US' ) AND ( prices_am IS NOT NULL ) AND ( prices_am != '' ) HAVING ( old_price IS NOT NULL ) AND ( old_price != 0 ) AND ( old_price != '' ) AND ( absolute_change < 0 ) AND ( prices.prices_date = new_price_date ) ORDER BY absolute_change ASC LIMIT 10 How would I rewrite this to use joins instead, or otherwise optimize this so it doesn't take over 3 minutes to get a result? Any help would be greatly appreciated! Thank you kindly.

Read the article
Combination of Operating Mode and Commit Strategy

- by Kevin Yang

If you want to populate a source into multiple targets, you may also want to ensure that every row from the source affects all targets uniformly (or separately). Let’s consider the Example Mapping below. If a row from SOURCE causes different changes in multiple targets (TARGET_1, TARGET_2 and TARGET_3), for example, it can be successfully inserted into TARGET_1 and TARGET_3, but failed to be inserted into TARGET_2, and the current Mapping Property TLO (target load order) is “TARGET_1 -> TARGET_2 -> TARGET_3”. What should Oracle Warehouse Builder do, in order to commit the appropriate data to all affected targets at the same time? If it doesn’t behave as you intended, the data could become inaccurate and possibly unusable. Example Mapping In OWB, we can use Mapping Configuration Commit Strategies and Operating Modes together to achieve this kind of requirements. Below we will explore the combination of these two features and how they affect the results in the target tables Before going to the example, let’s review some of the terms we will be using (Details can be found in white paper Oracle® Warehouse Builder Data Modeling, ETL, and Data Quality Guide11g Release 2): Operating Modes: Set-Based Mode: Warehouse Builder generates a single SQL statement that processes all data and performs all operations. Row-Based Mode: Warehouse Builder generates statements that process data row by row. The select statement is in a SQL cursor. All subsequent statements are PL/SQL. Row-Based (Target Only) Mode: Warehouse Builder generates a cursor select statement and attempts to include as many operations as possible in the cursor. For each target, Warehouse Builder inserts each row into the target separately. Commit Strategies: Automatic: Warehouse Builder loads and then automatically commits data based on the mapping design. If the mapping has multiple targets, Warehouse Builder commits and rolls back each target separately and independently of other targets. Use the automatic commit when the consequences of multiple targets being loaded unequally are not great or are irrelevant. Automatic correlated: It is a specialized type of automatic commit that applies to PL/SQL mappings with multiple targets only. Warehouse Builder considers all targets collectively and commits or rolls back data uniformly across all targets. Use the correlated commit when it is important to ensure that every row in the source affects all affected targets uniformly. Manual: select manual commit control for PL/SQL mappings when you want to interject complex business logic, perform validations, or run other mappings before committing data. Combination of the commit strategy and operating mode To understand the effects of each combination of operating mode and commit strategy, I’ll illustrate using the following example Mapping. Firstly we insert 100 rows into the SOURCE table and make sure that the 99th row and 100th row have the same ID value. And then we create a unique key constraint on ID column for TARGET_2 table. So while running the example mapping, OWB tries to load all 100 rows to each of the targets. But the mapping should fail to load the 100th row to TARGET_2, because it will violate the unique key constraint of table TARGET_2. With different combinations of Commit Strategy and Operating Mode, here are the results ¦ Set-based/ Correlated Commit: Configuration of Example mapping: Result: What’s happening: A single error anywhere in the mapping triggers the rollback of all data. OWB encounters the error inserting into Target_2, it reports an error for the table and does not load the row. OWB rolls back all the rows inserted into Target_1 and does not attempt to load rows to Target_3. No rows are added to any of the target tables. ¦ Row-based/ Correlated Commit: Configuration of Example mapping: Result: What’s happening: OWB evaluates each row separately and loads it to all three targets. Loading continues in this way until OWB encounters an error loading row 100th to Target_2. OWB reports the error and does not load the row. It rolls back the row 100th previously inserted into Target_1 and does not attempt to load row 100 to Target_3. Then, if there are remaining rows, OWB will continue loading them, resuming with loading rows to Target_1. The mapping completes with 99 rows inserted into each target. ¦ Set-based/ Automatic Commit: Configuration of Example mapping: Result: What’s happening: When OWB encounters the error inserting into Target_2, it does not load any rows and reports an error for the table. It does, however, continue to insert rows into Target_3 and does not roll back the rows previously inserted into Target_1. The mapping completes with one error message for Target_2, no rows inserted into Target_2, and 100 rows inserted into Target_1 and Target_3 separately. ¦ Row-based/Automatic Commit: Configuration of Example mapping: Result: What’s happening: OWB evaluates each row separately for loading into the targets. Loading continues in this way until OWB encounters an error loading row 100 to Target_2 and reports the error. OWB does not roll back row 100th from Target_1, does insert it into Target_3. If there are remaining rows, it will continue to load them. The mapping completes with 99 rows inserted into Target_2 and 100 rows inserted into each of the other targets. Note: Automatic Correlated commit is not applicable for row-based (target only). If you design a mapping with the row-based (target only) and correlated commit combination, OWB runs the mapping but does not perform the correlated commit. In set-based mode, correlated commit may impact the size of your rollback segments. Space for rollback segments may be a concern when you merge data (insert/update or update/insert). Correlated commit operates transparently with PL/SQL bulk processing code. The correlated commit strategy is not available for mappings run in any mode that are configured for Partition Exchange Loading or that include a Queue, Match Merge, or Table Function operator. If you want to practice in your own environment, you can follow the steps: 1. Import the MDL file: commit_operating_mode.mdl 2. Fix the location for oracle module ORCL and deploy all tables under it. 3. Insert sample records into SOURCE table, using below plsql code: begin for i in 1..99 loop insert into source values(i, 'col_'||i); end loop; insert into source values(99, 'col_99'); end; 4. Configure MAPPING_1 to any combinations of operating mode and commit strategy you want to test. And make sure feature TLO of mapping is open. 5. Deploy Mapping “MAPPING_1”. 6. Run the mapping and check the result.

Read the article
Database design advice needed.

- by user346271

Hi all, I'm a lone developer for a telecoms company, and am after some database design advice from anyone with a bit of time to answer. I am inserting into one table ~2 million rows each day, these tables then get archived and compressed on a monthly basis. Each monthly table contains ~15,000,000 rows. Although this is increasing month on month. For every insert I do above I am combining the data from rows which belong together and creating another "correlated" table. This table is currently not being archived, as I need to make sure I never miss an update to the correlated table. (Hope that makes sense) Although in general this information should remain fairly static after a couple of days of processing. All of the above is working perfectly. However my company now wishes to perform some stats against this data, and these tables are getting too large to provide the results in what would be deemed a reasonable time. Even with the appropriate indexes set. So I guess after all the above my question is quite simple. Should I write a script which groups the data from my correlated table into smaller tables. Or should I store the queries result sets in something like memcache? I'm already using mysqls cache, but due to having limited control over how long the data is stored for, it's not working ideally. The main advantages I can see of using something like memcache: No blocking on my correlated table after the query has been cashed. Greater flexibility of sharing the collected data between the backend collector and front end processor. (i.e custom reports could be written in the backend and the results of these stored in the cache under a key which then gets shared with anyone who would want to see the data of this report) Redundancy and scalability if we start sharing this data with a large amount of customers. The main disadvantages I can see of using something like memcache: Data is not persistent if machine is rebooted / cache is flushed. The main advantages of using MySql Persistent data. Less code changes (although adding something like memcache is trivial anyway) The main disadvantages of using MySql Have to define table templates every time I want to store provide a new set of grouped data. Have to write a program which loops through the correlated data and fills these new tables. Potentially will still grow slower as the data continues to be filled. Apologies for quite a long question. It's helped me to write down these thoughts here anyway, and any advice/help/experience with dealing with this sort of problem would be greatly appreciated. Many thanks. Alan

Read the article
How to get tens of millions of pages indexed by Google bot?

- by Chris Adragna

We are currently developing a site that currently has 8 million unique pages that will grow to about 20 million right away, and eventually to about 50 million or more. Before you criticize... Yes, it provides unique, useful content. We continually process raw data from public records and by doing some data scrubbing, entity rollups, and relationship mapping, we've been able to generate quality content, developing a site that's quite useful and also unique, in part due to the breadth of the data. It's PR is 0 (new domain, no links), and we're getting spidered at a rate of about 500 pages per day, putting us at about 30,000 pages indexed thus far. At this rate, it would take over 400 years to index all of our data. I have two questions: Is the rate of the indexing directly correlated to PR, and by that I mean is it correlated enough that by purchasing an old domain with good PR will get us to a workable indexing rate (in the neighborhood of 100,000 pages per day). Are there any SEO consultants who specialize in aiding the indexing process itself. We're otherwise doing very well with SEO, on-page especially, besides, the competition for our "long-tail" keyword phrases is pretty low, so our success hinges mostly on the number of pages indexed. Our main competitor has achieved approx 20MM pages indexed in just over one year's time, along with an Alexa 2000-ish ranking. Noteworthy qualities we have in place: page download speed is pretty good (250-500 ms) no errors (no 404 or 500 errors when getting spidered) we use Google webmaster tools and login daily friendly URLs in place I'm afraid to submit sitemaps. Some SEO community postings suggest a new site with millions of pages and no PR is suspicious. There is a Google video of Matt Cutts speaking of a staged on-boarding of large sites, too, in order to avoid increased scrutiny (at approx 2:30 in the video). Clickable site links deliver all pages, no more than four pages deep and typically no more than 250(-ish) internal links on a page. Anchor text for internal links is logical and adds relevance hierarchically to the data on the detail pages. We had previously set the crawl rate to the highest on webmaster tools (only about a page every two seconds, max). I recently turned it back to "let Google decide" which is what is advised.

Read the article
Essence of Anchor Text

It is significant to utilize anchor text in order to improve search engine ranking. Anchor text is directly correlated with inbound links. If you are leaving comments to blogs or submit articles with link, make use of anchor text and not the URL only.

Read the article
More SQL Smells

- by Nick Harrison

Let's continue exploring some of the SQL Smells from Phil's list. He has been putting together. Datatype mis-matches in predicates that rely on implicit conversion.(Plamen Ratchev) This is a great example poking holes in the whole theory of "If it works it's not broken" Queries will this probably will generally work and give the correct response. In fact, without careful analysis, you probably may be completely oblivious that there is even a problem. This subtle little problem will needlessly complicate queries and slow them down regardless of the indexes applied. Consider this example: CREATE TABLE [dbo].[Page]( [PageId] [int] IDENTITY(1,1) NOT NULL, [Title] [varchar](75) NOT NULL, [Sequence] [int] NOT NULL, [ThemeId] [int] NOT NULL, [CustomCss] [text] NOT NULL, [CustomScript] [text] NOT NULL, [PageGroupId] [int] NOT NULL; CREATE PROCEDURE PageSelectBySequence ( @sequenceMin smallint , @sequenceMax smallint ) AS BEGIN SELECT [PageId] , [Title] , [Sequence] , [ThemeId] , [CustomCss] , [CustomScript] , [PageGroupId] FROM [CMS].[dbo].[Page] WHERE Sequence BETWEEN @sequenceMin AND @SequenceMax END Note that the Sequence column is defined as int while the sequence parameter is defined as a small int. The problem is that the database may have to do a lot of type conversions to evaluate the query. In some cases, this may even negate the indexes that you have in place. Using Correlated subqueries instead of a join (Dave_Levy/ Plamen Ratchev) There are two main problems here. The first is a little subjective, since this is a non-standard way of expressing the query, it is harder to understand. The other problem is much more objective and potentially problematic. You are taking much of the control away from the optimizer. Written properly, such a query may well out perform a corresponding query written with traditional joins. More likely than not, performance will degrade. Whenever you assume that you know better than the optimizer, you will most likely be wrong. This is the fundmental problem with any hint. Consider a query like this: SELECT Page.Title , Page.Sequence , Page.ThemeId , Page.CustomCss , Page.CustomScript , PageEffectParams.Name , PageEffectParams.Value , ( SELECT EffectName FROM dbo.Effect WHERE EffectId = dbo.PageEffects.EffectId ) AS EffectName FROM Page INNER JOIN PageEffect ON Page.PageId = PageEffects.PageId INNER JOIN PageEffectParam ON PageEffects.PageEffectId = PageEffectParams.PageEffectId This can and should be written as: SELECT Page.Title , Page.Sequence , Page.ThemeId , Page.CustomCss , Page.CustomScript , PageEffectParams.Name , PageEffectParams.Value , EffectName FROM Page INNER JOIN PageEffect ON Page.PageId = PageEffects.PageId INNER JOIN PageEffectParam ON PageEffects.PageEffectId = PageEffectParams.PageEffectId INNER JOIN dbo.Effect ON dbo.Effects.EffectId = dbo.PageEffects.EffectId The correlated query may just as easily show up in the where clause. It's not a good idea in the select clause or the where clause. Few or No comments. This one is a bit more complicated and controversial. All comments are not created equal. Some comments are helpful and need to be included. Other comments are not necessary and may indicate a problem. I tend to follow the rule of thumb that comments that explain why are good. Comments that explain how are bad. Many people may be shocked to hear the idea of a bad comment, but hear me out. If a comment is needed to explain what is going on or how it works, the logic is too complex and needs to be simplified. Comments that explain why are good. Comments may explain why the sql is needed are good. Comments that explain where the sql is used are good. Comments that explain how tables are related should not be needed if the sql is well written. If they are needed, you need to consider reworking the sql or simplify your data model. Use of functions in a WHERE clause. (Anil Das) Calling a function in the where clause will often negate the indexing strategy. The function will be called for every record considered. This will often a force a full table scan on the tables affected. Calling a function will not guarantee that there is a full table scan, but there is a good chance that it will. If you find that you often need to write queries using a particular function, you may need to add a column to the table that has the function already applied.

Read the article
Looking for ideas for a simple pattern matching algorithm to run on a microcontroller

- by pic_audio

I'm working on a project to recognize simple audio patterns. I have two data sets, each made up of between 4 and 32 note/duration pairs. One set is predefined, the other is from an incoming data stream. The length of the two strongly correlated data sets is often different, but roughly the same "shape". My goal is to come up with some sort of ranking as to how well the two data sets correlate/match. I have converted the incoming frequencies to pitch and shifted the incoming data stream's pitch so that it's average pitch matches that of the predefined data set. I also stretch/compress the incoming data set's durations to match the overall duration of the predefined set. Here are two graphical examples of data that should be ranked as strongly correlated: http://s2.postimage.org/FVeG0-ee3c23ecc094a55b15e538c3a0d83dd5.gif (Sorry, as a new user I couldn't directly post images) I'm doing this on a 8-bit microcontroller so resources are minimal. Speed is less an issue, a second or two of processing isn't a deal breaker. It wouldn't surprise me if there is an obvious solution, I've just been staring at the problem too long. Any ideas? Thanks in advance...

Read the article
MySQL: Complex Join Statement involving two tables and a third correlation table

- by Stephen

I have two tables that were built for two disparate systems. I have records in one table (called "leads") that represent customers, and records in another table (called "manager") that are the exact same customers but "manager" uses different fields (For example, "leads" contains an email address, and "manager" contains two fields for two different emails--either of which might be the email from "leads"). So, I've created a correlation table that contains the lead_id and manager_id. currently this correlation table is empty. I'm trying to query the "leads" table to give me records that match either "manager" email field with the single "leads" email field, while at the same time ignoring fields that have already been added to the "correlated" table. (this way I can see how many leads that match have not yet been correlated.) Here's my current, invalid SQL attempt: SELECT leads.id, manager.id FROM leads, manager LEFT OUTER JOIN correlation ON correlation.lead_id = leads.id WHERE correlation.id IS NULL AND leads.project != "someproject" AND (manager.orig_email = leads.email OR manager.dest_email = leads.email) AND leads.created BETWEEN '1999-01-01 00:00:00' AND '2010-05-10 23:59:59' ORDER BY leads.created ASC; I get the error: Unknown column 'leads.id' in 'on clause' Before you wonder: there are records in the "leads" table where leads.project != "someproject" and leads.created falls between those dates. I've included those additional parameters for completeness.

Read the article
nginx: how do I track down a random 500 from nginx (not my application). Potentially has something to do with load?

- by kaleidomedallion

We recently had some 500's from nginx itself that somehow were not logged (we have screenshots, but nothing in the logs). That is weird in itself, because usually errors show up there. Regardless, I am wondering if there is something like a connection pool size that if maxed out would result in a 500? We have correlated it potentially to a recent spike in traffic, but it is not conclusive. Anyone have any ideas of how to begin to approach such an issue?

Read the article
Are CK Metrics still considered useful? Is there an open source tool to help?

- by DeveloperDon

Chidamber & Kemerer proposed several metrics for object oriented code. Among them, depth of inheritance tree, weighted number of methods, number of member functions, number of children, and coupling between objects. Using a base of code, they tried to correlated these metrics to the defect density and maintenance effort using covariant analysis. Are these metrics actionable in projects? Perhaps they can guide refactoring. For example weighted number of methods might show which God classes needed to be broken into more cohesive classes that address a single concern. Is there approach superseded by a better method, and is there a tool that can identify problem code, particularly in moderately large project being handed off to a new developer or team?

Read the article

1 2 3 4 | Next Page >