Search Results

Search found 313 results on 13 pages for 'logically'.

Page 12/13 | < Previous Page | 8 9 10 11 12 13 | Next Page >

Why won't C# accept a (seemingly) perfectly good Sql Server CE Query?

- by VoidKing

By perfectly good sql query, I mean to say that, inside WebMatrix, if I execute the following query, it works to perfection: SELECT page AS location, (len(page) - len(replace(UPPER(page), UPPER('o'), ''))) / len('o') AS occurences, 'pageSettings' AS tableName FROM PageSettings WHERE page LIKE '%o%' UNION SELECT pageTitle AS location, (len(pageTitle) - len(replace(UPPER(pageTitle), UPPER('o'), ''))) / len('o') AS occurences, 'ExternalSecondaryPages' AS tableName FROM ExternalSecondaryPages WHERE pageTitle LIKE '%o%' UNION SELECT eventTitle AS location, (len(eventTitle) - len(replace(UPPER(eventTitle), UPPER('o'), ''))) / len('o') AS occurences, 'MainStreetEvents' AS tableName FROM MainStreetEvents WHERE eventTitle LIKE '%o%' Here i am using 'o' as a static search string to search upon. No problem, but not exeactly very dynamic. Now, when I write this query as a string in C# and as I think it should be (and even as I have done before) I get a server-side error indicating that the string was not in the correct format. Here is a pic of that error: And (although I am only testing the output, should I get it to quit erring), here is the actual C# (i.e., the .cshtml) page that queries the database: @{ Layout = "~/Layouts/_secondaryMainLayout.cshtml"; var db = Database.Open("Content"); string searchText = Request.Unvalidated["searchText"]; string selectQueryString = "SELECT page AS location, (len(page) - len(replace(UPPER(page), UPPER(@0), ''))) / len(@0) AS occurences, 'pageSettings' AS tableName FROM PageSettings WHERE page LIKE '%' + @0 + '%' "; selectQueryString += "UNION "; selectQueryString += "SELECT pageTitle AS location, (len(pageTitle) - len(replace(UPPER(pageTitle), UPPER(@0), ''))) / len(@0) AS occurences, 'ExternalSecondaryPages' AS tableName FROM ExternalSecondaryPages WHERE pageTitle LIKE '%' + @0 + '%' "; selectQueryString += "UNION "; selectQueryString += "SELECT eventTitle AS location, (len(eventTitle) - len(replace(UPPER(eventTitle), UPPER(@0), ''))) / len(@0) AS occurences, 'MainStreetEvents' AS tableName FROM MainStreetEvents WHERE eventTitle LIKE '%' + @0 + '%'"; @:beginning <br/> foreach (var row in db.Query(selectQueryString, searchText)) { @:entry @:@row.location   @:@row.occurences   @:@row.tableName <br/> } } Since it is erring on the foreach (var row in db.Query(selectQueryString, searchText)) line, that heavily suggests that something is wrong with my query, however, everything seems right to me about the syntax here and it even executes to perfection if I query the database (mind you, un-parameterized) directly. Logically, I would assume that I have erred somewhere with the syntax involved in parameterizing this query, however, my double and triple checking (as well as, my past experience at doing this) insists that everything looks fine here. Have I messed up the syntax involved with parameterizing this query, or is something else at play here that I am overlooking? I know I can tell you, for sure, as it has been previously tested, that the value I am getting from the query string is, indeed, what I would expect it to be, but as there really isn't much else on the .cshtml page yet, that is about all I can tell you.

Read the article
How would you implement this "WorkerChain" functionality in .NET?

- by Dan Tao

Sorry for the vague question title -- not sure how to encapsulate what I'm asking below succinctly. (If someone with editing privileges can think of a more descriptive title, feel free to change it.) The behavior I need is this. I am envisioning a worker class that accepts a single delegate task in its constructor (for simplicity, I would make it immutable -- no more tasks can be added after instantiation). I'll call this task T. The class should have a simple method, something like GetToWork, that will exhibit this behavior: If the worker is not currently running T, then it will start doing so right now. If the worker is currently running T, then once it is finished, it will start T again immediately. GetToWork can be called any number of times while the worker is running T; the simple rule is that, during any execution of T, if GetToWork was called at least once, T will run again upon completion (and then if GetToWork is called while T is running that time, it will repeat itself again, etc.). Now, this is pretty straightforward with a boolean switch. But this class needs to be thread-safe, by which I mean, steps 1 and 2 above need to comprise atomic operations (at least I think they do). There is an added layer of complexity. I have need of a "worker chain" class that will consist of many of these workers linked together. As soon as the first worker completes, it essentially calls GetToWork on the worker after it; meanwhile, if its own GetToWork has been called, it restarts itself as well. Logically calling GetToWork on the chain is essentially the same as calling GetToWork on the first worker in the chain (I would fully intend that the chain's workers not be publicly accessible). One way to imagine how this hypothetical "worker chain" would behave is by comparing it to a team in a relay race. Suppose there are four runners, W1 through W4, and let the chain be called C. If I call C.StartWork(), what should happen is this: If W1 is at his starting point (i.e., doing nothing), he will start running towards W2. If W1 is already running towards W2 (i.e., executing his task), then once he reaches W2, he will signal to W2 to get started, immediately return to his starting point and, since StartWork has been called, start running towards W2 again. When W1 reaches W2's starting point, he'll immediately return to his own starting point. If W2 is just sitting around, he'll start running immediately towards W3. If W2 is already off running towards W3, then W2 will simply go again once he's reached W3 and returned to his starting point. The above is probably a little convoluted and written out poorly. But hopefully you get the basic idea. Obviously, these workers will be running on their own threads. Also, I guess it's possible this functionality already exists somewhere? If that's the case, definitely let me know!

Read the article
PHP array pointer craziness

- by JMan

I'm trying to create a "GetCurrentLevel" method that takes a point value as an input and returns what "Level" that corresponds to. I'm storing the Level = Points mapping in an array, but the array pointer is not moving logically when I use it a foreach loop. I've added echo statements for debugging. Here's my class definition: class Levels extends Model { protected $_map = array ( 'None' => 0, 'Bronze' => 50, 'Silver' => 200, 'Gold' => 500 ); public function __construct() { parent::__construct(); } public function GetCurrentLevel($points) { foreach ($this->_map as $name => $threshold) { echo "Level Name: $name<br/>"; echo "Level Threshold: $threshold<br/>"; echo "Current Level: " . key($this->_map) . "<br/>"; echo "Current Threshold: " . current($this->_map) . "<br/>"; if ($points < $threshold) /* Threshold is now above the points, so need to go back one level */ { $previousThreshold = prev($this->_map); echo "Previous Threshold: $previousThreshold<br/>"; echo "Final Level: " . key($this->_map) . "<br/>"; return key($this->_map); } echo "Go to next level <br/>"; } } And here is what I see when I call GetCurrentLevel(60): Level Name: None Level Threshold: 0 Current Level: Bronze //* Looks like foreach immediately moves the array pointer *// Current Threshold: 50 Go to next level Level Name: Bronze Level Threshold: 50 Current Level: Bronze //* WTF? Why hasn't the array pointer moved? *// Current Threshold: 50 Go to next level Level Name: Silver Level Threshold: 200 Current Level: Bronze //* WTF? Why hasn't the array pointer moved? *// Current Threshold: 50 Previous Threshold: 0 Final Level: None But the "Final Level" should be 'Bronze' since 60 points is above the 50 points needed for a Bronze medal, but below the 200 points needed for a Silver medal. Sorry for the long post. Thanks for your help!

Read the article
Refactoring Singleton Overuse

- by drharris

Today I had an epiphany, and it was that I was doing everything wrong. Some history: I inherited a C# application, which was really just a collection of static methods, a completely procedural mess of C# code. I refactored this the best I knew at the time, bringing in lots of post-college OOP knowledge. To make a long story short, many of the entities in code have turned out to be Singletons. Today I realized I needed 3 new classes, which would each follow the same Singleton pattern to match the rest of the software. If I keep tumbling down this slippery slope, eventually every class in my application will be Singleton, which will really be no logically different from the original group of static methods. I need help on rethinking this. I know about Dependency Injection, and that would generally be the strategy to use in breaking the Singleton curse. However, I have a few specific questions related to this refactoring, and all about best practices for doing so. How acceptable is the use of static variables to encapsulate configuration information? I have a brain block on using static, and I think it is due to an early OO class in college where the professor said static was bad. But, should I have to reconfigure the class every time I access it? When accessing hardware, is it ok to leave a static pointer to the addresses and variables needed, or should I continually perform Open() and Close() operations? Right now I have a single method acting as the controller. Specifically, I continually poll several external instruments (via hardware drivers) for data. Should this type of controller be the way to go, or should I spawn separate threads for each instrument at the program's startup? If the latter, how do I make this object oriented? Should I create classes called InstrumentAListener and InstrumentBListener? Or is there some standard way to approach this? Is there a better way to do global configuration? Right now I simply have Configuration.Instance.Foo sprinkled liberally throughout the code. Almost every class uses it, so perhaps keeping it as a Singleton makes sense. Any thoughts? A lot of my classes are things like SerialPortWriter or DataFileWriter, which must sit around waiting for this data to stream in. Since they are active the entire time, how should I arrange these in order to listen for the events generated when data comes in? Any other resources, books, or comments about how to get away from Singletons and other pattern overuse would be helpful.

Read the article
Diagnosing packet loss / high latency in Ubuntu

- by Sam Gammon

We have a Linux box (Ubuntu 12.04) running Nginx (1.5.2), which acts as a reverse proxy/load balancer to some Tornado and Apache hosts. The upstream servers are physically and logically close (same DC, sometimes same-rack) and show sub-millisecond latency between them: PING appserver (10.xx.xx.112) 56(84) bytes of data. 64 bytes from appserver (10.xx.xx.112): icmp_req=1 ttl=64 time=0.180 ms 64 bytes from appserver (10.xx.xx.112): icmp_req=2 ttl=64 time=0.165 ms 64 bytes from appserver (10.xx.xx.112): icmp_req=3 ttl=64 time=0.153 ms We receive a sustained load of about 500 requests per second, and are currently seeing regular packet loss / latency spikes from the Internet, even from basic pings: sam@AM-KEEN ~> ping -c 1000 loadbalancer PING 50.xx.xx.16 (50.xx.xx.16): 56 data bytes 64 bytes from loadbalancer: icmp_seq=0 ttl=56 time=11.624 ms 64 bytes from loadbalancer: icmp_seq=1 ttl=56 time=10.494 ms ... many packets later ... Request timeout for icmp_seq 2 64 bytes from loadbalancer: icmp_seq=2 ttl=56 time=1536.516 ms 64 bytes from loadbalancer: icmp_seq=3 ttl=56 time=536.907 ms 64 bytes from loadbalancer: icmp_seq=4 ttl=56 time=9.389 ms ... many packets later ... Request timeout for icmp_seq 919 64 bytes from loadbalancer: icmp_seq=918 ttl=56 time=2932.571 ms 64 bytes from loadbalancer: icmp_seq=919 ttl=56 time=1932.174 ms 64 bytes from loadbalancer: icmp_seq=920 ttl=56 time=932.018 ms 64 bytes from loadbalancer: icmp_seq=921 ttl=56 time=6.157 ms --- 50.xx.xx.16 ping statistics --- 1000 packets transmitted, 997 packets received, 0.3% packet loss round-trip min/avg/max/stddev = 5.119/52.712/2932.571/224.629 ms The pattern is always the same: things operate fine for a while (<20ms), then a ping drops completely, then three or four high-latency pings (1000ms), then it settles down again. Traffic comes in through a bonded public interface (we will call it bond0) configured as such: bond0 Link encap:Ethernet HWaddr 00:xx:xx:xx:xx:5d inet addr:50.xx.xx.16 Bcast:50.xx.xx.31 Mask:255.255.255.224 inet6 addr: <ipv6 address> Scope:Global inet6 addr: <ipv6 address> Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:527181270 errors:1 dropped:4 overruns:0 frame:1 TX packets:413335045 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:240016223540 (240.0 GB) TX bytes:104301759647 (104.3 GB) Requests are then submitted via HTTP to upstream servers on the private network (we can call it bond1), which is configured like so: bond1 Link encap:Ethernet HWaddr 00:xx:xx:xx:xx:5c inet addr:10.xx.xx.70 Bcast:10.xx.xx.127 Mask:255.255.255.192 inet6 addr: <ipv6 address> Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:430293342 errors:1 dropped:2 overruns:0 frame:1 TX packets:466983986 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:77714410892 (77.7 GB) TX bytes:227349392334 (227.3 GB) Output of uname -a: Linux <hostname> 3.5.0-42-generic #65~precise1-Ubuntu SMP Wed Oct 2 20:57:18 UTC 2013 x86_64 GNU/Linux We have customized sysctl.conf in an attempt to fix the problem, with no success. Output of /etc/sysctl.conf (with irrelevant configs omitted): # net: core net.core.netdev_max_backlog = 10000 # net: ipv4 stack net.ipv4.tcp_ecn = 2 net.ipv4.tcp_sack = 1 net.ipv4.tcp_fack = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_timestamps = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_max_syn_backlog = 10000 net.ipv4.tcp_congestion_control = cubic net.ipv4.ip_local_port_range = 8000 65535 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_thin_dupack = 1 net.ipv4.tcp_thin_linear_timeouts = 1 net.netfilter.nf_conntrack_max = 99999999 net.netfilter.nf_conntrack_tcp_timeout_established = 300 Output of dmesg -d, with non-ICMP UFW messages suppressed: [508315.349295 < 19.852453>] [UFW BLOCK] IN=bond1 OUT= MAC=<mac addresses> SRC=118.xx.xx.143 DST=50.xx.xx.16 LEN=68 TOS=0x00 PREC=0x00 TTL=51 ID=43221 PROTO=ICMP TYPE=3 CODE=1 [SRC=50.xx.xx.16 DST=118.xx.xx.143 LEN=40 TOS=0x00 PREC=0x00 TTL=249 ID=10220 DF PROTO=TCP SPT=80 DPT=53817 WINDOW=8190 RES=0x00 ACK FIN URGP=0 ] [517787.732242 < 0.443127>] Peer 190.xx.xx.131:59705/80 unexpectedly shrunk window 1155488866:1155489425 (repaired) How can I go about diagnosing the cause of this problem, on a Debian-family Linux box?

Read the article
ISP 5 Device Limit ... again

- by Tommo

Sorry for the delay in responding to the suggestions that were posted in my first question (ISP 5 Device Limit - double NAT the solution?). I've been travelling and have not been able to try anything. Below is what I've tried and where I have not been successful. Any more help gratefully appreciated. I figure I need to give a more comprehensive overview of what I've got and how it's set up. First of all - I am using all Apple products here. I am iMac, iPad, iPhone, Apple TV, Airport Express and Time Capsule. I used to like the way that it 'just worked'. Now I find that it requires a bit of encouragement before it 'just works'. So, as I stated in my original question; my ISP has a router in my building that is limiting me to 5 devices. I am hard wired into this router and I can neither access it physically nor logically (they won't let me access it). Also, I only appear to be able to connect to it through the LAN ports on my Time Capsule. Any device I connect appears to be on a rolling IP list with the following settings: Router 91.72.80.1 Devices then get assigned IPv4 addresses in the range (as far as I can see) from 91.72.80.2 onwards. SubNet Mask 255.255.255.0 DNS Servers 213.132.63.25, 80.227.2.4 I have my Time Capsule / Router in Bridge-Mode which means I am limited to the 5 devices and cannot use Guest Networks etc. What I've tried today. Static IPs: On all devices, I went from DHCP to Static and put in the same information when they had connected using DHCP. Somewhat surprisingly this did not work. None of the devices enjoyed any connection to the router and certainly no internet connection. Intentional Double-NAT - Time Capsule to 'DHCP and NAT': By selecting DHCP and NAT on my Router I was able to connect devices to my Time Capsule in the range 10.0.1.2 to 10.0.1.200. This offered no internet connectivity and didn't really help the situation. In this mode, however, I was able to force the devices - individually and laboriously - to look for the Router and previously listed DNSs by inputting the numbers from 'Bridge-mode' into the STATIC settings and then resetting the connection. The Router then appeared to assign a distinct IP address to the device and it worked on the network. I had this working for more than 5 devices. However, this is not a great solution because as soon as one of the mobile devices left the building it needed repointing to the Router. The connections were also not very stable. Especially when trying to hold onto a VPN. Spoofing a few MAC addresses: I'm afraid I don't really know what this would achieve, nor how to do it on an Apple device… So … I'm almost back at Square One. I have had to withdraw to the Bridge-Mode position again with the 5 device limit to see if there's a better course of action to follow. ANY help would be much appreciated. I am positive that I cannot be the only one suffering under this 5 device limit!

Read the article
ASP.NET and WIF: Showing custom profile username as User.Identity.Name

- by DigiMortal

I am building ASP.NET MVC application that uses external services to authenticate users. For ASP.NET users are fully authenticated when they are redirected back from external service. In system they are logically authenticated when they have created user profiles. In this posting I will show you how to force ASP.NET MVC controller actions to demand existence of custom user profiles. Using external authentication sources with AppFabric Suppose you want to be user-friendly and you don’t force users to keep in mind another username/password when they visit your site. You can accept logins from different popular sites like Windows Live, Facebook, Yahoo, Google and many more. If user has account in some of these services then he or she can use his or her account to log in to your site. If you have community site then you usually have support for user profiles too. Some of these providers give you some information about users and other don’t. So only thing in common you get from all those providers is some unique ID that identifies user in service uniquely. Image above shows you how new user joins your site. Existing users who already have profile are directed to users homepage after they are authenticated. You can read more about how to solve semi-authorized users problem from my blog posting ASP.NET MVC: Using ProfileRequiredAttribute to restrict access to pages. The other problem is related to usernames that we don’t get from all identity providers. Why is IIdentity.Name sometimes empty? The problem is described more specifically in my blog posting Identifying AppFabric Access Control Service users uniquely. Shortly the problem is that not all providers have claim called http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name. The following diagram illustrates what happens when user got token from AppFabric ACS and was redirected to your site. Now, when user was authenticated using Windows Live ID then we don’t have name claim in token and that’s why User.Identity.Name is empty. Okay, we can force nameidentifier to be used as name (we can do it in web.config file) but we have user profiles and we want username from profile to be shown when username is asked. Modifying name claim Now let’s force IClaimsIdentity to use username from our user profiles. You can read more about my profiles topic from my blog posting ASP.NET MVC: Using ProfileRequiredAttribute to restrict access to pages and you can find some useful extension methods for claims identity from my blog posting Identifying AppFabric Access Control Service users uniquely. Here is what we do to set User.Identity.Name: we will check if user has profile, if user has profile we will check if User.Identity.Name matches the name given by profile, if names does not match then probably identity provider returned some name for user, we will remove name claim and recreate it with correct username, we will add new name claim to claims collection. All this stuff happens in Application_AuthorizeRequest event of our web application. The code is here. protected void Application_AuthorizeRequest() { if (string.IsNullOrEmpty(User.Identity.Name)) { var identity = User.Identity; var profile = identity.GetProfile(); if (profile != null) { if (profile.UserName != identity.Name) { identity.RemoveName(); var claim = new Claim("http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name", profile.UserName); var claimsIdentity = (IClaimsIdentity)identity; claimsIdentity.Claims.Add(claim); } } } } RemoveName extension method is simple – it looks for name claims of IClaimsIdentity claims collection and removes them. public static void RemoveName(this IIdentity identity) { if (identity == null) return; var claimsIndentity = identity as ClaimsIdentity; if (claimsIndentity == null) return; for (var i = claimsIndentity.Claims.Count - 1; i >= 0; i--) { var claim = claimsIndentity.Claims[i]; if (claim.ClaimType == "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name") claimsIndentity.Claims.RemoveAt(i); } } And we are done. Now User.Identity.Name returns the username from user profile and you can use it to show username of current user everywhere in your site. Conclusion Mixing AppFabric Access Control Service and Windows Identity Foundation with custom authorization logic is not impossible but a little bit tricky. This posting finishes my little series about AppFabric ACS and WIF for this time and hopefully you found some useful tricks, tips, hacks and code pieces you can use in your own applications.

Read the article
SQL SERVER – Guest Post – Architecting Data Warehouse – Niraj Bhatt

- by pinaldave

Niraj Bhatt works as an Enterprise Architect for a Fortune 500 company and has an innate passion for building / studying software systems. He is a top rated speaker at various technical forums including Tech·Ed, MCT Summit, Developer Summit, and Virtual Tech Days, among others. Having run a successful startup for four years Niraj enjoys working on – IT innovations that can impact an enterprise bottom line, streamlining IT budgets through IT consolidation, architecture and integration of systems, performance tuning, and review of enterprise applications. He has received Microsoft MVP award for ASP.NET, Connected Systems and most recently on Windows Azure. When he is away from his laptop, you will find him taking deep dives in automobiles, pottery, rafting, photography, cooking and financial statements though not necessarily in that order. He is also a manager/speaker at BDOTNET, Asia’s largest .NET user group. Here is the guest post by Niraj Bhatt. As data in your applications grows it’s the database that usually becomes a bottleneck. It’s hard to scale a relational DB and the preferred approach for large scale applications is to create separate databases for writes and reads. These databases are referred as transactional database and reporting database. Though there are tools / techniques which can allow you to create snapshot of your transactional database for reporting purpose, sometimes they don’t quite fit the reporting requirements of an enterprise. These requirements typically are data analytics, effective schema (for an Information worker to self-service herself), historical data, better performance (flat data, no joins) etc. This is where a need for data warehouse or an OLAP system arises. A Key point to remember is a data warehouse is mostly a relational database. It’s built on top of same concepts like Tables, Rows, Columns, Primary keys, Foreign Keys, etc. Before we talk about how data warehouses are typically structured let’s understand key components that can create a data flow between OLTP systems and OLAP systems. There are 3 major areas to it: a) OLTP system should be capable of tracking its changes as all these changes should go back to data warehouse for historical recording. For e.g. if an OLTP transaction moves a customer from silver to gold category, OLTP system needs to ensure that this change is tracked and send to data warehouse for reporting purpose. A report in context could be how many customers divided by geographies moved from sliver to gold category. In data warehouse terminology this process is called Change Data Capture. There are quite a few systems that leverage database triggers to move these changes to corresponding tracking tables. There are also out of box features provided by some databases e.g. SQL Server 2008 offers Change Data Capture and Change Tracking for addressing such requirements. b) After we make the OLTP system capable of tracking its changes we need to provision a batch process that can run periodically and takes these changes from OLTP system and dump them into data warehouse. There are many tools out there that can help you fill this gap – SQL Server Integration Services happens to be one of them. c) So we have an OLTP system that knows how to track its changes, we have jobs that run periodically to move these changes to warehouse. The question though remains is how warehouse will record these changes? This structural change in data warehouse arena is often covered under something called Slowly Changing Dimension (SCD). While we will talk about dimensions in a while, SCD can be applied to pure relational tables too. SCD enables a database structure to capture historical data. This would create multiple records for a given entity in relational database and data warehouses prefer having their own primary key, often known as surrogate key. As I mentioned a data warehouse is just a relational database but industry often attributes a specific schema style to data warehouses. These styles are Star Schema or Snowflake Schema. The motivation behind these styles is to create a flat database structure (as opposed to normalized one), which is easy to understand / use, easy to query and easy to slice / dice. Star schema is a database structure made up of dimensions and facts. Facts are generally the numbers (sales, quantity, etc.) that you want to slice and dice. Fact tables have these numbers and have references (foreign keys) to set of tables that provide context around those facts. E.g. if you have recorded 10,000 USD as sales that number would go in a sales fact table and could have foreign keys attached to it that refers to the sales agent responsible for sale and to time table which contains the dates between which that sale was made. These agent and time tables are called dimensions which provide context to the numbers stored in fact tables. This schema structure of fact being at center surrounded by dimensions is called Star schema. A similar structure with difference of dimension tables being normalized is called a Snowflake schema. This relational structure of facts and dimensions serves as an input for another analysis structure called Cube. Though physically Cube is a special structure supported by commercial databases like SQL Server Analysis Services, logically it’s a multidimensional structure where dimensions define the sides of cube and facts define the content. Facts are often called as Measures inside a cube. Dimensions often tend to form a hierarchy. E.g. Product may be broken into categories and categories in turn to individual items. Category and Items are often referred as Levels and their constituents as Members with their overall structure called as Hierarchy. Measures are rolled up as per dimensional hierarchy. These rolled up measures are called Aggregates. Now this may seem like an overwhelming vocabulary to deal with but don’t worry it will sink in as you start working with Cubes and others. Let’s see few other terms that we would run into while talking about data warehouses. ODS or an Operational Data Store is a frequently misused term. There would be few users in your organization that want to report on most current data and can’t afford to miss a single transaction for their report. Then there is another set of users that typically don’t care how current the data is. Mostly senior level executives who are interesting in trending, mining, forecasting, strategizing, etc. don’t care for that one specific transaction. This is where an ODS can come in handy. ODS can use the same star schema and the OLAP cubes we saw earlier. The only difference is that the data inside an ODS would be short lived, i.e. for few months and ODS would sync with OLTP system every few minutes. Data warehouse can periodically sync with ODS either daily or weekly depending on business drivers. Data marts are another frequently talked about topic in data warehousing. They are subject-specific data warehouse. Data warehouses that try to span over an enterprise are normally too big to scope, build, manage, track, etc. Hence they are often scaled down to something called Data mart that supports a specific segment of business like sales, marketing, or support. Data marts too, are often designed using star schema model discussed earlier. Industry is divided when it comes to use of data marts. Some experts prefer having data marts along with a central data warehouse. Data warehouse here acts as information staging and distribution hub with spokes being data marts connected via data feeds serving summarized data. Others eliminate the need for a centralized data warehouse citing that most users want to report on detailed data. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Best Practices, Business Intelligence, Data Warehousing, Database, Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
How I do VCS

- by Wes McClure

After years of dabbling with different version control systems and techniques, I wanted to share some of what I like and dislike in a few blog posts. To start this out, I want to talk about how I use VCS in a team environment. These come in a series of tips or best practices that I try to follow. Note: This list is subject to change in the future. Always use some form of version control for all aspects of software development. Development is an evolution. Looking back at where we were is an invaluable asset in that process. This includes data schemas and documentation. Reverting / reapplying changes is absolutely critical for efficient development. The tools I use: Code: Hg (preferred), SVN Database: TSqlMigrations Documents: Sometimes in code repository, also SharePoint with versioning Always tag a commit (changeset) with comments This is a quick way to describe to someone else (or your future self) what the changeset entails. Be brief but courteous. One or two sentences about the task, not the actual changes. Use precommit hooks or setup the central repository to reject changes without comments. Link changesets to documentation If your project management system integrates with version control, or has a way to externally reference stories, tasks etc then leave a reference in the commit. This helps locate more information about the commit and/or related changesets. It’s best to have a precommit hook or system that requires this information, otherwise it’s easy to forget. Ability to work offline is required, including commits and history Yes this requires a DVCS locally but doesn’t require the central repository to be a DVCS. I prefer to use either Git or Hg but if it isn’t possible to migrate the central repository, it’s still possible for a developer to push / pull changes to that repository from a local Hg or Git repository. Never lock resources (files) in a central repository… Rude! We have merge tools for a reason, merging sucked a long time ago, it doesn’t anymore… stop locking files! This is unproductive, rude and annoying to other team members. Always review everything in your commit. Never ever commit a set of files without reviewing the changes in each. Never add a file without asking yourself, deep down inside, does this belong? If you leave to make changes during a review, start the review over when you come back. Never assume you didn’t touch a file, double check. This is another reason why you want to avoid large, infrequent commits. Requirements for tools Quickly show pending changes for the entire repository. Default action for a resource with pending changes is a diff. Pluggable diff & merge tool Produce a unified diff or a diff of all changes. This is helpful to bulk review changes instead of opening each file. The central repository is not your own personal dump yard. Breaking this rule is a sure fire way to get the F bomb dropped in front of your name, multiple times. If you turn on Visual Studio’s commit on closing studio option, I will personally break your fingers. By the way, the person(s) in charge of this feature should be fired and never be allowed near programming, ever again. Commit (integrate) to the central repository / branch frequently I try to do this before leaving each day, especially without a DVCS. One never knows when they might need to work from remote the following day. Never commit commented out code If it isn’t needed anymore, delete it! If you aren’t sure if it might be useful in the future, delete it! This is why we have history. If you don’t know why it’s commented out, figure it out and then either uncomment it or delete it. Don’t commit build artifacts, user preferences and temporary files. Build artifacts do not belong in VCS, everything in them is present in the code. (ie: bin\*, obj\*, *.dll, *.exe) User preferences are your settings, stop overriding my preferences files! (ie: *.suo and *.user files) Most tools allow you to ignore certain files and Hg/Git allow you to version this as an ignore file. Set this up as a first step when creating a new repository! Be polite when merging unresolved conflicts. Count to 10, cuss, grab a stress ball and realize it’s not a big deal. Actually, it’s an opportunity to let you know that someone else is working in the same area and you might want to communicate with them. Following the other rules, especially committing frequently, will reduce the likelihood of this. Suck it up, we all have to deal with this unintended consequence at times. Just be careful and GET FAMILIAR with your merge tool. It’s really not as scary as you think. I personally prefer KDiff3 as its merging capabilities rock. Don’t blindly merge and then blindly commit your changes, this is rude and unprofessional. Make sure you understand why the conflict occurred and which parts of the code you want to keep. Apply scrutiny when you commit a manual merge: review the diff! Make sure you test the changes (build and run automated tests) Become intimate with your version control system and the tools you use with it. Avoid trial and error as much as is possible, sit down and test the tool out, read some tutorials etc. Create test repositories and walk through common scenarios. Find the most efficient way to do your work. These tools will be used repetitively, so inefficiencies will add up. Sometimes this involves a mix of tools, both GUI and CLI. I like a combination of both Tortoise Hg and hg cli to get the job efficiently. Always tag releases Create a way to find a given release, whether this be in comments or an explicit tag / branch. This should be readily discoverable. Create release branches to patch bugs and then merge the changes back to other development branch(es). If using feature branches, strive for periodic integrations. Feature branches often cause forked code that becomes irreconcilable. Strive to re-integrate somewhat frequently with the branch this code will ultimately be merged into. This will avoid merge conflicts in the future. Feature branches are best when they are mutually exclusive of active development in other branches. Use and abuse local commits , at least one per task in a story. This builds a trail of changes in your local repository that can be pushed to a central repository when the story is complete. Never commit a broken build or failing tests to the central repository. It’s ok for a local commit to break the build and/or tests. In fact, I encourage this if it helps group the changes more logically. This is one of the main reasons I got excited about DVCS, when I wanted more than one changeset for a set of pending changes but some files could be grouped into both changesets (like solution file / project file changes). If you have more than a dozen outstanding changed resources, there should probably be more than one commit involved. Exceptions when maintaining code bases that require shotgun surgery, in this case, it’s a design smell :) Don’t version sensitive information Especially usernames / passwords There is one area I haven’t found a solution I like yet: versioning 3rd party libraries and/or code. I really dislike keeping any assemblies in the repository, but seems to be a common practice for external libraries. Please feel free to share your ideas about this below. -Wes

Read the article
Anatomy of a .NET Assembly - CLR metadata 1

- by Simon Cooper

Before we look at the bytes comprising the CLR-specific data inside an assembly, we first need to understand the logical format of the metadata (For this post I only be looking at simple pure-IL assemblies; mixed-mode assemblies & other things complicates things quite a bit). Metadata streams Most of the CLR-specific data inside an assembly is inside one of 5 streams, which are analogous to the sections in a PE file. The name of each section in a PE file starts with a ., and the name of each stream in the CLR metadata starts with a #. All but one of the streams are heaps, which store unstructured binary data. The predefined streams are: #~ Also called the metadata stream, this stream stores all the information on the types, methods, fields, properties and events in the assembly. Unlike the other streams, the metadata stream has predefined contents & structure. #Strings This heap is where all the namespace, type & member names are stored. It is referenced extensively from the #~ stream, as we'll be looking at later. #US Also known as the user string heap, this stream stores all the strings used in code directly. All the strings you embed in your source code end up in here. This stream is only referenced from method bodies. #GUID This heap exclusively stores GUIDs used throughout the assembly. #Blob This heap is for storing pure binary data - method signatures, generic instantiations, that sort of thing. Items inside the heaps (#Strings, #US, #GUID and #Blob) are indexed using a simple binary offset from the start of the heap. At that offset is a coded integer giving the length of that item, then the item's bytes immediately follow. The #GUID stream is slightly different, in that GUIDs are all 16 bytes long, so a length isn't required. Metadata tables The #~ stream contains all the assembly metadata. The metadata is organised into 45 tables, which are binary arrays of predefined structures containing information on various aspects of the metadata. Each entry in a table is called a row, and the rows are simply concatentated together in the file on disk. For example, each row in the TypeRef table contains: A reference to where the type is defined (most of the time, a row in the AssemblyRef table). An offset into the #Strings heap with the name of the type An offset into the #Strings heap with the namespace of the type. in that order. The important tables are (with their table number in hex): 0x2: TypeDef 0x4: FieldDef 0x6: MethodDef 0x14: EventDef 0x17: PropertyDef Contains basic information on all the types, fields, methods, events and properties defined in the assembly. 0x1: TypeRef The details of all the referenced types defined in other assemblies. 0xa: MemberRef The details of all the referenced members of types defined in other assemblies. 0x9: InterfaceImpl Links the types defined in the assembly with the interfaces that type implements. 0xc: CustomAttribute Contains information on all the attributes applied to elements in this assembly, from method parameters to the assembly itself. 0x18: MethodSemantics Links properties and events with the methods that comprise the get/set or add/remove methods of the property or method. 0x1b: TypeSpec 0x2b: MethodSpec These tables provide instantiations of generic types and methods for each usage within the assembly. There are several ways to reference a single row within a table. The simplest is to simply specify the 1-based row index (RID). The indexes are 1-based so a value of 0 can represent 'null'. In this case, which table the row index refers to is inferred from the context. If the table can't be determined from the context, then a particular row is specified using a token. This is a 4-byte value with the most significant byte specifying the table, and the other 3 specifying the 1-based RID within that table. This is generally how a metadata table row is referenced from the instruction stream in method bodies. The third way is to use a coded token, which we will look at in the next post. So, back to the bytes Now we've got a rough idea of how the metadata is logically arranged, we can now look at the bytes comprising the start of the CLR data within an assembly: The first 8 bytes of the .text section are used by the CLR loader stub. After that, the CLR-specific data starts with the CLI header. I've highlighted the important bytes in the diagram. In order, they are: The size of the header. As the header is a fixed size, this is always 0x48. The CLR major version. This is always 2, even for .NET 4 assemblies. The CLR minor version. This is always 5, even for .NET 4 assemblies, and seems to be ignored by the runtime. The RVA and size of the metadata header. In the diagram, the RVA 0x20e4 corresponds to the file offset 0x2e4 Various flags specifying if this assembly is pure-IL, whether it is strong name signed, and whether it should be run as 32-bit (this is how the CLR differentiates between x86 and AnyCPU assemblies). A token pointing to the entrypoint of the assembly. In this case, 06 (the last byte) refers to the MethodDef table, and 01 00 00 refers to to the first row in that table. (after a gap) RVA of the strong name signature hash, which comes straight after the CLI header. The RVA 0x2050 corresponds to file offset 0x250. The rest of the CLI header is mainly used in mixed-mode assemblies, and so is zeroed in this pure-IL assembly. After the CLI header comes the strong name hash, which is a SHA-1 hash of the assembly using the strong name key. After that comes the bodies of all the methods in the assembly concatentated together. Each method body starts off with a header, which I'll be looking at later. As you can see, this is a very small assembly with only 2 methods (an instance constructor and a Main method). After that, near the end of the .text section, comes the metadata, containing a metadata header and the 5 streams discussed above. We'll be looking at this in the next post. Conclusion The CLI header data doesn't have much to it, but we've covered some concepts that will be important in later posts - the logical structure of the CLR metadata and the overall layout of CLR data within the .text section. Next, I'll have a look at the contents of the #~ stream, and how the table data is arranged on disk.

Read the article
Master Data

- by david.butler(at)oracle.com

Let's take a deeper look at what we mean when we talk about 'Master' data. In its most general sense, master data is data that exists in more than one operational application. These are the applications that automate business processes. These applications require significant amounts of data to function correctly. This includes data about the objects that are involved in transactions, as well as the transaction data itself. For example, when a customer buys a product, the transaction is managed by a sales application. The objects of the transaction are the Customer and the Product. The transactional data is the time, place, price, discount, payment methods, etc. used at the point of sale. Many thousands of transactional data attributes are needed within the application. These important data elements are local to the applications and have no bearing on other applications. Harmonization and synchronization across applications is not necessary. The Customer and Product objects of the transaction also have a large number of attributes. Customer for example, includes hierarchies, hierarchical and matrixed relationships, contacts, classifications, preferences, accounts, identifiers, profiles, and addresses galore for 'ship to', 'mail to'; 'service at'; etc. Dozens of attributes exist for individuals, hundreds for organizations, and thousands for products. This data has meaning beyond any particular application. It exists in many applications and drives the vital cross application enterprise business processes. These are the processes that define and differentiate the organization. At every decision point, information about the objects of the process determines the direction of the process flow. This is the nature of the data that exists in more than one application, and this is why we call it 'master data'. Let me elaborate. Parties Oracle has developed a party schema to model all participants in your daily business operations. It models people, organizations, groups, customers, contacts, employees, and suppliers. It models their accounts, locations, classifications, and preferences. And most importantly, it models the vast array of hierarchical and matrixed relationships that exist between all the participants in your real world operations. The model logically separates people and organizations from their relationships and accounts. This separation creates flexibility unmatched in the industry and accounts for the fact that the Oracle schema for Customers, Suppliers, and Accounts is a true superset of the wide variety of commercial and homegrown customer models in existence. Sites Sites are places where business is conducted. They can be addresses, clusters such as retail malls, locations within a cluster, floors within a building, places where meters are located, rooms on floors, etc. Fully understanding all attributes of a site is key to many business processes. Attributes such as 'noise abatement policy' at a point of delivery, or the size of an oven in a business kitchen drive day-to-day activities such as delivery schedules or food promotions. Typically this kind of data is siloed in departments and scattered across applications and spreadsheets. This leads to conflicting information and poor operational efficiencies. Oracle's Global Single Schema can hold all site attributes in one place and enables a single version of authoritative site information across the enterprise. Products and Services The Oracle Global Single Schema also includes a number of entities that define the products and services a company creates and offers for sale. Key entities include Items organized into Catalogs and Price Lists. The Catalog structures provide for the ability to capture different views of a product such as engineering, manufacturing, and service which are based on a unified product model. As a result, designers, manufacturing engineers, purchasers and partners can work simultaneously on a common product definition. The Catalog schema allows for unlimited attributes, combines them into meaningful groups, and maps them to catalog categories to track these different types of information. The model also maps an unlimited number of functional structures for each item. For example, multiple Bills of Material (BOMs) can be constructed representing requirements BOM, features BOM, and packaging BOM for an item. The Catalog model also supports hierarchical information about each item and all standard Global Data Synchronization attributes. Business Processes Utilizing Linked Data Entities Each business entity codified into a centralized master data environment significantly improves the efficiency of the automated business processes that use the consolidated data. When all the key business entities used by an organization's process are so consolidated, the advantages are multiplied. The primary reason for business process breakdowns (i.e. data errors across application boundaries) is eliminated. All processes are positively impacted and business process automation is itself automated. I like to use the "Call to Resolution" business process as an example to help illustrate this important point. It involves call center applications, service applications, RMA applications, transportation applications, inventory applications, etc. Customer, Site, Product and Supplier master data must all be correct and consistent across these applications. What's more, the data relationships between customer and product, and product and suppliers must be right. This is the minimum quality needed to insure the business process flows without error. But that is not the end of the story. Critical master data attributes such as customer loyalty, profitability, credit worthiness, and propensity to buy can optimize the call center point of contact component of the process. Critical product information such as alternative parts or equivalent products can optimize the resolution selected by the process. A comprehensive understanding of the 'service at' location can help insure multiple trips are avoided in the process. Full supplier information on reliability, delivery delays, and potential alternates can prevent supplier exceptions and play a significant role in optimizing the process. In other words, these master data attributes enable the optimization of the "Call to Resolution" enterprise business process. Master data supports and guides business process flows. Thus the phrase 'Master Data' is indeed appropriate. MDM is the software that houses, manages, and governs the master data that resides in all applications and controls the enterprise business processes. A complete master data solution takes a data model that holds fully attributed master data entities and their inter-relationships. Oracle has this model. Oracle, with its deep understanding of application data is the logical choice for managing all your master data within the enterprise whether or not your organization actually runs any Oracle Applications.

Read the article
I see no LOBs!

- by Paul White

Is it possible to see LOB (large object) logical reads from STATISTICS IO output on a table with no LOB columns? I was asked this question today by someone who had spent a good fraction of their afternoon trying to work out why this was occurring – even going so far as to re-run DBCC CHECKDB to see if any corruption had taken place. The table in question wasn’t particularly pretty – it had grown somewhat organically over time, with new columns being added every so often as the need arose. Nevertheless, it remained a simple structure with no LOB columns – no TEXT or IMAGE, no XML, no MAX types – nothing aside from ordinary INT, MONEY, VARCHAR, and DATETIME types. To add to the air of mystery, not every query that ran against the table would report LOB logical reads – just sometimes – but when it did, the query often took much longer to execute. Ok, enough of the pre-amble. I can’t reproduce the exact structure here, but the following script creates a table that will serve to demonstrate the effect: IF OBJECT_ID(N'dbo.Test', N'U') IS NOT NULL DROP TABLE dbo.Test GO CREATE TABLE dbo.Test ( row_id NUMERIC IDENTITY NOT NULL, col01 NVARCHAR(450) NOT NULL, col02 NVARCHAR(450) NOT NULL, col03 NVARCHAR(450) NOT NULL, col04 NVARCHAR(450) NOT NULL, col05 NVARCHAR(450) NOT NULL, col06 NVARCHAR(450) NOT NULL, col07 NVARCHAR(450) NOT NULL, col08 NVARCHAR(450) NOT NULL, col09 NVARCHAR(450) NOT NULL, col10 NVARCHAR(450) NOT NULL, CONSTRAINT [PK dbo.Test row_id] PRIMARY KEY CLUSTERED (row_id) ) ; The next script loads the ten variable-length character columns with one-character strings in the first row, two-character strings in the second row, and so on down to the 450th row: WITH Numbers AS ( -- Generates numbers 1 - 450 inclusive SELECT TOP (450) n = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM master.sys.columns C1, master.sys.columns C2, master.sys.columns C3 ORDER BY n ASC ) INSERT dbo.Test WITH (TABLOCKX) SELECT REPLICATE(N'A', N.n), REPLICATE(N'B', N.n), REPLICATE(N'C', N.n), REPLICATE(N'D', N.n), REPLICATE(N'E', N.n), REPLICATE(N'F', N.n), REPLICATE(N'G', N.n), REPLICATE(N'H', N.n), REPLICATE(N'I', N.n), REPLICATE(N'J', N.n) FROM Numbers AS N ORDER BY N.n ASC ; Once those two scripts have run, the table contains 450 rows and 10 columns of data like this: Most of the time, when we query data from this table, we don’t see any LOB logical reads, for example: -- Find the maximum length of the data in -- column 5 for a range of rows SELECT result = MAX(DATALENGTH(T.col05)) FROM dbo.Test AS T WHERE row_id BETWEEN 50 AND 100 ; But with a different query… -- Read all the data in column 1 SELECT result = MAX(DATALENGTH(T.col01)) FROM dbo.Test AS T ; …suddenly we have 49 LOB logical reads, as well as the ‘normal’ logical reads we would expect. The Explanation If we had tried to create this table in SQL Server 2000, we would have received a warning message to say that future INSERT or UPDATE operations on the table might fail if the resulting row exceeded the in-row storage limit of 8060 bytes. If we needed to store more data than would fit in an 8060 byte row (including internal overhead) we had to use a LOB column – TEXT, NTEXT, or IMAGE. These special data types store the large data values in a separate structure, with just a small pointer left in the original row. Row Overflow SQL Server 2005 introduced a feature called row overflow, which allows one or more variable-length columns in a row to move to off-row storage if the data in a particular row would otherwise exceed 8060 bytes. You no longer receive a warning when creating (or altering) a table that might need more than 8060 bytes of in-row storage; if SQL Server finds that it can no longer fit a variable-length column in a particular row, it will silently move one or more of these columns off the row into a separate allocation unit. Only variable-length columns can be moved in this way (for example the (N)VARCHAR, VARBINARY, and SQL_VARIANT types). Fixed-length columns (like INTEGER and DATETIME for example) never move into ‘row overflow’ storage. The decision to move a column off-row is done on a row-by-row basis – so data in a particular column might be stored in-row for some table records, and off-row for others. In general, if SQL Server finds that it needs to move a column into row-overflow storage, it moves the largest variable-length column record for that row. Note that in the case of an UPDATE statement that results in the 8060 byte limit being exceeded, it might not be the column that grew that is moved! Sneaky LOBs Anyway, that’s all very interesting but I don’t want to get too carried away with the intricacies of row-overflow storage internals. The point is that it is now possible to define a table with non-LOB columns that will silently exceed the old row-size limit and result in ordinary variable-length columns being moved to off-row storage. Adding new columns to a table, expanding an existing column definition, or simply storing more data in a column than you used to – all these things can result in one or more variable-length columns being moved off the row. Note that row-overflow storage is logically quite different from old-style LOB and new-style MAX data type storage – individual variable-length columns are still limited to 8000 bytes each – you can just have more of them now. Having said that, the physical mechanisms involved are very similar to full LOB storage – a column moved to row-overflow leaves a 24-byte pointer record in the row, and the ‘separate storage’ I have been talking about is structured very similarly to both old-style LOBs and new-style MAX types. The disadvantages are also the same: when SQL Server needs a row-overflow column value it needs to follow the in-row pointer a navigate another chain of pages, just like retrieving a traditional LOB. And Finally… In the example script presented above, the rows with row_id values from 402 to 450 inclusive all exceed the total in-row storage limit of 8060 bytes. A SELECT that references a column in one of those rows that has moved to off-row storage will incur one or more lob logical reads as the storage engine locates the data. The results on your system might vary slightly depending on your settings, of course; but in my tests only column 1 in rows 402-450 moved off-row. You might like to play around with the script – updating columns, changing data type lengths, and so on – to see the effect on lob logical reads and which columns get moved when. You might even see row-overflow columns moving back in-row if they are updated to be smaller (hint: reduce the size of a column entry by at least 1000 bytes if you hope to see this). Be aware that SQL Server will not warn you when it moves ‘ordinary’ variable-length columns into overflow storage, and it can have dramatic effects on performance. It makes more sense than ever to choose column data types sensibly. If you make every column a VARCHAR(8000) or NVARCHAR(4000), and someone stores data that results in a row needing more than 8060 bytes, SQL Server might turn some of your column data into pseudo-LOBs – all without saying a word. Finally, some people make a distinction between ordinary LOBs (those that can hold up to 2GB of data) and the LOB-like structures created by row-overflow (where columns are still limited to 8000 bytes) by referring to row-overflow LOBs as SLOBs. I find that quite appealing, but the ‘S’ stands for ‘small’, which makes expanding the whole acronym a little daft-sounding…small large objects anyone? © Paul White 2011 email: [email protected] twitter: @SQL_Kiwi

Read the article
Translating Your Customizations

- by Richard Bingham

This blog post explains the basics of translating the customizations you can make to Fusion Applications products, with the inclusion of information for both composer-based customizations and the generic design-time customizations done via JDeveloper. Introduction Like most Oracle Applications, Fusion Applications installs on-premise with a US-English base language that is, in Release 7, supported by the option to add up to a total of 22 additional language packs (In Oracle Cloud production environments languages are pre-installed already). As such many organizations offer their users the option of working with their local language, and logically that should also apply for any customizations as well. Composer-based UI Customizations Customizations made in Page Composer take into consideration the session LOCALE, as set in the user preferences screen, during all customization work, and stores the customization in the MDS repository accordingly. As such the actual new or changed values used will only apply for the same language under which the customization was made, and text for any other languages requires a separate upload. See the Resource Bundles section below, which incidentally also applies to custom UI changes done in JDeveloper. You may have noticed this when you select the “Select Text Resource” menu option when editing the text on a page. Using this ensures that the resource bundles are used, whereas if you define a static value in Expression Builder it will never be available for translation. Notice in the screenshot below the “What’s New” custom value I have already defined using the ‘Select Text Resource’ feature is internally using the adfBundle groovy function to pull the custom value for my key (RT_S_1) from the ComposerOverrideBundle. Figure 1 – Page Composer showing the override bundle being used. Business Objects Customizing the Business Objects available in the Applications Composer tool for the CRM products, such as adding additional fields, also operates using the session language. Translating these additional values for these fields into other installed languages requires loading additional resource bundles, again as described below. Reports and Analytics Most customizations to Reports and BI Analytics are just essentially reorganizations and visualizations of existing number and text data from the system, and as such will use the appropriate values based on the users session language. Where a translated value or string exists for that session language, it will be used without the need for additional work. Extending through the addition of brand new reports and analytics requires another method of loading the translated strings, as part of what is known as ‘Localizing’ the BI Catalog and Metadata. This time it is via an export/import of XML data through the BI Administrators console, and is described in the OBIEE Admin Guide. Fusion Applications reports based on BI Publisher are already defined in template-per-locale, and in addition provide an extra process for getting the data for translation and reloading. This again uses the standard resource bundle format. Loading a custom report is illustrated in this video from our YouTube channel which shows the screen for both setting the template local and running an export for translation. Fusion Applications Menus Whilst the seeded Navigator and Global Menu values are fully translated when the additional language is installed, if they are customized then the change or new menu item will apply universally, not currently per language. This is set to change in a future release with the new UI Text Editor feature described below. More on Resource Bundles As mentioned above, to provide translations for most of your customizations you need to add values to a resource bundle. This is an industry open standard (OASIS) format XML file with the extension .xliff, and store translated values for the strings used by ADF at run-time. The general process is that these values are exported from the MDS repository, manually edited, and then imported back in again.This needs to be done by an administrator, via either WLST commands or through Enterprise Manager as per the screenshot below. This is detailed out in the Fusion Applications Extensibility Guide. For SaaS environments the Cloud Operations team can assist. Figure 2 – Enterprise Manager’s MDS export used getting resource bundles for manual translation and re-imported on the same screen. All customized strings are stored in an override bundle (xliff file) for each locale, suffixed with the language initials, with English ones being saved to the default. As such each language bundle can be easily identified and updated. Similarly if you used JDeveloper to create your own applications as extensions to Fusion Applications you would use the native support for resource bundles, and add them into the faces-config.xml file for inclusion in your application. An example is this ADF customization video from our YouTube channel. JDeveloper also supports automatic synchronization between your underlying resource bundles and any translatable strings you add – very handy. For more information see chapters on “Using Automatic Resource Bundle Integration in JDeveloper” and “Manually Defining Resource Bundles and Locales” in the Oracle Fusion Middleware Web User Interface Developer’s Guide for Oracle Application Development Framework. FND Messages and Look-ups FND Messages, as defined here, are not used for UI labels (they are known as ‘strings’), but are the responses back to users as a result of an action, such as from a page submit. Each ‘message’ is defined and stored in the related database table (FND_MESSAGES_B), with another (FND_MESSAGES_TL) holding any language-specific values. These come seeded with the additional language installs, however if you customize the messages via the “Manage Messages” task in Functional Setup Manager, or add new ones, then currently (in Release 7) you’ll need to repeat it for each language. Figure 3 – An FND Message defined in an English user session. Similarly Look-ups are stored in a translation table (FND_LOOKUP_VALUES_TL) where appropriate, and can be customized by setting the users session language and making the change in the Setup and Maintenance task entitled “Manage [Standard|Common] Look-ups”. Online Help Yes, in fact all the seeded help is applied as part of each language pack install as part of the post-install provisioning process. If you are editing or adding custom online help then the Create Help screen provides a drop-down of which language your help customization will apply to. This is shown in the video below from our YouTube channel, and obviously you’ll need to it for each language in use. What is Coming for Translations? Currently planned for Release 8 is something called the User Interface (UI) Text Editor. This tool will allow the editing of all the text shown on the pages and forms of Fusion Application. This will provide a search based on a particular term or word, say “Worker”, and will allow it to be adjusted, say to “Employee”, which then updates all the Resource Bundles that contain it. In the case of multi-language environments, it will use the users session language (locale) to know which Resource Bundles to apply the change to. This capability will also support customization sandboxes, to help ensure changes can be tested and approved. It is also interesting to note that the design currently allows any page-specific customizations done using Page Composer or Application Composer to over-write the global changes done via the UI Text Editor, allowing for special context-sensitive values to still be used. Further Reading and Resources The following short list provides the mains resources for digging into more detail on translation support for both Composer and JDeveloper customization projects. There is a dedicated chapter entitled “Translating Custom Text” in the Fusion Applications Extensibility Guide. This has good examples and steps for many tasks, especially administering resource bundles. Using localization formatting (numbers, dates etc) for design-time changes is well documented in the Fusion Applications Developer Guide. For more guidelines on general design-time globalization, see either the ‘Internationalizing and Localizing Pages’ chapter in the Oracle Fusion Middleware Web User Interface Developer’s Guide for Oracle Application Development Framework (Oracle Fusion Applications Edition) or the general Oracle Database Globalization Support Guide. The Oracle Architecture ‘A-Team’ provided a recent post on customizing the user session timeout popup, using design-time changes to resource bundles. It has detailed step-by-step examples which can be a useful illustration.

Read the article
Computer Networks UNISA - Chap 10 – In Depth TCP/IP Networking

- by MarkPearl

After reading this section you should be able to Understand methods of network design unique to TCP/IP networks, including subnetting, CIDR, and address translation Explain the differences between public and private TCP/IP networks Describe protocols used between mail clients and mail servers, including SMTP, POP3, and IMAP4 Employ multiple TCP/IP utilities for network discovery and troubleshooting Designing TCP/IP-Based Networks The following sections explain how network and host information in an IPv4 address can be manipulated to subdivide networks into smaller segments. Subnetting Subnetting separates a network into multiple logically defined segments, or subnets. Networks are commonly subnetted according to geographic locations, departmental boundaries, or technology types. A network administrator might separate traffic to accomplish the following… Enhance security Improve performance Simplify troubleshooting The challenges of Classful Addressing in IPv4 (No subnetting) The simplest type of IPv4 is known as classful addressing (which was the Class A, Class B & Class C network addresses). Classful addressing has the following limitations. Restriction in the number of usable IPv4 addresses (class C would be limited to 254 addresses) Difficult to separate traffic from various parts of a network Because of the above reasons, subnetting was introduced. IPv4 Subnet Masks Subnetting depends on the use of subnet masks to identify how a network is subdivided. A subnet mask indicates where network information is located in an IPv4 address. The 1 in a subnet mask indicates that corresponding bits in the IPv4 address contain network information (likewise 0 indicates the opposite) Each network class is associated with a default subnet mask… Class A = 255.0.0.0 Class B = 255.255.0.0 Class C = 255.255.255.0 An example of calculating the network ID for a particular device with a subnet mask is shown below.. IP Address = 199.34.89.127 Subnet Mask = 255.255.255.0 Resultant Network ID = 199.34.89.0 IPv4 Subnetting Techniques Subnetting breaks the rules of classful IPv4 addressing. Read page 490 for a detailed explanation Calculating IPv4 Subnets Read page 491 – 494 for an explanation Important… Subnetting only applies to the devices internal to your network. Everything external looks at the class of the IP address instead of the subnet network ID. This way, traffic directed to your network externally still knows where to go, and once it has entered your internal network it can then be prioritized and segmented. CIDR (classless Interdomain Routing) CIDR is also known as classless routing or supernetting. In CIDR conventional network class distinctions do not exist, a subnet boundary can move to the left, therefore generating more usable IP addresses on your network. A subnet created by moving the subnet boundary to the left is known as a supernet. With CIDR also came new shorthand for denoting the position of subnet boundaries known as CIDR notation or slash notation. CIDR notation takes the form of the network ID followed by a forward slash (/) followed by the number of bits that are used for the extended network prefix. To take advantage of classless routing, your networks routers must be able to interpret IP addresses that don;t adhere to conventional network class parameters. Routers that rely on older routing protocols (i.e. RIP) are not capable of interpreting classless IP addresses. Internet Gateways Gateways are a combination of software and hardware that enable two different network segments to exchange data. A gateway facilitates communication between different networks or subnets. Because on device cannot send data directly to a device on another subnet, a gateway must intercede and hand off the information. Every device on a TCP/IP based network has a default gateway (a gateway that first interprets its outbound requests to other subnets, and then interprets its inbound requests from other subnets). The internet contains a vast number of routers and gateways. If each gateway had to track addressing information for every other gateway on the Internet, it would be overtaxed. Instead, each handles only a relatively small amount of addressing information, which it uses to forward data to another gateway that knows more about the data’s destination. The gateways that make up the internet backbone are called core gateways. Address Translation An organizations default gateway can also be used to “hide” the organizations internal IP addresses and keep them from being recognized on a public network. A public network is one that any user may access with little or no restrictions. On private networks, hiding IP addresses allows network managers more flexibility in assigning addresses. Clients behind a gateway may use any IP addressing scheme, regardless of whether it is recognized as legitimate by the Internet authorities but as soon as those devices need to go on the internet, they must have legitimate IP addresses to exchange data. When a clients transmission reaches the default gateway, the gateway opens the IP datagram and replaces the client’s private IP address with an Internet recognized IP address. This process is known as NAT (Network Address Translation). TCP/IP Mail Services All Internet mail services rely on the same principles of mail delivery, storage, and pickup, though they may use different types of software to accomplish these functions. Email servers and clients communicate through special TCP/IP application layer protocols. These protocols, all of which operate on a variety of operating systems are discussed below… SMTP (Simple Mail transfer Protocol) The protocol responsible for moving messages from one mail server to another over TCP/IP based networks. SMTP belongs to the application layer of the ODI model and relies on TCP as its transport protocol. Operates from port 25 on the SMTP server Simple sub-protocol, incapable of doing anything more than transporting mail or holding it in a queue MIME (Multipurpose Internet Mail Extensions) The standard message format specified by SMTP allows for lines that contain no more than 1000 ascii characters meaning if you relied solely on SMTP you would have very short messages and nothing like pictures included in an email. MIME us a standard for encoding and interpreting binary files, images, video, and non-ascii character sets within an email message. MIME identifies each element of a mail message according to content type. MIME does not replace SMTP but works in conjunction with it. Most modern email clients and servers support MIME POP (Post Office Protocol) POP is an application layer protocol used to retrieve messages from a mail server POP3 relies on TCP and operates over port 110 With POP3 mail is delivered and stored on a mail server until it is downloaded by a user Disadvantage of POP3 is that it typically does not allow users to save their messages on the server because of this IMAP is sometimes used IMAP (Internet Message Access Protocol) IMAP is a retrieval protocol that was developed as a more sophisticated alternative to POP3 The single biggest advantage IMAP4 has over POP3 is that users can store messages on the mail server, rather than having to continually download them Users can retrieve all or only a portion of any mail message Users can review their messages and delete them while the messages remain on the server Users can create sophisticated methods of organizing messages on the server Users can share a mailbox in a central location Disadvantages of IMAP are typically related to the fact that it requires more storage space on the server. Additional TCP/IP Utilities Nearly all TCP/IP utilities can be accessed from the command prompt on any type of server or client running TCP/IP. The syntaxt may differ depending on the OS of the client. Below is a list of additional TCP/IP utilities – research their use on your own! Ipconfig (Windows) & Ifconfig (Linux) Netstat Nbtstat Hostname, Host & Nslookup Dig (Linux) Whois (Linux) Traceroute (Tracert) Mtr (my traceroute) Route

Read the article
Google Analytics on Android

- by pjv

There is a specific and official analytics SDK for native Android apps (note that I'm not talking about webpages in apps on a phone). This library basically sends pages and events to Google Analytics and you can view your analytics in exactly the same dashboard as for websites. Since my background is apps rather than websites, and since a lot of the Google Analytics terminology seems particularly inapplicable to a native app, I need some pointers. Please discuss my remarks, provide some clarification where you think I'm off-track, and above all share good experiences! 1. Page Views Pages mostly can match different Activities (and Dialogs) being displayed. Activities can be visible behind non-full-screen Activities however, though only the top-level Activity can be interacted. This sort-off clashes with a "(page) view". You'd also want at least one page view for each visit and therefore put one page view tracker in the Application class. However this does not constitute a window or sorts. Usually an Activity will open at the same time, so the time spent on that page will have been 0. This will influence your "time spent" statistics. How are these counted anyway? Moreover, there is a loose coupling between the Activities, by means of Intents. A user can, much like on any website, step in at any Activity, although usually this then concerns resuming the application where he left off. This makes that the hierarchy of Activities usually is very flat. And since there are no url's involved. What meaning would using slashes in page titles have, such as "/Home"? All pages would appear on an equal level in the reports, so no content drilldown. Non-unique page views seem to be counted as some kind of indicator of successfulness: how often does the visitor revisit the page. When the user rotates the screen however usually an Activity resumes again, thus making it a new page view. This happens a lot. Maybe a well-thought-through placement of the call might solve this, or placing several, I'm not sure. How to deal with Page Views? 2. Events I'd say there are two sorts: A user event Something that happened, usually as an indirect consequence of the above. The latter particularly is giving me headaches. First of all, many events aren't written in code any more, but pieced logically together by means of Intents. This means that there is no place to put the analytics call. You'd either have to give up this advantage and start doing it the old-fashioned way in favor of good analytics, or, just be missing some events. Secondly, as a developer you're not so much interested in when a user clicks a button, but if the action that should have been performed really was performed and what the result was. There seems to be no clear way to get resulting data into Google Analytics (what's up with the integers? I want to put in Strings!). The same that applies to the flat pages hierarchy, also goes for the event categories. You could do "vertical" categories (topically, that is), but some code is shared "horizontally" and the tracking will be equally shared. Just as with the Intents mechanism, inheritance makes it hard for you to put the tracking in the right places at all times. And I can't really imagine "horizontal" categories. Unless you start making really small categories, such as all the items form the same menu in one category, I have a hard time grasping the concept. Finally, how do you deal with cancelling? Usually you both have an explicit cancel mechanism by ways of a button, as well as the implicit cancel when the "back"-button is pressed to leave the activity and there were no changes. The latter also applies to "saves", when the back button is pressed and there ARE changes. How are you consequently going to catch all these if not by doing all the "back"-button work yourself? How to deal with events? 3. Goals For goal types I have choice of: URL Destination, Time on Site, and Pages/Visit. Most apps don't have a funnel that leads the user to some "registration done" or "order placed" page. Apps have either already been bought (in which case you want to stimulate the user to love your app, so that he might bring on new buyers) or are paid for by in-app ads. So URL Destination is not a very important goal. Time on Site also seems troublesome. First, I have some doubt on how this would be measured. Second, I don't necessarily want my user to spend a lot of time in my already paid app, just be active and content. Equivalently, why not mention how frequent a user uses your app? Regarding Pages/Visit I already mentioned how screen orientation changes blow up the page view numbers. In an app I'd be most interested in events/visit to measure the user's involvement/activity. If he's intensively using the app then he must be loving it right? Furthermore, I also have some small funnels (that do not lead to conversion though) that I want to see streamlined. In my mind those funnels would end in events rather than page views but that seems not to be possible. I could also measure clickthroughs on in-app ads, but then I'd need to track those as Page Views rather than Events, in view of "URL Destination". What are smart goals for apps and how can you fit them on top of Analytics? 4. Optimisation Is there a smart way to manually do what "Website Optimiser" does for websites? Most importantly, how would I track different landing page designs? 5. Traffic Sources Referrals deal with installation time referrals, if you're smart enough to get them included. But perhaps I'd also want to get some data which third-party app sends users to my app to perform some actions (this app interoperability is possible via Intents). Many of the terminologies related to "Traffic Sources" seem totally meaningless and there is no possibility of connecting in AdSense. What are smart uses of this data? 6. Visitors Of the "Browser capabilities", "Network Properties" and "Mobile" tabs, many things are pointless as they have no influence on / relation with my mostly offline app that won't use flash anyway. Only if you drill down far enough, can you get to OS versions, which do matter a lot. I even forgot where you could check what exact Android devices visited. What are smart uses of this data? How can you make the relevant info more prominent? 7. Other No in-page analytics. I have to register my app as a web-url (What!?)?

Read the article
use svcutil to map multiple namespaces for generating wcf service proxies

- by Pratik

I want to use svcutil to map multiple wsdl namespace to clr namespace when generating service proxies. I use strong versioning of namespaces and hence the generated clr namespaces are awkward and may mean many client side code changes if the wsdl/xsd namespace version changes. A code example would be better to show what I want. // Service code namespace TestService.StoreService { [DataContract(Namespace = "http://mydomain.com/xsd/Model/Store/2009/07/01")] public class Address { [DataMember(IsRequired = true, Order = 0)] public string street { get; set; } } [ServiceContract(Namespace = "http://mydomain.com/wsdl/StoreService-v1.0")] public interface IStoreService { [OperationContract] List<Customer> GetAllCustomersForStore(int storeId); [OperationContract] Address GetStoreAddress(int storeId); } public class StoreService : IStoreService { public List<Customer> GetAllCustomersForStore(int storeId) { throw new NotImplementedException(); } public Address GetStoreAddress(int storeId) { throw new NotImplementedException(); } } } namespace TestService.CustomerService { [DataContract(Namespace = "http://mydomain.com/xsd/Model/Customer/2009/07/01")] public class Address { [DataMember(IsRequired = true, Order = 0)] public string city { get; set; } } [ServiceContract(Namespace = "http://mydomain.com/wsdl/CustomerService-v1.0")] public interface ICustomerService { [OperationContract] Customer GetCustomer(int customerId); [OperationContract] Address GetStoreAddress(int customerId); } public class CustomerService : ICustomerService { public Customer GetCustomer(int customerId) { throw new NotImplementedException(); } public Address GetStoreAddress(int customerId) { throw new NotImplementedException(); } } } namespace TestService.Shared { [DataContract(Namespace = "http://mydomain.com/xsd/Model/Shared/2009/07/01")] public class Customer { [DataMember(IsRequired = true, Order = 0)] public int CustomerId { get; set; } [DataMember(IsRequired = true, Order = 1)] public string FirstName { get; set; } } } 1. svcutil - without namespace mapping svcutil.exe /t:metadata TestSvcUtil\bin\debug\TestService.CustomerService.dll TestSvcUtil\bin\debug\TestService.StoreService.dll svcutil.exe /t:code *.wsdl *.xsd /o:TestClient\WebServiceProxy.cs The generated proxy looks like namespace mydomain.com.xsd.Model.Shared._2009._07._011 { public partial class Customer{} } namespace mydomain.com.xsd.Model.Customer._2009._07._011 { public partial class Address{} } namespace mydomain.com.xsd.Model.Store._2009._07._011 { public partial class Address{} } The client classes are out of any namespaces. Any change to xsd namespace would imply changing all using statements in my client code all build will break. 2. svcutil - with wildcard namespace mapping svcutil.exe /t:metadata TestSvcUtil\bin\debug\TestService.CustomerService.dll TestSvcUtil\bin\debug\TestService.StoreService.dll svcutil.exe /t:code *.wsdl *.xsd /n:*,MyDomain.ServiceProxy /o:TestClient\WebServicesProxy2.cs The generated proxy looks like namespace MyDomain.ServiceProxy { public partial class Customer{} public partial class Address{} public partial class Address1{} public partial class CustomerServiceClient{} public partial class StoreServiceClient{} } Notice that svcutil has automatically changed one of the Address class to Address1. I don't like this. All client classes are also inside the same namespace. What I want Something like this: svcutil.exe /t:code *.wsdl *.xsd /n:"http://mydomain.com/xsd/Model/Shared/2009/07/01, MyDomain.Model.Shared;http://mydomain.com/xsd/Model/Customer/2009/07/01, MyDomain.Model.Customer;http://mydomain.com/wsdl/CustomerService-v1.0, MyDomain.CustomerServiceProxy;http://mydomain.com/xsd/Model/Store/2009/07/01, MyDomain.Model.Store;http://mydomain.com/wsdl/StoreService-v1.0, MyDomain.StoreServiceProxy" /o:TestClient\WebServiceProxy3.cs This way I can logically group the clr namespace and any change to wsdl/xsd namespace is handled in the proxy generation only without affecting the rest of the client side code. Now this is not possible. The svcutil allows to map only one or all namespaces, not a list of mappings. I can do one mapping as shown below but not multiple svcutil.exe /t:code *.wsdl *.xsd /n:"http://mydomain.com/xsd/Model/Store/2009/07/01, MyDomain.Model.Address" /o:TestClient\WebServiceProxy4.cs But is there any solution. Svcutil is not magic, it is written in .Net and programatically generating the proxies. Has anyone written an alternate to svcutil or point me to directions so that I can write one.

Read the article
ADO.NET (WCF) Data Services Query Interceptor Hangs IIS

- by PreMagination

I have an ADO.NET Data Service that's supposed to provide read-only access to a somewhat complex database. Logically I have table-per-type (TPT) inheritance in my data model but the EDM doesn't implement inheritance. (Limitation of EF and navigation properties on derived types. STILL not fixed in EF4!) I can query my EDM directly (using a separate project) using a copy of the query I'm trying to run against the web service, results are returned within 10 seconds. Disabling the query interceptors I'm able to make the same query against the web service, results are returned similarly quickly. I can enable some of the query interceptors and the results are returned slowly, up to a minute or so later. Alternatively, I can enable all the query interceptors, expand less of the properties on the main object I'm querying, and results are returned in a similar period of time. (I've increased some of the timeout periods) Up til this point Sql Profiler indicates the slow-down is the database. (That's a post for a different day) But when I enable all my query interceptors and expand all the properties I'd like to have the IIS worker process pegs the CPU for 20 minutes and a query is never even made against the database. This implies to me that yes, my implementation probably sucks but regardless the Data Services "tier" is having an issue it shouldn't. WCF tracing didn't reveal anything interesting to my untrained eye. Details: Data model: Agent-Person-Student Student has a collection of referrals Students and referrals are private, queries against the web service should only return "your" students and referrals. This means Person and Agent need to be filtered too. Other entities (Agent-Organization-School) can be accessed by anyone who has authenticated. The existing security model is poorly suited to perform this type of filtering for this type of data access, the query interceptors are complicated and cause EF to generate some entertaining sql queries. Sample Interceptor [QueryInterceptor("Agents")] public Expression<Func<Agent, Boolean>> OnQueryAgents() { //Agent is a Person(1), Educator(2), Student(3), or Other Person(13); allow if scope permissions exist return ag => (ag.AgentType.AgentTypeId == 1 || ag.AgentType.AgentTypeId == 2 || ag.AgentType.AgentTypeId == 3 || ag.AgentType.AgentTypeId == 13) && ag.Person.OrganizationPersons.Count<OrganizationPerson>(op => op.Organization.ScopePermissions.Any<ScopePermission> (p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124) || op.Organization.HierarchyDescendents.Any<OrganizationsHierarchy>(oh => oh.AncestorOrganization.ScopePermissions.Any<ScopePermission> (p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124))) > 0; } The query interceptors for Person, Student, Referral are all very similar, ie they traverse multiple same/similar tables to look for ScopePermissions as above. Sample Query var referrals = (from r in service.Referrals .Expand("Organization/ParentOrganization") .Expand("Educator/Person/Agent") .Expand("Student/Person/Agent") .Expand("Student") .Expand("Grade") .Expand("ProblemBehavior") .Expand("Location") .Expand("Motivation") .Expand("AdminDecision") .Expand("OthersInvolved") where r.DateCreated >= coupledays && r.DateDeleted == null select r); Any suggestions or tips would be greatly associated, for fixing my current implementation or in developing a new one, with the caveat that the database can't be changed and that ultimately I need to expose a large portion of the database via a web service that limits data access to the data authorized for, for the purpose of data integration with multiple outside parties. THANK YOU!!!

Read the article
Const references when dereferencing iterator on set, starting from Visual Studio 2010

- by Patrick

Starting from Visual Studio 2010, iterating over a set seems to return an iterator that dereferences the data as 'const data' instead of non-const. The following code is an example of something that does compile on Visual Studio 2005, but not on 2010 (this is an artificial example, but clearly illustrates the problem we found on our own code). In this example, I have a class that stores a position together with a temperature. I define comparison operators (not all them, just enough to illustrate the problem) that only use the position, not the temperature. The point is that for me two instances are identical if the position is identical; I don't care about the temperature. #include <set> class DataPoint { public: DataPoint (int x, int y) : m_x(x), m_y(y), m_temperature(0) {} void setTemperature(double t) {m_temperature = t;} bool operator<(const DataPoint& rhs) const { if (m_x==rhs.m_x) return m_y<rhs.m_y; else return m_x<rhs.m_x; } bool operator==(const DataPoint& rhs) const { if (m_x!=rhs.m_x) return false; if (m_y!=rhs.m_y) return false; return true; } private: int m_x; int m_y; double m_temperature; }; typedef std::set<DataPoint> DataPointCollection; void main(void) { DataPointCollection points; points.insert (DataPoint(1,1)); points.insert (DataPoint(1,1)); points.insert (DataPoint(1,2)); points.insert (DataPoint(1,3)); points.insert (DataPoint(1,1)); for (DataPointCollection::iterator it=points.begin();it!=points.end();++it) { DataPoint &point = *it; point.setTemperature(10); } } In the main routine I have a set to which I add some points. To check the correctness of the comparison operator, I add data points with the same position multiple times. When writing the contents of the set, I can clearly see there are only 3 points in the set. The for-loop loops over the set, and sets the temperature. Logically this is allowed, since the temperature is not used in the comparison operators. This code compiles correctly in Visual Studio 2005, but gives compilation errors in Visual Studio 2010 on the following line (in the for-loop): DataPoint &point = *it; The error given is that it can't assign a "const DataPoint" to a [non-const] "DataPoint &". It seems that you have no decent (= non-dirty) way of writing this code in VS2010 if you have a comparison operator that only compares parts of the data members. Possible solutions are: Adding a const-cast to the line where it gives an error Making temperature mutable and making setTemperature a const method But to me both solutions seem rather 'dirty'. It looks like the C++ standards committee overlooked this situation. Or not? What are clean solutions to solve this problem? Did some of you encounter this same problem and how did you solve it? Patrick

Read the article
Anyone willing to help out a javascript n00b? :-)

- by Splynx

Since I am asking for a lot, and know it, the following is a wall of text for those who might show some interest and want to know a little before offering their help to me. First a little about my level of programming skills, and a little about what I ask for. Where I'm at: I am not totally new to Javascript, and have dabbled a little with PHP earlier - well have dabbled a lot with PHP in fact, but never got good at it because I program alone. And I have until now never used forums to get help etc. other that searching to see if anyone else had my problem before and what the solution was. So I am not a intuitive or talented programmer, I'm more of a very maticulate programmer and you would be surprised how far you can get with if else... (ok that's a joke hehe). My solutions are usually (I am guessing here) not the best ones - and slow I take it, and the code is usually too long and I have to look up most of the stuff I use (really a lot of it is not done in "freehand"). I have a LOT of experience with HTML and CSS, and have always done well formed markup, as well as I am really into x-browsing and always require that my work validates when it's done. I also worry about optimizing a lot, and work with sprites for images, minimize the number of http requests etc, using H1,H2 etc. where it is logically correct, as well as use the correct elements and not just div span or p it... So because I am a workhorse and very maticulate I can actually pull off some quite "advanced" features, but it's always the basics that bite me in the end. Not fully understanding the syntax and so on usually gives me problems. Have recently discovered jQuery - wich is a lot of fun.... But I want to use it for the DOM node manipulation/handling only. As I mentioned I worry about optimizing, and jQuery used for everything seems... well not optimal, it strikes me as doing it yourself when possible is faster than accesing another script that may take a whole lot of other considerations into perspective when handling your variables and objects (and I am just guessing here since I as explained know nothing). So thats where I'm at... As mentioned I just started with javascript for "real" so I do not have much to show, but at the end of my WOT you can see two unfinisheded scripts I have made so you can see where I'm at roughly - just check out the URL without the /feedback.html for the second example (I am only allowed to post 1 link since I am also a SO n00b) (and for those rushing over to a validation service, remember I wrote "when it's done"...) What I ask for: I am figuring this... I have a piece of code I am working on at the moment, and this little project has taught me a whole lot already, and I have "grown" a lot as a javascript programmer. If I add a whole lot of comments to the script, and explain what it is intended to do, will you then show me where: I am writing incorrect code - making mistakes Where/how my code could be more optimal Where I am just simply being a muppet The code I want to use as the background for the tuition is the one here http://projects.1000monkeys.dk/feedback.html Use firebug and have a quick look see...

Read the article
How do I make these fields autopopulate from the database?

- by dmanexe

I have an array, which holds values like this: $items_pool = Array ( [0] => Array ( [id] => 1 [quantity] => 1 ) [1] => Array ( [id] => 2 [quantity] => 1 ) [2] => Array ( [id] => 72 [quantity] => 6 ) [3] => Array ( [id] => 4 [quantity] => 1 ) [4] => Array ( [id] => 5 [quantity] => 1 ) [5] => Array ( [id] => 7 [quantity] => 1 ) [6] => Array ( [id] => 8 [quantity] => 1 ) [7] => Array ( [id] => 9 [quantity] => 1 ) [8] => Array ( [id] => 19 [quantity] => 1 ) [9] => Array ( [id] => 20 [quantity] => 1 ) [10] => Array ( [id] => 22 [quantity] => 1 ) [11] => Array ( [id] => 29 [quantity] => 0 ) ) Next, I have a form that I am trying to populate. It loops through the item database, prints out all the possible items, and checks the ones that are already present in $items_pool. <?php foreach ($items['items_poolpackage']->result() as $item): ?> <input type="checkbox" name="measure[<?=$item->id?>][checkmark]" value="<?=$item->id?>"> <?php endforeach; ?> I know what logically I'm trying to accomplish here, but I can't figure out the programming. What I'm looking for, written loosely is something like this (not real code): <input type="checkbox" name="measure[<?=$item->id?>][checkmark]" value="<?=$item->id?>" <?php if ($items_pool['$item->id']) { echo "SELECTED"; } else { }?>> Specifically this conditional loop through the array, through all the key values (the ID) and if there's a match, the checkbox is selected. <?php if ($items_pool['$item->id']) { echo "SELECTED"; } else { }?> I understand from a loop structured like this that it may mean a lot of 'extra' processing. TL;DR - I need to echo within a loop if the item going through the loop exists within another array.

Read the article
trie reg exp parse step over char and continue

- by forest.peterson

Setup: 1) a string trie database formed from linked nodes and a vector array linking to the next node terminating in a leaf, 2) a recursive regular expression function that if A) char '*' continues down all paths until string length limit is reached, then continues down remaining string paths if valid, and B) char '?' continues down all paths for 1 char and then continues down remaining string paths if valid. 3) after reg expression the candidate strings are measured for edit distance against the 'try' string. Problem: the reg expression works fine for adding chars or swapping ? for a char but if the remaining string has an error then there is not a valid path to a terminating leaf; making the matching function redundant. I tried adding a 'step-over' ? char if the end of the node vector was reached and then followed every path of that node - allowing this step-over only once; resulted in a memory exception; I cannot find logically why it is accessing the vector out of range - bactracking? Questions: 1) how can the regular expression step over an invalid char and continue with the path? 2) why is swapping the 'sticking' char for '?' resulting in an overflow? Function: void Ontology::matchRegExpHelper(nodeT *w, string inWild, Set<string> &matchSet, string out, int level, int pos, int stepover) { if (inWild=="") { matchSet.add(out); } else { if (w->alpha.size() == pos) { int testLength = out.length() + inWild.length(); if (stepover == 0 && matchSet.size() == 0 && out.length() > 8 && testLength == tokenLength) {//candidate generator inWild[0] = '?'; matchRegExpHelper(w, inWild, matchSet, out, level, 0, stepover+1); } else return; //giveup on this path } if (inWild[0] == '?' || (inWild[0] == '*' && (out.length() + inWild.length() ) == level ) ) { //wild matchRegExpHelper(w->alpha[pos].next, inWild.substr(1), matchSet, out+w->alpha[pos].letter, level, 0, stepover);//follow path -> if ontology is full, treat '*' like a '?' } else if (inWild[0] == '*') matchRegExpHelper(w->alpha[pos].next, '*'+inWild.substr(1), matchSet, out+w->alpha[pos].letter, level, 0, stepover); //keep adding chars if (inWild[0] == w->alpha[pos].letter) //follow self matchRegExpHelper(w->alpha[pos].next, inWild.substr(1), matchSet, out+w->alpha[pos].letter, level, 0, stepover); //follow char matchRegExpHelper(w, inWild, matchSet, out, level, pos+1, stepover);//check next path } } Error Message: +str "Attempt to access index 1 in a vector of size 1." std::basic_string<char,std::char_traits<char>,std::allocator<char> > +err {msg="Attempt to access index 1 in a vector of size 1." } ErrorException Note: this function works fine for hundreds of test strings with '*' wilds if the extra stepover gate is not used Semi-Solved: I place a pos < w->alpha.size() condition on each path that calls w->alpha[pos]... - this prevented the backtrack calls from attempting to access the vector with an out of bounds index value. Still have other issues to work out - it loops infinitely adding the ? and backtracking to remove it, then repeat. But, moving forward now. Revised question: why during backtracking is the position index accumulating and/or not deincrementing - so at somepoint it calls w->alpha[pos]... with an invalid position that is either remaining from the next node or somehow incremented pos+1 when passing upward?

Read the article
Trying to reinvent the wheel of StackOverflow to have a good learning experience. Need some suggesti

- by Legend

I want to learn and am not able to do it unless I have a real "mission" to complete. SO is my favorite and I can't imagine a better experience than actually recreating it but not in ASP. I'd like to use PHP+MySQL+jQuery. So far, I have been a self-taught programmer but I would like to master one paradigm that forces you to adhere to the standards. For instance, recently, jQuery forced me to use some "rules". The plugins were supposed to be written in a particular way and that's it. When I started off, everything seemed like Greek and Latin but when I finished a very small plugin, I felt really happy because it forced me to program in a certain way. I am looking for something like this only in a larger project. I've heard a lot about MVC and all but I am confused about the various frameworks out there. Zend seems awesome but looks heavy at the same time and also requires you to have a lot more control over the web-server whereas CakePHP is a good and a fast framework that needs only little control. Do I use one of these or just write my own MVC? I have the following goals: Goals: Site should be fast - I know this depends on my coding skills but I will learn on my way. The framework itself should not slow me down) Setting up the site should not require you to use command-line - This requirement is ok during development. But some frameworks like Symphony require you to initialize certain things through command-line Should support pluggable modules - For instance, if I want to be able to use the FCK editor, I should be able to organize things in a good way. Should be possible to extend - For instance, SO is mainly a Q&A site but I should be able to logically extend it into an Idea Management System (optional but I'm curious). This goes more into code re-usability I guess. I am comfortable with MySQL so I should be done with database design etc. with some serious effort. As for PHP, I can write code on my own but haven't really used any frameworks that much. jQuery, I started off recently and love it. I would be glad if someone can guide me during these initial steps. Precisely, when designing something like SO, I have the following questions: Do I use a framework? If yes, should it be MVC? If MVC, which one is a good and a scalable one? (I'd love something like jQuery that will not die anytime soon) How do I balance the functionality? The same logic can sometimes be made server centric or client centric. (more Ajax?). Is it a good idea to make a heavy javascript site considering the recent advances on client-side JS processing? Just in case anyone is wondering, I am not interested in commercializing any of this. I need a reason to learn something :)

Read the article
SQL SERVER – Concurrency Basics – Guest Post by Vinod Kumar

- by pinaldave

This guest post is by Vinod Kumar. Vinod Kumar has worked with SQL Server extensively since joining the industry over a decade ago. Working on various versions from SQL Server 7.0, Oracle 7.3 and other database technologies – he now works with the Microsoft Technology Center (MTC) as a Technology Architect. Let us read the blog post in Vinod’s own voice. Learning is always fun when it comes to SQL Server and learning the basics again can be more fun. I did write about Transaction Logs and recovery over my blogs and the concept of simplifying the basics is a challenge. In the real world we always see checks and queues for a process – say railway reservation, banks, customer supports etc there is a process of line and queue to facilitate everyone. Shorter the queue higher is the efficiency of system (a.k.a higher is the concurrency). Every database does implement this using checks like locking, blocking mechanisms and they implement the standards in a way to facilitate higher concurrency. In this post, let us talk about the topic of Concurrency and what are the various aspects that one needs to know about concurrency inside SQL Server. Let us learn the concepts as one-liners: Concurrency can be defined as the ability of multiple processes to access or change shared data at the same time. The greater the number of concurrent user processes that can be active without interfering with each other, the greater the concurrency of the database system. Concurrency is reduced when a process that is changing data prevents other processes from reading that data or when a process that is reading data prevents other processes from changing that data. Concurrency is also affected when multiple processes are attempting to change the same data simultaneously. Two approaches to managing concurrent data access: Optimistic Concurrency Model Pessimistic Concurrency Model Concurrency Models Pessimistic Concurrency Default behavior: acquire locks to block access to data that another process is using. Assumes that enough data modification operations are in the system that any given read operation is likely affected by a data modification made by another user (assumes conflicts will occur). Avoids conflicts by acquiring a lock on data being read so no other processes can modify that data. Also acquires locks on data being modified so no other processes can access the data for either reading or modifying. Readers block writer, writers block readers and writers. Optimistic Concurrency Assumes that there are sufficiently few conflicting data modification operations in the system that any single transaction is unlikely to modify data that another transaction is modifying. Default behavior of optimistic concurrency is to use row versioning to allow data readers to see the state of the data before the modification occurs. Older versions of the data are saved so a process reading data can see the data as it was when the process started reading and not affected by any changes being made to that data. Processes modifying the data is unaffected by processes reading the data because the reader is accessing a saved version of the data rows. Readers do not block writers and writers do not block readers, but, writers can and will block writers. Transaction Processing A transaction is the basic unit of work in SQL Server. Transaction consists of SQL commands that read and update the database but the update is not considered final until a COMMIT command is issued (at least for an explicit transaction: marked with a BEGIN TRAN and the end is marked by a COMMIT TRAN or ROLLBACK TRAN). Transactions must exhibit all the ACID properties of a transaction. ACID Properties Transaction processing must guarantee the consistency and recoverability of SQL Server databases. Ensures all transactions are performed as a single unit of work regardless of hardware or system failure. A – Atomicity C – Consistency I – Isolation D- Durability Atomicity: Each transaction is treated as all or nothing – it either commits or aborts. Consistency: ensures that a transaction won’t allow the system to arrive at an incorrect logical state – the data must always be logically correct. Consistency is honored even in the event of a system failure. Isolation: separates concurrent transactions from the updates of other incomplete transactions. SQL Server accomplishes isolation among transactions by locking data or creating row versions. Durability: After a transaction commits, the durability property ensures that the effects of the transaction persist even if a system failure occurs. If a system failure occurs while a transaction is in progress, the transaction is completely undone, leaving no partial effects on data. Transaction Dependencies In addition to supporting all four ACID properties, a transaction might exhibit few other behaviors (known as dependency problems or consistency problems). Lost Updates: Occur when two processes read the same data and both manipulate the data, changing its value and then both try to update the original data to the new value. The second process might overwrite the first update completely. Dirty Reads: Occurs when a process reads uncommitted data. If one process has changed data but not yet committed the change, another process reading the data will read it in an inconsistent state. Non-repeatable Reads: A read is non-repeatable if a process might get different values when reading the same data in two reads within the same transaction. This can happen when another process changes the data in between the reads that the first process is doing. Phantoms: Occurs when membership in a set changes. It occurs if two SELECT operations using the same predicate in the same transaction return a different number of rows. Isolation Levels SQL Server supports 5 isolation levels that control the behavior of read operations. Read Uncommitted All behaviors except for lost updates are possible. Implemented by allowing the read operations to not take any locks, and because of this, it won’t be blocked by conflicting locks acquired by other processes. The process can read data that another process has modified but not yet committed. When using the read uncommitted isolation level and scanning an entire table, SQL Server can decide to do an allocation order scan (in page-number order) instead of a logical order scan (following page pointers). If another process doing concurrent operations changes data and move rows to a new location in the table, the allocation order scan can end up reading the same row twice. Also can happen if you have read a row before it is updated and then an update moves the row to a higher page number than your scan encounters later. Performing an allocation order scan under Read Uncommitted can cause you to miss a row completely – can happen when a row on a high page number that hasn’t been read yet is updated and moved to a lower page number that has already been read. Read Committed Two varieties of read committed isolation: optimistic and pessimistic (default). Ensures that a read never reads data that another application hasn’t committed. If another transaction is updating data and has exclusive locks on data, your transaction will have to wait for the locks to be released. Your transaction must put share locks on data that are visited, which means that data might be unavailable for others to use. A share lock doesn’t prevent others from reading but prevents them from updating. Read committed (snapshot) ensures that an operation never reads uncommitted data, but not by forcing other processes to wait. SQL Server generates a version of the changed row with its previous committed values. Data being changed is still locked but other processes can see the previous versions of the data as it was before the update operation began. Repeatable Read This is a Pessimistic isolation level. Ensures that if a transaction revisits data or a query is reissued the data doesn’t change. That is, issuing the same query twice within a transaction cannot pickup any changes to data values made by another user’s transaction because no changes can be made by other transactions. However, this does allow phantom rows to appear. Preventing non-repeatable read is a desirable safeguard but cost is that all shared locks in a transaction must be held until the completion of the transaction. Snapshot Snapshot Isolation (SI) is an optimistic isolation level. Allows for processes to read older versions of committed data if the current version is locked. Difference between snapshot and read committed has to do with how old the older versions have to be. It’s possible to have two transactions executing simultaneously that give us a result that is not possible in any serial execution. Serializable This is the strongest of the pessimistic isolation level. Adds to repeatable read isolation level by ensuring that if a query is reissued rows were not added in the interim, i.e, phantoms do not appear. Preventing phantoms is another desirable safeguard, but cost of this extra safeguard is similar to that of repeatable read – all shared locks in a transaction must be held until the transaction completes. In addition serializable isolation level requires that you lock data that has been read but also data that doesn’t exist. Ex: if a SELECT returned no rows, you want it to return no. rows when the query is reissued. This is implemented in SQL Server by a special kind of lock called the key-range lock. Key-range locks require that there be an index on the column that defines the range of values. If there is no index on the column, serializable isolation requires a table lock. Gets its name from the fact that running multiple serializable transactions at the same time is equivalent of running them one at a time. Now that we understand the basics of what concurrency is, the subsequent blog posts will try to bring out the basics around locking, blocking, deadlocks because they are the fundamental blocks that make concurrency possible. Now if you are with me – let us continue learning for SQL Server Locking Basics. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Concurrency

Read the article
quick look at: dm_db_index_physical_stats

- by fatherjack

A quick look at the key data from this dmv that can help a DBA keep databases performing well and systems online as the users need them. When the dynamic management views relating to index statistics became available in SQL Server 2005 there was much hype about how they can help a DBA keep their servers running in better health than ever before. This particular view gives an insight into the physical health of the indexes present in a database. Whether they are use or unused, complete or missing some columns is irrelevant, this is simply the physical stats of all indexes; disabled indexes are ignored however. In it’s simplest form this dmv can be executed as: The results from executing this contain a record for every index in every database but some of the columns will be NULL. The first parameter is there so that you can specify which database you want to gather index details on, rather than scan every database. Simply specifying DB_ID() in place of the first NULL achieves this. In order to avoid the NULLS, or more accurately, in order to choose when to have the NULLS you need to specify a value for the last parameter. It takes one of 4 values – DEFAULT, ‘SAMPLED’, ‘LIMITED’ or ‘DETAILED’. If you execute the dmv with each of these values you can see some interesting details in the times taken to complete each step. DECLARE @Start DATETIME DECLARE @First DATETIME DECLARE @Second DATETIME DECLARE @Third DATETIME DECLARE @Finish DATETIME SET @Start = GETDATE() SELECT * FROM [sys].[dm_db_index_physical_stats](DB_ID(), NULL, NULL, NULL, DEFAULT) AS ddips SET @First = GETDATE() SELECT * FROM [sys].[dm_db_index_physical_stats](DB_ID(), NULL, NULL, NULL, 'SAMPLED') AS ddips SET @Second = GETDATE() SELECT * FROM [sys].[dm_db_index_physical_stats](DB_ID(), NULL, NULL, NULL, 'LIMITED') AS ddips SET @Third = GETDATE() SELECT * FROM [sys].[dm_db_index_physical_stats](DB_ID(), NULL, NULL, NULL, 'DETAILED') AS ddips SET @Finish = GETDATE() SELECT DATEDIFF(ms, @Start, @First) AS [DEFAULT] , DATEDIFF(ms, @First, @Second) AS [SAMPLED] , DATEDIFF(ms, @Second, @Third) AS [LIMITED] , DATEDIFF(ms, @Third, @Finish) AS [DETAILED] Running this code will give you 4 result sets; DEFAULT will have 12 columns full of data and then NULLS in the remainder. SAMPLED will have 21 columns full of data. LIMITED will have 12 columns of data and the NULLS in the remainder. DETAILED will have 21 columns full of data. So, from this we can deduce that the DEFAULT value (the same one that is also applied when you query the view using a NULL parameter) is the same as using LIMITED. Viewing the final result set has some details that are worth noting: Running queries against this view takes significantly longer when using the SAMPLED and DETAILED values in the last parameter. The duration of the query is directly related to the size of the database you are working in so be careful running this on big databases unless you have tried it on a test server first. Let’s look at the data we get back with the DEFAULT value first of all and then progress to the extra information later. We know that the first parameter that we supply has to be a database id and for the purposes of this blog we will be providing that value with the DB_ID function. We could just as easily put a fixed value in there or a function such as DB_ID (‘AnyDatabaseName’). The first columns we get back are database_id and object_id. These are pretty explanatory and we can wrap those in some code to make things a little easier to read: SELECT DB_NAME([ddips].[database_id]) AS [DatabaseName] , OBJECT_NAME([ddips].[object_id]) AS [TableName] … FROM [sys].[dm_db_index_physical_stats](DB_ID(), NULL, NULL, NULL, NULL) AS ddips gives us SELECT DB_NAME([ddips].[database_id]) AS [DatabaseName] , OBJECT_NAME([ddips].[object_id]) AS [TableName], [i].[name] AS [IndexName] , ….. FROM [sys].[dm_db_index_physical_stats](DB_ID(), NULL, NULL, NULL, NULL) AS ddips INNER JOIN [sys].[indexes] AS i ON [ddips].[index_id] = [i].[index_id] AND [ddips].[object_id] = [i].[object_id] These handily tie in with the next parameters in the query on the dmv. If you specify an object_id and an index_id in these then you get results limited to either the table or the specific index. Once again we can place a function in here to make it easier to work with a specific table. eg. SELECT * FROM [sys].[dm_db_index_physical_stats] (DB_ID(), OBJECT_ID(‘AdventureWorks2008.Person.Address’) , 1, NULL, NULL) AS ddips Note: Despite me showing that functions can be placed directly in the parameters for this dmv, best practice recommends that functions are not used directly in the function as it is possible that they will fail to return a valid object ID. To be certain of not passing invalid values to this function, and therefore setting an automated process off on the wrong path, declare variables for the OBJECT_IDs and once they have been validated, use them in the function: DECLARE @db_id SMALLINT; DECLARE @object_id INT; SET @db_id = DB_ID(N’AdventureWorks_2008′); SET @object_id = OBJECT_ID(N’AdventureWorks_2008.Person.Address’); IF @db_id IS NULL BEGINPRINT N’Invalid database’; ENDELSE IF @object_id IS NULL BEGINPRINT N’Invalid object’; ENDELSE BEGINSELECT * FROM sys.dm_db_index_physical_stats (@db_id, @object_id, NULL, NULL , ‘LIMITED’); END; GO In cases where the results of querying this dmv don’t have any effect on other processes (i.e. simply viewing the results in the SSMS results area) then it will be noticed when the results are not consistent with the expected results and in the case of this blog this is the method I have used. So, now we can relate the values in these columns to something that we recognise in the database lets see what those other values in the dmv are all about. The next columns are: We’ll skip partition_number, index_type_desc, alloc_unit_type_desc, index_depth and index_level as this is a quick look at the dmv and they are pretty self explanatory. The final columns revealed by querying this view in the DEFAULT mode are avg_fragmentation_in_percent. This is the amount that the index is logically fragmented. It will show NULL when the dmv is queried in SAMPLED mode. fragment_count. The number of pieces that the index is broken into. It will show NULL when the dmv is queried in SAMPLED mode. avg_fragment_size_in_pages. The average size, in pages, of a single fragment in the leaf level of the IN_ROW_DATA allocation unit. It will show NULL when the dmv is queried in SAMPLED mode. page_count. Total number of index or data pages in use. OK, so what does this give us? Well, there is an obvious correlation between fragment_count, page_count and avg_fragment_size-in_pages. We see that an index that takes up 27 pages and is in 3 fragments has an average fragment size of 9 pages (27/3=9). This means that for this index there are 3 separate places on the hard disk that SQL Server needs to locate and access to gather the data when it is requested by a DML query. If this index was bigger than 72KB then having it’s data in 3 pieces might not be too big an issue as each piece would have a significant piece of data to read and the speed of access would not be too poor. If the number of fragments increases then obviously the amount of data in each piece decreases and that means the amount of work for the disks to do in order to retrieve the data to satisfy the query increases and this would start to decrease performance. This information can be useful to keep in mind when considering the value in the avg_fragmentation_in_percent column. This is arrived at by an internal algorithm that gives a value to the logical fragmentation of the index taking into account the multiple files, type of allocation unit and the previously mentioned characteristics if index size (page_count) and fragment_count. Seeing an index with a high avg_fragmentation_in_percent value will be a call to action for a DBA that is investigating performance issues. It is possible that tables will have indexes that suffer from rapid increases in fragmentation as part of normal daily business and that regular defragmentation work will be needed to keep it in good order. In other cases indexes will rarely become fragmented and therefore not need rebuilding from one end of the year to another. Keeping this in mind DBAs need to use an ‘intelligent’ process that assesses key characteristics of an index and decides on the best, if any, defragmentation method to apply should be used. There is a simple example of this in the sample code found in the Books OnLine content for this dmv, in example D. There are also a couple of very popular solutions created by SQL Server MVPs Michelle Ufford and Ola Hallengren which I would wholly recommend that you review for much further detail on how to care for your SQL Server indexes. Right, let’s get back on track then. Querying the dmv with the fifth parameter value as ‘DETAILED’ takes longer because it goes through the index and refreshes all data from every level of the index. As this blog is only a quick look a we are going to skate right past ghost_record_count and version_ghost_record_count and discuss avg_page_space_used_in_percent, record_count, min_record_size_in_bytes, max_record_size_in_bytes and avg_record_size_in_bytes. We can see from the details below that there is a correlation between the columns marked. Column 1 (Page_Count) is the number of 8KB pages used by the index, column 2 is how full each page is (how much of the 8KB has actual data written on it), column 3 is how many records are recorded in the index and column 4 is the average size of each record. This approximates to: ((Col1*8) * 1024*(Col2/100))/Col3 = Col4*. avg_page_space_used_in_percent is an important column to review as this indicates how much of the disk that has been given over to the storage of the index actually has data on it. This value is affected by the value given for the FILL_FACTOR parameter when creating an index. avg_record_size_in_bytes is important as you can use it to get an idea of how many records are in each page and therefore in each fragment, thus reinforcing how important it is to keep fragmentation under control. min_record_size_in_bytes and max_record_size_in_bytes are exactly as their names set them out to be. A detail of the smallest and largest records in the index. Purely offered as a guide to the DBA to better understand the storage practices taking place. So, keeping an eye on avg_fragmentation_in_percent will ensure that your indexes are helping data access processes take place as efficiently as possible. Where fragmentation recurs frequently then potentially the DBA should consider; the fill_factor of the index in order to leave space at the leaf level so that new records can be inserted without causing fragmentation so rapidly. the columns used in the index should be analysed to avoid new records needing to be inserted in the middle of the index but rather always be added to the end. * – it’s approximate as there are many factors associated with things like the type of data and other database settings that affect this slightly. Another great resource for working with SQL Server DMVs is Performance Tuning with SQL Server Dynamic Management Views by Louis Davidson and Tim Ford – a free ebook or paperback from Simple Talk. Disclaimer – Jonathan is a Friend of Red Gate and as such, whenever they are discussed, will have a generally positive disposition towards Red Gate tools. Other tools are often available and you should always try others before you come back and buy the Red Gate ones. All code in this blog is provided “as is” and no guarantee, warranty or accuracy is applicable or inferred, run the code on a test server and be sure to understand it before you run it on a server that means a lot to you or your manager.

Read the article
Taming Hopping Windows

- by Roman Schindlauer

At first glance, hopping windows seem fairly innocuous and obvious. They organize events into windows with a simple periodic definition: the windows have some duration d (e.g. a window covers 5 second time intervals), an interval or period p (e.g. a new window starts every 2 seconds) and an alignment a (e.g. one of those windows starts at 12:00 PM on March 15, 2012 UTC). var wins = xs .HoppingWindow(TimeSpan.FromSeconds(5), TimeSpan.FromSeconds(2), new DateTime(2012, 3, 15, 12, 0, 0, DateTimeKind.Utc)); Logically, there is a window with start time a + np and end time a + np + d for every integer n. That’s a lot of windows. So why doesn’t the following query (always) blow up? var query = wins.Select(win => win.Count()); A few users have asked why StreamInsight doesn’t produce output for empty windows. Primarily it’s because there is an infinite number of empty windows! (Actually, StreamInsight uses DateTimeOffset.MaxValue to approximate “the end of time” and DateTimeOffset.MinValue to approximate “the beginning of time”, so the number of windows is lower in practice.) That was the good news. Now the bad news. Events also have duration. Consider the following simple input: var xs = this.Application .DefineEnumerable(() => new[] { EdgeEvent.CreateStart(DateTimeOffset.UtcNow, 0) }) .ToStreamable(AdvanceTimeSettings.IncreasingStartTime); Because the event has no explicit end edge, it lasts until the end of time. So there are lots of non-empty windows if we apply a hopping window to that single event! For this reason, we need to be careful with hopping window queries in StreamInsight. Or we can switch to a custom implementation of hopping windows that doesn’t suffer from this shortcoming. The alternate window implementation produces output only when the input changes. We start by breaking up the timeline into non-overlapping intervals assigned to each window. In figure 1, six hopping windows (“Windows”) are assigned to six intervals (“Assignments”) in the timeline. Next we take input events (“Events”) and alter their lifetimes (“Altered Events”) so that they cover the intervals of the windows they intersect. In figure 1, you can see that the first event e1 intersects windows w1 and w2 so it is adjusted to cover assignments a1 and a2. Finally, we can use snapshot windows (“Snapshots”) to produce output for the hopping windows. Notice however that instead of having six windows generating output, we have only four. The first and second snapshots correspond to the first and second hopping windows. The remaining snapshots however cover two hopping windows each! While in this example we saved only two events, the savings can be more significant when the ratio of event duration to window duration is higher. Figure 1: Timeline The implementation of this strategy is straightforward. We need to set the start times of events to the start time of the interval assigned to the earliest window including the start time. Similarly, we need to modify the end times of events to the end time of the interval assigned to the latest window including the end time. The following snap-to-boundary function that rounds a timestamp value t down to the nearest value t' <= t such that t' is a + np for some integer n will be useful. For convenience, we will represent both DateTime and TimeSpan values using long ticks: static long SnapToBoundary(long t, long a, long p) { return t - ((t - a) % p) - (t > a ? 0L : p); } How do we find the earliest window including the start time for an event? It’s the window following the last window that does not include the start time assuming that there are no gaps in the windows (i.e. duration < interval), and limitation of this solution. To find the end time of that antecedent window, we need to know the alignment of window ends: long e = a + (d % p); Using the window end alignment, we are finally ready to describe the start time selector: static long AdjustStartTime(long t, long e, long p) { return SnapToBoundary(t, e, p) + p; } To find the latest window including the end time for an event, we look for the last window start time (non-inclusive): public static long AdjustEndTime(long t, long a, long d, long p) { return SnapToBoundary(t - 1, a, p) + p + d; } Bringing it together, we can define the translation from events to ‘altered events’ as in Figure 1: public static IQStreamable<T> SnapToWindowIntervals<T>(IQStreamable<T> source, TimeSpan duration, TimeSpan interval, DateTime alignment) { if (source == null) throw new ArgumentNullException("source"); // reason about DateTime and TimeSpan in ticks long d = Math.Min(DateTime.MaxValue.Ticks, duration.Ticks); long p = Math.Min(DateTime.MaxValue.Ticks, Math.Abs(interval.Ticks)); // set alignment to earliest possible window var a = alignment.ToUniversalTime().Ticks % p; // verify constraints of this solution if (d <= 0L) { throw new ArgumentOutOfRangeException("duration"); } if (p == 0L || p > d) { throw new ArgumentOutOfRangeException("interval"); } // find the alignment of window ends long e = a + (d % p); return source.AlterEventLifetime( evt => ToDateTime(AdjustStartTime(evt.StartTime.ToUniversalTime().Ticks, e, p)), evt => ToDateTime(AdjustEndTime(evt.EndTime.ToUniversalTime().Ticks, a, d, p)) - ToDateTime(AdjustStartTime(evt.StartTime.ToUniversalTime().Ticks, e, p))); } public static DateTime ToDateTime(long ticks) { // just snap to min or max value rather than under/overflowing return ticks < DateTime.MinValue.Ticks ? new DateTime(DateTime.MinValue.Ticks, DateTimeKind.Utc) : ticks > DateTime.MaxValue.Ticks ? new DateTime(DateTime.MaxValue.Ticks, DateTimeKind.Utc) : new DateTime(ticks, DateTimeKind.Utc); } Finally, we can describe our custom hopping window operator: public static IQWindowedStreamable<T> HoppingWindow2<T>( IQStreamable<T> source, TimeSpan duration, TimeSpan interval, DateTime alignment) { if (source == null) { throw new ArgumentNullException("source"); } return SnapToWindowIntervals(source, duration, interval, alignment).SnapshotWindow(); } By switching from HoppingWindow to HoppingWindow2 in the following example, the query returns quickly rather than gobbling resources and ultimately failing! public void Main() { var start = new DateTimeOffset(new DateTime(2012, 6, 28), TimeSpan.Zero); var duration = TimeSpan.FromSeconds(5); var interval = TimeSpan.FromSeconds(2); var alignment = new DateTime(2012, 3, 15, 12, 0, 0, DateTimeKind.Utc); var events = this.Application.DefineEnumerable(() => new[] { EdgeEvent.CreateStart(start.AddSeconds(0), "e0"), EdgeEvent.CreateStart(start.AddSeconds(1), "e1"), EdgeEvent.CreateEnd(start.AddSeconds(1), start.AddSeconds(2), "e1"), EdgeEvent.CreateStart(start.AddSeconds(3), "e2"), EdgeEvent.CreateStart(start.AddSeconds(9), "e3"), EdgeEvent.CreateEnd(start.AddSeconds(3), start.AddSeconds(10), "e2"), EdgeEvent.CreateEnd(start.AddSeconds(9), start.AddSeconds(10), "e3"), }).ToStreamable(AdvanceTimeSettings.IncreasingStartTime); var adjustedEvents = SnapToWindowIntervals(events, duration, interval, alignment); var query = from win in HoppingWindow2(events, duration, interval, alignment) select win.Count(); DisplayResults(adjustedEvents, "Adjusted Events"); DisplayResults(query, "Query"); } As you can see, instead of producing a massive number of windows for the open start edge e0, a single window is emitted from 12:00:15 AM until the end of time: Adjusted Events StartTime EndTime Payload 6/28/2012 12:00:01 AM 12/31/9999 11:59:59 PM e0 6/28/2012 12:00:03 AM 6/28/2012 12:00:07 AM e1 6/28/2012 12:00:05 AM 6/28/2012 12:00:15 AM e2 6/28/2012 12:00:11 AM 6/28/2012 12:00:15 AM e3 Query StartTime EndTime Payload 6/28/2012 12:00:01 AM 6/28/2012 12:00:03 AM 1 6/28/2012 12:00:03 AM 6/28/2012 12:00:05 AM 2 6/28/2012 12:00:05 AM 6/28/2012 12:00:07 AM 3 6/28/2012 12:00:07 AM 6/28/2012 12:00:11 AM 2 6/28/2012 12:00:11 AM 6/28/2012 12:00:15 AM 3 6/28/2012 12:00:15 AM 12/31/9999 11:59:59 PM 1 Regards, The StreamInsight Team

Read the article

Search Results

Search found 313 results on 13 pages for 'logically'.

Page 12/13 | < Previous Page | 8 9 10 11 12 13 | Next Page >

- by VoidKing

- by Dan Tao

- by JMan

- by drharris

- by Sam Gammon

- by Tommo

- by DigiMortal

- by pinaldave

- by Wes McClure

- by Simon Cooper

- by david.butler(at)oracle.com

- by Paul White

- by Richard Bingham

- by MarkPearl

- by pjv

- by Pratik

- by PreMagination

- by Patrick

- by Splynx

- by dmanexe

- by forest.peterson

- by Legend

- by pinaldave

- by fatherjack

- by Roman Schindlauer

< Previous Page | 8 9 10 11 12 13 | Next Page >