Search Results

Search found 883 results on 36 pages for 'subset'.

Page 8/36 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15  | Next Page >

  • Best way in Python to determine all possible intersections in a matrix?

    - by ssweens
    So if I have a matrix (list of lists) of unique words as my column headings, document ids as my row headings, and a 0 or 1 as the values if the word exists in that particular document. What I'd like to know is how to determine all the possible combinations of words and documents where more than one word is in common with more than one document. So something like: [[Docid_3, Docid_5], ['word1', 'word17', 'word23']], [[Docid_3, Docid_9, Docid_334], ['word2', 'word7', 'word23', 'word68', 'word982']], and so on for each possible combination. Would love a solution that provides the complete set of combinations and one that yields only the combinations that are not a subset of another, so from the example, not [[Docid_3, Docid_5], ['word1', 'word17']] since it's a complete subset of the first example. I feel like there is an elegant solution that just isn't coming to mind and the beer isn't helping. Thanks.

    Read the article

  • Alfresco: Restrict categories that can be selected

    - by vegemite4me
    I have a custom content type in Alfresco (3.3 Enterprise, if it matters) and I can assign one or more categories to that content. So far so good. But can I restrict the set of possible categories to only a subset of all categories? If, for example, categories looked like below, how can I restrict the user to only selecting a region subcategory (e.g. Europe, South America, etc). Categories + Software Document Classification <- I do not want these to be picked. | + Utilisation Documents | + Software Descriptions | + ... | + Regions <- I want to restrict the | + Latin America user to this subset of categories. | + Europe | + ... + ... Is this possible?

    Read the article

  • How to obtain a random sub-datatable from another data table

    - by developerit
    Introduction In this article, I’ll show how to get a random subset of data from a DataTable. This is useful when you already have queries that are filtered correctly but returns all the rows. Analysis I came across this situation when I wanted to display a random tag cloud. I already had the query to get the keywords ordered by number of clicks and I wanted to created a tag cloud. Tags that are the most popular should have more chance to get picked and should be displayed larger than less popular ones. Implementation In this code snippet, there is everything you need. ' Min size, in pixel for the tag Private Const MIN_FONT_SIZE As Integer = 9 ' Max size, in pixel for the tag Private Const MAX_FONT_SIZE As Integer = 14 ' Basic function that retreives Tags from a DataBase Public Shared Function GetTags() As MediasTagsDataTable ' Simple call to the TableAdapter, to get the Tags ordered by number of clicks Dim dt As MediasTagsDataTable = taMediasTags.GetDataValide ' If the query returned no result, return an empty DataTable If dt Is Nothing OrElse dt.Rows.Count < 1 Then Return New MediasTagsDataTable End If ' Set the font-size of the group of data ' We are dividing our results into sub set, according to their number of clicks ' Example: 10 results -> [0,2] will get font size 9, [3,5] will get font size 10, [6,8] wil get 11, ... ' This is the number of elements in one group Dim groupLenth As Integer = CType(Math.Floor(dt.Rows.Count / (MAX_FONT_SIZE - MIN_FONT_SIZE)), Integer) ' Counter of elements in the same group Dim counter As Integer = 0 ' Counter of groups Dim groupCounter As Integer = 0 ' Loop througt the list For Each row As MediasTagsRow In dt ' Set the font-size in a custom column row.c_FontSize = MIN_FONT_SIZE + groupCounter ' Increment the counter counter += 1 ' If the group counter is less than the counter If groupLenth <= counter Then ' Start a new group counter = 0 groupCounter += 1 End If Next ' Return the new DataTable with font-size Return dt End Function ' Function that generate the random sub set Public Shared Function GetRandomSampleTags(ByVal KeyCount As Integer) As MediasTagsDataTable ' Get the data Dim dt As MediasTagsDataTable = GetTags() ' Create a new DataTable that will contains the random set Dim rep As MediasTagsDataTable = New MediasTagsDataTable ' Count the number of row in the new DataTable Dim count As Integer = 0 ' Random number generator Dim rand As New Random() While count < KeyCount Randomize() ' Pick a random row Dim r As Integer = rand.Next(0, dt.Rows.Count - 1) Dim tmpRow As MediasTagsRow = dt(r) ' Import it into the new DataTable rep.ImportRow(tmpRow) ' Remove it from the old one, to be sure not to pick it again dt.Rows.RemoveAt(r) ' Increment the counter count += 1 End While ' Return the new sub set Return rep End Function Pro’s This method is good because it doesn’t require much work to get it work fast. It is a good concept when you are working with small tables, let says less than 100 records. Con’s If you have more than 100 records, out of memory exception may occur since we are coping and duplicating rows. I would consider using a stored procedure instead.

    Read the article

  • LLBLGen Pro feature highlights: model views

    - by FransBouma
    (This post is part of a series of posts about features of the LLBLGen Pro system) To be able to work with large(r) models, it's key you can view subsets of these models so you can have a better, more focused look at them. For example because you want to display how a subset of entities relate to one another in a different way than the list of entities. LLBLGen Pro offers this in the form of Model Views. Model Views are views on parts of the entity model of a project, and the subsets are displayed in a graphical way. Additionally, one can add documentation to a Model View. As Model Views are displaying parts of the model in a graphical way, they're easier to explain to people who aren't familiar with entity models, e.g. the stakeholders you're interviewing for your project. The documentation can then be used to communicate specifics of the elements on the model view to the developers who have to write the actual code. Below I've included an example. It's a model view on a subset of the entities of AdventureWorks. It displays several entities, their relationships (both relational and inheritance relationships) and also some specifics gathered from the interview with the stakeholder. As the information is inside the actual project the developer will work with, the information doesn't have to be converted back/from e.g .word documents or other intermediate formats, it's the same project. This makes sure there are less errors / misunderstandings. (of course you can hide the docked documentation pane or dock it to another corner). The Model View can contain entities which are placed in different groups. This makes it ideal to group entities together for close examination even though they're stored in different groups. The Model View is a first-class citizen of the code-generator. This means you can write templates which consume Model Views and generate code accordingly. E.g. you can write a template which generates a service per Model View and exposes the entities in the Model View as a single entity graph, fetched through a method. (This template isn't included in the LLBLGen Pro package, but it's easy to write it up yourself with the built-in template editor). Viewing an entity model in different ways is key to fully understand the entity model and Model Views help with that.

    Read the article

  • “Cloud Integration in Minutes” – True or False?

    - by Bruce Tierney
    The short answer is “yes”. Connecting on-premise and cloud applications “in minutes” is true…provided you only consider the connectivity subset of integration and have a small number of cloud integration touch points. At the recent Gartner AADI conference, 230 attendees filled up the Oracle session to get a more comprehensive answer to this question. During the session, titled “Simplifying Integration – The Cloud & Mobile Pre-requisite”, Oracle’s Tim Hall described cloud connectivity and then, equally importantly, the other essential and sometimes overlooked aspects of integration required to ensure a long term application and service integration strategy. To understand the challenges and opportunities faced by cloud integration, the session started off with a slide that describes how connectivity can quickly transition from simplicity to complexity as the number of applications and service vendor instances grows: Increased complexity puts increased demand on the integration platform As companies expand from on-premise applications into a hybrid on-premise/cloud infrastructure with support for mobile, cloud, and social, there is a new sense of urgency to implement a unified and comprehensive service integration platform. Without getting this unified platform in place, companies face increased complexity and cost managing a growing patchwork of niche integration toolsets as well as the disparate standards mandated by each SaaS vendor as shown in the image below: dddddddddddddddddddd Incomplete and overlapping offerings from a patchwork of niche vendors Also at Gartner AADI, Oracle SOA Suite customer Geeta Pyne, Director of Middleware at BMC presented their successful strategy on how BMC efficiently manages their cloud integration despite disparate requirements from each vendor. From one of Geeta’s slide: Interfaces are dictated by SaaS vendors; wide variety (SOAP, REST, Socket, HTTP/POX, SFTP); Flexibility of Oracle Service Bus/SOA Suite helps to support Every vendor has their way to handle Security; WS-Security, Custom Header; Support in Oracle Service Bus helps to adhere to disparate requirements At BMC, the flexibility of Oracle Service Bus and Oracle SOA Suite allowed them to support the wide variation in the functional requirements as mandated by their SaaS vendors. In contrast to the patchwork platform approach of escalating complexity from overlapping SaaS toolkits, Oracle’s strategy is to provide a unified platform to support disparate requirements from your SaaS vendors, on-premise apps, legacy apps, and more. Furthermore, Oracle SOA Suite includes the many aspects of comprehensive integration beyond basic connectivity including orchestration, analytics (BAM, events…), service virtualization and more in a single unified interface. Oracle SOA Suite – Unified and comprehensive To summarize, yes you can achieve “cloud integration in minutes” when considering the connectivity subset of integration but be sure to look for ways to simplify as you consider a more comprehensive view of integration beyond basic connectivity such as service virtualization, management, event processing and more. And finally, be sure your integration platform has the deep flexibility to handle the requirements of all your future SaaS applications…many of which are unknown to you now.

    Read the article

  • ANGLE wined3d in reverse

    <b>Wine-Reviews:</b> "Were happy to announce a new open source project called Almost Native Graphics Layer Engine, or ANGLE for short. The goal of ANGLE is to layer WebGLs subset of the OpenGL ES 2.0 API over DirectX 9.0c API calls."

    Read the article

  • JSR 360 and JSR 361: A Big Leap for Java ME 8

    - by terrencebarr
    It might have gone unnoticed to some, but Java ME took a big leap forward a couple of weeks ago with the filing of two new JSRs: JSR 360: “Connected Limited Device Configuration 8″ (aka CLDC 8) JSR 361: “Java ME Embedded Profile” (aka ME EP) Together, these two JSRs will significantly update, enhance, and modernize the Java ME platform, and specifically small embedded Java, with a host of new features and functionality. JSR 360 – Connected Limited Device Configuration 8 CLDC 8 is based on JSR 139 (CLDC 1.1) and updates the core Java ME VM, language support, libraries, and features to be aligned with Java SE 8. This will include: VM updated to comply with the JVM language specification version 2 Support for SE 7/8 language features like Generics, Assertions, Annotations, Try-with-Resources, and more New libraries such as Collections, NIO subset, Logging API subset A consolidated and enhanced Generic Connection Framework for multi-protocol I/O With CLDC 8, Java ME and Java SE are entering their next phase of alignment – making Java the only technology today that truly scales application development, code re-use, and tooling across the whole range of IT platforms, from small embedded to large enterprise. JSR 361 – Java ME Embedded Profile ME EP is based on JSR 228 (IMP-NG) and updates the specification in key areas to provide a powerful and flexible application environment for small embedded Java platforms, building on the features of CLDC 8:  A new, lightweight component and services model Shared libraries Multi-application concurrency, inter-application communication, and event system Application management API optionality, to address low-footprint use cases With ME EP, application developers will have a modern application environment which allows development and deployment of  modular, robust, sophisticated, and footprint-optimized solutions for a wide range of embedded use cases and devices. Summary While these JSRs are still under development, it’s clear that there are exciting new times ahead for Java ME – turning into a serious application platform while maintaining the focus on resource-constrained devices to address the expected explosion of small, smart, and connected embedded platforms. To learn more, click on the above links for JSR 360 and JSR 361. Or review the JavaOne 2012 online presentations on the topic: CON11300: Expanding the reach of the Java ME Platform CON5943: Java ME 8 Service Platform And stay tuned for more in this space! Cheers, – Terrence Filed under: Mobile & Embedded Tagged: "jsr 360", "jsr 361", "me 8", embedded, Embedded Java, JCP

    Read the article

  • JSR 360 and JSR 361: A Big Leap for Java ME 8

    - by terrencebarr
    It might have gone unnoticed to some, but Java ME took a big leap forward a couple of weeks ago with the filing of two new JSRs: JSR 360: “Connected Limited Device Configuration 8″ (aka CLDC 8) JSR 361: “Java ME Embedded Profile” (aka ME EP) Together, these two JSRs will significantly update, enhance, and modernize the Java ME platform, and specifically small embedded Java, with a host of new features and functionality. JSR 360 – Connected Limited Device Configuration 8 CLDC 8 is based on JSR 139 (CLDC 1.1) and updates the core Java ME VM, language support, libraries, and features to be aligned with Java SE 8. This will include: VM updated to comply with the JVM language specification version 2 Support for SE 7/8 language features like Generics, Assertions, Annotations, Try-with-Resources, and more New libraries such as Collections, NIO subset, Logging API subset A consolidated and enhanced Generic Connection Framework for multi-protocol I/O With CLDC 8, Java ME and Java SE are entering their next phase of alignment – making Java the only technology today that truly scales application development, code re-use, and tooling across the whole range of IT platforms, from small embedded to large enterprise. JSR 361 – Java ME Embedded Profile ME EP is based on JSR 228 (IMP-NG) and updates the specification in key areas to provide a powerful and flexible application environment for small embedded Java platforms, building on the features of CLDC 8:  A new, lightweight component and services model Shared libraries Multi-application concurrency, inter-application communication, and event system Application management API optionality, to address low-footprint use cases With ME EP, application developers will have a modern application environment which allows development and deployment of  modular, robust, sophisticated, and footprint-optimized solutions for a wide range of embedded use cases and devices. Summary While these JSRs are still under development, it’s clear that there are exciting new times ahead for Java ME – turning into a serious application platform while maintaining the focus on resource-constrained devices to address the expected explosion of small, smart, and connected embedded platforms. To learn more, click on the above links for JSR 360 and JSR 361. Or review the JavaOne 2012 online presentations on the topic: CON11300: Expanding the reach of the Java ME Platform CON5943: Java ME 8 Service Platform And stay tuned for more in this space! Cheers, – Terrence Filed under: Mobile & Embedded Tagged: "jsr 360", "jsr 361", "me 8", embedded, Embedded Java, JCP

    Read the article

  • Thread placement policies on NUMA systems - update

    - by Dave
    In a prior blog entry I noted that Solaris used a "maximum dispersal" placement policy to assign nascent threads to their initial processors. The general idea is that threads should be placed as far away from each other as possible in the resource topology in order to reduce resource contention between concurrently running threads. This policy assumes that resource contention -- pipelines, memory channel contention, destructive interference in the shared caches, etc -- will likely outweigh (a) any potential communication benefits we might achieve by packing our threads more densely onto a subset of the NUMA nodes, and (b) benefits of NUMA affinity between memory allocated by one thread and accessed by other threads. We want our threads spread widely over the system and not packed together. Conceptually, when placing a new thread, the kernel picks the least loaded node NUMA node (the node with lowest aggregate load average), and then the least loaded core on that node, etc. Furthermore, the kernel places threads onto resources -- sockets, cores, pipelines, etc -- without regard to the thread's process membership. That is, initial placement is process-agnostic. Keep reading, though. This description is incorrect. On Solaris 10 on a SPARC T5440 with 4 x T2+ NUMA nodes, if the system is otherwise unloaded and we launch a process that creates 20 compute-bound concurrent threads, then typically we'll see a perfect balance with 5 threads on each node. We see similar behavior on an 8-node x86 x4800 system, where each node has 8 cores and each core is 2-way hyperthreaded. So far so good; this behavior seems in agreement with the policy I described in the 1st paragraph. I recently tried the same experiment on a 4-node T4-4 running Solaris 11. Both the T5440 and T4-4 are 4-node systems that expose 256 logical thread contexts. To my surprise, all 20 threads were placed onto just one NUMA node while the other 3 nodes remained completely idle. I checked the usual suspects such as processor sets inadvertently left around by colleagues, processors left offline, and power management policies, but the system was configured normally. I then launched multiple concurrent instances of the process, and, interestingly, all the threads from the 1st process landed on one node, all the threads from the 2nd process landed on another node, and so on. This happened even if I interleaved thread creating between the processes, so I was relatively sure the effect didn't related to thread creation time, but rather that placement was a function of process membership. I this point I consulted the Solaris sources and talked with folks in the Solaris group. The new Solaris 11 behavior is intentional. The kernel is no longer using a simple maximum dispersal policy, and thread placement is process membership-aware. Now, even if other nodes are completely unloaded, the kernel will still try to pack new threads onto the home lgroup (socket) of the primordial thread until the load average of that node reaches 50%, after which it will pick the next least loaded node as the process's new favorite node for placement. On the T4-4 we have 64 logical thread contexts (strands) per socket (lgroup), so if we launch 48 concurrent threads we will find 32 placed on one node and 16 on some other node. If we launch 64 threads we'll find 32 and 32. That means we can end up with our threads clustered on a small subset of the nodes in a way that's quite different that what we've seen on Solaris 10. So we have a policy that allows process-aware packing but reverts to spreading threads onto other nodes if a node becomes too saturated. It turns out this policy was enabled in Solaris 10, but certain bugs suppressed the mixed packing/spreading behavior. There are configuration variables in /etc/system that allow us to dial the affinity between nascent threads and their primordial thread up and down: see lgrp_expand_proc_thresh, specifically. In the OpenSolaris source code the key routine is mpo_update_tunables(). This method reads the /etc/system variables and sets up some global variables that will subsequently be used by the dispatcher, which calls lgrp_choose() in lgrp.c to place nascent threads. Lgrp_expand_proc_thresh controls how loaded an lgroup must be before we'll consider homing a process's threads to another lgroup. Tune this value lower to have it spread your process's threads out more. To recap, the 'new' policy is as follows. Threads from the same process are packed onto a subset of the strands of a socket (50% for T-series). Once that socket reaches the 50% threshold the kernel then picks another preferred socket for that process. Threads from unrelated processes are spread across sockets. More precisely, different processes may have different preferred sockets (lgroups). Beware that I've simplified and elided details for the purposes of explication. The truth is in the code. Remarks: It's worth noting that initial thread placement is just that. If there's a gross imbalance between the load on different nodes then the kernel will migrate threads to achieve a better and more even distribution over the set of available nodes. Once a thread runs and gains some affinity for a node, however, it becomes "stickier" under the assumption that the thread has residual cache residency on that node, and that memory allocated by that thread resides on that node given the default "first-touch" page-level NUMA allocation policy. Exactly how the various policies interact and which have precedence under what circumstances could the topic of a future blog entry. The scheduler is work-conserving. The x4800 mentioned above is an interesting system. Each of the 8 sockets houses an Intel 7500-series processor. Each processor has 3 coherent QPI links and the system is arranged as a glueless 8-socket twisted ladder "mobius" topology. Nodes are either 1 or 2 hops distant over the QPI links. As an aside the mapping of logical CPUIDs to physical resources is rather interesting on Solaris/x4800. On SPARC/Solaris the CPUID layout is strictly geographic, with the highest order bits identifying the socket, the next lower bits identifying the core within that socket, following by the pipeline (if present) and finally the logical thread context ("strand") on the core. But on Solaris on the x4800 the CPUID layout is as follows. [6:6] identifies the hyperthread on a core; bits [5:3] identify the socket, or package in Intel terminology; bits [2:0] identify the core within a socket. Such low-level details should be of interest only if you're binding threads -- a bad idea, the kernel typically handles placement best -- or if you're writing NUMA-aware code that's aware of the ambient placement and makes decisions accordingly. Solaris introduced the so-called critical-threads mechanism, which is expressed by putting a thread into the FX scheduling class at priority 60. The critical-threads mechanism applies to placement on cores, not on sockets, however. That is, it's an intra-socket policy, not an inter-socket policy. Solaris 11 introduces the Power Aware Dispatcher (PAD) which packs threads instead of spreading them out in an attempt to be able to keep sockets or cores at lower power levels. Maximum dispersal may be good for performance but is anathema to power management. PAD is off by default, but power management polices constitute yet another confounding factor with respect to scheduling and dispatching. If your threads communicate heavily -- one thread reads cache lines last written by some other thread -- then the new dense packing policy may improve performance by reducing traffic on the coherent interconnect. On the other hand if your threads in your process communicate rarely, then it's possible the new packing policy might result on contention on shared computing resources. Unfortunately there's no simple litmus test that says whether packing or spreading is optimal in a given situation. The answer varies by system load, application, number of threads, and platform hardware characteristics. Currently we don't have the necessary tools and sensoria to decide at runtime, so we're reduced to an empirical approach where we run trials and try to decide on a placement policy. The situation is quite frustrating. Relatedly, it's often hard to determine just the right level of concurrency to optimize throughput. (Understanding constructive vs destructive interference in the shared caches would be a good start. We could augment the lines with a small tag field indicating which strand last installed or accessed a line. Given that, we could augment the CPU with performance counters for misses where a thread evicts a line it installed vs misses where a thread displaces a line installed by some other thread.)

    Read the article

  • Create Ubuntu repository on CentOS server with debmirror

    - by Wilco Groothand
    I want to create a UBUNTU repo mirror on my CentOS reposerver. I read about it and came to the conclusion that for our purpose debmirror was the correct solution, because I then could mirror a subset of the total repository. The problem is that with debmirror I run into the gpg key errors. Already solved in Ubuntu, but the apt-key solutions are not valid within CentOS. The command does not exists. I am totally stuck.

    Read the article

  • Does GNC mean the death of Internet Explorer?

    - by Monika Michael
    From the wikipedia - Google Native Client (NaCl) is a sandboxing technology for running a subset of Intel x86 or ARM native code using software-based fault isolation. It is proposed for safely running native code from a web browser, allowing web-based applications to run at near-native speeds. (Emphasis mine) (Source) Compiled C++ code running in a browser? Are other companies working on a similar offering? What would it mean for the browser landscape?

    Read the article

  • Building dynamic OLAP data marts on-the-fly

    - by DrJohn
    At the forthcoming SQLBits conference, I will be presenting a session on how to dynamically build an OLAP data mart on-the-fly. This blog entry is intended to clarify exactly what I mean by an OLAP data mart, why you may need to build them on-the-fly and finally outline the steps needed to build them dynamically. In subsequent blog entries, I will present exactly how to implement some of the techniques involved. What is an OLAP data mart? In data warehousing parlance, a data mart is a subset of the overall corporate data provided to business users to meet specific business needs. Of course, the term does not specify the technology involved, so I coined the term "OLAP data mart" to identify a subset of data which is delivered in the form of an OLAP cube which may be accompanied by the relational database upon which it was built. To clarify, the relational database is specifically create and loaded with the subset of data and then the OLAP cube is built and processed to make the data available to the end-users via standard OLAP client tools. Why build OLAP data marts? Market research companies sell data to their clients to make money. To gain competitive advantage, market research providers like to "add value" to their data by providing systems that enhance analytics, thereby allowing clients to make best use of the data. As such, OLAP cubes have become a standard way of delivering added value to clients. They can be built on-the-fly to hold specific data sets and meet particular needs and then hosted on a secure intranet site for remote access, or shipped to clients' own infrastructure for hosting. Even better, they support a wide range of different tools for analytical purposes, including the ever popular Microsoft Excel. Extension Attributes: The Challenge One of the key challenges in building multiple OLAP data marts based on the same 'template' is handling extension attributes. These are attributes that meet the client's specific reporting needs, but do not form part of the standard template. Now clearly, these extension attributes have to come into the system via additional files and ultimately be added to relational tables so they can end up in the OLAP cube. However, processing these files and filling dynamically altered tables with SSIS is a challenge as SSIS packages tend to break as soon as the database schema changes. There are two approaches to this: (1) dynamically build an SSIS package in memory to match the new database schema using C#, or (2) have the extension attributes provided as name/value pairs so the file's schema does not change and can easily be loaded using SSIS. The problem with the first approach is the complexity of writing an awful lot of complex C# code. The problem of the second approach is that name/value pairs are useless to an OLAP cube; so they have to be pivoted back into a proper relational table somewhere in the data load process WITHOUT breaking SSIS. How this can be done will be part of future blog entry. What is involved in building an OLAP data mart? There are a great many steps involved in building OLAP data marts on-the-fly. The key point is that all the steps must be automated to allow for the production of multiple OLAP data marts per day (i.e. many thousands, each with its own specific data set and attributes). Now most of these steps have a great deal in common with standard data warehouse practices. The key difference is that the databases are all built to order. The only permanent database is the metadata database (shown in orange) which holds all the metadata needed to build everything else (i.e. client orders, configuration information, connection strings, client specific requirements and attributes etc.). The staging database (shown in red) has a short life: it is built, populated and then ripped down as soon as the OLAP Data Mart has been populated. In the diagram below, the OLAP data mart comprises the two blue components: the Data Mart which is a relational database and the OLAP Cube which is an OLAP database implemented using Microsoft Analysis Services (SSAS). The client may receive just the OLAP cube or both components together depending on their reporting requirements.  So, in broad terms the steps required to fulfil a client order are as follows: Step 1: Prepare metadata Create a set of database names unique to the client's order Modify all package connection strings to be used by SSIS to point to new databases and file locations. Step 2: Create relational databases Create the staging and data mart relational databases using dynamic SQL and set the database recovery mode to SIMPLE as we do not need the overhead of logging anything Execute SQL scripts to build all database objects (tables, views, functions and stored procedures) in the two databases Step 3: Load staging database Use SSIS to load all data files into the staging database in a parallel operation Load extension files containing name/value pairs. These will provide client-specific attributes in the OLAP cube. Step 4: Load data mart relational database Load the data from staging into the data mart relational database, again in parallel where possible Allocate surrogate keys and use SSIS to perform surrogate key lookup during the load of fact tables Step 5: Load extension tables & attributes Pivot the extension attributes from their native name/value pairs into proper relational tables Add the extension attributes to the views used by OLAP cube Step 6: Deploy & Process OLAP cube Deploy the OLAP database directly to the server using a C# script task in SSIS Modify the connection string used by the OLAP cube to point to the data mart relational database Modify the cube structure to add the extension attributes to both the data source view and the relevant dimensions Remove any standard attributes that not required Process the OLAP cube Step 7: Backup and drop databases Drop staging database as it is no longer required Backup data mart relational and OLAP database and ship these to the client's infrastructure Drop data mart relational and OLAP database from the build server Mark order complete Start processing the next order, ad infinitum. So my future blog posts and my forthcoming session at the SQLBits conference will all focus on some of the more interesting aspects of building OLAP data marts on-the-fly such as handling the load of extension attributes and how to dynamically alter the structure of an OLAP cube using C#.

    Read the article

  • Stairway to XML: Level 4 - Querying XML Data

    You can extract a subset of data from an XML instance by using the query() method, and you can use the value() method to retrieve individual element and attribute values from an XML instance. SQL Monitor v3 is even more powerfulUse custom metrics to monitor and alert on data that's most important for your environment, easily imported from our custom metrics site. Find out more.

    Read the article

  • Nautilus and file command in 11.04 don't show metadata for WebM files

    - by Pili
    The file-name extension .webm is used for media files using the WebM multimedia format, which consists of the WebM container (a subset of the Matroska container) and audio and video streams with independet enconding and quality settings. Description of the issue: For files in the WebM format, the program file says that files are raw data, instead of determining and displaying the real file-format, which is WebM. Besides, Nautilus doesn't display the technical metadata of files in this format. Why is the file program not displaying the file format for WebM files?

    Read the article

  • Does NaCl mean the death of Internet Explorer? [closed]

    - by Monika Michael
    From the wikipedia - Google Native Client (NaCl) is a sandboxing technology for running a subset of Intel x86 or ARM native code using software-based fault isolation. It is proposed for safely running native code from a web browser, allowing web-based applications to run at near-native speeds. (Emphasis mine) (Source) Compiled C++ code running in a browser? Are other companies working on a similar offering? What would it mean for the browser landscape?

    Read the article

  • What are good design practices when working with Entity Framework

    - by AD
    This will apply mostly for an asp.net application where the data is not accessed via soa. Meaning that you get access to the objects loaded from the framework, not Transfer Objects, although some recommendation still apply. This is a community post, so please add to it as you see fit. Applies to: Entity Framework 1.0 shipped with Visual Studio 2008 sp1. Why pick EF in the first place? Considering it is a young technology with plenty of problems (see below), it may be a hard sell to get on the EF bandwagon for your project. However, it is the technology Microsoft is pushing (at the expense of Linq2Sql, which is a subset of EF). In addition, you may not be satisfied with NHibernate or other solutions out there. Whatever the reasons, there are people out there (including me) working with EF and life is not bad.make you think. EF and inheritance The first big subject is inheritance. EF does support mapping for inherited classes that are persisted in 2 ways: table per class and table the hierarchy. The modeling is easy and there are no programming issues with that part. (The following applies to table per class model as I don't have experience with table per hierarchy, which is, anyway, limited.) The real problem comes when you are trying to run queries that include one or many objects that are part of an inheritance tree: the generated sql is incredibly awful, takes a long time to get parsed by the EF and takes a long time to execute as well. This is a real show stopper. Enough that EF should probably not be used with inheritance or as little as possible. Here is an example of how bad it was. My EF model had ~30 classes, ~10 of which were part of an inheritance tree. On running a query to get one item from the Base class, something as simple as Base.Get(id), the generated SQL was over 50,000 characters. Then when you are trying to return some Associations, it degenerates even more, going as far as throwing SQL exceptions about not being able to query more than 256 tables at once. Ok, this is bad, EF concept is to allow you to create your object structure without (or with as little as possible) consideration on the actual database implementation of your table. It completely fails at this. So, recommendations? Avoid inheritance if you can, the performance will be so much better. Use it sparingly where you have to. In my opinion, this makes EF a glorified sql-generation tool for querying, but there are still advantages to using it. And ways to implement mechanism that are similar to inheritance. Bypassing inheritance with Interfaces First thing to know with trying to get some kind of inheritance going with EF is that you cannot assign a non-EF-modeled class a base class. Don't even try it, it will get overwritten by the modeler. So what to do? You can use interfaces to enforce that classes implement some functionality. For example here is a IEntity interface that allow you to define Associations between EF entities where you don't know at design time what the type of the entity would be. public enum EntityTypes{ Unknown = -1, Dog = 0, Cat } public interface IEntity { int EntityID { get; } string Name { get; } Type EntityType { get; } } public partial class Dog : IEntity { // implement EntityID and Name which could actually be fields // from your EF model Type EntityType{ get{ return EntityTypes.Dog; } } } Using this IEntity, you can then work with undefined associations in other classes // lets take a class that you defined in your model. // that class has a mapping to the columns: PetID, PetType public partial class Person { public IEntity GetPet() { return IEntityController.Get(PetID,PetType); } } which makes use of some extension functions: public class IEntityController { static public IEntity Get(int id, EntityTypes type) { switch (type) { case EntityTypes.Dog: return Dog.Get(id); case EntityTypes.Cat: return Cat.Get(id); default: throw new Exception("Invalid EntityType"); } } } Not as neat as having plain inheritance, particularly considering you have to store the PetType in an extra database field, but considering the performance gains, I would not look back. It also cannot model one-to-many, many-to-many relationship, but with creative uses of 'Union' it could be made to work. Finally, it creates the side effet of loading data in a property/function of the object, which you need to be careful about. Using a clear naming convention like GetXYZ() helps in that regards. Compiled Queries Entity Framework performance is not as good as direct database access with ADO (obviously) or Linq2SQL. There are ways to improve it however, one of which is compiling your queries. The performance of a compiled query is similar to Linq2Sql. What is a compiled query? It is simply a query for which you tell the framework to keep the parsed tree in memory so it doesn't need to be regenerated the next time you run it. So the next run, you will save the time it takes to parse the tree. Do not discount that as it is a very costly operation that gets even worse with more complex queries. There are 2 ways to compile a query: creating an ObjectQuery with EntitySQL and using CompiledQuery.Compile() function. (Note that by using an EntityDataSource in your page, you will in fact be using ObjectQuery with EntitySQL, so that gets compiled and cached). An aside here in case you don't know what EntitySQL is. It is a string-based way of writing queries against the EF. Here is an example: "select value dog from Entities.DogSet as dog where dog.ID = @ID". The syntax is pretty similar to SQL syntax. You can also do pretty complex object manipulation, which is well explained [here][1]. Ok, so here is how to do it using ObjectQuery< string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault(); The first time you run this query, the framework will generate the expression tree and keep it in memory. So the next time it gets executed, you will save on that costly step. In that example EnablePlanCaching = true, which is unnecessary since that is the default option. The other way to compile a query for later use is the CompiledQuery.Compile method. This uses a delegate: static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => ctx.DogSet.FirstOrDefault(it => it.ID == id)); or using linq static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => (from dog in ctx.DogSet where dog.ID == id select dog).FirstOrDefault()); to call the query: query_GetDog.Invoke( YourContext, id ); The advantage of CompiledQuery is that the syntax of your query is checked at compile time, where as EntitySQL is not. However, there are other consideration... Includes Lets say you want to have the data for the dog owner to be returned by the query to avoid making 2 calls to the database. Easy to do, right? EntitySQL string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)).Include("Owner"); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault(); CompiledQuery static readonly Func<Entities, int, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, Dog>((ctx, id) => (from dog in ctx.DogSet.Include("Owner") where dog.ID == id select dog).FirstOrDefault()); Now, what if you want to have the Include parametrized? What I mean is that you want to have a single Get() function that is called from different pages that care about different relationships for the dog. One cares about the Owner, another about his FavoriteFood, another about his FavotireToy and so on. Basicly, you want to tell the query which associations to load. It is easy to do with EntitySQL public Dog Get(int id, string include) { string query = "select value dog " + "from Entities.DogSet as dog " + "where dog.ID = @ID"; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)) .IncludeMany(include); oQuery.Parameters.Add(new ObjectParameter("ID", id)); oQuery.EnablePlanCaching = true; return oQuery.FirstOrDefault(); } The include simply uses the passed string. Easy enough. Note that it is possible to improve on the Include(string) function (that accepts only a single path) with an IncludeMany(string) that will let you pass a string of comma-separated associations to load. Look further in the extension section for this function. If we try to do it with CompiledQuery however, we run into numerous problems: The obvious static readonly Func<Entities, int, string, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) => (from dog in ctx.DogSet.Include(include) where dog.ID == id select dog).FirstOrDefault()); will choke when called with: query_GetDog.Invoke( YourContext, id, "Owner,FavoriteFood" ); Because, as mentionned above, Include() only wants to see a single path in the string and here we are giving it 2: "Owner" and "FavoriteFood" (which is not to be confused with "Owner.FavoriteFood"!). Then, let's use IncludeMany(), which is an extension function static readonly Func<Entities, int, string, Dog> query_GetDog = CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) => (from dog in ctx.DogSet.IncludeMany(include) where dog.ID == id select dog).FirstOrDefault()); Wrong again, this time it is because the EF cannot parse IncludeMany because it is not part of the functions that is recognizes: it is an extension. Ok, so you want to pass an arbitrary number of paths to your function and Includes() only takes a single one. What to do? You could decide that you will never ever need more than, say 20 Includes, and pass each separated strings in a struct to CompiledQuery. But now the query looks like this: from dog in ctx.DogSet.Include(include1).Include(include2).Include(include3) .Include(include4).Include(include5).Include(include6) .[...].Include(include19).Include(include20) where dog.ID == id select dog which is awful as well. Ok, then, but wait a minute. Can't we return an ObjectQuery< with CompiledQuery? Then set the includes on that? Well, that what I would have thought so as well: static readonly Func<Entities, int, ObjectQuery<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, ObjectQuery<Dog>>((ctx, id) => (ObjectQuery<Dog>)(from dog in ctx.DogSet where dog.ID == id select dog)); public Dog GetDog( int id, string include ) { ObjectQuery<Dog> oQuery = query_GetDog(id); oQuery = oQuery.IncludeMany(include); return oQuery.FirstOrDefault; } That should have worked, except that when you call IncludeMany (or Include, Where, OrderBy...) you invalidate the cached compiled query because it is an entirely new one now! So, the expression tree needs to be reparsed and you get that performance hit again. So what is the solution? You simply cannot use CompiledQueries with parametrized Includes. Use EntitySQL instead. This doesn't mean that there aren't uses for CompiledQueries. It is great for localized queries that will always be called in the same context. Ideally CompiledQuery should always be used because the syntax is checked at compile time, but due to limitation, that's not possible. An example of use would be: you may want to have a page that queries which two dogs have the same favorite food, which is a bit narrow for a BusinessLayer function, so you put it in your page and know exactly what type of includes are required. Passing more than 3 parameters to a CompiledQuery Func is limited to 5 parameters, of which the last one is the return type and the first one is your Entities object from the model. So that leaves you with 3 parameters. A pitance, but it can be improved on very easily. public struct MyParams { public string param1; public int param2; public DateTime param3; } static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) => from dog in ctx.DogSet where dog.Age == myParams.param2 && dog.Name == myParams.param1 and dog.BirthDate > myParams.param3 select dog); public List<Dog> GetSomeDogs( int age, string Name, DateTime birthDate ) { MyParams myParams = new MyParams(); myParams.param1 = name; myParams.param2 = age; myParams.param3 = birthDate; return query_GetDog(YourContext,myParams).ToList(); } Return Types (this does not apply to EntitySQL queries as they aren't compiled at the same time during execution as the CompiledQuery method) Working with Linq, you usually don't force the execution of the query until the very last moment, in case some other functions downstream wants to change the query in some way: static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) => from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog); public IEnumerable<Dog> GetSomeDogs( int age, string name ) { return query_GetDog(YourContext,age,name); } public void DataBindStuff() { IEnumerable<Dog> dogs = GetSomeDogs(4,"Bud"); // but I want the dogs ordered by BirthDate gridView.DataSource = dogs.OrderBy( it => it.BirthDate ); } What is going to happen here? By still playing with the original ObjectQuery (that is the actual return type of the Linq statement, which implements IEnumerable), it will invalidate the compiled query and be force to re-parse. So, the rule of thumb is to return a List< of objects instead. static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) => from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog); public List<Dog> GetSomeDogs( int age, string name ) { return query_GetDog(YourContext,age,name).ToList(); //<== change here } public void DataBindStuff() { List<Dog> dogs = GetSomeDogs(4,"Bud"); // but I want the dogs ordered by BirthDate gridView.DataSource = dogs.OrderBy( it => it.BirthDate ); } When you call ToList(), the query gets executed as per the compiled query and then, later, the OrderBy is executed against the objects in memory. It may be a little bit slower, but I'm not even sure. One sure thing is that you have no worries about mis-handling the ObjectQuery and invalidating the compiled query plan. Once again, that is not a blanket statement. ToList() is a defensive programming trick, but if you have a valid reason not to use ToList(), go ahead. There are many cases in which you would want to refine the query before executing it. Performance What is the performance impact of compiling a query? It can actually be fairly large. A rule of thumb is that compiling and caching the query for reuse takes at least double the time of simply executing it without caching. For complex queries (read inherirante), I have seen upwards to 10 seconds. So, the first time a pre-compiled query gets called, you get a performance hit. After that first hit, performance is noticeably better than the same non-pre-compiled query. Practically the same as Linq2Sql When you load a page with pre-compiled queries the first time you will get a hit. It will load in maybe 5-15 seconds (obviously more than one pre-compiled queries will end up being called), while subsequent loads will take less than 300ms. Dramatic difference, and it is up to you to decide if it is ok for your first user to take a hit or you want a script to call your pages to force a compilation of the queries. Can this query be cached? { Dog dog = from dog in YourContext.DogSet where dog.ID == id select dog; } No, ad-hoc Linq queries are not cached and you will incur the cost of generating the tree every single time you call it. Parametrized Queries Most search capabilities involve heavily parametrized queries. There are even libraries available that will let you build a parametrized query out of lamba expressions. The problem is that you cannot use pre-compiled queries with those. One way around that is to map out all the possible criteria in the query and flag which one you want to use: public struct MyParams { public string name; public bool checkName; public int age; public bool checkAge; } static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog = CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) => from dog in ctx.DogSet where (myParams.checkAge == true && dog.Age == myParams.age) && (myParams.checkName == true && dog.Name == myParams.name ) select dog); protected List<Dog> GetSomeDogs() { MyParams myParams = new MyParams(); myParams.name = "Bud"; myParams.checkName = true; myParams.age = 0; myParams.checkAge = false; return query_GetDog(YourContext,myParams).ToList(); } The advantage here is that you get all the benifits of a pre-compiled quert. The disadvantages are that you most likely will end up with a where clause that is pretty difficult to maintain, that you will incur a bigger penalty for pre-compiling the query and that each query you run is not as efficient as it could be (particularly with joins thrown in). Another way is to build an EntitySQL query piece by piece, like we all did with SQL. protected List<Dod> GetSomeDogs( string name, int age) { string query = "select value dog from Entities.DogSet where 1 = 1 "; if( !String.IsNullOrEmpty(name) ) query = query + " and dog.Name == @Name "; if( age > 0 ) query = query + " and dog.Age == @Age "; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext ); if( !String.IsNullOrEmpty(name) ) oQuery.Parameters.Add( new ObjectParameter( "Name", name ) ); if( age > 0 ) oQuery.Parameters.Add( new ObjectParameter( "Age", age ) ); return oQuery.ToList(); } Here the problems are: - there is no syntax checking during compilation - each different combination of parameters generate a different query which will need to be pre-compiled when it is first run. In this case, there are only 4 different possible queries (no params, age-only, name-only and both params), but you can see that there can be way more with a normal world search. - Noone likes to concatenate strings! Another option is to query a large subset of the data and then narrow it down in memory. This is particularly useful if you are working with a definite subset of the data, like all the dogs in a city. You know there are a lot but you also know there aren't that many... so your CityDog search page can load all the dogs for the city in memory, which is a single pre-compiled query and then refine the results protected List<Dod> GetSomeDogs( string name, int age, string city) { string query = "select value dog from Entities.DogSet where dog.Owner.Address.City == @City "; ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext ); oQuery.Parameters.Add( new ObjectParameter( "City", city ) ); List<Dog> dogs = oQuery.ToList(); if( !String.IsNullOrEmpty(name) ) dogs = dogs.Where( it => it.Name == name ); if( age > 0 ) dogs = dogs.Where( it => it.Age == age ); return dogs; } It is particularly useful when you start displaying all the data then allow for filtering. Problems: - Could lead to serious data transfer if you are not careful about your subset. - You can only filter on the data that you returned. It means that if you don't return the Dog.Owner association, you will not be able to filter on the Dog.Owner.Name So what is the best solution? There isn't any. You need to pick the solution that works best for you and your problem: - Use lambda-based query building when you don't care about pre-compiling your queries. - Use fully-defined pre-compiled Linq query when your object structure is not too complex. - Use EntitySQL/string concatenation when the structure could be complex and when the possible number of different resulting queries are small (which means fewer pre-compilation hits). - Use in-memory filtering when you are working with a smallish subset of the data or when you had to fetch all of the data on the data at first anyway (if the performance is fine with all the data, then filtering in memory will not cause any time to be spent in the db). Singleton access The best way to deal with your context and entities accross all your pages is to use the singleton pattern: public sealed class YourContext { private const string instanceKey = "On3GoModelKey"; YourContext(){} public static YourEntities Instance { get { HttpContext context = HttpContext.Current; if( context == null ) return Nested.instance; if (context.Items[instanceKey] == null) { On3GoEntities entity = new On3GoEntities(); context.Items[instanceKey] = entity; } return (YourEntities)context.Items[instanceKey]; } } class Nested { // Explicit static constructor to tell C# compiler // not to mark type as beforefieldinit static Nested() { } internal static readonly YourEntities instance = new YourEntities(); } } NoTracking, is it worth it? When executing a query, you can tell the framework to track the objects it will return or not. What does it mean? With tracking enabled (the default option), the framework will track what is going on with the object (has it been modified? Created? Deleted?) and will also link objects together, when further queries are made from the database, which is what is of interest here. For example, lets assume that Dog with ID == 2 has an owner which ID == 10. Dog dog = (from dog in YourContext.DogSet where dog.ID == 2 select dog).FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; Person owner = (from o in YourContext.PersonSet where o.ID == 10 select dog).FirstOrDefault(); //dog.OwnerReference.IsLoaded == true; If we were to do the same with no tracking, the result would be different. ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>) (from dog in YourContext.DogSet where dog.ID == 2 select dog); oDogQuery.MergeOption = MergeOption.NoTracking; Dog dog = oDogQuery.FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>) (from o in YourContext.PersonSet where o.ID == 10 select o); oPersonQuery.MergeOption = MergeOption.NoTracking; Owner owner = oPersonQuery.FirstOrDefault(); //dog.OwnerReference.IsLoaded == false; Tracking is very useful and in a perfect world without performance issue, it would always be on. But in this world, there is a price for it, in terms of performance. So, should you use NoTracking to speed things up? It depends on what you are planning to use the data for. Is there any chance that the data your query with NoTracking can be used to make update/insert/delete in the database? If so, don't use NoTracking because associations are not tracked and will causes exceptions to be thrown. In a page where there are absolutly no updates to the database, you can use NoTracking. Mixing tracking and NoTracking is possible, but it requires you to be extra careful with updates/inserts/deletes. The problem is that if you mix then you risk having the framework trying to Attach() a NoTracking object to the context where another copy of the same object exist with tracking on. Basicly, what I am saying is that Dog dog1 = (from dog in YourContext.DogSet where dog.ID == 2).FirstOrDefault(); ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>) (from dog in YourContext.DogSet where dog.ID == 2 select dog); oDogQuery.MergeOption = MergeOption.NoTracking; Dog dog2 = oDogQuery.FirstOrDefault(); dog1 and dog2 are 2 different objects, one tracked and one not. Using the detached object in an update/insert will force an Attach() that will say "Wait a minute, I do already have an object here with the same database key. Fail". And when you Attach() one object, all of its hierarchy gets attached as well, causing problems everywhere. Be extra careful. How much faster is it with NoTracking It depends on the queries. Some are much more succeptible to tracking than other. I don't have a fast an easy rule for it, but it helps. So I should use NoTracking everywhere then? Not exactly. There are some advantages to tracking object. The first one is that the object is cached, so subsequent call for that object will not hit the database. That cache is only valid for the lifetime of the YourEntities object, which, if you use the singleton code above, is the same as the page lifetime. One page request == one YourEntity object. So for multiple calls for the same object, it will load only once per page request. (Other caching mechanism could extend that). What happens when you are using NoTracking and try to load the same object multiple times? The database will be queried each time, so there is an impact there. How often do/should you call for the same object during a single page request? As little as possible of course, but it does happens. Also remember the piece above about having the associations connected automatically for your? You don't have that with NoTracking, so if you load your data in multiple batches, you will not have a link to between them: ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)(from dog in YourContext.DogSet select dog); oDogQuery.MergeOption = MergeOption.NoTracking; List<Dog> dogs = oDogQuery.ToList(); ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>)(from o in YourContext.PersonSet select o); oPersonQuery.MergeOption = MergeOption.NoTracking; List<Person> owners = oPersonQuery.ToList(); In this case, no dog will have its .Owner property set. Some things to keep in mind when you are trying to optimize the performance. No lazy loading, what am I to do? This can be seen as a blessing in disguise. Of course it is annoying to load everything manually. However, it decreases the number of calls to the db and forces you to think about when you should load data. The more you can load in one database call the better. That was always true, but it is enforced now with this 'feature' of EF. Of course, you can call if( !ObjectReference.IsLoaded ) ObjectReference.Load(); if you want to, but a better practice is to force the framework to load the objects you know you will need in one shot. This is where the discussion about parametrized Includes begins to make sense. Lets say you have you Dog object public class Dog { public Dog Get(int id) { return YourContext.DogSet.FirstOrDefault(it => it.ID == id ); } } This is the type of function you work with all the time. It gets called from all over the place and once you have that Dog object, you will do very different things to it in different functions. First, it should be pre-compiled, because you will call that very often. Second, each different pages will want to have access to a different subset of the Dog data. Some will want the Owner, some the FavoriteToy, etc. Of course, you could call Load() for each reference you need anytime you need one. But that will generate a call to the database each time. Bad idea. So instead, each page will ask for the data it wants to see when it first request for the Dog object: static public Dog Get(int id) { return GetDog(entity,"");} static public Dog Get(int id, string includePath) { string query = "select value o " + " from YourEntities.DogSet as o " +

    Read the article

  • A Look Inside JSR 360 - CLDC 8

    - by Roger Brinkley
    If you didn't notice during JavaOne the Java Micro Edition took a major step forward in its consolidation with Java Standard Edition when JSR 360 was proposed to the JCP community. Over the last couple of years there has been a focus to move Java ME back in line with it's big brother Java SE. We see evidence of this in JCP itself which just recently merged the ME and SE/EE Executive Committees into a single Java Executive Committee. But just before that occurred JSR 360 was proposed and approved for development on October 29. So let's take a look at what changes are now being proposed. In a way JSR 360 is returning back to the original roots of Java ME when it was first introduced. It was indeed a subset of the JDK 4 language, but as Java progressed many of the language changes were not implemented in the Java ME. Back then the tradeoff was still a functionality, footprint trade off but the major market was feature phones. Today the market has changed and CLDC, while it will still target feature phones, will have it primary emphasis on embedded devices like wireless modules, smart meters, health care monitoring and other M2M devices. The major changes will come in three areas: language feature changes, library changes, and consolidating the Generic Connection Framework.  There have been three Java SE versions that have been implemented since JavaME was first developed so the language feature changes can be divided into changes that came in JDK 5 and those in JDK 7, which mostly consist of the project Coin changes. There were no language changes in JDK 6 but the changes from JDK 5 are: Assertions - Assertions enable you to test your assumptions about your program. For example, if you write a method that calculates the speed of a particle, you might assert that the calculated speed is less than the speed of light. In the example code below if the interval isn't between 0 and and 1,00 the an error of "Invalid value?" would be thrown. private void setInterval(int interval) { assert interval > 0 && interval <= 1000 : "Invalid value?" } Generics - Generics add stability to your code by making more of your bugs detectable at compile time. Code that uses generics has many benefits over non-generic code with: Stronger type checks at compile time. Elimination of casts. Enabling programming to implement generic algorithms. Enhanced for Loop - the enhanced for loop allows you to iterate through a collection without having to create an Iterator or without having to calculate beginning and end conditions for a counter variable. The enhanced for loop is the easiest of the new features to immediately incorporate in your code. In this tip you will see how the enhanced for loop replaces more traditional ways of sequentially accessing elements in a collection. void processList(Vector<string> list) { for (String item : list) { ... Autoboxing/Unboxing - This facility eliminates the drudgery of manual conversion between primitive types, such as int and wrapper types, such as Integer.  Hashtable<Integer, string=""> data = new Hashtable<>(); void add(int id, String value) { data.put(id, value); } Enumeration - Prior to JDK 5 enumerations were not typesafe, had no namespace, were brittle because they were compile time constants, and provided no informative print values. JDK 5 added support for enumerated types as a full-fledged class (dubbed an enum type). In addition to solving all the problems mentioned above, it allows you to add arbitrary methods and fields to an enum type, to implement arbitrary interfaces, and more. Enum types provide high-quality implementations of all the Object methods. They are Comparable and Serializable, and the serial form is designed to withstand arbitrary changes in the enum type. enum Season {WINTER, SPRING, SUMMER, FALL}; } private Season season; void setSeason(Season newSeason) { season = newSeason; } Varargs - Varargs eliminates the need for manually boxing up argument lists into an array when invoking methods that accept variable-length argument lists. The three periods after the final parameter's type indicate that the final argument may be passed as an array or as a sequence of arguments. Varargs can be used only in the final argument position. void warning(String format, String... parameters) { .. for(String p : parameters) { ...process(p);... } ... } Static Imports -The static import construct allows unqualified access to static members without inheriting from the type containing the static members. Instead, the program imports the members either individually or en masse. Once the static members have been imported, they may be used without qualification. The static import declaration is analogous to the normal import declaration. Where the normal import declaration imports classes from packages, allowing them to be used without package qualification, the static import declaration imports static members from classes, allowing them to be used without class qualification. import static data.Constants.RATIO; ... double r = Math.cos(RATIO * theta); Annotations - Annotations provide data about a program that is not part of the program itself. They have no direct effect on the operation of the code they annotate. There are a number of uses for annotations including information for the compiler, compiler-time and deployment-time processing, and run-time processing. They can be applied to a program's declarations of classes, fields, methods, and other program elements. @Deprecated public void clear(); The language changes from JDK 7 are little more familiar as they are mostly the changes from Project Coin: String in switch - Hey it only took us 18 years but the String class can be used in the expression of a switch statement. Fortunately for us it won't take that long for JavaME to adopt it. switch (arg) { case "-data": ... case "-out": ... Binary integral literals and underscores in numeric literals - Largely for readability, the integral types (byte, short, int, and long) can also be expressed using the binary number system. and any number of underscore characters (_) can appear anywhere between digits in a numerical literal. byte flags = 0b01001111; long mask = 0xfff0_ff08_4fff_0fffl; Multi-catch and more precise rethrow - A single catch block can handle more than one type of exception. In addition, the compiler performs more precise analysis of rethrown exceptions than earlier releases of Java SE. This enables you to specify more specific exception types in the throws clause of a method declaration. catch (IOException | InterruptedException ex) { logger.log(ex); throw ex; } Type Inference for Generic Instance Creation - Otherwise known as the diamond operator, the type arguments required to invoke the constructor of a generic class can be replaced with an empty set of type parameters (<>) as long as the compiler can infer the type arguments from the context.  map = new Hashtable<>(); Try-with-resource statement - The try-with-resources statement is a try statement that declares one or more resources. A resource is an object that must be closed after the program is finished with it. The try-with-resources statement ensures that each resource is closed at the end of the statement.  try (DataInputStream is = new DataInputStream(...)) { return is.readDouble(); } Simplified varargs method invocation - The Java compiler generates a warning at the declaration site of a varargs method or constructor with a non-reifiable varargs formal parameter. Java SE 7 introduced a compiler option -Xlint:varargs and the annotations @SafeVarargs and @SuppressWarnings({"unchecked", "varargs"}) to supress these warnings. On the library side there are new features that will be added to satisfy the language requirements above and some to improve the currently available set of APIs.  The library changes include: Collections update - New Collection, List, Set and Map, Iterable and Iteratator as well as implementations including Hashtable and Vector. Most of the work is too support generics String - New StringBuilder and CharSequence as well as a Stirng formatter. The javac compiler  now uses the the StringBuilder instead of String Buffer. Since StringBuilder is synchronized there is a performance increase which has necessitated the wahat String constructor works. Comparable interface - The comparable interface works with Collections, making it easier to reuse. Try with resources - Closeable and AutoCloseable Annotations - While support for Annotations is provided it will only be a compile time support. SuppressWarnings, Deprecated, Override NIO - There is a subset of NIO Buffer that have been in use on the of the graphics packages and needs to be pulled in and also support for NIO File IO subset. Platform extensibility via Service Providers (ServiceLoader) - ServiceLoader interface dos late bindings of interface to existing implementations. It helpe to package an interface and behavior of the implementation at a later point in time.Provider classes must have a zero-argument constructor so that they can be instantiated during loading. They are located and instantiated on demand and are identified via a provider-configuration file in the METAINF/services resource directory. This is a mechansim from Java SE. import com.XYZ.ServiceA; ServiceLoader<ServiceA> sl1= new ServiceLoader(ServiceA.class); Resources: META-INF/services/com.XYZ.ServiceA: ServiceAProvider1 ServiceAProvider2 ServiceAProvider3 META-INF/services/ServiceB: ServiceBProvider1 ServiceBProvider2 From JSR - I would rather use this list I think The Generic Connection Framework (GCF) was previously specified in a number of different JSRs including CLDC, MIDP, CDC 1.2, and JSR 197. JSR 360 represents a rare opportunity to consolidated and reintegrate parts that were duplicated in other specifications into a single specification, upgrade the APIs as well provide new functionality. The proposal is to specify a combined GCF specification that can be used with Java ME or Java SE and be backwards compatible with previous implementations. Because of size limitations as well as the complexity of the some features like InvokeDynamic and Unicode 6 will not be included. Additionally, any language or library changes in JDK 8 will be not be included. On the upside, with all the changes being made, backwards compatibility will still be maintained. JSR 360 is a major step forward for Java ME in terms of platform modernization, language alignment, and embedded support. If you're interested in following the progress of this JSR see the JSR's java.net project for details of the email lists, discussions groups.

    Read the article

  • Making the most of next weeks SharePoint 2010 developer training

    - by Eric Nelson
    [you can still register if you are free on the afternoons of 9th to 11th – UK time] We have 50+ registrations with more coming in – which is fantastic. Please read on to make the most of the training. Background We have structured the training to make sure that you can still learn lots during the three days even if you do not have SharePoint 2010 installed. Additionally the course is based around a subset of the channel 9 training to allow you to easily dig deeper or look again at specific areas. Which means if you have zero time between now and next Wednesday then you are still good to go. But if you can do some pre-work you will likely get even more out of the three days. Step 1: Check out the topics and resources available on-demand The course is based around a subset of the channel 9 training to allow you to easily dig deeper or look again at specific areas. Take a lap around the SharePoint 2010 Training Course on Channel 9 Download the SharePoint Developer Training Kit Step 2: Use a pre-configured Virtual Machine which you can download (best start today – it is large!) Consider using the VM we created If you don't have access to SharePoint 2010. You will need a 64bit host OS and bare minimum of 4GB of RAM. 8GB recommended. Virtual PC can not be used with this VM – Virtual PC only supports 32bit guests. The 2010-7a Information Worker VM gives you everything you need to develop for SharePoint 2010. Watch the Video on how to use this VM Download the VM Remember you only need to download the “parts” for the 2010-7a VM. There are 3 subtly different ways of using this VM: Easiest is to follow the advice of the video and get yourself a host OS of Windows Server 2008 R2 with Hyper-V and simply use the VM Alternatively you can take the VHD and create a “Boot to VHD” if you have Windows 7 Ultimate or Enterprise Edition. This works really well – especially if you are already familiar with “Boot to VHD” (This post I did will help you get started) Or you can take the VHD and use an alternative VM tool such as VirtualBox if you have a different host OS. NB: This tends to involve some work to get everything running fine. Check out parts 1 to 3 from Rolly and if you go with Virtual Box use an IDE controller not SATA. SATA will blue screen. Note in the screenshot below I also converted the vhd to a vmdk. I used the FREE Starwind Converter to do this whilst I was fighting blue screens – not sure its necessary as VirtualBox does now work with VHDs. or Step 3 – Install SharePoint 2010 on a 64bit Windows 7 or Vista Host I haven’t tried this but it is now supported. Check out MSDN. Final notes: I am in the process of securing a number of hosted VMs for ISVs directly managed by my team. Your Architect Evangelist will have details once I have them! Else we can sort out on the Wed. Regrettably I am unable to give folks 1:1 support on any issues around Boot to VHD, 3rd party VM products etc. Related Links: Check you are fully plugged into the work of my team – have you done these simple steps including joining our new LinkedIn group?

    Read the article

  • Efficiently separating Read/Compute/Write steps for concurrent processing of entities in Entity/Component systems

    - by TravisG
    Setup I have an entity-component architecture where Entities can have a set of attributes (which are pure data with no behavior) and there exist systems that run the entity logic which act on that data. Essentially, in somewhat pseudo-code: Entity { id; map<id_type, Attribute> attributes; } System { update(); vector<Entity> entities; } A system that just moves along all entities at a constant rate might be MovementSystem extends System { update() { for each entity in entities position = entity.attributes["position"]; position += vec3(1,1,1); } } Essentially, I'm trying to parallelise update() as efficiently as possible. This can be done by running entire systems in parallel, or by giving each update() of one system a couple of components so different threads can execute the update of the same system, but for a different subset of entities registered with that system. Problem In reality, these systems sometimes require that entities interact(/read/write data from/to) each other, sometimes within the same system (e.g. an AI system that reads state from other entities surrounding the current processed entity), but sometimes between different systems that depend on each other (i.e. a movement system that requires data from a system that processes user input). Now, when trying to parallelize the update phases of entity/component systems, the phases in which data (components/attributes) from Entities are read and used to compute something, and the phase where the modified data is written back to entities need to be separated in order to avoid data races. Otherwise the only way (not taking into account just "critical section"ing everything) to avoid them is to serialize parts of the update process that depend on other parts. This seems ugly. To me it would seem more elegant to be able to (ideally) have all processing running in parallel, where a system may read data from all entities as it wishes, but doesn't write modifications to that data back until some later point. The fact that this is even possible is based on the assumption that modification write-backs are usually very small in complexity, and don't require much performance, whereas computations are very expensive (relatively). So the overhead added by a delayed-write phase might be evened out by more efficient updating of entities (by having threads work more % of the time instead of waiting). A concrete example of this might be a system that updates physics. The system needs to both read and write a lot of data to and from entities. Optimally, there would be a system in place where all available threads update a subset of all entities registered with the physics system. In the case of the physics system this isn't trivially possible because of race conditions. So without a workaround, we would have to find other systems to run in parallel (which don't modify the same data as the physics system), other wise the remaining threads are waiting and wasting time. However, that has disadvantages Practically, the L3 cache is pretty much always better utilized when updating a large system with multiple threads, as opposed to multiple systems at once, which all act on different sets of data. Finding and assembling other systems to run in parallel can be extremely time consuming to design well enough to optimize performance. Sometimes, it might even not be possible at all because a system just depends on data that is touched by all other systems. Solution? In my thinking, a possible solution would be a system where reading/updating and writing of data is separated, so that in one expensive phase, systems only read data and compute what they need to compute, and then in a separate, performance-wise cheap, write phase, attributes of entities that needed to be modified are finally written back to the entities. The Question How might such a system be implemented to achieve optimal performance, as well as making programmer life easier? What are the implementation details of such a system and what might have to be changed in the existing EC-architecture to accommodate this solution?

    Read the article

  • Should I expose IObservable<T> on my interfaces?

    - by Alex
    My colleague and I have dispute. We are writing a .NET application that processes massive amounts of data. It receives data elements, groups subsets of them into blocks according to some criterion and processes those blocks. Let's say we have data items of type Foo arriving some source (from the network, for example) one by one. We wish to gather subsets of related objects of type Foo, construct an object of type Bar from each such subset and process objects of type Bar. One of us suggested the following design. Its main theme is exposing IObservable objects directly from the interfaces of our components. // ********* Interfaces ********** interface IFooSource { // this is the event-stream of objects of type Foo IObservable<Foo> FooArrivals { get; } } interface IBarSource { // this is the event-stream of objects of type Bar IObservable<Bar> BarArrivals { get; } } / ********* Implementations ********* class FooSource : IFooSource { // Here we put logic that receives Foo objects from the network and publishes them to the FooArrivals event stream. } class FooSubsetsToBarConverter : IBarSource { IFooSource fooSource; IObservable<Bar> BarArrivals { get { // Do some fancy Rx operators on fooSource.FooArrivals, like Buffer, Window, Join and others and return IObservable<Bar> } } } // this class will subscribe to the bar source and do processing class BarsProcessor { BarsProcessor(IBarSource barSource); void Subscribe(); } // ******************* Main ************************ class Program { public static void Main(string[] args) { var fooSource = FooSourceFactory.Create(); var barsProcessor = BarsProcessorFactory.Create(fooSource) // this will create FooSubsetToBarConverter and BarsProcessor barsProcessor.Subscribe(); fooSource.Run(); // this enters a loop of listening for Foo objects from the network and notifying about their arrival. } } The other suggested another design that its main theme is using our own publisher/subscriber interfaces and using Rx inside the implementations only when needed. //********** interfaces ********* interface IPublisher<T> { void Subscribe(ISubscriber<T> subscriber); } interface ISubscriber<T> { Action<T> Callback { get; } } //********** implementations ********* class FooSource : IPublisher<Foo> { public void Subscribe(ISubscriber<Foo> subscriber) { /* ... */ } // here we put logic that receives Foo objects from some source (the network?) publishes them to the registered subscribers } class FooSubsetsToBarConverter : ISubscriber<Foo>, IPublisher<Bar> { void Callback(Foo foo) { // here we put logic that aggregates Foo objects and publishes Bars when we have received a subset of Foos that match our criteria // maybe we use Rx here internally. } public void Subscribe(ISubscriber<Bar> subscriber) { /* ... */ } } class BarsProcessor : ISubscriber<Bar> { void Callback(Bar bar) { // here we put code that processes Bar objects } } //********** program ********* class Program { public static void Main(string[] args) { var fooSource = fooSourceFactory.Create(); var barsProcessor = barsProcessorFactory.Create(fooSource) // this will create BarsProcessor and perform all the necessary subscriptions fooSource.Run(); // this enters a loop of listening for Foo objects from the network and notifying about their arrival. } } Which one do you think is better? Exposing IObservable and making our components create new event streams from Rx operators, or defining our own publisher/subscriber interfaces and using Rx internally if needed? Here are some things to consider about the designs: In the first design the consumer of our interfaces has the whole power of Rx at his/her fingertips and can perform any Rx operators. One of us claims this is an advantage and the other claims that this is a drawback. The second design allows us to use any publisher/subscriber architecture under the hood. The first design ties us to Rx. If we wish to use the power of Rx, it requires more work in the second design because we need to translate the custom publisher/subscriber implementation to Rx and back. It requires writing glue code for every class that wishes to do some event processing.

    Read the article

  • Interpolation and Morphing of an image in labview and/or openCV

    - by Marc
    I am working on an image manipulation problem. I have an overhead projector that projects onto a screen, and I have a camera that takes pictures of that. I can establish a 1:1 correspondence between a subset of projector coordinates and a subset of camera pixels by projecting dots on the screen and finding the centers of mass of the resulting regions on the camera. I thus have a map proj_x, proj_y <-- cam_x, cam_y for scattered point pairs My original plan was to regularize this map using the Mathscript function griddata. This would work fine in MATLAB, as follows [pgridx, pgridy] = meshgrid(allprojxpts, allprojypts) fitcx = griddata (proj_x, proj_y, cam_x, pgridx, pgridy); fitcy = griddata (proj_x, proj_y, cam_y, pgridx, pgridy); and the reverse for the camera to projector mapping Unfortunately, this code causes Labview to run out of memory on the meshgrid step (the camera is 5 megapixels, which apparently is too much for labview to handle) I then started looking through openCV, and found the cvRemap function. Unfortunately, this function takes as its starting point a regularized pixel-pixel map like the one I was trying to generate above. However, it made me hope that functions for creating such a map might be available in openCV. I couldn't find it in the openCV 1.0 API (I am stuck with 1.0 for legacy reasons), but I was hoping it's there or that someone has an easy trick. So my question is one of the following 1) How can I interpolate from scattered points to a grid in openCV; (i.e., given z = f(x,y) for scattered values of x and y, how to fill an image with f(im_x, im_y) ? 2) How can I perform an image transform that maps image 1 to image 2, given that I know a scattered mapping of points in coordinate system 1 to coordinate system 2. This could be implemented either in Labview or OpenCV. Note: I am tagging this post delaunay, because that's one method of doing a scattered interpolation, but the better tag would be "scattered interpolation"

    Read the article

  • What is the correct way to implement a massive hierarchical, geographical search for news?

    - by Philip Brocoum
    The company I work for is in the business of sending press releases. We want to make it possible for interested parties to search for press releases based on a number of criteria, the most important being location. For example, someone might search for all news sent to New York City, Massachusetts, or ZIP code 89134, sent from a governmental institution, under the topic of "traffic". Or whatever. The problem is, we've sent, literally, hundreds of thousands of press releases. Searching is slow and complex. For example, a press release sent to Queens, NY should show up in the search I mentioned above even though it wasn't specifically sent to New York City, because Queens is a subset of New York City. We may also want to implement "and" and "or" and negation and text search to the query to create complex searches. These searches also have to be fast enough to function as dynamic RSS feeds. I really don't know anything about search theory, or how it's properly done. The way we are getting by right now is using a data mart to store the locations the releases were sent to in a single table. However, because of the subset thing mentioned above, the data mart is gigantic with millions of rows. And we haven't even implemented cities yet, and there are about 50,000 cities in the United States, which will exponentially increase the size of the data mart by so much I'm afraid it just won't work anymore. Anyway, I realize this is not a simple question and there won't be a "do this" answer. However, I'm hoping one of you can point me in the right direction where I can learn about how massive searches are done? Because I really know nothing about it. And such a search engine is turning out to be incredibly difficult to make. Thanks! I know there must be a way because if Google can search the entire internet we must be able to search our own database :-)

    Read the article

< Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15  | Next Page >