Search Results

Search found 1155 results on 47 pages for 'relational algebra'.

Page 16/47 | < Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23  | Next Page >

  • How to Set Customer Table with Multiple Phone Numbers? - Relational Database Design

    - by user311509
    CREATE TABLE Phone ( phoneID - PK . . . ); CREATE TABLE PhoneDetail ( phoneDetailID - PK phoneID - FK points to Phone phoneTypeID ... phoneNumber ... . . . ); CREATE TABLE Customer ( customerID - PK firstName phoneID - Unique FK points to Phone . . . ); A customer can have multiple phone numbers e.g. Cell, Work, etc. phoneID in Customer table is unique and points to PhoneID in Phone table. If customer record is deleted, phoneID in Phone table should also be deleted. Do you have any concerns on my design? Is this designed properly? My problem is phoneID in Customer table is a child and if child record is deleted then i can not delete the parent (Phone) record automatically.

    Read the article

  • What is the best way to archive data in a relational database?

    - by GenericTypeTea
    I have a bit of an issue with a particular aspect of a program I'm working on. I need the ability to archive (fix) a table so that a change anywhere in the system will not affect the results it returns. This is the basic structure of what I need to fix: Recipe --> Recipe (as sub recipe) Recipe --> Ingredients So, if I fix a Recipe, I need to ensure all the sub recipes (including all the sub recipes sub recipes and so forth) are fixed and all its ingredients are fixed. The problem is that the sub recipe and ingredients still need to be modifiable as they are used by other recipes that are not fixed. I came up with a solution whereby I serialize (with protobuf-net) a master object that deals with the recipe and all the sub recipes and ingredients and save the archive data to a table like follows: Archive{ ReferenceId, (i.e. RecipeId) ReferenceTypeId, (i.e. Recipe) ArchiveData varbinary(max) } Now, this works great and is almost perfect... however I totally forgot (I'd love to blame the agile development mentally, however this was just short sighted) that this information needs to be reported on. As far as I'm aware I can't think how I could inflate the serialized data back into my Recipe Object and use it in a Report. I'm using the standard SQL 2005 report services at the moment. Alternatively, I guess I could do the following: Duplicate every table and tag the word "Archive" on the end of the table name. This would then give me an area of specific archive data... but ignoring my simplified example, there'd actually be about 15 tables duplicated. Add a nullable, non-foreign key property called "CopiedFromId" to every table that contains fixed data and duplicate every record that the recipe (and all it's sub recipes and all their sub recipes) touches. Create some sort of denormalised structure that could be restored from at a later date to the original, unfixed recipe. Although I think this would be like option 1 and involve a lot of extra tables. Anyway, I'm at a total loss and do not like any of the ideas particularly. Can anyone please advise the best course of action? EDIT: Or 4) Create tables specific to what the report requires and populate them with the data when the user clicks the report button? This would cause about 4 extra tables for the report in question.

    Read the article

  • What resources will help me understand the fundamentals of Relational Database Systems.

    - by Rachel
    This are few of the fundamental database questions which has always given me trouble. I have tried using google and wiki but I somehow I miss out on understanding the functionality rather than terminology. If possible would really appreciate if someone can share more insights on this questions using some visual representative examples. What is a key? A candidate key? A primary key? An alternate key? A foreign key? What is an index and how does it help your database? What are the data types available and when to use which ones?

    Read the article

  • Any tips of how to handle hierarchical trees in relational model?

    - by George
    Hello all. I have a tree structure that can be n-levels deep, without restriction. That means that each node can have another n nodes. What is the best way to retrieve a tree like that without issuing thousands of queries to the database? I looked at a few other models, like flat table model, Preorder Tree Traversal Algorithm, and so. Do you guys have any tips or suggestions of how to implement a efficient tree model? My objective in the real end is to have one or two queries that would spit the whole tree for me. With enough processing i can display the tree in dot net, but that would be in client machine, so, not much of a big deal. Thanks for the attention

    Read the article

  • What are the Options for Storing Hierarchical Data in a Relational Database?

    - by orangepips
    Good Overviews One more Nested Intervals vs. Adjacency List comparison: the best comparison of Adjacency List, Materialized Path, Nested Set and Nested Interval I've found. Models for hierarchical data: slides with good explanations of tradeoffs and example usage Representing hierarchies in MySQL: very good overview of Nested Set in particular Hierarchical data in RDBMSs: most comprehensive and well organized set of links I've seen, but not much in the way on explanation Options Ones I am aware of and general features: Adjacency List: Columns: ID, ParentID Easy to implement. Cheap node moves, inserts, and deletes. Expensive to find level (can store as a computed column), ancestry & descendants (Bridge Hierarchy combined with level column can solve), path (Lineage Column can solve). Use Common Table Expressions in those databases that support them to traverse. Nested Set (a.k.a Modified Preorder Tree Traversal) First described by Joe Celko - covered in depth in his book Trees and Hierarchies in SQL for Smarties Columns: Left, Right Cheap level, ancestry, descendants Compared to Adjacency List, moves, inserts, deletes more expensive. Requires a specific sort order (e.g. created). So sorting all descendants in a different order requires additional work. Nested Intervals Combination of Nested Sets and Materialized Path where left/right columns are floating point decimals instead of integers and encode the path information. Bridge Table (a.k.a. Closure Table: some good ideas about how to use triggers for maintaining this approach) Columns: ancestor, descendant Stands apart from table it describes. Can include some nodes in more than one hierarchy. Cheap ancestry and descendants (albeit not in what order) For complete knowledge of a hierarchy needs to be combined with another option. Flat Table A modification of the Adjacency List that adds a Level and Rank (e.g. ordering) column to each record. Expensive move and delete Cheap ancestry and descendants Good Use: threaded discussion - forums / blog comments Lineage Column (a.k.a. Materialized Path, Path Enumeration) Column: lineage (e.g. /parent/child/grandchild/etc...) Limit to how deep the hierarchy can be. Descendants cheap (e.g. LEFT(lineage, #) = '/enumerated/path') Ancestry tricky (database specific queries) Database Specific Notes MySQL Use session variables for Adjacency List Oracle Use CONNECT BY to traverse Adjacency Lists PostgreSQL ltree datatype for Materialized Path SQL Server General summary 2008 offers HierarchyId data type appears to help with Lineage Column approach and expand the depth that can be represented.

    Read the article

  • What resources will help me understand the fundamentals of Relational Database Design.

    - by Rachel
    This are few of the fundamental database questions which has always given me trouble. I have tried using google and wiki but I somehow I miss out on understanding the functionality rather than terminology. If possible would really appreciate if someone can share more insights on this questions using some visual representative examples. What is a key? A candidate key? A primary key? An alternate key? A foreign key? What is an index and how does it help your database? What are the data types available and when to use which ones?

    Read the article

  • How can I create a small relational database in MySQL?

    - by Sergio Tapia
    I need to make a small database in MySQL that has two tables. Clothes and ClotheType Clothes has: ID, Name, Color, Brand, Price, ClotheTypeID ClotheType has: ID, Description In Microsoft SQL it would be: create table Clothes( id int, primary key(id), name varchar(200), color varchar(200), brand varchar(200), price varchar(200), clothetypeid int, foreign key clothetypeid references ClotheType(id) ) I need this in MySQL.

    Read the article

  • Is it better to use a relational database or document-based database for an app like Wufoo?

    - by mboyle
    I'm working on an application that's similar to Wufoo in that it allows our users to create their own databases and collect/present records with auto generated forms and views. Since every user is creating a different schema (one user might have a database of their baseball card collection, another might have a database of their recipes) our current approach is using MySQL to create separate databases for every user with its own tables. So in other words, the databases our MySQL server contains look like: main-web-app-db (our web app containing tables for users account info, billing, etc) user_1_db (baseball_cards_table) user_2_db (recipes_table) .... And so on. If a user wants to set up a new database to keep track of their DVD collection, we'd do a "create database ..." with "create table ...". If they enter some data in and then decide they want to change a column we'd do an "alter table ....". Now, the further along I get with building this out the more it seems like MySQL is poorly suited to handling this. 1) My first concern is that switching databases every request, first to our main app's database for authentication etc, and then to the user's personal database, is going to be inefficient. 2) The second concern I have is that there's going to be a limit to the number of databases a single MySQL server can host. Pretending for a moment this application had 500,000 user databases, is MySQL designed to operate this way? What if it were a million, or more? 3) Lastly, is this method going to be a nightmare to support and scale? I've never heard of MySQL being used in this way so I do worry about how this affects things like replication and other methods of scaling. To me, it seems like MySQL wasn't built to be used in this way but what do I know. I've been looking at document-based databases like MongoDB, CouchDB, and Redis as alternatives because it seems like a schema-less approach to this particular problem makes a lot of sense. Can anyone offer some advice on this?

    Read the article

  • What's The Best Object-Relational Mapping Tool For .NET?

    - by Icono123
    I've worked on a few Java web projects and we've always used Hibernate for our data object layer. I haven't worked on a large scale ASP.NET site and I'm unsure which solution to choose. I'm tempted to try NHibernate, but I don't like the fact that they use so many third party libraries. I found this list on Wikipedia of available ORM software: http://en.wikipedia.org/wiki/List_of_object-relational_mapping_software#.NET What ORM have you used? Was it easy to use? Would you recommend using it again? Was it used on a small, medium, or large project? Would you write your own? Thanks.

    Read the article

  • How to map combinations of things to a relational database?

    - by Space_C0wb0y
    I have a table whose records represent certain objects. For the sake of simplicity I am going to assume that the table only has one row, and that is the unique ObjectId. Now I need a way to store combinations of objects from that table. The combinations have to be unique, but can be of arbitrary length. For example, if I have the ObjectIds 1,2,3,4 I want to store the following combinations: {1,2}, {1,3,4}, {2,4}, {1,2,3,4} The ordering is not necessary. My current implementation is to have a table Combinations that maps ObjectIds to CombinationIds. So every combination receives a unique Id: ObjectId | CombinationId ------------------------ 1 | 1 2 | 1 1 | 2 3 | 2 4 | 2 This is the mapping for the first two combinations of the example above. The problem is, that the query for finding the CombinationId of a specific Combination seems to be very complex. The two main usage scenarios for this table will be to iterate over all combinations, and the retrieve a specific combination. The table will be created once and never be updated. I am using SQLite through JDBC. Is there any simpler way or a best practice to implement such a mapping?

    Read the article

  • Are there any less costly alternatives to Amazon's Relational Database Services (RDS)?

    - by swapnonil
    Hi All, I have the following requirement. I have with me a database containing the contact and address details of at least 2000 members of my school alumni organization. We want to store all that information in a relation model so that This data can be created and edited on demand. This data is always backed up and should be simple to restore in case the master copy becomes unusable. All sensitive personal information residing in this database is guaranteed to be available only to authorized users. This database won't be online in the first 6 months. It will become online only after a website is built on top of it. I am not a DBA and I don't want to spend time doing things like backups. I thought Amazon's RDS with it's automatic backup facility was the perfect solution for our needs. The only problem is that being a voluntary organization we cannot spare the monthly $100 to $150 fees this service demands. So my question is, are there any less costlier alternatives to Amazon's RDS?

    Read the article

  • Are there any less costlier alternatives to Amazon's Relational Database Services (RDS)?

    - by swapnonil
    Hi All, I have the following requirement. I have with me a database containing the contact and address details of at least 2000 members of my school alumni organization. We want to store all that information in a relation model so that This data can be created and edited on demand. This data is always backed up and should be simple to restore in case the master copy becomes unusable. All sensitive personal information residing in this database is guaranteed to be available only to authorized users. This database won't be online in the first 6 months. It will become online only after a website is built on top of it. I am not a DBA and I don't want to spend time doing things like backups. I thought Amazon's RDS with it's automatic backup facility was the perfect solution for our needs. The only problem is that being a voluntary organization we cannot spare the monthly $100 to $150 fees this service demands. So my question is, are there any less costlier alternatives to Amazon's RDS?

    Read the article

  • Any tips of how to handle hierarchial trees in relational model?

    - by George
    Hello all. I have a tree structure that can be n-levels deep, without restriction. That means that each node can have another n nodes. What is the best way to retrieve a tree like that without issuing thousands of queries to the database? I looked at a few other models, like flat table model, Preorder Tree Traversal Algorithm, and so. Do you guys have any tips or suggestions of how to implement a efficient tree model? My objective in the real end is to have one or two queries that would spit the whole tree for me. With enough processing i can display the tree in dot net, but that would be in client machine, so, not much of a big deal. Thanks for the attention

    Read the article

  • Which relational databases exist with a public API for a high level language?

    - by Jens Schauder
    We typically interface with a RDBMS through SQL. I.e. we create a sql string and send it to the server through JDBC or ODBC or something similar. Are there any RDBMS that allow direct interfacing with the database engine through some API in Java, C#, C or similar? I would expect an API that allows constructs like this (in some arbitrary pseudo code): Iterator iter = engine.getIndex("myIndex").getReferencesForValue("23"); for (Reference ref: iter){ Row row = engine.getTable("mytable").getRow(ref); } I guess something like this is hidden somewhere in (and available from) open source databases, but I am looking for something that is officially supported as a public API, so one finds at least a note in the release notes, when it changes. In order to make this a question that actually has a 'best' answer: I prefer languages in the order given above and I will prefer mature APIs over prototypes and research work, although these are welcome as well.

    Read the article

  • NoSQL as file meta database

    - by fga
    I am trying to implement a virtual file system structure in front of an object storage (Openstack). For availability reasons we initially chose Cassandra, however while designing file system data model, it looked like a tree structure similar to a relational model. Here is the dilemma for availability and partition tolerance we need NoSQL, but our data model is relational. The intended file system must be able to handle filtered search based on date, name etc. as fast as possible. So what path should i take? Stick to relational with some indexing mechanism backed by 3 rd tools like Apache Solr or dig deeper into NoSQL and find a suitable model and database satisfying the model? P.S: Currently from NoSQL Cassandra or MongoDB are choices proposed by my colleagues.

    Read the article

  • Should certain math classes be required for a Computer Science degree?

    - by sunpech
    For a Computer Science (CS) degree at many colleges and universities, certain math courses are required: Calculus, Linear Algebra, and Discrete Mathematics are few examples. However, since I've started working in the real world as a software developer, I have yet to truly use some the knowledge I had at once acquired from taking those classes. Discrete Math might be the only exception. My questions: Should these math classes be required to obtain a computer science degree? Or would they be better served as electives? I'm challenging even that the certain math classes even help with required CS classes. For example, I never used linear algebra outside of the math class itself. I hear it's used in Computer Graphics, but I never took those classes-- yet linear algebra was required for a CS degree. I personally think it could be better served as an elective rather than requirement because it's more specific to a branch of CS rather than general CS. From a Slashdot post CS Profs Debate Role of Math In CS Education: 'For too long, we have taught computer science as an academic discipline (as though all of our students will go on to get PhDs and then become CS faculty members) even though for most of us, our students are overwhelmingly seeking careers in which they apply computer science.'

    Read the article

  • Big Data – Basics of Big Data Architecture – Day 4 of 21

    - by Pinal Dave
    In yesterday’s blog post we understood how Big Data evolution happened. Today we will understand basics of the Big Data Architecture. Big Data Cycle Just like every other database related applications, bit data project have its development cycle. Though three Vs (link) for sure plays an important role in deciding the architecture of the Big Data projects. Just like every other project Big Data project also goes to similar phases of the data capturing, transforming, integrating, analyzing and building actionable reporting on the top of  the data. While the process looks almost same but due to the nature of the data the architecture is often totally different. Here are few of the question which everyone should ask before going ahead with Big Data architecture. Questions to Ask How big is your total database? What is your requirement of the reporting in terms of time – real time, semi real time or at frequent interval? How important is the data availability and what is the plan for disaster recovery? What are the plans for network and physical security of the data? What platform will be the driving force behind data and what are different service level agreements for the infrastructure? This are just basic questions but based on your application and business need you should come up with the custom list of the question to ask. As I mentioned earlier this question may look quite simple but the answer will not be simple. When we are talking about Big Data implementation there are many other important aspects which we have to consider when we decide to go for the architecture. Building Blocks of Big Data Architecture It is absolutely impossible to discuss and nail down the most optimal architecture for any Big Data Solution in a single blog post, however, we can discuss the basic building blocks of big data architecture. Here is the image which I have built to explain how the building blocks of the Big Data architecture works. Above image gives good overview of how in Big Data Architecture various components are associated with each other. In Big Data various different data sources are part of the architecture hence extract, transform and integration are one of the most essential layers of the architecture. Most of the data is stored in relational as well as non relational data marts and data warehousing solutions. As per the business need various data are processed as well converted to proper reports and visualizations for end users. Just like software the hardware is almost the most important part of the Big Data Architecture. In the big data architecture hardware infrastructure is extremely important and failure over instances as well as redundant physical infrastructure is usually implemented. NoSQL in Data Management NoSQL is a very famous buzz word and it really means Not Relational SQL or Not Only SQL. This is because in Big Data Architecture the data is in any format. It can be unstructured, relational or in any other format or from any other data source. To bring all the data together relational technology is not enough, hence new tools, architecture and other algorithms are invented which takes care of all the kind of data. This is collectively called NoSQL. Tomorrow Next four days we will answer the Buzz Words – Hadoop. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Don&rsquo;t Forget! In-Memory Databases are Hot

    - by andrewbrust
    If you’re left scratching your head over SAP’s intention to acquire Sybase for almost $6 million, you’re not alone.  Despite Sybase’s 1990s reign as the supreme database standard in certain sectors (including Wall Street), the company’s flagship product has certainly fallen from grace.  Why would SAP pay a greater than 50% premium over Sybase’s closing price on the day of the announcement just to acquire a relational database which is firmly stuck in maintenance mode? Well there’s more to Sybase than the relational database product.  Take, for example, its mobile application platform.  It hit Gartner’s “Leaders’ Quadrant” in January of last year, and SAP needs a good mobile play.  Beyond the platform itself, Sybase has a slew of mobile services; click this link to look them over. There’s a second major asset that Sybase has though, and I wonder if it figured prominently into SAP’s bid: Sybase IQ.  Sybase IQ is a columnar database.  Columnar databases place values from a given database column contiguously, unlike conventional relational databases, which store all of a row’s data in close proximity.  Storing column values together works well in aggregation reporting scenarios, because the figures to be aggregated can be scanned in one efficient step.  It also makes for high rates of compression because values from a single column tend to be close to each other in magnitude and may contain long sequences of repeating values.  Highly compressible databases use much less disk storage and can be largely or wholly loaded into memory, resulting in lighting fast query performance.  For an ERP company like SAP, with its own legacy BI platform (SAP BW) and the entire range of Business Objects and Crystal Reports BI products (which it acquired in 2007) query performance is extremely important. And it’s a competitive necessity too.  QlikTech has built an entire company on a columnar, in-memory BI product (QlikView).  So too has startup company Vertica.  IBM’s TM1 product has been doing in-memory OLAP for years.  And guess who else has the in-memory religion?  Microsoft does, in the form of its new PowerPivot product.  I expect the technology in PowerPivot to become strategic to the full-blown SQL Server Analysis Services product and the entire Microsoft BI stack.  I sure don’t blame SAP for jumping on the in-memory bandwagon, if indeed the Sybase acquisition is, at least in part, motivated by that. It will be interesting to watch and see what SAP does with Sybase’s product line-up (assuming the acquisition closes), including the core database, the mobile platform, IQ, and even tools like PowerBuilder.  It is also fascinating to watch columnar’s encroachment on relational.  Perhaps this acquisition will be columnar’s tipping point and people will no longer see it as a fad.  Are you listening Larry Ellison?

    Read the article

  • LLBLGen Pro v3.1 released!

    - by FransBouma
    Yesterday we released LLBLGen Pro v3.1! Version 3.1 comes with new features and enhancements, which I'll describe briefly below. v3.1 is a free upgrade for v3.x licensees. What's new / changed? Designer Extensible Import system. An extensible import system has been added to the designer to import project data from external sources. Importers are plug-ins which import project meta-data (like entity definitions, mappings and relational model data) from an external source into the loaded project. In v3.1, an importer plug-in for importing project elements from existing LLBLGen Pro v3.x project files has been included. You can use this importer to create source projects from which you import parts of models to build your actual project with. Model-only relationships. In v3.1, relationships of the type 1:1, m:1 and 1:n can be marked as model-only. A model-only relationship isn't required to have a backing foreign key constraint in the relational model data. They're ideal for projects which have to work with relational databases where changes can't always be made or some relationships can't be added to (e.g. the ones which are important for the entity model, but are not allowed to be added to the relational model for some reason). Custom field ordering. Although fields in an entity definition don't really have an ordering, it can be important for some situations to have the entity fields in a given order, e.g. when you use compound primary keys. Field ordering can be defined using a pop-up dialog which can be opened through various ways, e.g. inside the project explorer, model view and entity editor. It can also be set automatically during refreshes based on new settings. Command line relational model data refresher tool, CliRefresher.exe. The command line refresh tool shipped with v2.6 is now available for v3.1 as well Navigation enhancements in various designer elements. It's now easier to find elements like entities, typed views etc. in the project explorer from editors, to navigate to related entities in the project explorer by right clicking a relationship, navigate to the super-type in the project explorer when right-clicking an entity and navigate to the sub-type in the project explorer when right-clicking a sub-type node in the project explorer. Minor visual enhancements / tweaks LLBLGen Pro Runtime Framework Entity creation is now up to 30% faster and takes 5% less memory. Creating an entity object has been optimized further by tweaks inside the framework to make instantiating an entity object up to 30% faster. It now also takes up to 5% less memory than in v3.0 Prefetch Path node merging is now up to 20-25% faster. Setting entity references required the creation of a new relationship object. As this relationship object is always used internally it could be cached (as it's used for syncing only). This increases performance by 20-25% in the merging functionality. Entity fetches are now up to 20% faster. A large number of tweaks have been applied to make entity fetches up to 20% faster than in v3.0. Full WCF RIA support. It's now possible to use your LLBLGen Pro runtime framework powered domain layer in a WCF RIA application using the VS.NET tools for WCF RIA services. WCF RIA services is a Microsoft technology for .NET 4 and typically used within silverlight applications. SQL Server DQE compatibility level is now per instance. (Usable in Adapter). It's now possible to set the compatibility level of the SQL Server Dynamic Query Engine (DQE) per instance of the DQE instead of the global setting it was before. The global setting is still available and is used as the default value for the compatibility level per-instance. You can use this to switch between CE Desktop and normal SQL Server compatibility per DataAccessAdapter instance. Support for COUNT_BIG aggregate function (SQL Server specific). The aggregate function COUNT_BIG has been added to the list of available aggregate functions to be used in the framework. Minor changes / tweaks I'm especially pleased with the import system, as that makes working with entity models a lot easier. The import system lets you import from another LLBLGen Pro v3 project any entity definition, mapping and / or meta-data like table definitions. This way you can build repository projects where you store model fragments, e.g. the building blocks for a customer-order system, a user credential model etc., any model you can think of. In most projects, you'll recognize that some parts of your new model look familiar. In these cases it would have been easier if you would have been able to import these parts from projects you had pre-created. With LLBLGen Pro v3.1 you can. For example, say you have an Oracle schema called CRM which contains the bread 'n' butter customer-order-product kind of model. You create an entity model from that schema and save it in a project file. Now you start working on another project for another customer and you have to use SQL Server. You also start using model-first development, so develop the entity model from scratch as there's no existing database. As this customer also requires some CRM like entity model, you import the entities from your saved Oracle project into this new SQL Server targeting project. Because you don't work with Oracle this time, you don't import the relational meta-data, just the entities, their relationships and possibly their inheritance hierarchies, if any. As they're now entities in your project you can change them a bit to match the new customer's requirements. This can save you a lot of time, because you can re-use pre-fab model fragments for new projects. In the example above there are no tables yet (as you work model first) so using the forward mapping capabilities of LLBLGen Pro v3 creates the tables, PK constraints, Unique Constraints and FK constraints for you. This way you can build a nice repository of model fragments which you can re-use in new projects.

    Read the article

  • SQL SERVER – Guest Post – Architecting Data Warehouse – Niraj Bhatt

    - by pinaldave
    Niraj Bhatt works as an Enterprise Architect for a Fortune 500 company and has an innate passion for building / studying software systems. He is a top rated speaker at various technical forums including Tech·Ed, MCT Summit, Developer Summit, and Virtual Tech Days, among others. Having run a successful startup for four years Niraj enjoys working on – IT innovations that can impact an enterprise bottom line, streamlining IT budgets through IT consolidation, architecture and integration of systems, performance tuning, and review of enterprise applications. He has received Microsoft MVP award for ASP.NET, Connected Systems and most recently on Windows Azure. When he is away from his laptop, you will find him taking deep dives in automobiles, pottery, rafting, photography, cooking and financial statements though not necessarily in that order. He is also a manager/speaker at BDOTNET, Asia’s largest .NET user group. Here is the guest post by Niraj Bhatt. As data in your applications grows it’s the database that usually becomes a bottleneck. It’s hard to scale a relational DB and the preferred approach for large scale applications is to create separate databases for writes and reads. These databases are referred as transactional database and reporting database. Though there are tools / techniques which can allow you to create snapshot of your transactional database for reporting purpose, sometimes they don’t quite fit the reporting requirements of an enterprise. These requirements typically are data analytics, effective schema (for an Information worker to self-service herself), historical data, better performance (flat data, no joins) etc. This is where a need for data warehouse or an OLAP system arises. A Key point to remember is a data warehouse is mostly a relational database. It’s built on top of same concepts like Tables, Rows, Columns, Primary keys, Foreign Keys, etc. Before we talk about how data warehouses are typically structured let’s understand key components that can create a data flow between OLTP systems and OLAP systems. There are 3 major areas to it: a) OLTP system should be capable of tracking its changes as all these changes should go back to data warehouse for historical recording. For e.g. if an OLTP transaction moves a customer from silver to gold category, OLTP system needs to ensure that this change is tracked and send to data warehouse for reporting purpose. A report in context could be how many customers divided by geographies moved from sliver to gold category. In data warehouse terminology this process is called Change Data Capture. There are quite a few systems that leverage database triggers to move these changes to corresponding tracking tables. There are also out of box features provided by some databases e.g. SQL Server 2008 offers Change Data Capture and Change Tracking for addressing such requirements. b) After we make the OLTP system capable of tracking its changes we need to provision a batch process that can run periodically and takes these changes from OLTP system and dump them into data warehouse. There are many tools out there that can help you fill this gap – SQL Server Integration Services happens to be one of them. c) So we have an OLTP system that knows how to track its changes, we have jobs that run periodically to move these changes to warehouse. The question though remains is how warehouse will record these changes? This structural change in data warehouse arena is often covered under something called Slowly Changing Dimension (SCD). While we will talk about dimensions in a while, SCD can be applied to pure relational tables too. SCD enables a database structure to capture historical data. This would create multiple records for a given entity in relational database and data warehouses prefer having their own primary key, often known as surrogate key. As I mentioned a data warehouse is just a relational database but industry often attributes a specific schema style to data warehouses. These styles are Star Schema or Snowflake Schema. The motivation behind these styles is to create a flat database structure (as opposed to normalized one), which is easy to understand / use, easy to query and easy to slice / dice. Star schema is a database structure made up of dimensions and facts. Facts are generally the numbers (sales, quantity, etc.) that you want to slice and dice. Fact tables have these numbers and have references (foreign keys) to set of tables that provide context around those facts. E.g. if you have recorded 10,000 USD as sales that number would go in a sales fact table and could have foreign keys attached to it that refers to the sales agent responsible for sale and to time table which contains the dates between which that sale was made. These agent and time tables are called dimensions which provide context to the numbers stored in fact tables. This schema structure of fact being at center surrounded by dimensions is called Star schema. A similar structure with difference of dimension tables being normalized is called a Snowflake schema. This relational structure of facts and dimensions serves as an input for another analysis structure called Cube. Though physically Cube is a special structure supported by commercial databases like SQL Server Analysis Services, logically it’s a multidimensional structure where dimensions define the sides of cube and facts define the content. Facts are often called as Measures inside a cube. Dimensions often tend to form a hierarchy. E.g. Product may be broken into categories and categories in turn to individual items. Category and Items are often referred as Levels and their constituents as Members with their overall structure called as Hierarchy. Measures are rolled up as per dimensional hierarchy. These rolled up measures are called Aggregates. Now this may seem like an overwhelming vocabulary to deal with but don’t worry it will sink in as you start working with Cubes and others. Let’s see few other terms that we would run into while talking about data warehouses. ODS or an Operational Data Store is a frequently misused term. There would be few users in your organization that want to report on most current data and can’t afford to miss a single transaction for their report. Then there is another set of users that typically don’t care how current the data is. Mostly senior level executives who are interesting in trending, mining, forecasting, strategizing, etc. don’t care for that one specific transaction. This is where an ODS can come in handy. ODS can use the same star schema and the OLAP cubes we saw earlier. The only difference is that the data inside an ODS would be short lived, i.e. for few months and ODS would sync with OLTP system every few minutes. Data warehouse can periodically sync with ODS either daily or weekly depending on business drivers. Data marts are another frequently talked about topic in data warehousing. They are subject-specific data warehouse. Data warehouses that try to span over an enterprise are normally too big to scope, build, manage, track, etc. Hence they are often scaled down to something called Data mart that supports a specific segment of business like sales, marketing, or support. Data marts too, are often designed using star schema model discussed earlier. Industry is divided when it comes to use of data marts. Some experts prefer having data marts along with a central data warehouse. Data warehouse here acts as information staging and distribution hub with spokes being data marts connected via data feeds serving summarized data. Others eliminate the need for a centralized data warehouse citing that most users want to report on detailed data. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Best Practices, Business Intelligence, Data Warehousing, Database, Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • NoSQL is not about object databases

    - by Bertrand Le Roy
    NoSQL as a movement is an interesting beast. I kinda like that it’s negatively defined (I happen to belong myself to at least one other such a-community). It’s not in its roots about proposing one specific new silver bullet to kill an old problem. it’s about challenging the consensus. Actually, blindly and systematically replacing relational databases with object databases would just replace one set of issues with another. No, the point is to recognize that relational databases are not a universal answer -although they have been used as one for so long- and recognize instead that there’s a whole spectrum of data storage solutions out there. Why is it so hard to recognize, by the way? You are already using some of those other data storage solutions every day. Let me cite a few: The file system Active Directory XML / JSON documents The Web e-mail Logs Excel files EXIF blobs in your photos Relational databases And yes, object databases It’s just a fact of modern life. Notice by the way that most of the data that you use every day is unstructured and thus mostly unsuitable for relational storage. It really is more a matter of recognizing it: you are already doing NoSQL. So what happens when for any reason you need to simultaneously query two or more of these heterogeneous data stores? Well, you build an index of sorts combining them, and that’s what you query instead. Of course, there’s not much distance to travel from that to realizing that querying is better done when completely separated from storage. So why am I writing about this today? Well, that’s something I’ve been giving lots of thought, on and off, over the last ten years. When I built my first CMS all that time ago, one of the main problems my customers were facing was to manage and make sense of the mountain of unstructured data that was constituting most of their business. The central entity of that system was the file system because we were dealing with lots of Word documents, PDFs, OCR’d articles, photos and static web pages. We could have stored all that in SQL Server. It would have worked. Ew. I’m so glad we didn’t. Today, I’m working on Orchard (another CMS ;). It’s a pretty young project but already one of the questions we get the most is how to integrate existing data. One of the ideas I’ll be trying hard to sell to the rest of the team in the next few months is to completely split the querying from the storage. Not only does this provide great opportunities for performance optimizations, it gives you homogeneous access to heterogeneous and existing data sources. For free.

    Read the article

  • Big Data – Buzz Words: What is NoSQL – Day 5 of 21

    - by Pinal Dave
    In yesterday’s blog post we explored the basic architecture of Big Data . In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – NoSQL. What is NoSQL? NoSQL stands for Not Relational SQL or Not Only SQL. Lots of people think that NoSQL means there is No SQL, which is not true – they both sound same but the meaning is totally different. NoSQL does use SQL but it uses more than SQL to achieve its goal. As per Wikipedia’s NoSQL Database Definition – “A NoSQL database provides a mechanism for storage and retrieval of data that uses looser consistency models than traditional relational databases.“ Why use NoSQL? A traditional relation database usually deals with predictable structured data. Whereas as the world has moved forward with unstructured data we often see the limitations of the traditional relational database in dealing with them. For example, nowadays we have data in format of SMS, wave files, photos and video format. It is a bit difficult to manage them by using a traditional relational database. I often see people using BLOB filed to store such a data. BLOB can store the data but when we have to retrieve them or even process them the same BLOB is extremely slow in processing the unstructured data. A NoSQL database is the type of database that can handle unstructured, unorganized and unpredictable data that our business needs it. Along with the support to unstructured data, the other advantage of NoSQL Database is high performance and high availability. Eventual Consistency Additionally to note that NoSQL Database may not provided 100% ACID (Atomicity, Consistency, Isolation, Durability) compliance.  Though, NoSQL Database does not support ACID they provide eventual consistency. That means over the long period of time all updates can be expected to propagate eventually through the system and data will be consistent. Taxonomy Taxonomy is the practice of classification of things or concepts and the principles. The NoSQL taxonomy supports column store, document store, key-value stores, and graph databases. We will discuss the taxonomy in detail in later blog posts. Here are few of the examples of the each of the No SQL Category. Column: Hbase, Cassandra, Accumulo Document: MongoDB, Couchbase, Raven Key-value : Dynamo, Riak, Azure, Redis, Cache, GT.m Graph: Neo4J, Allegro, Virtuoso, Bigdata As of now there are over 150 NoSQL Database and you can read everything about them in this single link. Tomorrow In tomorrow’s blog post we will discuss Buzz Word – Hadoop. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • The Shift: how Orchard painlessly shifted to document storage, and how it’ll affect you

    - by Bertrand Le Roy
    We’ve known it all along. The storage for Orchard content items would be much more efficient using a document database than a relational one. Orchard content items are composed of parts that serialize naturally into infoset kinds of documents. Storing them as relational data like we’ve done so far was unnatural and requires the data for a single item to span multiple tables, related through 1-1 relationships. This means lots of joins in queries, and a great potential for Select N+1 problems. Document databases, unfortunately, are still a tough sell in many places that prefer the more familiar relational model. Being able to x-copy Orchard to hosters has also been a basic constraint in the design of Orchard. Combine those with the necessity at the time to run in medium trust, and with license compatibility issues, and you’ll find yourself with very few reasonable choices. So we went, a little reluctantly, for relational SQL stores, with the dream of one day transitioning to document storage. We have played for a while with the idea of building our own document storage on top of SQL databases, and Sébastien implemented something more than decent along those lines, but we had a better way all along that we didn’t notice until recently… In Orchard, there are fields, which are named properties that you can add dynamically to a content part. Because they are so dynamic, we have been storing them as XML into a column on the main content item table. This infoset storage and its associated API are fairly generic, but were only used for fields. The breakthrough was when Sébastien realized how this existing storage could give us the advantages of document storage with minimal changes, while continuing to use relational databases as the substrate. public bool CommercialPrices { get { return this.Retrieve(p => p.CommercialPrices); } set { this.Store(p => p.CommercialPrices, value); } } This code is very compact and efficient because the API can infer from the expression what the type and name of the property are. It is then able to do the proper conversions for you. For this code to work in a content part, there is no need for a record at all. This is particularly nice for site settings: one query on one table and you get everything you need. This shows how the existing infoset solves the data storage problem, but you still need to query. Well, for those properties that need to be filtered and sorted on, you can still use the current record-based relational system. This of course continues to work. We do however provide APIs that make it trivial to store into both record properties and the infoset storage in one operation: public double Price { get { return Retrieve(r => r.Price); } set { Store(r => r.Price, value); } } This code looks strikingly similar to the non-record case above. The difference is that it will manage both the infoset and the record-based storages. The call to the Store method will send the data in both places, keeping them in sync. The call to the Retrieve method does something even cooler: if the property you’re looking for exists in the infoset, it will return it, but if it doesn’t, it will automatically look into the record for it. And if that wasn’t cool enough, it will take that value from the record and store it into the infoset for the next time it’s required. This means that your data will start automagically migrating to infoset storage just by virtue of using the code above instead of the usual: public double Price { get { return Record.Price; } set { Record.Price = value; } } As your users browse the site, it will get faster and faster as Select N+1 issues will optimize themselves away. If you preferred, you could still have explicit migration code, but it really shouldn’t be necessary most of the time. If you do already have code using QueryHints to mitigate Select N+1 issues, you might want to reconsider those, as with the new system, you’ll want to avoid joins that you don’t need for filtering or sorting, further optimizing your queries. There are some rare cases where the storage of the property must be handled differently. Check out this string[] property on SearchSettingsPart for example: public string[] SearchedFields { get { return (Retrieve<string>("SearchedFields") ?? "") .Split(new[] {',', ' '}, StringSplitOptions.RemoveEmptyEntries); } set { Store("SearchedFields", String.Join(", ", value)); } } The array of strings is transformed by the property accessors into and from a comma-separated list stored in a string. The Retrieve and Store overloads used in this case are lower-level versions that explicitly specify the type and name of the attribute to retrieve or store. You may be wondering what this means for code or operations that look directly at the database tables instead of going through the new infoset APIs. Even if there is a record, the infoset version of the property will win if it exists, so it is necessary to keep the infoset up-to-date. It’s not very complicated, but definitely something to keep in mind. Here is what a product record looks like in Nwazet.Commerce for example: And here is the same data in the infoset: The infoset is stored in Orchard_Framework_ContentItemRecord or Orchard_Framework_ContentItemVersionRecord, depending on whether the content type is versionable or not. A good way to find what you’re looking for is to inspect the record table first, as it’s usually easier to read, and then get the item record of the same id. Here is the detailed XML document for this product: <Data> <ProductPart Inventory="40" Price="18" Sku="pi-camera-box" OutOfStockMessage="" AllowBackOrder="false" Weight="0.2" Size="" ShippingCost="null" IsDigital="false" /> <ProductAttributesPart Attributes="" /> <AutoroutePart DisplayAlias="camera-box" /> <TitlePart Title="Nwazet Pi Camera Box" /> <BodyPart Text="[...]" /> <CommonPart CreatedUtc="2013-09-10T00:39:00Z" PublishedUtc="2013-09-14T01:07:47Z" /> </Data> The data is neatly organized under each part. It is easy to see how that document is all you need to know about that content item, all in one table. If you want to modify that data directly in the database, you should be careful to do it in both the record table and the infoset in the content item record. In this configuration, the record is now nothing more than an index, and will only be used for sorting and filtering. Of course, it’s perfectly fine to mix record-backed properties and record-less properties on the same part. It really depends what you think must be sorted and filtered on. In turn, this potentially simplifies migrations considerably. So here it is, the great shift of Orchard to document storage, something that Orchard has been designed for all along, and that we were able to implement with a satisfying and surprising economy of resources. Expect this code to make its way into the 1.8 version of Orchard when that’s available.

    Read the article

< Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23  | Next Page >