This is the third (and last) post in a series that explains different approaches to map an inheritance hierarchy with EF Code First. I've described these strategies in previous posts:
Part 1 – Table per Hierarchy (TPH)
Part 2 – Table per Type (TPT)In today’s blog post I am going to discuss Table per Concrete Type (TPC) which completes the inheritance mapping strategies supported by EF Code First. At the end of this post I will provide some guidelines to choose an inheritance strategy mainly based on what we've learned in this series.
TPC and Entity Framework in the Past Table per Concrete type is somehow the simplest approach suggested, yet using TPC with EF is one of those concepts that has not been covered very well so far and I've seen in some resources that it was even discouraged. The reason for that is just because Entity Data Model Designer in VS2010 doesn't support TPC (even though the EF runtime does). That basically means if you are following EF's Database-First or Model-First approaches then configuring TPC requires manually writing XML in the EDMX file which is not considered to be a fun practice. Well, no more. You'll see that with Code First, creating TPC is perfectly possible with fluent API just like other strategies and you don't need to avoid TPC due to the lack of designer support as you would probably do in other EF approaches.
Table per Concrete Type (TPC)In Table per Concrete type (aka Table per Concrete class) we use exactly one table for each (nonabstract) class. All properties of a class, including inherited properties, can be mapped to columns of this table, as shown in the following figure:
As you can see, the SQL schema is not aware of the inheritance; effectively, we’ve mapped two unrelated tables to a more expressive class structure. If the base class was concrete, then an additional table would be needed to hold instances of that class. I have to emphasize that there is no relationship between the database tables, except for the fact that they share some similar columns.
TPC Implementation in Code First Just like the TPT implementation, we need to specify a separate table for each of the subclasses. We also need to tell Code First that we want all of the inherited properties to be mapped as part of this table. In CTP5, there is a new helper method on EntityMappingConfiguration class called MapInheritedProperties that exactly does this for us. Here is the complete object model as well as the fluent API to create a TPC mapping:
public abstract class BillingDetail
{
public int BillingDetailId { get; set; }
public string Owner { get; set; }
public string Number { get; set; }
}
public class BankAccount : BillingDetail
{
public string BankName { get; set; }
public string Swift { get; set; }
}
public class CreditCard : BillingDetail
{
public int CardType { get; set; }
public string ExpiryMonth { get; set; }
public string ExpiryYear { get; set; }
}
public class InheritanceMappingContext : DbContext
{
public DbSet<BillingDetail> BillingDetails { get; set; }
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<BankAccount>().Map(m =>
{
m.MapInheritedProperties();
m.ToTable("BankAccounts");
});
modelBuilder.Entity<CreditCard>().Map(m =>
{
m.MapInheritedProperties();
m.ToTable("CreditCards");
});
}
}
The Importance of EntityMappingConfiguration ClassAs a side note, it worth mentioning that EntityMappingConfiguration class turns out to be a key type for inheritance mapping in Code First. Here is an snapshot of this class:
namespace System.Data.Entity.ModelConfiguration.Configuration.Mapping
{
public class EntityMappingConfiguration<TEntityType> where TEntityType : class
{
public ValueConditionConfiguration Requires(string discriminator);
public void ToTable(string tableName);
public void MapInheritedProperties();
}
}
As you have seen so far, we used its Requires method to customize TPH. We also used its ToTable method to create a TPT and now we are using its MapInheritedProperties along with ToTable method to create our TPC mapping.
TPC Configuration is Not Done Yet!We are not quite done with our TPC configuration and there is more into this story even though the fluent API we saw perfectly created a TPC mapping for us in the database. To see why, let's start working with our object model. For example, the following code creates two new objects of BankAccount and CreditCard types and tries to add them to the database:
using (var context = new InheritanceMappingContext())
{
BankAccount bankAccount = new BankAccount();
CreditCard creditCard = new CreditCard() { CardType = 1 };
context.BillingDetails.Add(bankAccount);
context.BillingDetails.Add(creditCard);
context.SaveChanges();
}
Running this code throws an InvalidOperationException with this message:
The changes to the database were committed successfully, but an error occurred while updating the object context. The ObjectContext might be in an inconsistent state. Inner exception message: AcceptChanges cannot continue because the object's key values conflict with another object in the ObjectStateManager. Make sure that the key values are unique before calling AcceptChanges.
The reason we got this exception is because DbContext.SaveChanges() internally invokes SaveChanges method of its internal ObjectContext. ObjectContext's SaveChanges method on its turn by default calls AcceptAllChanges after it has performed the database modifications. AcceptAllChanges method merely iterates over all entries in ObjectStateManager and invokes AcceptChanges on each of them. Since the entities are in Added state, AcceptChanges method replaces their temporary EntityKey with a regular EntityKey based on the primary key values (i.e. BillingDetailId) that come back from the database and that's where the problem occurs since both the entities have been assigned the same value for their primary key by the database (i.e. on both BillingDetailId = 1) and the problem is that ObjectStateManager cannot track objects of the same type (i.e. BillingDetail) with the same EntityKey value hence it throws. If you take a closer look at the TPC's SQL schema above, you'll see why the database generated the same values for the primary keys: the BillingDetailId column in both BankAccounts and CreditCards table has been marked as identity.
How to Solve The Identity Problem in TPC As you saw, using SQL Server’s int identity columns doesn't work very well together with TPC since there will be duplicate entity keys when inserting in subclasses tables with all having the same identity seed. Therefore, to solve this, either a spread seed (where each table has its own initial seed value) will be needed, or a mechanism other than SQL Server’s int identity should be used. Some other RDBMSes have other mechanisms allowing a sequence (identity) to be shared by multiple tables, and something similar can be achieved with GUID keys in SQL Server. While using GUID keys, or int identity keys with different starting seeds will solve the problem but yet another solution would be to completely switch off identity on the primary key property. As a result, we need to take the responsibility of providing unique keys when inserting records to the database. We will go with this solution since it works regardless of which database engine is used.
Switching Off Identity in Code First We can switch off identity simply by placing DatabaseGenerated attribute on the primary key property and pass DatabaseGenerationOption.None to its constructor. DatabaseGenerated attribute is a new data annotation which has been added to System.ComponentModel.DataAnnotations namespace in CTP5:
public abstract class BillingDetail
{
[DatabaseGenerated(DatabaseGenerationOption.None)]
public int BillingDetailId { get; set; }
public string Owner { get; set; }
public string Number { get; set; }
}
As always, we can achieve the same result by using fluent API, if you prefer that:
modelBuilder.Entity<BillingDetail>()
.Property(p => p.BillingDetailId)
.HasDatabaseGenerationOption(DatabaseGenerationOption.None);
Working With The Object Model Our TPC mapping is ready and we can try adding new records to the database. But, like I said, now we need to take care of providing unique keys when creating new objects:
using (var context = new InheritanceMappingContext())
{
BankAccount bankAccount = new BankAccount()
{
BillingDetailId = 1
};
CreditCard creditCard = new CreditCard()
{
BillingDetailId = 2,
CardType = 1
};
context.BillingDetails.Add(bankAccount);
context.BillingDetails.Add(creditCard);
context.SaveChanges();
}
Polymorphic Associations with TPC is Problematic The main problem with this approach is that it doesn’t support Polymorphic Associations very well. After all, in the database, associations are represented as foreign key relationships and in TPC, the subclasses are all mapped to different tables so a polymorphic association to their base class (abstract BillingDetail in our example) cannot be represented as a simple foreign key relationship. For example, consider the the domain model we introduced here where User has a polymorphic association with BillingDetail. This would be problematic in our TPC Schema, because if User has a many-to-one relationship with BillingDetail, the Users table would need a single foreign key column, which would have to refer both concrete subclass tables. This isn’t possible with regular foreign key constraints.
Schema Evolution with TPC is Complex A further conceptual problem with this mapping strategy is that several different columns, of different tables, share exactly the same semantics. This makes schema evolution more complex. For example, a change to a base class property results in changes to multiple columns. It also makes it much more difficult to implement database integrity constraints that apply to all subclasses.
Generated SQLLet's examine SQL output for polymorphic queries in TPC mapping. For example, consider this polymorphic query for all BillingDetails and the resulting SQL statements that being executed in the database:
var query = from b in context.BillingDetails select b;
Just like the SQL query generated by TPT mapping, the CASE statements that you see in the beginning of the query is merely to ensure columns that are irrelevant for a particular row have NULL values in the returning flattened table. (e.g. BankName for a row that represents a CreditCard type).
TPC's SQL Queries are Union Based As you can see in the above screenshot, the first SELECT uses a FROM-clause subquery (which is selected with a red rectangle) to retrieve all instances of BillingDetails from all concrete class tables. The tables are combined with a UNION operator, and a literal (in this case, 0 and 1) is inserted into the intermediate result; (look at the lines highlighted in yellow.) EF reads this to instantiate the correct class given the data from a particular row. A union requires that the queries that are combined, project over the same columns; hence, EF has to pad and fill up nonexistent columns with NULL. This query will really perform well since here we can let the database optimizer find the best execution plan to combine rows from several tables. There is also no Joins involved so it has a better performance than the SQL queries generated by TPT where a Join is required between the base and subclasses tables.
Choosing Strategy GuidelinesBefore we get into this discussion, I want to emphasize that there is no one single "best strategy fits all scenarios" exists. As you saw, each of the approaches have their own advantages and drawbacks. Here are some rules of thumb to identify the best strategy in a particular scenario:
If you don’t require polymorphic associations or queries, lean toward TPC—in other words, if you never or rarely query for BillingDetails and you have no class that has an association to BillingDetail base class. I recommend TPC (only) for the top level of your class hierarchy, where polymorphism isn’t usually required, and when modification of the base class in the future is unlikely.
If you do require polymorphic associations or queries, and subclasses declare relatively few properties (particularly if the main difference between subclasses is in their behavior), lean toward TPH. Your goal is to minimize the number of nullable columns and to convince yourself (and your DBA) that a denormalized schema won’t create problems in the long run.
If you do require polymorphic associations or queries, and subclasses declare many properties (subclasses differ mainly by the data they hold), lean toward TPT. Or, depending on the width and depth of your inheritance hierarchy and the possible cost of joins versus unions, use TPC. By default, choose TPH only for simple problems. For more complex cases (or when you’re overruled by a data modeler insisting on the importance of nullability constraints and normalization), you should consider the TPT strategy. But at that point, ask yourself whether it may not be better to remodel inheritance as delegation in the object model (delegation is a way of making composition as powerful for reuse as inheritance). Complex inheritance is often best avoided for all sorts of reasons unrelated to persistence or ORM. EF acts as a buffer between the domain and relational models, but that doesn’t mean you can ignore persistence concerns when designing your classes.
SummaryIn this series, we focused on one of the main structural aspect of the object/relational paradigm mismatch which is inheritance and discussed how EF solve this problem as an ORM solution. We learned about the three well-known inheritance mapping strategies and their implementations in EF Code First. Hopefully it gives you a better insight about the mapping of inheritance hierarchies as well as choosing the best strategy for your particular scenario. Happy New Year and Happy Code-Firsting!
References
ADO.NET team blog
Java Persistence with Hibernate book
a
{
color: #5A99FF;
}
a:visited
{
color: #5A99FF;
}
.title
{
padding-bottom: 5px;
font-family: Segoe UI;
font-size: 11pt;
font-weight: bold;
padding-top: 15px;
}
.code, .typeName
{
font-family: consolas;
}
.typeName
{
color: #2b91af;
}
.padTop5
{
padding-top: 5px;
}
.padTop10
{
padding-top: 10px;
}
.exception
{
background-color: #f0f0f0;
font-style: italic;
padding-bottom: 5px;
padding-left: 5px;
padding-top: 5px;
padding-right: 5px;
}