Before we look any further at the CLR metadata, we need a quick diversion to understand how the metadata is actually stored.
Encoding table information
As an example, we'll have a look at a row in the TypeDef table. According to the spec, each TypeDef consists of the following:
Flags specifying various properties of the class, including visibility.
The name of the type.
The namespace of the type.
What type this type extends.
The field list of this type.
The method list of this type.
How is all this data actually represented?
Offset & RID encoding
Most assemblies don't need to use a 4 byte value to specify heap offsets and RIDs everywhere, however we can't hard-code every offset and RID to be 2 bytes long as there could conceivably be more than 65535 items in a heap or more than 65535 fields or types defined in an assembly.
So heap offsets and RIDs are only represented in the full 4 bytes if it is required; in the header information at the top of the #~ stream are 3 bits indicating if the #Strings, #GUID, or #Blob heaps use 2 or 4 bytes (the #US stream is not accessed from metadata), and the rowcount of each table. If the rowcount for a particular table is greater than 65535 then all RIDs referencing that table throughout the metadata use 4 bytes, else only 2 bytes are used.
Coded tokens
Not every field in a table row references a single predefined table. For example, in the TypeDef extends field, a type can extend another TypeDef (a type in the same assembly), a TypeRef (a type in a different assembly), or a TypeSpec (an instantiation of a generic type). A token would have to be used to let us specify the table along with the RID. Tokens are always 4 bytes long; again, this is rather wasteful of space. Cutting the RID down to 2 bytes would make each token 3 bytes long, which isn't really an optimum size for computers to read from memory or disk.
However, every use of a token in the metadata tables can only point to a limited subset of the metadata tables. For the extends field, we only need to be able to specify one of 3 tables, which we can do using 2 bits:
0x0: TypeDef
0x1: TypeRef
0x2: TypeSpec
We could therefore compress the 4-byte token that would otherwise be needed into a coded token of type TypeDefOrRef. For each type of coded token, the least significant bits encode the table the token points to, and the rest of the bits encode the RID within that table. We can work out whether each type of coded token needs 2 or 4 bytes to represent it by working out whether the maximum RID of every table that the coded token type can point to will fit in the space available.
The space available for the RID depends on the type of coded token; a TypeOrMethodDef coded token only needs 1 bit to specify the table, leaving 15 bits available for the RID before a 4-byte representation is needed, whereas a HasCustomAttribute coded token can point to one of 18 different tables, and so needs 5 bits to specify the table, only leaving 11 bits for the RID before 4 bytes are needed to represent that coded token type.
For example, a 2-byte TypeDefOrRef coded token with the value 0x0321 has the following bit pattern:
0 3 2 1
0000 0011 0010 0001
The first two bits specify the table - TypeRef; the other bits specify the RID. Because we've used the first two bits, we've got to shift everything along two bits:
000000 1100 1000
This gives us a RID of 0xc8. If any one of the TypeDef, TypeRef or TypeSpec tables had more than 16383 rows (2^14 - 1), then 4 bytes would need to be used to represent all TypeDefOrRef coded tokens throughout the metadata tables.
Lists
The third representation we need to consider is 1-to-many references; each TypeDef refers to a list of FieldDef and MethodDef belonging to that type. If we were to specify every FieldDef and MethodDef individually then each TypeDef would be very large and a variable size, which isn't ideal.
There is a way of specifying a list of references without explicitly specifying every item; if we order the MethodDef and FieldDef tables by the owning type, then the field list and method list in a TypeDef only have to be a single RID pointing at the first FieldDef or MethodDef belonging to that type; the end of the list can be inferred by the field list and method list RIDs of the next row in the TypeDef table.
Going back to the TypeDef
If we have a look back at the definition of a TypeDef, we end up with the following reprensentation for each row:
Flags - always 4 bytes
Name - a #Strings heap offset.
Namespace - a #Strings heap offset.
Extends - a TypeDefOrRef coded token.
FieldList - a single RID to the FieldDef table.
MethodList - a single RID to the MethodDef table.
So, depending on the number of entries in the heaps and tables within the assembly, the rows in the TypeDef table can be as small as 14 bytes, or as large as 24 bytes.
Now we've had a look at how information is encoded within the metadata tables, in the next post we can see how they are arranged on disk.