Getting at your database metadata is a pain in the ass. In my casual development of NDeavor, I’m also writing some providers for various database platforms, currently focused around MySql because it has good .NET support and it’s not Sql Server, which I actually work with every day (in the professional sense).  It became clear pretty quickly that without a fully-featured API like SMO that using the functionality in .GetSchema() off of an ADO.NET-compliant vendor’s connection class is the best bet all around.

Being that I want a solution I can use for all the providers I end up doing for NDeavor, and be as lightweight as possible, T4 templates was the best choice. Oh, and I want it LINQ-enabled as well. It will basically do the following:
- Call .GetSchema() on a DbConnection instance to get available metadata (as well as specific collections you tell it to).
- Ouput a C# POCO class with properties mapping to each column of the metadata’s DataTable that GetSchema() returns
- Output a DataContext-like class allowing for LINQ goodness.

You might think:

“Really? You’re getting the schema of a DataTable returned from DbConnection.GetSchema() so it can used to create the schema for a C# class which will be used to query the schema of a relational database?”

and you’d be right. So?

The idea of DBConnection.GetSchema(), in Microsoft terms, was to allow querying of vendor-specific metadata about the underlying data source, so as a lowest-common-denominator it returns instances of the ubiquitous DataTable and DataSet. Passing around loosely typed blobs of data won’t cut it, so we’ll be mapping these to strongly types classes, similar to that of an ORM framework. That sounds tedious and like a job for some lightweight code generation, in this case it will be some T4 template magic in Visual Studio 2008. These classes will just be thin wrappers over DataRows, they won’t have the associations/relationship semantics you’re used to seeing with Linq to Sql or Entity Framework, they’re just glue to support building up a larger more api-friendly object model. Any of the joins you do from the resulting object collections will be in-memory, there’s nothing dynamically querying the underlying schema. It’s all smoke and mirrors.

5 Steps to LINQ-ifying to your Db Schema

Download LinqToSchema

  1. Add LinqToSchema.tt to your Visual Studio 2008 project, and remove the ‘Custom Tool’ setting – this is not the template itself, just for inclusion inside the ‘host’ template.
  2. Create a new T4 template by adding a text file to your project and renaming the extension to .tt
  3. Write the template to look similar to this:
    <#@ template language="C#v3.5" hostspecific="True" debug="true" #>
    <#@ output extension="cs" #>
    <#@ include file="LinqToSchema.tt" #>
    <# 
        LinqToSchemaSettings settings = new LinqToSchemaSettings
        {
            ConnectionStringName = "Chinook_MySql",    // name of connection string in local App.Config/Web.Config - must be in same dir as this file
            TargetNamespace = "Chinook.Schema.MySql",    // what namespace to put the generated code in
            ClassNamePrefix = "MySql",                        // what prefix, if any, to put on the generated class names
            ClassVisibility = Visibility.Public                // visibility of the generated classes (Public or Internal)
        };
       
        LinqToSchemaClassesGenerator generator = new LinqToSchemaClassesGenerator(settings);
        generator.Run();
    #>
  4. Save your just-created template for it to be processed by the T4 engine.
  5. Compile/write other code that uses the generated classes. Such as:
    using (MySqlSchemaContext ctx = new MySqlSchemaContext(@"server=servername;user id=userid;persist security info=True;database=chinook"))
    {
        var query = from tbl in ctx.Tables
                    join col in ctx.Columns on tbl.TableName equals col.TableName
                    where tbl.TableName.ToLower() == "track"
                    select new
                    {
                        tbl.TableCatalog,
                        tbl.TableName,
                        col.ColumnName,
                        col.DataType,
                        col.IsNullable
                    };
        query.ToList().ForEach(p => Console.WriteLine(p));
    }

There are some more options such as explicitly setting the connection string, which are explained in the readme. And if it wasn’t already clear, because this is using DbConnection.GetSchema(), it’s usable on any database platform that returns meaningful metadata using from their platform-specific DbConnection implementation.


 
Categories: database | Linq To Schema | meta | metadata | NDeavor

For most ORM or data layer utilities there is some component of code generation involved and somewhere along the line retrieving some metadata about the underlying structure of a database is usually involved. Seeing as it relates to NDeavor, I figured I’d write about the ins and outs of getting metadata when running close to the database. Basically, there are a few basic ways to access database specific metadata directly through .Net.

Vendor-specific API.

I only know of one comprehensive API for this and it’s Sql Management Objects (SMO) which is Sql-Server only. A tremendously useful API for any Microsoft-dominated shops, though. Enumerating through basic structures like tables and columns is incredibly easy, if a little slow. Getting the table name, column name, and data type name of every column in every table in the chinook database might look like this:

public void PrintChinookTableColumns()
{
    ServerConnection sc = new ServerConnection(@".\SQLEXPRESS");
    sc.Connect();
    Server server = new Server(sc);
    Database chinookDb = server.Databases["chinook"];
    server.SetDefaultInitFields(typeof(Table), "IsSystemObject");
    foreach (Table table in chinookDb.Tables)
    {
        if (table.IsSystemObject)
            continue;

        string tableName = table.Name;

        foreach (Column col in table.Columns)
        {
            string columnName = col.Name;
            string dataTypeName = col.DataType.Name;

            Console.WriteLine("{0} {1} {2}", tableName, columnName, dataTypeName);
        }
    }
    sc.Disconnect();
}

 

Querying vendor-specific system tables.

Vendor lock-in at it’s finest, and typically unsupported at that. Usefulness is dependent on what level of information is available and the amount of time you’re willing to spend figuring out what the various columns and values mean. Getting table, column, and data type names in a specific database in Sql Server Express 2005 might look like this:

use chinook
go

SELECT  T.name as TableName, 
        C.name as ColumnName,
        ST.name as DataType
FROM 
    sys.tables T 
    INNER JOIN sys.columns C on T.object_id = C.object_id
    INNER JOIN sys.systypes ST on C.system_type_id = ST.type
 
Querying standard metadata tables.

Emphasis on ‘standard’, because there really is none. There is a spec as part of the SQL standards that came along sometime around SQL92/SQL99 (not sure) which dictates the INFORMATION_SCHEMA tables which store very basic metadata about the tables, views, columns, procedures, functions, parameters, constraints, etc, of a specific relational database. Support for these is relative to how much the vendor cares about doing so. Sql Server and MySql both have pretty strong support for these. But even then, the spec is very loose and while simple queries to say, get all table names and column names, might be portable, more complex ones likely won’t be. There’s other discrepancies as well, for instance the verbiage around ‘schema’ and ‘catalog’. In MySql terms schema is analogous to a Sql Server database, and this translates down into their implementation of these metadata tables. Back to the example of tables, columns, and data types in the chinook database:

Sql Server: 
use chinook 
go 

SELECT 
    T.TABLE_NAME as TableName,    
    C.COLUMN_NAME as ColumnName, 
    C.DATA_TYPE as DataType 
FROM 
    INFORMATION_SCHEMA.TABLES T 
    INNER JOIN INFORMATION_SCHEMA.COLUMNS C ON 
        T.TABLE_NAME = C.TABLE_NAME AND 
        T.TABLE_SCHEMA = C.TABLE_SCHEMA AND 
        T.TABLE_CATALOG = C.TABLE_CATALOG

 

MySql: 
SELECT 
    T.TABLE_NAME as TableName, 
    C.COLUMN_NAME as ColumnName, 
    C.DATA_TYPE as DataType 
FROM 
    INFORMATION_SCHEMA.TABLES T 
    INNER JOIN INFORMATION_SCHEMA.COLUMNS C ON 
        T.TABLE_NAME = C.TABLE_NAME AND 
        T.TABLE_SCHEMA = C.TABLE_SCHEMA 
WHERE T.TABLE_SCHEMA = 'chinook';
 
Using the schema-related functionality in Ado.Net 2.0.

Introduced in .Net 2.0, the DbConnection base class has a method with a few overloads called GetSchema(). This is supposed to allow implementers of Ado.Net providers a way to provide metadata about the underlying connected data store. Since it’s not strongly typed, and consists of returning DataTables, it’s entirely up to the implementer as far as what information is provided. Typically calling the parameter-less version of GetSchema() would return a DataTable that describes what metadata is available, indicated by a string column called something like MetaDataCollection in which you can pass to subsequent calls to GetSchema(string) to get that specific chunk of metadata. There’s a decent write up of what’s available in the native Ado.Net providers located here. Here’s an extension method to get all the metadata into one dataset for further manipulation, which should work for the built-in Ado.Net providers, as well as MySql.Data.MysqlClient using the latest MySql Net Connector:

using System.Collections.Generic;
using System.Data;
using System.Data.Common;
using System.Linq;

namespace System.Data
{
    public static class DbConnectionExtensions
    {
        public static DataSet GetSchemaSet(this DbConnection connection)
        {
            DataSet result = new DataSet();

            DataTable collections = connection.GetSchema();

            List<string> availableCollectionNames = (from row in collections.AsEnumerable()
                                                     select row.Field<string>("CollectionName"))
                                                     .Distinct()
                                                     .ToList();

            foreach (string collectionName in availableCollectionNames)
            {
                try
                {
                    DataTable schemaTable = connection.GetSchema(collectionName);
                    result.Tables.Add(schemaTable);
                }
                catch { }
                // this is bad form and shouldn't be necessary but SqlClient seems to try to give Sql2008 metadata for 
                // Sql2005 connections which results in unneeded exceptions
            }

            return result;
        }
    }
}

The other mechanism for getting metadata is the GetSchemaTable() method on the System.Data.IDataReader interface, which provides some limited information on the metadata for the columns that exist in a particular instance of IDataReader, typically from a call to DbCommand.ExecuteReader(). Again, GetSchemaTable(), as you might have guessed, returns a DataTable, this time with structural information only relating to the columns of the IDataReader itself. This, combined with DbDataReader.GetName(int) and DbDataReader.GetDataTypeName(int) can give some pretty useful information for table and column structure returned from a select statement. That being said, unless you’re reverse-engineering result sets from stored procedures, you’ll likely find the DbConnection.GetSchema() functionality much more effective.

If you’re using the System.Data.OleDb provider, there’s the GetOleDbSchemaTable method on the OleDbConnection class which provides its own metadata about the underlying database. It works a little differently and takes a System.Guid value which refers to the type of metadata being requested, all of which are static fields in the System.Data.OleDb.OleDbSchemaGuid class.

using System.Collections.Generic;
using System.Data.OleDb;
using System.Linq;
using System.Reflection;

namespace System.Data
{
    public static class OleDbConnectionExtensions
    {
        public static DataSet GetOleDbSchemaSet(this OleDbConnection connection)
        {
            DataSet result = new DataSet();
            List<FieldInfo> guidMembers = typeof(OleDbSchemaGuid).GetFields(BindingFlags.Static | BindingFlags.Public).ToList();

            foreach (FieldInfo field in guidMembers)
            {
                if (field.FieldType == typeof(Guid))
                {
                    Guid val = (Guid)field.GetValue(null);

                    try
                    {
                        DataTable schemaTable = connection.GetOleDbSchemaTable(val, new object[] { });
                        result.Tables.Add(schemaTable);
                    }
                    catch { } // unfortunately not all schema guids supported by all oledb connections necessarily
                }
            }

            return result;
        }
    }
}
Oddities and Errata

Under .NET 3.5 (maybe SP1 only), if you execute .GetSchema() on a System.Data.SqlClient.SqlConnection connected to a Sql Server 2005, you’ll notice there's a row indicating that the metadatacollection called StructuredTypeMembers is available. However, this is Sql Server 2008-specific and will throw an exception if you call .GetSchema(“StructuredTypeMembers”) on that same Sql 2005 connection. Why Microsoft let it even be exposed during calls to .GetSchema() on non-2008 connections, who knows. They already have a precedent for exposing Sql 2008-specific metadata depending on 2005/2008 server version as indicated here so it seems pretty stupid on their part. That’s the reason for the empty catch statement in the code block above.

Using MySql Connector Net 5.2.5, the schema collection of Foreign Key Columns is available but not indicated by the resulting DataTable returned by MySqlConnection.GetSchema(). I notified Reggie Burnett, who maintains the Net Connector, and he says it’s been fixed, so I’d expect to see that in a future release.

Getting the structure of result sets from stored procedures or functions is even more difficult than any of the basic table/column structure stuff I’ve mentioned. Using Sql Server, you can use the SET FMTONLY ON statement before the stored procedure call, and SET FMTONLY OFF afterwards, in order to get an empty result set from which you can derive schema without executing against live data. However, to do that in code, you’d have to derive their parameters using some other means like GetSchema() and build a statement block with the FMTONLY lines surrounding the procedure call. Oh, and none of this works if the stored procedure uses temporary tables to return that result set. And again, it’s Sql Server specific. For MySql, the only way I’ve figured out is to:

  1. Derive the parameter information using MySqlConnection.GetSchema(“Procedure Parameters”)
  2. Construct a MySqlCommand with command text equal to the procedure name and CommandType set to CommandType.StoredProcedure
  3. Add phony parameters using the previously-derived information and set their values to DBNull.Value.
  4. Execute the command via ExecuteReader()
  5. Reverse engineer the schema using GetName(int), GetDataTypeName(int) on the MySqlDataReader, in tandem with GetSchemaTable(). Theoretically, if the information in the schema table indicates the column came from a physical table, one could go to the source and look at the values of MySqlConnection.GetSchema(“Columns”) for the source column definitions.

That’s likely how I’ll be doing it in the NDeavor providers where stored procedures are supported like Oracle and Postgres. It’s less than ideal, but if you’re still following this, you’d realize that’s my point!

Adding It Up

There’s a pattern to getting at your metadata and that pattern is it’s a pain in the ass. I’m hoping NDeavor will provide an easy means of getting database schema out of various platforms for the purposes of artifact-generation. But even so, I’m not exactly planning on writing each NDeavor provider as its own one-off implementation because there’s such inconsistency between metadata across platforms. I have a way forward for a more lower-level means of getting at this stuff that solves a lot of the cross-cutting discrepancies, and that will be in part 2.


 
Categories: Ado.Net | database | metadata | MySql | NDeavor | Sql Server

December 11, 2008
@ 01:54 AM

NDeavor is a hobby project of mine that I’ve been working on for a couple months now on and off. It’s basically a way for me to explore the things I’m interested in but don’t get to explore in my day job, specifically the latest and greatest from Net like LINQ, extensibility patterns like MEF or System.AddIn, working with database metadata, dependency injection, and code generation. It’s a set of libraries and code that puts a relational database schema in a straight forward object model for the purposes of generating some other artifacts. Boring, I know. Been done before, I know. But never done simply enough (and free enough) to where I could say that my take is unnecessary. Extensibility is currently being provided using the Managed Extensibility Framework (MEF) in which I’ve defined some extremely simple contracts for a provider kind of model for support of all the different database platforms out there. I started with System.AddIn then switched to MEF, but that’s a post in and of it’s own.

Object Model

These are the interfaces, they’re purposely simple, and yet to be finalized:

NDeavor_ObjectModel

The actual contracts aren’t defined on these, as the interfaces are mostly for the purposes of mocking and testing. That, and they are left over from when everything had to be a contract when I was using System.AddIn. Thinking about ditching them in favor of concrete classes.

That’s great, what does it do?

Well on the surface, it’s just a library where any external developer can implement these two interfaces to support a specific database platform. Of course, I’ll be writing providers for the major platforms as that’s the part I”m most interested in.

NDeavor_ProviderInterfaces

Each ISchemaProvider, who’s GetSchema method returns an object that implements IDatabase, can be associated to one IDataTypeProvider, which is used to get more common System.Data.DbType and System.Type mappings from vendor-specific data types. I did say it was simple.

Once you have a provider for NDeavor that supports your database platform of choice, you’ll be able to put it in some directory somewhere and be able to call some code in the NDeavor.Core assembly which will instantiate the plugin, provided you know it’s associated plugin name. After that, you can do anything with the object model that you can do programmatically: use CodeDOM to generate C# or VB code, reverse-engieer SQL scripts, whatever.

The current example of putting NDeavor to work that I’ll be providing is invoking NDeavor via T4 Templates. I wrote a custom T4 Directive Processor which allows you to put a bit of angle bracket text in a T4 template which adds a couple properties to the generated template-class, allowing you to manipulate the model in the ‘code-behind’ of the template itself. For example, this bit of T4 template text:

<#@ template debug="True" language="C#v3.5" #>
<#@ output extension="cs" #>
<#@ assembly name="System.Data" #>
<#@ assembly name="C:\Program Files\SubSonic\SubSonic 2.1 Final\SubSonic\SubSonic.dll" #>
<#@ import namespace="System.Linq" #>
<#@ import namespace="System.Diagnostics" #>
<#@ import namespace="System.Data" #>
<#@ import namespace="SubSonic" #>
<#@ import namespace="System.Collections.Generic" #>
<#@ include file="TemplateInclude.tt" #>
<#@ DataModel Processor="DataModelDirectiveProcessor" Database="chinook" Name="Chinook" ConnectionStringName="Chinook_MySql" SchemaProvider="NDeavor.SchemaProviders.MySql" #>
<#  IDatabase db = this.ChinookDataModel; 
    IDataTypeProvider prov = this.ChinookDataTypeProvider; #>
using System;
using System.Collections.Generic;

namespace Model
{
<#foreach(ITable table in db.Tables) {#>
    public class <#=GetClassName(table)#>
    {        
<#foreach(IColumn col in table.Columns) {#>
        public <#=prov.GetSystemType(col).FullName#> <#=GetSingularPropertyName(col)#> { get; set; } 
<#}#>
    }
    
<#}#>
}

Generates a bunch of code like this:

    public class Customer
    {        
        public System.Int32 CustomerId { get; set; } 
        public System.String FirstName { get; set; } 
        public System.String LastName { get; set; } 
        public System.String Company { get; set; } 
        public System.String Address { get; set; } 
        public System.String City { get; set; } 
        public System.String State { get; set; } 
        public System.String Country { get; set; } 
        public System.String PostalCode { get; set; } 
        public System.String Phone { get; set; } 
        public System.String Fax { get; set; } 
        public System.String Email { get; set; } 
    }
    
    public class Employee
    {        
        public System.Int32 EmployeeId { get; set; } 
        public System.String LastName { get; set; } 
        public System.String FirstName { get; set; } 
        public System.String Title { get; set; } 
        public System.Int32 ReportsTo { get; set; } 
        public System.DateTime BirthDate { get; set; } 
        public System.DateTime HireDate { get; set; } 
        public System.String Address { get; set; } 
        public System.String City { get; set; } 
        public System.String State { get; set; } 
        public System.String Country { get; set; } 
        public System.String PostalCode { get; set; } 
        public System.String Phone { get; set; } 
        public System.String Fax { get; set; } 
        public System.String Email { get; set; } 
    }

Now that’s what I’m talking about. No querying system tables, no custom vendor-specific api, it’s all done for you in that bit of angle brackets. Booyah. Oh, and I’m using SubSonic solely for the Inflector class to singularlize class names, it’s totally not required for NDeavor. I may provide my own ‘extra’s like that with the T4 functionality, not sure.

Tentative To-Do List

  • Get object model to more polished and developer-friendly state. Includes ditching some of the POCO-ness in favor of read-only collections backed by specific Add/Remove methods, or custom collections all together, I haven’t nailed that down yet.
  • Get 2 providers written. I’m writing the MySql provider in tandem, first as a break from Sql Server which I use daily, and second as a way to get a view of how another platform does things (you flag an integer column to make it unsigned? seriously?), and how those things translate into a generalized framework like this.
  • Figure out configuration of standard plugin directory and how to implement subdirectory watching using MEF.
  • Test. Test. Test.
  • Blog about the ways to access metadata in various database platforms. (MySql 5.x has good, yet incomplete, support of the INFORMATION_SCHEMA ‘standard’, yet I’m using the .GetSchema() methods on the MySqlDataConnection class, via MySql Net Connector 5.2. )
  • Publish to CodePlex when all’s good. (License?)

That’s it for now. More to come!


 
Categories: NDeavor