April 11, 2009
@ 11:40 PM

schemaservice

What, you mean you don’t expose your database metadata via an Ado.Net data service?


 
Categories: meta | metadata

 

Getting at your database metadata is a pain in the ass. In my casual development of NDeavor, I’m also writing some providers for various database platforms, currently focused around MySql because it has good .NET support and it’s not Sql Server, which I actually work with every day (in the professional sense).  It became clear pretty quickly that without a fully-featured API like SMO that using the functionality in .GetSchema() off of an ADO.NET-compliant vendor’s connection class is the best bet all around.

Being that I want a solution I can use for all the providers I end up doing for NDeavor, and be as lightweight as possible, T4 templates was the best choice. Oh, and I want it LINQ-enabled as well. It will basically do the following:
- Call .GetSchema() on a DbConnection instance to get available metadata (as well as specific collections you tell it to).
- Ouput a C# POCO class with properties mapping to each column of the metadata’s DataTable that GetSchema() returns
- Output a DataContext-like class allowing for LINQ goodness.

You might think:

“Really? You’re getting the schema of a DataTable returned from DbConnection.GetSchema() so it can used to create the schema for a C# class which will be used to query the schema of a relational database?”

and you’d be right. So?

The idea of DBConnection.GetSchema(), in Microsoft terms, was to allow querying of vendor-specific metadata about the underlying data source, so as a lowest-common-denominator it returns instances of the ubiquitous DataTable and DataSet. Passing around loosely typed blobs of data won’t cut it, so we’ll be mapping these to strongly types classes, similar to that of an ORM framework. That sounds tedious and like a job for some lightweight code generation, in this case it will be some T4 template magic in Visual Studio 2008. These classes will just be thin wrappers over DataRows, they won’t have the associations/relationship semantics you’re used to seeing with Linq to Sql or Entity Framework, they’re just glue to support building up a larger more api-friendly object model. Any of the joins you do from the resulting object collections will be in-memory, there’s nothing dynamically querying the underlying schema. It’s all smoke and mirrors.

5 Steps to LINQ-ifying to your Db Schema

Download LinqToSchema

  1. Add LinqToSchema.tt to your Visual Studio 2008 project, and remove the ‘Custom Tool’ setting – this is not the template itself, just for inclusion inside the ‘host’ template.
  2. Create a new T4 template by adding a text file to your project and renaming the extension to .tt
  3. Write the template to look similar to this:
    <#@ template language="C#v3.5" hostspecific="True" debug="true" #>
    <#@ output extension="cs" #>
    <#@ include file="LinqToSchema.tt" #>
    <# 
        LinqToSchemaSettings settings = new LinqToSchemaSettings
        {
            ConnectionStringName = "Chinook_MySql",    // name of connection string in local App.Config/Web.Config - must be in same dir as this file
            TargetNamespace = "Chinook.Schema.MySql",    // what namespace to put the generated code in
            ClassNamePrefix = "MySql",                        // what prefix, if any, to put on the generated class names
            ClassVisibility = Visibility.Public                // visibility of the generated classes (Public or Internal)
        };
       
        LinqToSchemaClassesGenerator generator = new LinqToSchemaClassesGenerator(settings);
        generator.Run();
    #>
  4. Save your just-created template for it to be processed by the T4 engine.
  5. Compile/write other code that uses the generated classes. Such as:
    using (MySqlSchemaContext ctx = new MySqlSchemaContext(@"server=servername;user id=userid;persist security info=True;database=chinook"))
    {
        var query = from tbl in ctx.Tables
                    join col in ctx.Columns on tbl.TableName equals col.TableName
                    where tbl.TableName.ToLower() == "track"
                    select new
                    {
                        tbl.TableCatalog,
                        tbl.TableName,
                        col.ColumnName,
                        col.DataType,
                        col.IsNullable
                    };
        query.ToList().ForEach(p => Console.WriteLine(p));
    }

There are some more options such as explicitly setting the connection string, which are explained in the readme. And if it wasn’t already clear, because this is using DbConnection.GetSchema(), it’s usable on any database platform that returns meaningful metadata using from their platform-specific DbConnection implementation.


 
Categories: database | Linq To Schema | meta | metadata | NDeavor

For most ORM or data layer utilities there is some component of code generation involved and somewhere along the line retrieving some metadata about the underlying structure of a database is usually involved. Seeing as it relates to NDeavor, I figured I’d write about the ins and outs of getting metadata when running close to the database. Basically, there are a few basic ways to access database specific metadata directly through .Net.

Vendor-specific API.

I only know of one comprehensive API for this and it’s Sql Management Objects (SMO) which is Sql-Server only. A tremendously useful API for any Microsoft-dominated shops, though. Enumerating through basic structures like tables and columns is incredibly easy, if a little slow. Getting the table name, column name, and data type name of every column in every table in the chinook database might look like this:

public void PrintChinookTableColumns()
{
    ServerConnection sc = new ServerConnection(@".\SQLEXPRESS");
    sc.Connect();
    Server server = new Server(sc);
    Database chinookDb = server.Databases["chinook"];
    server.SetDefaultInitFields(typeof(Table), "IsSystemObject");
    foreach (Table table in chinookDb.Tables)
    {
        if (table.IsSystemObject)
            continue;

        string tableName = table.Name;

        foreach (Column col in table.Columns)
        {
            string columnName = col.Name;
            string dataTypeName = col.DataType.Name;

            Console.WriteLine("{0} {1} {2}", tableName, columnName, dataTypeName);
        }
    }
    sc.Disconnect();
}

 

Querying vendor-specific system tables.

Vendor lock-in at it’s finest, and typically unsupported at that. Usefulness is dependent on what level of information is available and the amount of time you’re willing to spend figuring out what the various columns and values mean. Getting table, column, and data type names in a specific database in Sql Server Express 2005 might look like this:

use chinook
go

SELECT  T.name as TableName, 
        C.name as ColumnName,
        ST.name as DataType
FROM 
    sys.tables T 
    INNER JOIN sys.columns C on T.object_id = C.object_id
    INNER JOIN sys.systypes ST on C.system_type_id = ST.type
 
Querying standard metadata tables.

Emphasis on ‘standard’, because there really is none. There is a spec as part of the SQL standards that came along sometime around SQL92/SQL99 (not sure) which dictates the INFORMATION_SCHEMA tables which store very basic metadata about the tables, views, columns, procedures, functions, parameters, constraints, etc, of a specific relational database. Support for these is relative to how much the vendor cares about doing so. Sql Server and MySql both have pretty strong support for these. But even then, the spec is very loose and while simple queries to say, get all table names and column names, might be portable, more complex ones likely won’t be. There’s other discrepancies as well, for instance the verbiage around ‘schema’ and ‘catalog’. In MySql terms schema is analogous to a Sql Server database, and this translates down into their implementation of these metadata tables. Back to the example of tables, columns, and data types in the chinook database:

Sql Server: 
use chinook 
go 

SELECT 
    T.TABLE_NAME as TableName,    
    C.COLUMN_NAME as ColumnName, 
    C.DATA_TYPE as DataType 
FROM 
    INFORMATION_SCHEMA.TABLES T 
    INNER JOIN INFORMATION_SCHEMA.COLUMNS C ON 
        T.TABLE_NAME = C.TABLE_NAME AND 
        T.TABLE_SCHEMA = C.TABLE_SCHEMA AND 
        T.TABLE_CATALOG = C.TABLE_CATALOG

 

MySql: 
SELECT 
    T.TABLE_NAME as TableName, 
    C.COLUMN_NAME as ColumnName, 
    C.DATA_TYPE as DataType 
FROM 
    INFORMATION_SCHEMA.TABLES T 
    INNER JOIN INFORMATION_SCHEMA.COLUMNS C ON 
        T.TABLE_NAME = C.TABLE_NAME AND 
        T.TABLE_SCHEMA = C.TABLE_SCHEMA 
WHERE T.TABLE_SCHEMA = 'chinook';
 
Using the schema-related functionality in Ado.Net 2.0.

Introduced in .Net 2.0, the DbConnection base class has a method with a few overloads called GetSchema(). This is supposed to allow implementers of Ado.Net providers a way to provide metadata about the underlying connected data store. Since it’s not strongly typed, and consists of returning DataTables, it’s entirely up to the implementer as far as what information is provided. Typically calling the parameter-less version of GetSchema() would return a DataTable that describes what metadata is available, indicated by a string column called something like MetaDataCollection in which you can pass to subsequent calls to GetSchema(string) to get that specific chunk of metadata. There’s a decent write up of what’s available in the native Ado.Net providers located here. Here’s an extension method to get all the metadata into one dataset for further manipulation, which should work for the built-in Ado.Net providers, as well as MySql.Data.MysqlClient using the latest MySql Net Connector:

using System.Collections.Generic;
using System.Data;
using System.Data.Common;
using System.Linq;

namespace System.Data
{
    public static class DbConnectionExtensions
    {
        public static DataSet GetSchemaSet(this DbConnection connection)
        {
            DataSet result = new DataSet();

            DataTable collections = connection.GetSchema();

            List<string> availableCollectionNames = (from row in collections.AsEnumerable()
                                                     select row.Field<string>("CollectionName"))
                                                     .Distinct()
                                                     .ToList();

            foreach (string collectionName in availableCollectionNames)
            {
                try
                {
                    DataTable schemaTable = connection.GetSchema(collectionName);
                    result.Tables.Add(schemaTable);
                }
                catch { }
                // this is bad form and shouldn't be necessary but SqlClient seems to try to give Sql2008 metadata for 
                // Sql2005 connections which results in unneeded exceptions
            }

            return result;
        }
    }
}

The other mechanism for getting metadata is the GetSchemaTable() method on the System.Data.IDataReader interface, which provides some limited information on the metadata for the columns that exist in a particular instance of IDataReader, typically from a call to DbCommand.ExecuteReader(). Again, GetSchemaTable(), as you might have guessed, returns a DataTable, this time with structural information only relating to the columns of the IDataReader itself. This, combined with DbDataReader.GetName(int) and DbDataReader.GetDataTypeName(int) can give some pretty useful information for table and column structure returned from a select statement. That being said, unless you’re reverse-engineering result sets from stored procedures, you’ll likely find the DbConnection.GetSchema() functionality much more effective.

If you’re using the System.Data.OleDb provider, there’s the GetOleDbSchemaTable method on the OleDbConnection class which provides its own metadata about the underlying database. It works a little differently and takes a System.Guid value which refers to the type of metadata being requested, all of which are static fields in the System.Data.OleDb.OleDbSchemaGuid class.

using System.Collections.Generic;
using System.Data.OleDb;
using System.Linq;
using System.Reflection;

namespace System.Data
{
    public static class OleDbConnectionExtensions
    {
        public static DataSet GetOleDbSchemaSet(this OleDbConnection connection)
        {
            DataSet result = new DataSet();
            List<FieldInfo> guidMembers = typeof(OleDbSchemaGuid).GetFields(BindingFlags.Static | BindingFlags.Public).ToList();

            foreach (FieldInfo field in guidMembers)
            {
                if (field.FieldType == typeof(Guid))
                {
                    Guid val = (Guid)field.GetValue(null);

                    try
                    {
                        DataTable schemaTable = connection.GetOleDbSchemaTable(val, new object[] { });
                        result.Tables.Add(schemaTable);
                    }
                    catch { } // unfortunately not all schema guids supported by all oledb connections necessarily
                }
            }

            return result;
        }
    }
}
Oddities and Errata

Under .NET 3.5 (maybe SP1 only), if you execute .GetSchema() on a System.Data.SqlClient.SqlConnection connected to a Sql Server 2005, you’ll notice there's a row indicating that the metadatacollection called StructuredTypeMembers is available. However, this is Sql Server 2008-specific and will throw an exception if you call .GetSchema(“StructuredTypeMembers”) on that same Sql 2005 connection. Why Microsoft let it even be exposed during calls to .GetSchema() on non-2008 connections, who knows. They already have a precedent for exposing Sql 2008-specific metadata depending on 2005/2008 server version as indicated here so it seems pretty stupid on their part. That’s the reason for the empty catch statement in the code block above.

Using MySql Connector Net 5.2.5, the schema collection of Foreign Key Columns is available but not indicated by the resulting DataTable returned by MySqlConnection.GetSchema(). I notified Reggie Burnett, who maintains the Net Connector, and he says it’s been fixed, so I’d expect to see that in a future release.

Getting the structure of result sets from stored procedures or functions is even more difficult than any of the basic table/column structure stuff I’ve mentioned. Using Sql Server, you can use the SET FMTONLY ON statement before the stored procedure call, and SET FMTONLY OFF afterwards, in order to get an empty result set from which you can derive schema without executing against live data. However, to do that in code, you’d have to derive their parameters using some other means like GetSchema() and build a statement block with the FMTONLY lines surrounding the procedure call. Oh, and none of this works if the stored procedure uses temporary tables to return that result set. And again, it’s Sql Server specific. For MySql, the only way I’ve figured out is to:

  1. Derive the parameter information using MySqlConnection.GetSchema(“Procedure Parameters”)
  2. Construct a MySqlCommand with command text equal to the procedure name and CommandType set to CommandType.StoredProcedure
  3. Add phony parameters using the previously-derived information and set their values to DBNull.Value.
  4. Execute the command via ExecuteReader()
  5. Reverse engineer the schema using GetName(int), GetDataTypeName(int) on the MySqlDataReader, in tandem with GetSchemaTable(). Theoretically, if the information in the schema table indicates the column came from a physical table, one could go to the source and look at the values of MySqlConnection.GetSchema(“Columns”) for the source column definitions.

That’s likely how I’ll be doing it in the NDeavor providers where stored procedures are supported like Oracle and Postgres. It’s less than ideal, but if you’re still following this, you’d realize that’s my point!

Adding It Up

There’s a pattern to getting at your metadata and that pattern is it’s a pain in the ass. I’m hoping NDeavor will provide an easy means of getting database schema out of various platforms for the purposes of artifact-generation. But even so, I’m not exactly planning on writing each NDeavor provider as its own one-off implementation because there’s such inconsistency between metadata across platforms. I have a way forward for a more lower-level means of getting at this stuff that solves a lot of the cross-cutting discrepancies, and that will be in part 2.


 
Categories: Ado.Net | database | metadata | MySql | NDeavor | Sql Server