For most ORM or data layer utilities there is some component of code generation involved and somewhere along the line retrieving some metadata about the underlying structure of a database is usually involved. Seeing as it relates to NDeavor, I figured I’d write about the ins and outs of getting metadata when running close to the database. Basically, there are a few basic ways to access database specific metadata directly through .Net.

Vendor-specific API.

I only know of one comprehensive API for this and it’s Sql Management Objects (SMO) which is Sql-Server only. A tremendously useful API for any Microsoft-dominated shops, though. Enumerating through basic structures like tables and columns is incredibly easy, if a little slow. Getting the table name, column name, and data type name of every column in every table in the chinook database might look like this:

public void PrintChinookTableColumns()
{
    ServerConnection sc = new ServerConnection(@".\SQLEXPRESS");
    sc.Connect();
    Server server = new Server(sc);
    Database chinookDb = server.Databases["chinook"];
    server.SetDefaultInitFields(typeof(Table), "IsSystemObject");
    foreach (Table table in chinookDb.Tables)
    {
        if (table.IsSystemObject)
            continue;

        string tableName = table.Name;

        foreach (Column col in table.Columns)
        {
            string columnName = col.Name;
            string dataTypeName = col.DataType.Name;

            Console.WriteLine("{0} {1} {2}", tableName, columnName, dataTypeName);
        }
    }
    sc.Disconnect();
}

 

Querying vendor-specific system tables.

Vendor lock-in at it’s finest, and typically unsupported at that. Usefulness is dependent on what level of information is available and the amount of time you’re willing to spend figuring out what the various columns and values mean. Getting table, column, and data type names in a specific database in Sql Server Express 2005 might look like this:

use chinook
go

SELECT  T.name as TableName, 
        C.name as ColumnName,
        ST.name as DataType
FROM 
    sys.tables T 
    INNER JOIN sys.columns C on T.object_id = C.object_id
    INNER JOIN sys.systypes ST on C.system_type_id = ST.type
 
Querying standard metadata tables.

Emphasis on ‘standard’, because there really is none. There is a spec as part of the SQL standards that came along sometime around SQL92/SQL99 (not sure) which dictates the INFORMATION_SCHEMA tables which store very basic metadata about the tables, views, columns, procedures, functions, parameters, constraints, etc, of a specific relational database. Support for these is relative to how much the vendor cares about doing so. Sql Server and MySql both have pretty strong support for these. But even then, the spec is very loose and while simple queries to say, get all table names and column names, might be portable, more complex ones likely won’t be. There’s other discrepancies as well, for instance the verbiage around ‘schema’ and ‘catalog’. In MySql terms schema is analogous to a Sql Server database, and this translates down into their implementation of these metadata tables. Back to the example of tables, columns, and data types in the chinook database:

Sql Server: 
use chinook 
go 

SELECT 
    T.TABLE_NAME as TableName,    
    C.COLUMN_NAME as ColumnName, 
    C.DATA_TYPE as DataType 
FROM 
    INFORMATION_SCHEMA.TABLES T 
    INNER JOIN INFORMATION_SCHEMA.COLUMNS C ON 
        T.TABLE_NAME = C.TABLE_NAME AND 
        T.TABLE_SCHEMA = C.TABLE_SCHEMA AND 
        T.TABLE_CATALOG = C.TABLE_CATALOG

 

MySql: 
SELECT 
    T.TABLE_NAME as TableName, 
    C.COLUMN_NAME as ColumnName, 
    C.DATA_TYPE as DataType 
FROM 
    INFORMATION_SCHEMA.TABLES T 
    INNER JOIN INFORMATION_SCHEMA.COLUMNS C ON 
        T.TABLE_NAME = C.TABLE_NAME AND 
        T.TABLE_SCHEMA = C.TABLE_SCHEMA 
WHERE T.TABLE_SCHEMA = 'chinook';
 
Using the schema-related functionality in Ado.Net 2.0.

Introduced in .Net 2.0, the DbConnection base class has a method with a few overloads called GetSchema(). This is supposed to allow implementers of Ado.Net providers a way to provide metadata about the underlying connected data store. Since it’s not strongly typed, and consists of returning DataTables, it’s entirely up to the implementer as far as what information is provided. Typically calling the parameter-less version of GetSchema() would return a DataTable that describes what metadata is available, indicated by a string column called something like MetaDataCollection in which you can pass to subsequent calls to GetSchema(string) to get that specific chunk of metadata. There’s a decent write up of what’s available in the native Ado.Net providers located here. Here’s an extension method to get all the metadata into one dataset for further manipulation, which should work for the built-in Ado.Net providers, as well as MySql.Data.MysqlClient using the latest MySql Net Connector:

using System.Collections.Generic;
using System.Data;
using System.Data.Common;
using System.Linq;

namespace System.Data
{
    public static class DbConnectionExtensions
    {
        public static DataSet GetSchemaSet(this DbConnection connection)
        {
            DataSet result = new DataSet();

            DataTable collections = connection.GetSchema();

            List<string> availableCollectionNames = (from row in collections.AsEnumerable()
                                                     select row.Field<string>("CollectionName"))
                                                     .Distinct()
                                                     .ToList();

            foreach (string collectionName in availableCollectionNames)
            {
                try
                {
                    DataTable schemaTable = connection.GetSchema(collectionName);
                    result.Tables.Add(schemaTable);
                }
                catch { }
                // this is bad form and shouldn't be necessary but SqlClient seems to try to give Sql2008 metadata for 
                // Sql2005 connections which results in unneeded exceptions
            }

            return result;
        }
    }
}

The other mechanism for getting metadata is the GetSchemaTable() method on the System.Data.IDataReader interface, which provides some limited information on the metadata for the columns that exist in a particular instance of IDataReader, typically from a call to DbCommand.ExecuteReader(). Again, GetSchemaTable(), as you might have guessed, returns a DataTable, this time with structural information only relating to the columns of the IDataReader itself. This, combined with DbDataReader.GetName(int) and DbDataReader.GetDataTypeName(int) can give some pretty useful information for table and column structure returned from a select statement. That being said, unless you’re reverse-engineering result sets from stored procedures, you’ll likely find the DbConnection.GetSchema() functionality much more effective.

If you’re using the System.Data.OleDb provider, there’s the GetOleDbSchemaTable method on the OleDbConnection class which provides its own metadata about the underlying database. It works a little differently and takes a System.Guid value which refers to the type of metadata being requested, all of which are static fields in the System.Data.OleDb.OleDbSchemaGuid class.

using System.Collections.Generic;
using System.Data.OleDb;
using System.Linq;
using System.Reflection;

namespace System.Data
{
    public static class OleDbConnectionExtensions
    {
        public static DataSet GetOleDbSchemaSet(this OleDbConnection connection)
        {
            DataSet result = new DataSet();
            List<FieldInfo> guidMembers = typeof(OleDbSchemaGuid).GetFields(BindingFlags.Static | BindingFlags.Public).ToList();

            foreach (FieldInfo field in guidMembers)
            {
                if (field.FieldType == typeof(Guid))
                {
                    Guid val = (Guid)field.GetValue(null);

                    try
                    {
                        DataTable schemaTable = connection.GetOleDbSchemaTable(val, new object[] { });
                        result.Tables.Add(schemaTable);
                    }
                    catch { } // unfortunately not all schema guids supported by all oledb connections necessarily
                }
            }

            return result;
        }
    }
}
Oddities and Errata

Under .NET 3.5 (maybe SP1 only), if you execute .GetSchema() on a System.Data.SqlClient.SqlConnection connected to a Sql Server 2005, you’ll notice there's a row indicating that the metadatacollection called StructuredTypeMembers is available. However, this is Sql Server 2008-specific and will throw an exception if you call .GetSchema(“StructuredTypeMembers”) on that same Sql 2005 connection. Why Microsoft let it even be exposed during calls to .GetSchema() on non-2008 connections, who knows. They already have a precedent for exposing Sql 2008-specific metadata depending on 2005/2008 server version as indicated here so it seems pretty stupid on their part. That’s the reason for the empty catch statement in the code block above.

Using MySql Connector Net 5.2.5, the schema collection of Foreign Key Columns is available but not indicated by the resulting DataTable returned by MySqlConnection.GetSchema(). I notified Reggie Burnett, who maintains the Net Connector, and he says it’s been fixed, so I’d expect to see that in a future release.

Getting the structure of result sets from stored procedures or functions is even more difficult than any of the basic table/column structure stuff I’ve mentioned. Using Sql Server, you can use the SET FMTONLY ON statement before the stored procedure call, and SET FMTONLY OFF afterwards, in order to get an empty result set from which you can derive schema without executing against live data. However, to do that in code, you’d have to derive their parameters using some other means like GetSchema() and build a statement block with the FMTONLY lines surrounding the procedure call. Oh, and none of this works if the stored procedure uses temporary tables to return that result set. And again, it’s Sql Server specific. For MySql, the only way I’ve figured out is to:

  1. Derive the parameter information using MySqlConnection.GetSchema(“Procedure Parameters”)
  2. Construct a MySqlCommand with command text equal to the procedure name and CommandType set to CommandType.StoredProcedure
  3. Add phony parameters using the previously-derived information and set their values to DBNull.Value.
  4. Execute the command via ExecuteReader()
  5. Reverse engineer the schema using GetName(int), GetDataTypeName(int) on the MySqlDataReader, in tandem with GetSchemaTable(). Theoretically, if the information in the schema table indicates the column came from a physical table, one could go to the source and look at the values of MySqlConnection.GetSchema(“Columns”) for the source column definitions.

That’s likely how I’ll be doing it in the NDeavor providers where stored procedures are supported like Oracle and Postgres. It’s less than ideal, but if you’re still following this, you’d realize that’s my point!

Adding It Up

There’s a pattern to getting at your metadata and that pattern is it’s a pain in the ass. I’m hoping NDeavor will provide an easy means of getting database schema out of various platforms for the purposes of artifact-generation. But even so, I’m not exactly planning on writing each NDeavor provider as its own one-off implementation because there’s such inconsistency between metadata across platforms. I have a way forward for a more lower-level means of getting at this stuff that solves a lot of the cross-cutting discrepancies, and that will be in part 2.


 
Categories: Ado.Net | database | metadata | MySql | NDeavor | Sql Server