Archive

Archive for the ‘Uncategorized’ Category

Grasp, A .NET Analysis Engine: GitHub

April 19th, 2012 No comments

In part 9 we wrapped up the initial implementation of Grasp, focusing on the specification, compilation, and execution of calculations. I have ambitious plans for Grasp from here but this is a natural stopping point for the introductory series.

In the meantime, the source is on GitHub, including a test suite that provides some insight into consuming the API. There is also a NuGet package containing ready-to-reference assemblies. All future development will be available via these channels.

Thanks for reading and happy Grasping!

Tags: , ,

Grasp, A .NET Analysis Engine – Part 9: Dependency Sorting

March 17th, 2012 4 comments

In part 8, we completed the GraspCompiler class and set ourselves up to sort calculations in the order required by their dependencies. In this post, we will implement the sorting.

Interdependent Calculations

Here is an example of a set of calculations with dependencies between them:

Dependencies

A, B, and C are output variables for calculations, and the arrows denote dependencies. They extend from the calculation which needs the data to the calculation which produces it. Analyzing this setup, we see that:

  • B has no dependencies on any calculations
  • C depends on B
  • A depends on both B and C
    In order to get correct results, we must execute these calculations such that variables are available before they are needed. In order to calculate A, we first need to calculate B and C. In order to calculate C, we must first calculate B. This means the order in which we should execute the calculations is B, then C, then A.
    We can determine this order by applying a little graph theory to our calculations. This might sound imposing, but we are going to limit ourselves to very basic concepts and one well-documented algorithm. Specifically, we are going to treat the setup above as a directed acyclic graph, where each node represents a calculation and the arrows represent dependencies.
    The first step is to create a way to manipulate the structure above, known as a graph. We will represent each node in the graph, then the graph itself. Once we have that data structure in place, we will use a straightforward algorithm called a topological sort to order the nodes such that calculations which produce data occur before calculations which need that data.

Nodes

Each node in a dependency graph represents a single calculation and all of its dependencies. The graph above has these nodes:

  • A {B, C}
  • B {}
  • C {B}
    This is a more formal statement of the same observations we made before. We can create a class to represent this data structure:
internal sealed class DependencyNode
{
  internal DependencyNode(
    CalculationSchema calculation,
    IEnumerable<CalculationSchema> dependencies)
  {
    Calculation = calculation;
    Dependencies = dependencies.ToList().AsReadOnly();
  }

  internal CalculationSchema Calculation { get; private set; }

  internal ReadOnlyCollection<CalculationSchema> Dependencies { get; private set; }
}

The next step is to create a set of these nodes from a set of calculations. To do this, we need to pair every calculation with every other calculation and determine if there is a dependency between each pair. Our example would produce these comparisons:

Pairing Dependency?
A –> B Yes
A –> C Yes
B –> A No
B –> C No
C –> A No
C –> B Yes

We check both directions of each pairing because eventually we will guard against cycles. For example, if A depends on B, B depends on C, and C depends on A, there is no way to execute that set of calculations due to an infinite loop. Any graph with a cycle will result in a compilation error from Grasp.

To create the nodes, we add a method to the DependencyAnalyzer class defined in part 8:

private static IEnumerable<DependencyNode>
  GetNodes(IEnumerable<CalculationSchema> calculations)
{
  return
    from calculation in calculations
    let dependencies =
      from possibleDependency in calculations
      where possibleDependency != calculation
      where IsDependency(calculation, possibleDependency)
      select possibleDependency
    select new DependencyNode(calculation, dependencies);
}

We create a LINQ query which selects all of the calculations in the sequence, then for each one does the same thing but filters out the one we are already considering. This gives us all calculation pairs in both directions. We check the original calculation against each possible dependency to determine if there is an actual dependency between them:

private static bool IsDependency(
  CalculationSchema calculation,
  CalculationSchema possibleDependency)
{
  return calculation.Variables.Contains(possibleDependency.OutputVariable);
}

This is simply a matter of checking whether the possible dependency’s output variable is referenced by the calculation. This repeated Contains call is why we made the CalculationSchema.Variables property a HashSet<> instead of a List<>.

Graph

Once we have all the nodes in hand, we can represent the entire graph:

internal sealed class DependencyGraph
{
  private readonly Dictionary<CalculationSchema, DependencyNode> _nodes;

  internal DependencyGraph(IEnumerable<DependencyNode> nodes)
  {
    _nodes = nodes.ToDictionary(node => node.Calculation);
  }

  internal IEnumerable<CalculationSchema> OrderCalculations()
  {
    return new TopologicalSort(this).SortNodes().Select(node => node.Calculation);
  }
}

We create a dictionary which associates each node with its calculation. This will be important later when we want to look up the nodes on which a node depends, as there is no direct association between nodes; the DependencyNode.Dependencies property is expressed in terms of calculations, not nodes.

The OrderCalculations method is the public API of our graph class. It creates a topological sort (discussed below), sorts the nodes in the graph, then grabs the calculation from each. The result is the ordered set of calculations, which we use to implement the DependencyAnalyzer.OrderByDependency method we left unfinished in part 8:

internal static IEnumerable<CalculationSchema>
  OrderByDependency(this IEnumerable<CalculationSchema> calculations)
{
  return GetGraph(calculations).OrderCalculations();
}

private static DependencyGraph
  GetGraph(IEnumerable<CalculationSchema> calculations)
{
  return new DependencyGraph(GetNodes(calculations));
}

This creates a graph by calling the GetNodes method we defined above, then returns the ordered set of calculations to be compiled by GraspCompiler. This set of methods is simply how we weave the data and algorithm together; the truly interesting logic is in the sorting itself.

Topological Sorting

A topological sort is a way of ordering a set of nodes such that less-dependent nodes appear first and the more-dependent nodes appear later. Applying this sort to our example would yield the calculations in the order B, C, A.

The general idea is to start from each node in the graph and walk through every path described by its dependencies. Each time we visit a node we haven’t seen before, we walk through its dependencies as well. Only after visiting all of a node’s dependencies do we add it to a list that contains the sorted nodes.

The end result is that we find all of the leaf nodes first (those without any dependencies) and add those to the list initially. After that, the nodes that depend on the leaf nodes get added to the list, then the ones that depend on those, etc., until we have added all the nodes to the list. As we visit leaf nodes first and work our way back from there, this is a depth-first search.

Let’s apply this to our example:

  • A –> {B, C}
  • B –> {}
  • C –> {B}
A     First visit – visit dependencies  
  B   First visit – no dependencies to visit Add B to list
  C   First visit – visit dependencies  
    B Already visited Add C to list
        Add A to list
B     Already visited  
C     Already visited  

After the algorithm runs, the list contains the nodes B, C, and A, as expected.

Detecting Cycles

A cycle is a set of nodes which are all interdependent:

Circular dependencies

There is no valid topological sort for a graph with even a single cycle. The reason is clear: where would we start, and where would we end? This is a form of an infinite loop which would be useful to detect. Grasp should make it easy to find and fix calculation cycles.

We can modify the topological sort to encompass this new requirement. The trick is to keep track of every set of nodes visited in the context of a root node; if we see the same node twice, we have identified a cycle. (A root node is one which we are visiting on its own, outside the context of any other node. In the example above, the leftmost column contains the root nodes.)

Let’s apply this to our circular example:

  • A {C}
  • B {A}
  • C {B}
A       First visit – visit dependencies
  C     First visit – visit dependencies
    B   First visit – visit dependencies
      A Already visited in context of A – cycle detected

When we see A again, we know that some part of the graph is cyclical. We stop sorting nodes and raise an error containing the repeated node and all those above it. This gives schema designers plenty of debugging information.

Visit History

We have identified two pieces of context while sorting nodes:

  • All nodes we have visited
  • All nodes within the current root node
    We can pair this data and logic via a class representing the visit history, nesting it privately within DependencyGraph:
private sealed class VisitHistory
{
  private HashSet<DependencyNode> _visitedNodes = new HashSet<DependencyNode>();
  private HashSet<DependencyNode> _visitedNodesFromRoot;
  private List<DependencyNode> _visitedNodesFromRootInOrder;

  internal void OnVisitingRootNode()
  {
    _visitedNodesFromRoot = new HashSet<DependencyNode>();
    _visitedNodesFromRootInOrder = new List<DependencyNode>();
  }

  internal bool OnVisitingNode(DependencyNode node)
  {
    if(_visitedNodesFromRoot.Contains(node))
    {
      throw new CalculationCycleException(_visitedNodesFromRootInOrder, node);
    }

    var firstVisit = !_visitedNodes.Contains(node);

    if(firstVisit)
    {
      _visitedNodes.Add(node);

      _visitedNodesFromRoot.Add(node);
      _visitedNodesFromRootInOrder.Add(node);
    }

    return firstVisit;
  }
}

The first method signals we are starting a visit of a root node. We create a new set to track the nodes we visit underneath it.

The second method signals that we are visiting some node in the graph. The first thing we do is check whether we have visited the same node in the context of the current root node; if so, we have detected a cycle and stop the sort by throwing an exception.

After that, we determine if we have seen the node before. If not, we track in the overall visited node set as well as the set of nodes under the current root node. We also keep an ordered list so we can provide the exact cycle sequence (sets have an undefined order). Finally, we return whether this is the first visit, since we will use that to determine whether we visit the node’s dependencies.

Algorithm

The TopologicalSort class is also private to the DependencyGraph class. It exposes the SortNodes method we used to implement the DependencyGraph.OrderCalculations method above:

private sealed class TopologicalSort
{
  private readonly VisitHistory _visitHistory = new VisitHistory();
  private readonly List<DependencyNode> _sortedNodes = new List<DependencyNode>();
  private readonly DependencyGraph _graph;

  internal TopologicalSort(DependencyGraph graph)
  {
    _graph = graph;
  }

  internal IEnumerable<DependencyNode> SortNodes()
  {
    foreach(var rootNode in _graph.GetNodes())
    {
      _visitHistory.OnVisitingRootNode();

      VisitNode(rootNode);
    }

    return _sortedNodes;
  }

  private void VisitNode(DependencyNode node)
  {
    var firstVisit = _visitHistory.OnVisitingNode(node);

    if(firstVisit)
    {
      foreach(var dependencyNode in _graph.GetDependencyNodes(node))
      {
        VisitNode(dependencyNode);
      }

      _sortedNodes.Add(node);
    }
  }
}

We create a visit history to track visits during the sort, and a list to contain the sorted nodes. The SortNodes method implements the algorithm we described above: it iterates through all of the nodes in the graph, signals the history for each of them, and visits them. It simply returns the list of sorted nodes when done.

The VisitNode method is the workhorse. It first signals to the history that it is visiting a node; it receives in response a flag indicating whether the node is being visited for the first time. If so, it gets the nodes for each of the current node’s dependencies and visits those as well. Only after all of the dependencies are visited does it add the current node to the list of sorted nodes. This recursion implements the depth-first search as described earlier.

TopologicalSort uses two methods we haven’t yet defined on DependencyGraph: GetNodes and GetDependencyNodes. These are fairly straightforward:

private IEnumerable<DependencyNode> GetNodes()
{
  return _nodes.Values;
}

private IEnumerable<DependencyNode> GetDependencyNodes(DependencyNode node)
{
  return node.Dependencies.Select(dependency => _nodes[dependency]);
}

GetNodes gets all of the values in the calculation->node dictionary. GetDependencyNodes translates the values in the Dependencies property, which are calculations, into the nodes associated with those calculations.

Summary

We identified a data structure that can represent dependencies between calculations: the graph. We turned a set of calculations into a set of nodes and sorted them according to the well-known topological sort algorithm. We also detected cycles and reported all relevant information so Grasp users can debug their schemas. We ultimately produced the compiled calculations, in order of dependency, that GraspRuntime applies to a set of variables.

That’s it! We have seen everything that goes into defining, compiling, and executing a set of calculations on a data set. This is the starting point for a family of application types which otherwise require lots of custom code; it frees developers to worry about business problems instead of the mechanics of analysis.

I plan on putting the entire codebase on GitHub and posting the URI soon. I also want to create some usage examples and discuss future possibilities (an example: frame a UI as a set of variables, validation rules as a set of boolean-valued calculations, and Grasp would do nicely at the core of a widely-applicable validation system.)

It promises to be a fun ride. And if you have read this far, thanks for indulging me!

Tags: , ,

Grasp, A .NET Analysis Engine – Part 8: Calculation Dependencies

March 11th, 2012 No comments

In part 7, we compiled individual calculations into an executable code in the form of a delegate. In this post, we will take the next step and compile all of the calculations associated with a GraspSchema.

Cascades

The key difference in compiling multiple calculations is that there may be dependencies between them. A calculation can reference variables, and since variables may represent the results of other calculations, it is possible to have a cascading effect where the output of one calculation turns into the input of another. In this scenario, we need to execute the calculations in the right order to guarantee the correct result.

We can extend our OperatingProfit example to demonstrate this. Let’s say we want to calculate NetProfit, which applies a known tax rate to the OperatingProfit figure. We could use a set of calculations that look like this (namespaces omitted for clarity):

OperatingProfit = TotalIncome – TotalExpenses

NetProfit = OperatingProfit * (1 – TaxRate)

Here, OperatingProfit obviously needs to be available before NetProfit is calculated. We call a cross-calculation reference like this a dependency; we say that NetProfit is dependent upon OperatingProfit. Compiling a set of calculations requires us to identify these dependencies and order the calculations so they are all satisfied.

Completing the Compiler

In part 6, we left one piece of unfinished business: the GraspCompiler.Compile method.  We took a detour in part 7 to lay the groundwork for compiling calculations; we can now complete the implementation of Compile:

internal GraspExecutable Compile()
{
  ValidateCalculations();

  return new GraspExecutable(_schema, GetCalculator());
}

We create an instance of GraspExecutable, defined in part 5, and provide it the instance of GraspSchema we are compiling. We also provide a calculator, which is what we call an instance of ICalculator. This second argument is the output of compiling the set of calculations associated with the schema.

The core method on which we build GetCalculator is an overload which takes a CalculationSchema, defined in part 6. This is where we use the CalculationCompiler class, defined in part 7, to create a function which applies a single calculation to a runtime:

private static ICalculator GetCalculator(CalculationSchema schema)
{
  return new CalculationCompiler().CompileCalculation(schema);
}

This visits all of the node in the calculation expression, replaces them with calls to retrieve their values instead, and returns a function which applies the calculation to a runtime. This is the unit of a compiled GraspSchema.

The GetCalculator overload with no parameters is responsible for taking all of the calculations and producing a single calculator which applies them. The first thing we do is attempt to optimize a simple scenario: a schema with a single calculation, by definition, cannot have any dependencies. In this case, we can just create a calculator for it; otherwise, we need to create a calculator which applies a set of calculations:

private ICalculator GetCalculator()
{
  return _calculations.Count == 1
    ? GetCalculator(_calculations.Single())
    : GetCalculators();
}

The GetCalculators method produces an implementation of ICalculator which applies a set of calculators in order. We can use the CompositeCalculator class here, defined in part 4:

private ICalculator GetCalculators()
{
  return new CompositeCalculator(OrderCalculatorsByDependency());
}

It encapsulates the individual calculators we create for each calculation, ordered by dependency:

private IEnumerable<ICalculator> OrderCalculatorsByDependency()
{
  return _calculations.OrderByDependency().Select(GetCalculator);
}

We order the _calculations sequence, defined in part 6, by dependency, then for each one select its calculator using the GetCalculator method. This produces the sequence we pass to the CompositeCalculator. (The syntax works because the C# compiler can infer that the GetCalculator method has the signature Func<CalculationSchema, ICalculator> of the parameter expected by  the Select method. This is a simpler syntax than writing out the equivalent lambda expression schema => GetCalculator(schema).)

OrderByDependency is an extension method which operates on a sequence of calculation schemas and returns the same thing. This is similar to the LINQ OrderBy methods, except there is no function parameter because we are encapsulating the sorting logic:

internal static class DependencyAnalyzer
{
  internal static IEnumerable<CalculationSchema>
    OrderByDependency(this IEnumerable<CalculationSchema> calculations)
  {
    // Next time
  }
}

This is the entry point to analyzing the dependencies between calculations. We are set up nicely to do the analysis, but it is a decent amount of code and deserves a post of its own.

Summary

We identified the concept of cross-calculation dependencies and determined that we must order the calculations so all variable values are available when needed. We also finished the implementation of GraspCompiler and set up a context in which we can perform the ordering.

Next time, we will complete the dependency analysis logic.

Continue to Part 9: Dependency Sorting

Tags: , ,

Grasp, A .NET Analysis Engine – Part 7: Compiling Calculations

March 4th, 2012 No comments

In part 6, we started to define the compilation process and determined how to find all variable references in a calculation’s expression tree. In this post, we will see how to generate executable code for a calculation in the form of a delegate.

Rewriting Calculation Expressions

As discussed in part 6, expression trees don’t inherently know what variable nodes mean, only Grasp does. The first step we need to take in compiling an expression is to transform variable references into something meaningful. For example, consider the example calculation from part 3 (namespaces omitted for clarity):

OperatingProfit = TotalIncome – TotalExpenses

Here, OperatingProfit is the output variable and TotalIncome – TotalExpenses is the expression tree that produces the result. It looks like this:

 

calculation-before-rewrite

 

Our goal is to locate the nodes that represent TotalIncome and TotalExpenses and completely replace them with different nodes that represent how to access their values. The variables are merely placeholders for more complex logic. This is known as rewriting an expression tree.

We already know how to ask for variable values: the GraspRuntime.GetVariableValue method. Thus, given a parameter of type GraspRuntime, we simply need to call GetVariableValue and pass in the variable represented by the node. (We will worry about where we get the runtime parameter later; for now, assume we have one in scope.)

This means we will turn each variable node turn into the appropriate method call. The resulting code will be:

runtime.GetVariableValue(TotalIncome) – runtime.GetVariableValue(TotalExpenses)

The data structure that represents this expression looks like:

 

calculation-after-rewrite

 

We replaced each VariableExpression with a MethodCallExpression that invokes the GetVariableValue method of the runtime parameter, packaging the corresponding variable in a constant and passing it as the single argument. We now have a data structure that represents fully-executable code and can be compiled to a delegate we can invoke.

Visiting Variables

In order to perform the rewrite, we will create another implementation of CalculationExpressionVisitor:

internal sealed class CalculationCompiler : CalculationExpressionVisitor
{
  internal CalculationFunction CompileCalculation(CalculationSchema schema)
  {
    
  }
}

It encapsulates the transformation of a CalculationSchema (defined in part 6) to a CalculationFunction (defined in part 4). In essence, it takes a calculation expression and compiles a Func<GraspRuntime, object> representing a method that takes a runtime parameter (for variable value lookups) and returns the calculated value.

The first step is to get a reference to GraspRuntime.GetVariableValue:

private static readonly MethodInfo _getVariableValueMethod =
  typeof(GraspRuntime).GetMethod(
    "GetVariableValue",
    BindingFlags.Public | BindingFlags.Instance);

We use reflection to get an instance of MethodInfo representing GetVariableValue, specifying that we want the public instance method of that name. We make the variable static so we only pay the reflection tax once per application domain, no matter how many instances of CalculationCompiler we create.

The next step is to define the parameter representing the runtime on which we make calls to GetVariableValue (using the Expression.Parameter factory method):

private readonly ParameterExpression _runtimeParameter =
  Expression.Parameter(typeof(GraspRuntime), "runtime");

With the GetVariableValue method and runtime parameter in hand, we can define a method which turns a VariableExpression into the corresponding method call (using the Expression.Call and Expression.Constant factory methods):

private Expression GetGetVariableValueCall(VariableExpression variableNode)
{
  return Expression.Call(
    _runtimeParameter,
    _getVariableValueMethod,
    Expression.Constant(variableNode.Variable));
}

Now we can override the VisitVariable method and define what happens whenever we see a VariableExpression in the tree:

protected override Expression VisitVariable(VariableExpression node)
{
  return Expression.Convert(GetGetVariableValueCall(node), node.Variable.Type);
}

We get the call to GetVariableValue for the variable, then cast the result to the variable’s type (using the Expression.Convert factory method). This is necessary because expression trees are type-safe, but GetVariableValue returns object. Luckily, we have easy access to the variable’s type.

Compiling the Rewritten Expression

We have defined the process of rewriting variable nodes as corresponding calls to GetVariableValue. This will produce expressions that look like the second figure above. Now we can implement the CompileCalculation method:

internal FunctionCalculator CompileCalculation(CalculationSchema schema)
{
  var body = schema.Expression;

  try
  {
    body = Visit(body);

    return new FunctionCalculator(schema.OutputVariable, CompileFunction(body));
  }
  catch(Exception ex)
  {
    throw new CalculationCompilationException(schema, body, ex);
  }
}

First, we ask the base class to visit the expression represented by the calculation. This will walk through the entire tree, rewriting variable nodes whenever they are encountered. Once we have the rewritten expression, we call the CompileFunction method, which turns it into a Func<GraspRuntime, object> delegate; this is the executable form of the calculation. We enclose the process in a try/catch so we can provide detailed error information in the case that a calculation’s expression is invalid.

Here is the lambda expression for our example in C# syntax:

runtime =>

  runtime.GetVariableValue(TotalIncome) – runtime.GetVariableValue(TotalExpenses)

This is the same expression we saw before, but now we have defined the runtime parameter. This is conceptually an inline method; it has a set of parameters and a body. Code defined as expression trees must take the form of a method so we can invoke them. In expression trees, lambda expressions are what represent method definitions.

We can define a lambda expression with the Expression.Lambda factory method, passing in the body (the rewritten expression), the runtime parameter we defined earlier, and the type of delegate we want to create. Finally, we can call Compile to have .NET dynamically generate a method that executes the code represented by the expression tree:

private Func<GraspRuntime, object> CompileFunction(Expression body)
{
  if(body.Type != typeof(object))
  {
    body = Expression.Convert(body, typeof(object));
  }

  var lambda = Expression.Lambda<Func<GraspRuntime, object>>(
    body,
    _runtimeParameter);

  return lambda.Compile();
}

We do a little bookkeeping to ensure that the lambda body returns object instead of the variable’s type; for C# source code, the compiler would infer this for us, but since we are building an expression tree by hand, we need to be explicit.

What we have at the end of CompileFunction is a delegate that wraps a method created by .NET to execute exactly the code represented by the rewritten expression. The beauty of this system is that the delegate can contain code of arbitrary complexity; anything that represents a valid expression tree can be used to define a calculation. Combined with the ability to use variables anywhere within an expression tree, Grasp supports any conceivable logic that operates on a data set.

Summary

We defined the transformation from variable nodes to nodes which access their values. We also created a visitor which performs the replacement and compiles the rewritten expression to executable code.

Next time, we will tackle a more gnarly problem: dependencies between calculations.

Continue to Part 8: Calculation Dependencies

Tags: , ,

Grasp, A .NET Analysis Engine – Part 6: Validating Calculations

March 3rd, 2012 No comments

In part 5, we saw how to create runtime instances by providing an initial set of values to an executable. In this post, we will look at the first step in creating an executable from a schema: validating that its calculations are semantically correct.

Foundation

The GraspCompiler class is the context in which validation and compilation takes place. Based on what we saw in part 5, we would expect it to look like this:

internal sealed class GraspCompiler
{
  private readonly GraspSchema _schema;

  internal GraspCompiler(GraspSchema schema)
  {
    _schema = schema;
  }

  internal GraspExecutable Compile()
  {
    // Not quite yet…
  }
}

It is internal because we don’t want to expose it as part of the public API (we do that through the GraspSchema.Compile method). It is sealed because Grasp has a single definition of compilation and is not intended for extension. It is an implementation detail. If in the future we decide it should be a base class, that decision will be easier because we did not expose it publicly. This is true of all classes involved in the compilation process.

The Compile method is blank for now. This is the basic skeleton, but before we flesh it out we need to lay some groundwork. Specifically, to we compile a set of calculations, we must be able to compile a single calculation.

Variable References

The defining characteristic of a calculation expression is that it contains nodes which are instances of the VariableExpression class (defined in part 2). Expression trees have no idea what these nodes mean; we grafted them on to represent a concept that only Grasp understands. This means we are going to need to do something with them before we can turn expressions into executable code.

In order to do meaningful work with the variables nodes, we first need to find them. Expression trees are complex beasts; they can describe any .NET expression you can dream up, which might be a massive number of nodes. How do we locate variables in all of that?

Luckily, .NET gives us the ExpressionVisitor class. Its job is to sift through all nodes in an expression and give us a chance to inspect them. If a node has child nodes, it will sift through those as well. For example, an operator references expressions for its left and right operands, and a method call may reference expressions for its arguments. The knowledge of how to visit each kind of node and its children is baked into the ExpressionVisitor base class; all we need to do is derive from it and override its methods.

To add support for variables, we can extend ExpressionVisitor with a base class that adds a single method for visiting VariableExpression nodes:

internal abstract class CalculationExpressionVisitor : ExpressionVisitor
{
  public override Expression Visit(Expression node)
  {
    return node.NodeType == VariableExpression.ExpressionType
      ? VisitVariable((VariableExpression) node)
      : base.Visit(node);
  }

  protected virtual Expression VisitVariable(VariableExpression node)
  {
    return node;
  }
}

We override the method which visits any given expression and check its node type; if it is a variable, we allow derived classes to inspect it via the VisitVariable method. Otherwise, we let the base class determine how to visit the node. This gives us a context in which we can process calculation expressions.

Finding Variables

The most basic use of CalculationExpressionVisitor is to find all of the variables referenced by a calculation:

internal sealed class VariableSearch : CalculationExpressionVisitor
{
  private ISet<Variable> _variables;

  internal ISet<Variable> GetVariables(Calculation calculation)
  {
    _variables = new HashSet<Variable>();

    Visit(calculation.Expression);

    return _variables;
  }

  protected override Expression VisitVariable(VariableExpression node)
  {
    _variables.Add(node.Variable);

    return node;
  }
}

We create an implementation which exposes a single method named GetVariables; it takes a calculation and returns all of the variables in its expression. It does this by passing the expression to the same Visit method we overrode in CalculationExpressionVisitor, then keeping track of every variable it encounters. This is an incredibly small amount of code to walk any arbitrary expression for a calculation; once again, thanks .NET!

Validating a Calculation

Now that we know how to determine all of the unique variables referenced by a calculation, we can put that information to use in validating its structure is correct:

  • All referenced variables must exist in the schema
  • The result must be assignable to the output variable
    In order to make these assessments, we first need to associate a calculation with all of its referenced variables. We can call this pairing the schema of a calculation:
internal sealed class CalculationSchema
{
  private readonly Calculation _calculation;

  internal CalculationSchema(Calculation calculation)
  {
    _calculation = calculation;

    Variables = new VariableSearch().GetVariables(calculation);
  }

  internal Expression Expression
  {
    get { return _calculation.Expression; }
  }

  internal Variable OutputVariable
  {
    get { return _calculation.OutputVariable; }
  }

  internal ISet<Variable> Variables { get; private set; }
}

We expose the existing elements of a calculation; we also use our VariableSearch visitor to find all of the referenced variables and expose them. This is the general usage pattern for a visitor: instantiate and use one whenever needed. A visitor encapsulates some algorithm, exposes a single entry point, and is most often used a single time.

With the ability to find all referenced variables, we can begin to flesh out the compiler. The first thing we do is create schemas for each of the calculations:

internal sealed class GraspCompiler
{
  private readonly GraspSchema _schema;
  private readonly ISet<Variable> _variables;
  private readonly IList<CalculationSchema> _calculations;

  internal GraspCompiler(GraspSchema schema)
  {
    _schema = schema;

    _calculations = schema.Calculations.Select(
      calculation => new CalculationSchema(calculation)).ToList();

    var effectiveVariables = schema.Variables.Concat(
      _calculations.Select(calculation => calculation.OutputVariable));

    _variables = new HashSet<Variable>(effectiveVariables);
  }

  internal GraspExecutable Compile()
  {
    // Not quite yet…
  }
}

We also determine the effective set of variables, which includes the variables in the schema as well as all of the calculations’ output variables. By automatically including the output variables, Grasp users don’t have to explicitly include them in the variables they pass to the schema.

Next, we validate all of the calculations before we continue the compilation process:

private void ValidateCalculations()
{
  foreach(var calculation in _calculations)
  {
    EnsureVariablesExistInSchema(calculation);

    EnsureAssignableToOutputVariable(calculation);
  }
}

Ensuring all of a calculation’s variables are part of the GraspSchema we are compiling is straightforward. The key here is that we created a HashSet<> in the constructor to hold its variables, increasing performance during repeated lookups:

private void EnsureVariablesExistInSchema(CalculationSchema calculation)
{
  foreach(var variable in calculation.Variables)
  {
    if(!_variables.Contains(variable))
    {
      throw new InvalidCalculationVariableException(calculation, variable);
    }
  }
}

We also ensure that the result of a calculation’s expression can be assigned to its output variable by using the Type.IsAssignableFrom method:

private void EnsureAssignableToOutputVariable(CalculationSchema calculation)
{
  var variableType = calculation.OutputVariable.Type;
  var resultType = calculation.Expression.Type;

  if(!variableType.IsAssignableFrom(resultType))
  {
    throw new InvalidCalculationResultTypeException(calculation);
  }
}

These conditions guards against the various calculation-related errors that might occur(expression trees take care of validating the structure of the code they represent). We throw custom exception types to facilitate better reporting to consumers of the API. This will also be very useful when we create a UI for building and compiling runtimes (more on that later).

Summary

We created the foundation of the compilation process, GraspCompiler. We also added some infrastructure for visiting variable nodes in expression trees and created a visitor which finds all variable references in a calculation. We then validated that each calculation’s structure is correct.

Next time, we will finally turn a calculation’s expression into executable code.

Continue to Part 7: Compiling Calculations

Tags: , ,

Grasp, A .NET Analysis Engine – Part 5: Executable

March 1st, 2012 No comments

In part 4, we started outlining the execution of the Grasp engine by defining elements for representing a system’s schema and runtime. In this post, we take another step toward generating runtimes from a schema.

Between Schema and Runtime

A schema represents the raw ingredients for a runtime: the variables and the calculations which apply to them. However, the calculations are in the form of expression trees, which are just data structures; we cannot use them to actually carry out the logic they represent. We need some notion of a compiler to turn the expression trees into something executable.

In part 4, we defined the ICalculator interface, which exposes the ability to operate on a GraspRuntime to perform a calculation. Implementations of this interface, specifically FunctionCalculator, would be the output of our hypothetical compiler. By associating an instance of ICalculator with the schema from which it originated, we get the executable form of a system:

public class GraspExecutable
{
  public GraspExecutable(GraspSchema schema, ICalculator calculator)
  {
    Contract.Requires(schema != null);
    Contract.Requires(calculator != null);

    Schema = schema;
    Calculator = calculator;
  }

  public GraspSchema Schema { get; private set; }

  public ICalculator Calculator { get; private set; }
}

This represents the potential to run calculations, but without any specific data. This is much like a program executable, which defines the potential to run the program but is not an instance of that program.

Now that we’ve defined executables, we can add the ability to compile to them right on the GraspSchema class we defined in part 2:

public class GraspSchema
{
  // …

  public GraspExecutable Compile()
  {
    return new GraspCompiler(this).Compile();
  }
}

We will explore the GraspCompiler class later; the key takeaway here is that the compilation process takes a schema as input and produces an executable as output. If we replace "schema" with "source files", we would be describing the traditional definition of a compiler. Modeling Grasp after this well-known process lets us leverage existing concepts and language to facilitate understanding.

Generating Runtimes

The defining attribute of an executable is the ability to create instances of itself. Each of these instances is called a runtime (as defined in part 4). What differentiates one runtime from another is the data that lives within; for example, two students taking the same test will have different sets of answers, thus requiring each to have a separate runtime.

This implies that, in order to generate a runtime, we must seed it with its own specific data. This might be persistent data if a user saved a test or survey to come back later; it could also simply be the default values for each variable. In any case, we need a mechanism that encapsulates the mapping of an executable’s variables to their initial values:

public interface IRuntimeSnapshot
{
  object GetValue(Variable variable);
}

This straightforward interface represents the state of a runtime at a given point; we can use it to initialize the variable bindings of a new runtime. To do this, we add the GetRuntime method to the GraspExecutable class:

public GraspRuntime GetRuntime(IRuntimeSnapshot initialState)
{
  Contract.Requires(initialState != null);

  return new GraspRuntime(Schema, Calculator, GetBindings(initialState));
}

private IEnumerable<VariableBinding> GetBindings(IRuntimeSnapshot initialState)
{
  return Schema.Variables.Select(
    variable => new VariableBinding(variable, initialState.GetValue(variable)));
}

This is how we create instances of executables for a specific data set. The entire workflow for creating a runtime and apply calculations, then, looks like:

var schema = new GraspSchema(…variables and calculations…);

var executable = schema.Compile();

var runtime = executable.GetRuntime(…initial state…);

runtime.ApplyCalculations();

This is Grasp’s external API. In a typical application, we would compile the schema into an executable once, then use it to generate many runtimes. Following the student/test example, an application may take a test defined in XML, build the schema, compile it, and store it at the application level. Then, we would get a new runtime for each student which takes the test, sandboxing their data, but only pay the performance tax for compiling the schema a single time.

Summary

We identified the need for an executable form of a schema and added the ability to create instances of it called runtimes. We then created an abstraction that maps variables to their initial values and saw the process of generating runtimes from a schema.

Next time, we will look at GraspCompiler and see how it turns Calculation objects into executable code.

Continue to Part 6: Validating Calculations

Tags: , ,

Grasp, A .NET Analysis Engine – Part 4: Runtime

February 28th, 2012 No comments

In parts 2 and 3 we laid out the initial elements which model the analysis of a data set: variables and calculations. In this post, we will start to define a context in which calculations can operate on variables.

Schema

As stated in part 2, we can describe the set of all variables known to a system as its schema. However, that isn’t totally accurate: just as a database schema has tables and logic, such as stored procedures, a Grasp schema is really the set of all variables and calculations that apply to them. We can model this with a collection of each:

public class GraspSchema
{
  public GraspSchema(
    IEnumerable<Variable> variables,
    IEnumerable<Calculation> calculations)
  {
    Contract.Requires(variables != null);
    Contract.Requires(calculations != null);

    Variables = variables.ToList().AsReadOnly();
    Calculations = calculations.ToList().AsReadOnly();
  }

  public ReadOnlyCollection<Variable> Variables { get; private set; }

  public ReadOnlyCollection<Calculation> Calculations { get; private set; }
}

An instance of this class describes the entire context in which the variables and calculations are relevant. This could be a survey, a math test, a quality checklist, or a college’s financial report. A schema is the container for a system’s design; it is the static portion of the engine.

Runtime

The dynamic portion of the engine is where we associate values with variables and actually apply calculations. This is known as the runtime. It is the context in which values live and change; we can consider a runtime as a particular instance of a system described by GraspSchema. For example, if a schema represents a math test, then a runtime would represent an instance of that test for a particular student.

The core concept within the runtime is the binding of variables to data. Binding is a highly overloaded term in the field of software, but in this case, it simply means giving a variable a value:

public class VariableBinding
{
  public VariableBinding(Variable variable, object value)
  {
    Contract.Requires(variable != null);

    Variable = variable;
    Value = value;
  }

  public Variable Variable { get; private set; }

  public object Value { get; set; }
}

We give the Value property a public setter so the runtime can change its value as it applies calculations. A runtime simply associates a schema with a set of these bindings:

public class GraspRuntime
{
  private readonly Dictionary<Variable, VariableBinding> _bindingsByVariable;

  public GraspRuntime(GraspSchema schema, IEnumerable<VariableBinding> bindings)
  {
    Contract.Requires(schema != null);
    Contract.Requires(bindings != null);

    Schema = schema;

    _bindingsByVariable = bindings.ToDictionary(binding => binding.Variable);
  }

  public GraspSchema Schema { get; private set; }
}

This establishes the base state of the runtime. We index the bindings by their associated variables so that we can perform efficient lookups when applying calculations. In order to find the binding associated with a variable, we use the TryGetBinding method:

private VariableBinding TryGetBinding(Variable variable)
{
  VariableBinding binding;

  _bindingsByVariable.TryGetValue(variable, out binding);

  return binding;
}

Here, we use the TryGetValue method of the dictionary to look up the binding. If it is not found, binding will be assigned null (this is in contrast to using the indexer, which would throw an exception if the key is not found.) We use the null sentinel value to create bindings for unbound variables:

public void SetVariableValue(Variable variable, object value)
{
  Contract.Requires(variable != null);

  var binding = TryGetBinding(variable);

  if(binding != null)
  {
    binding.Value = value;
  }
  else
  {
    binding = new VariableBinding(variable, value);

    _bindingsByVariable[variable] = binding;
  }
}

This is a simple update or on-demand initialization of a binding. This allows the runtime to grow along with calculations, as we may not have values for variables until they are calculated (such as Acme.Bookstore.OperatingProfit).

The other side of the coin from setting values is getting values:

public object GetVariableValue(Variable variable)
{
  Contract.Requires(variable != null);

  var binding = TryGetBinding(variable);

  if(binding == null)
  {
    throw new UnboundVariableException(variable);
  }

  return binding.Value;
}

Here, we also attempt to retrieve the binding associated with the variable. However, we throw an exception when the variable is not bound. This highlights algorithm errors which may reference a variable before its calculation or otherwise be expecting a variable which is not known to the runtime.

Calculators

We have discussed how to model variables within a runtime. Next, we move on to calculations. A calculation is the process of reading variables from the data set, performing some computation, and setting an output variable with the result. We can model this with an interface that represents an operation on a runtime instance:

public interface ICalculator
{
  void ApplyCalculation(GraspRuntime runtime);
}

This encapsulates all logic associated with reading and updating variables. With this abstraction in place, we can add an instance of it to GraspRuntime and expose a method which invokes it:

public ICalculator Calculator { get; private set; }

public void ApplyCalculations()
{
  Calculator.ApplyCalculation(this);
}

We also need to add it to the constructor of GraspRuntime (not shown). The ApplyCalculations method simply asks the calculator to apply its calculations to the runtime which owns it. Obviously we will need to apply multiple calculations to a particular runtime, so first we create a composite implementation of ICalculator to do so:

public sealed class CompositeCalculator : ICalculator
{
  public CompositeCalculator(IEnumerable<ICalculator> calculators)
  {
    Contract.Requires(calculators != null);

    Calculators = calculators.ToList().AsReadOnly();
  }

  public ReadOnlyCollection<ICalculator> Calculators { get; private set; }

  public void ApplyCalculation(GraspRuntime runtime)
  {
    foreach(var calculator in Calculators)
    {
      calculator.ApplyCalculation(runtime);
    }
  }
}

We simply apply a collection of ICalculator implementations as though they are a single implementation (the basic tenet of a composite object). Things get interesting when we define a calculation in terms of a function which accepts a GraspRuntime and returns the output value:

public sealed class FunctionCalculator : ICalculator
{
  public FunctionCalculator(
    Variable outputVariable,
    Func<GraspRuntime, object> function)
  {
    Contract.Requires(outputVariable != null);
    Contract.Requires(function != null);

    OutputVariable = outputVariable;
    Function = function;
  }

  public Variable OutputVariable { get; private set; }

  public Func<GraspRuntime, object> Function { get; private set; }

  public void ApplyCalculation(GraspRuntime runtime)
  {
    runtime.SetVariableValue(OutputVariable, Function(runtime));
  }
}

Here, we implement ApplyCalculation by applying a function to the specified runtime and assigning the result to the output variable. This is a very straightforward implementation which places the majority of the heavy lifting on the function. We create the function by compiling the Expression instance we used to model the Calculation class; if that does not sound familiar, don’t worry, we will see more in the next post.

Summary

We codified the representation of a system’s schema, the static portion of the engine. We also modeled the dynamic portion of the engine by associating variables with values in a context called a runtime. We then implemented the getting/setting of those values and created an abstraction for applying calculations.

Next time, we will start examining the compilation process, which generates a runtime from a schema.

Continue to Part 5: Executable

Tags: , ,

Grasp, A .NET Analysis Engine – Part 3: Calculations

February 23rd, 2012 No comments

In part 2, we created a class that models a single piece of data, a step toward the first goal of representing any data set. In this post, we will work toward the second goal, representing a set of rules which act on a data set.

Data Begets Data

Analysis is the process of generating data by examining existing data. For example, if we take the total income of the Acme bookstore and subtract its expenses, we have derived a new piece of data, its operating profit. That value becomes part of the data set and a new possible input for further analysis. For example, after generating the operating profit value, we might then apply taxes, generating another data point: the net profit.

A rule which defines the generation of data has a specific profile: it can act on any data in the data set, it encodes arbitrarily complex logic, and it results in a single value. Grasp calls this a calculation; we can thus define the analysis of a data set as a series of calculations.

Expressing Calculations

We already have a decent idea of how to model the result of a calculation: it is just another piece of data, which we defined in part 2 as a variable. This means every calculation will have a variable to represent its result.

The other tenets of a calculation are harder to model: acting on data in a data set and encoding arbitrarily complex logic. In the solutions I have seen, these are easily the stickiest parts. Generally, they involve a data structure that can represent simple constructs, such as add, subtract, multiply, divide, and boolean operations. They also have algorithms for executing the logic described by the data structure.

If the system requires more involved capabilities, such exponents, nesting, or order-of-operations, those must be coded into the core calculation engine: the more supported concepts, the more complexity in the engine. This part often has the most code and highest levels of risk and change in the whole codebase.

Rather than repeat this line of reasoning in Grasp, we can take advantage of a built-in version of a logic system: Expression Trees. Debuting in .NET 3.5, they model code as a tree-like data structure, where each node represents a particular kind of .NET expression. This covers all of the cases we discussed before, such as operators, nesting, order of operations, etc. It also covers more advanced scenarios, such as method calls, unary operators, modulo, and any other kind of expression supported by .NET.

This is extraordinarily useful because it gives us a ready-made data type to represent the "arbitrarily complex logic" portion of a calculation: Expression. Even better, though, it also provides the means for carrying out the logic. Expressions can be compiled, at runtime, to a delegate containing the executable version of the code it represents. Not only does this remove the burden of writing our own algorithm, the logic will run as fast as if we had written and compiled it ourselves. That’s a win-win-win. Thanks .NET!

From Concept to Code

Now that we’ve defined the major components of calculations, we can represent them:

public class Calculation
{
  public Calculation(Variable outputVariable, Expression expression)
  {
    Contract.Requires(outputVariable != null);
    Contract.Requires(expression != null);

    OutputVariable = outputVariable;
    Expression = expression;
  }

  public Variable OutputVariable { get; private set; }

  public Expression Expression { get; private set; }

  public override string ToString()
  {
    return String.Format("{0} = {1}", OutputVariable, Expression);
  }
}

We also override ToString to provide a simple visualization. The variable will output its fully-qualified name, and Expression provides nice text for all of the expression types (another win).

Now we can tackle the final tenet of calculations: act on any data in a data set. We have a data structure that can represent any kind of logic, but it does not know about variables as we’ve defined them. We need to teach expression trees about variables so we can use them as operands.

To do so, we create a new kind of node, specific to Grasp, that represents a variable:

public sealed class VariableExpression : Expression
{
  public static readonly ExpressionType ExpressionType = (ExpressionType) 1000;

  internal VariableExpression(Variable variable)
  {
    Variable = variable;
  }

  public override ExpressionType NodeType
  {
    get { return ExpressionType; }
  }

  public override Type Type
  {
    get { return Variable.Type; }
  }

  public new Variable Variable { get; private set; }

  public override string ToString()
  {
    return Variable.ToString();
  }
}

First, we derive from Expression, allowing variable nodes to exist in a tree like any other node. We then override the NodeType property, providing a value of the ExpressionType enumeration that is far above any of the base values. We need to make sure Grasp does not impede on the existing expression system.

We also accept a variable in the constructor and store it in a property*. We override the Type property to indicate that the expression’s result type is the variable’s type, such as integer or decimal**.  We also return the variable’s fully-qualified name in ToString so it appears in an expression’s text.

Notice that VariableExpression‘s constructor is internal. This is because I chose to replicate the Expression class’s factory pattern for variable expressions as well. So, instead of creating a VariableExpression instance directly, we can use the Variable.Expression factory method declared on the Variable class:

public static VariableExpression Expression(Variable variable)
{
  Contract.Requires(variable != null);

  return new VariableExpression(variable);
}

This is solely an aesthetic choice and could be done either way.

Modeling Operating Profit

Now that we’ve done the prep work, we can cook the Operating Profit calculation. At this stage, our goal is to create a data structure that accurately describes this equation:

Acme.Bookstore.OperatingProfit =

    Acme.Bookstore.TotalIncome – Acme.Bookstore.TotalExpenses

Later in the series, we will discuss how to assign values to these variables and perform the subtraction. This post defines and builds the underlying structure.

First, we create the variables in the the calculation, giving them the decimal type since we are dealing with money:

var operatingProfit =
  new Variable("Acme.Bookstore", "OperatingProfit", typeof(decimal));

var totalIncome =
  new Variable("Acme.Bookstore", "TotalIncome", typeof(decimal));

var totalExpenses =
  new Variable("Acme.Bookstore", "TotalExpenses", typeof(decimal));

This uses the constructor we defined in part 2. Next, we need to create nodes which represent the input variables in an expression tree:

var totalIncomeExpression = Variable.Expression(totalIncome);

var totalExpensesExpression = Variable.Expression(totalExpenses);

Now comes the interesting part: creating a node that represents the subtraction. This is as easy as using the static factory on the Expression class:

var operatingProfitExpression =
  Expression.Subtract(totalIncomeExpression, totalExpensesExpression);

This produces a BinaryExpression whose NodeType property is ExpressionType.Subtract, whose Left value is totalIncomeExpression, and whose Right value is totalExpensesExpression. Since we overrode VariableExpression.Type to return the variable’s type, the Subtract node will see a decimal on each side and determine that its return type should also be decimal (just as if we had written it in code). This is fortunate, as we are attempting to assign it to a decimal variable.

The final step is to create the calculation object that associates the subtraction with the operatingProfit variable. This is straightforward using the constructor we defined earlier:

var operatingProfitCalculation =
  new Calculation(operatingProfit, operatingProfitExpression);

This is an example of constructing logic to operate on arbitrary data points. The kicker is that the code operatingProfitCalculation.ToString() gives us the same simple text representation as we saw in the beginning of this section.

Summary

We explored the nature of analysis as a data generation process and determined what constitutes a calculation. We also weaved expression trees into our definitions and created an object to represent an example. Grasp hopes to provide a simple usage model on top of complex reasoning, the hallmark of a solid abstraction.

Next time, we will get into some aspects of the runtime and take a look at making calculations actually do something.

Continue to Part 4: Runtime

* I should note that there is already an Expression.Variable factory method, which is why we need the "new" modifier on the declaration of the Variable property. However, that node only represents a name and type, sans namespace; it also doesn’t allow us to store the variable instance, which is why we instead define an entirely new node type.

** At first the Expression.Constant node seemed like it could work, but its return type would be Variable, which wouldn’t be allowed as, say, the operand of an Add node. We need a node representing a variable to look like it is a value of the variable’s type.

Tags: , ,

Grasp, A .NET Analysis Engine – Part 2: Variables

February 23rd, 2012 No comments

In part 1, we identified a family of systems at whose core is a data set and its analysis. We also set out two goals for the Grasp engine: represent a structured collection of data points and the rules which analyze it. In this post, we will explore how we can represent any set of domain-specific data.

Data Points

To describe a data set, we first need to define a unit of data. The variable is a well-known concept we can leverage here: it represents a value, consisting of a name and type. In Grasp, a type is a CLR type, so like a variable in a program, a Grasp variable can represent any manner of data.

A variable is a design-time construct; it communicates that, at some point, there will be a concrete value associated with it. Just like writing a program, this separates the rules of a system from the runtime which carries them out.

Schema

We can describe the set of all variables known to a system as its schema. This is similar in concept to a database schema, which also describes an organization of data. It is the "shape" of the data set.

A Grasp schema is similar to a database schema in another important respect: its data is always available. This is different from a program, where a variable’s scope, and thus availability, is determined by the extent its name appears in the source. A schema is effectively a single scope in which all variables reside.

This poses an interesting challenge: how can we effectively partition variables if they all live in the same bucket? We can’t have two variables named, say, TotalIncome, that mean different things in different contexts. Any decent-sized data set would have conflicts pretty quickly. Relational databases solve this issue using tables: a table qualifies a piece of data, making it uniquely identifiable within the schema.

Variables, though, are more fluid than the strict structure of tables; they are more akin to organizing types within an assembly. This implies we can borrow another well-known concept: the namespace. Its hierarchical nature allows us to fully qualify any variable in a data set, allowing us to get as fine-grained as necessary in describing data.

For example, let’s say we are accrediting the Acme School of Anvil Design. We may ask the total income of the school as well as the total income of its bookstore. We can represent both of these values by qualifying them with meaningful namespaces:

Acme.TotalIncome

Acme.Bookstore.TotalIncome

This is easier to understand, and will evolve better, than if we chose arbitrary names to differentiate them, such as TotalSchoolIncome/TotalBookstoreIncome or SchoolTotalIncome/BookstoreTotalIncome. It is more obvious that the variables represent similar values, and leaves room for other values to be organized at the school or bookstore level. Perhaps the bookstore also has a coffee shop; we can further organize the data along these lines:

Acme.Bookstore.CoffeeShop.TotalIncome

This approach organizes data along the contours of the problem domain, facilitating discoverability and learnability.

Let’s See Some Code

A variable is straightforward to represent. For starters, we create properties for the namespace, name, and type. These values do not change for the lifetime of an instance, so we can make them immutable via private setters:

public class Variable
{
  public string Namespace { get; private set; }

  public string Name { get; private set; }

  public Type Type { get; private set; }

  public override string ToString()
  {
    return Namespace + "." + Name;
  }
}

We also override ToString so it returns the fully-qualified name.

Next, we need to initialize these properties. The key here is to ensure the namespace and name are formatted correctly. For Grasp, this means following the .NET Framework’s definition of a namespace, which is a series of identifiers separated by the "." character. An identifier is a token composed of a combination of letters, numbers, and/or the "_" character, and does not start with a number.

We can encode these formatting rules as a set of static methods on the Variable class:

public static bool IsNamespace(string value)
{
  Contract.Requires(value != null);

  return Regex.IsMatch(value, @"^([_A-Za-z]+\w*)+(\.[_A-Za-z]+\w*)*$");
}

public static bool IsName(string value)
{
  Contract.Requires(value != null);

  return Regex.IsMatch(value, @"^[_A-Za-z]+\w*$");
}

Phew! Those are some imposing regular expressions on first glance. They actually pretty straightforward, though, as regular expressions go. Here is a breakdown:

Namespace
^   Start of string
(   Start a group to match the first namespace identifier
  [_A-Za-z]+ Match exactly one underscore or letter to start (no digits)
  \w* Match zero or more "word" characters (letters, digits, or underscores)
)+   Match exactly one identifier to start the namespace
(   Start a group to match the subsequent identifiers
  \. Match a single separating dot
  [_A-Za-z]+ Match exactly one underscore or letter to start (no digits)
  \w* Match zero or more "word" characters (letters, digits, or underscores)
)*   Match zero or more subsequent identifiers
$   End of string
Name
^ Start of string
[_A-Za-z]+ Match exactly one underscore or letter to start (no digits)
\w* Match zero or more "word" characters (letters, digits, or underscores)
$ End of string

Together these checks ensure that all namespaces and names for variables follow the well-known pattern for .NET namespaces. This enables a text-based calculation editor, where we would reference variable names in a parseable manner. But, we’ll get to that later.

Now that the Variable class has the ability to validate the format of its values, we can create a constructor that initializes the Namespace, Name, and Type properties:

public Variable(string @namespace, string name, Type type)
{
  Contract.Requires(IsNamespace(@namespace));
  Contract.Requires(IsName(name));
  Contract.Requires(type != null);

  Namespace = @namespace;
  Name = name;
  Type = type;
}

In the constructor, we ensure that the namespace and name values have the correct format, and that the type is not null. (If you don’t recognize the syntax, Contract.Requires is part of .NET Code Contracts. I use it throughout Grasp for argument checking.)

I used the "@" prefix for the namespace parameter because that is the best name but also happens to be a keyword. In these cases, we also have the option to compromise the name somehow, i.e. "ns", "nmespace", or "theNamespace". However, each of these is an end run around the issue and does not reflect to the reader why they chose that identifier; rather than have the next developer try to change it to "namespace", realize it won’t work, and have to go through the same decision process, I chose to make the decision explicit. This happens frequently with "@event" as well. Your mileage may vary.

Summary

We addressed the first goal of Grasp: represent the data of any data set. We were able to do this by combining the concepts of namespaces and variables to uniquely identify any piece of data. We also created a class to represent a namespace-qualified variable and ensured the namespace and name have the proper format.

Next time, we will tackle the other goal: rules which analyze the data.

Continue to Part 3: Calculations

Tags: , ,

Grasp, A .NET Analysis Engine – Part 1: Overview

February 23rd, 2012 1 comment

A frequent scenario I see as a developer is to collect a data set and analyze it. Many application types operate on this core principle, however subtly. I have often wondered what it would look like to generalize and unify these systems. I have seen/worked on a few:

  • Surveys/tests/quizzes
  • Operational data for medical offices, labs, and other organizations
  • Financial forms
  • 360-degree feedback
  • Conformance to standards
  • Accreditation and assessment

These problem domains share many traits but vary widely in purpose. Each has unique needs for data and analysis, which together form a schema for interpreting meaning. By defining things, these systems add value in the space between what and why.

Solutions in this area tend to overlap in form and function. I thought it would be interesting to capture the common elements in a library, at a low level. I named it Grasp, reflecting the need to both collect and understand data.

Workflow

Data in this context means a set of uniquely-identifiable values. It may represent a math test, a safety checklist, a web survey, a quarterly financial report, or a site assessment.

Analysis mines the raw material for business value by generating data from existing data:

  • Are the student’s answers correct?
  • Is the safety inspection fully filled out?
  • What percentage of respondents answered "No" for question 5?
  • What was the bookstore’s operating profit?
  • How many residents are graduating this year?

The more thorough the analysis, the more we know about a data set. The answers become new data we can throw on top of what we’ve already got, for future use. Grasp is a language for defining, executing, and reporting on this cycle.

What to Expect

This series will cover many aspects of an analysis engine. We will represent data and calculations involving that data. We will compile those into a fully-functional runtime that addresses details such as data types, interdependent calculations, and extensibility. At the end we will have a library that fits at the core of many application types.

In the next post we will start with our first goal: represent data.

Continue to Part 2: Variables

Tags: , ,