Grasp, A .NET Analysis Engine – Part 3: Calculations
- Part 1: Overview
- Part 2: Variables
- Part 3: Calculations
- Part 4: Runtime
- Part 5: Executable
- Part 6: Validating Calculations
- Part 7: Compiling Calculations
- Part 8: Calculation Dependencies
- Part 9: Dependency Sorting
- GitHub
In part 2, we created a class that models a single piece of data, a step toward the first goal of representing any data set. In this post, we will work toward the second goal, representing a set of rules which act on a data set.
Data Begets Data
Analysis is the process of generating data by examining existing data. For example, if we take the total income of the Acme bookstore and subtract its expenses, we have derived a new piece of data, its operating profit. That value becomes part of the data set and a new possible input for further analysis. For example, after generating the operating profit value, we might then apply taxes, generating another data point: the net profit.
A rule which defines the generation of data has a specific profile: it can act on any data in the data set, it encodes arbitrarily complex logic, and it results in a single value. Grasp calls this a calculation; we can thus define the analysis of a data set as a series of calculations.
Expressing Calculations
We already have a decent idea of how to model the result of a calculation: it is just another piece of data, which we defined in part 2 as a variable. This means every calculation will have a variable to represent its result.
The other tenets of a calculation are harder to model: acting on data in a data set and encoding arbitrarily complex logic. In the solutions I have seen, these are easily the stickiest parts. Generally, they involve a data structure that can represent simple constructs, such as add, subtract, multiply, divide, and boolean operations. They also have algorithms for executing the logic described by the data structure.
If the system requires more involved capabilities, such exponents, nesting, or order-of-operations, those must be coded into the core calculation engine: the more supported concepts, the more complexity in the engine. This part often has the most code and highest levels of risk and change in the whole codebase.
Rather than repeat this line of reasoning in Grasp, we can take advantage of a built-in version of a logic system: Expression Trees. Debuting in .NET 3.5, they model code as a tree-like data structure, where each node represents a particular kind of .NET expression. This covers all of the cases we discussed before, such as operators, nesting, order of operations, etc. It also covers more advanced scenarios, such as method calls, unary operators, modulo, and any other kind of expression supported by .NET.
This is extraordinarily useful because it gives us a ready-made data type to represent the "arbitrarily complex logic" portion of a calculation: Expression. Even better, though, it also provides the means for carrying out the logic. Expressions can be compiled, at runtime, to a delegate containing the executable version of the code it represents. Not only does this remove the burden of writing our own algorithm, the logic will run as fast as if we had written and compiled it ourselves. That’s a win-win-win. Thanks .NET!
From Concept to Code
Now that we’ve defined the major components of calculations, we can represent them:
{
public Calculation(Variable outputVariable, Expression expression)
{
Contract.Requires(outputVariable != null);
Contract.Requires(expression != null);
OutputVariable = outputVariable;
Expression = expression;
}
public Variable OutputVariable { get; private set; }
public Expression Expression { get; private set; }
public override string ToString()
{
return String.Format("{0} = {1}", OutputVariable, Expression);
}
}
We also override ToString to provide a simple visualization. The variable will output its fully-qualified name, and Expression provides nice text for all of the expression types (another win).
Now we can tackle the final tenet of calculations: act on any data in a data set. We have a data structure that can represent any kind of logic, but it does not know about variables as we’ve defined them. We need to teach expression trees about variables so we can use them as operands.
To do so, we create a new kind of node, specific to Grasp, that represents a variable:
{
public static readonly ExpressionType ExpressionType = (ExpressionType) 1000;
internal VariableExpression(Variable variable)
{
Variable = variable;
}
public override ExpressionType NodeType
{
get { return ExpressionType; }
}
public override Type Type
{
get { return Variable.Type; }
}
public new Variable Variable { get; private set; }
public override string ToString()
{
return Variable.ToString();
}
}
First, we derive from Expression, allowing variable nodes to exist in a tree like any other node. We then override the NodeType property, providing a value of the ExpressionType enumeration that is far above any of the base values. We need to make sure Grasp does not impede on the existing expression system.
We also accept a variable in the constructor and store it in a property*. We override the Type property to indicate that the expression’s result type is the variable’s type, such as integer or decimal**. We also return the variable’s fully-qualified name in ToString so it appears in an expression’s text.
Notice that VariableExpression‘s constructor is internal. This is because I chose to replicate the Expression class’s factory pattern for variable expressions as well. So, instead of creating a VariableExpression instance directly, we can use the Variable.Expression factory method declared on the Variable class:
{
Contract.Requires(variable != null);
return new VariableExpression(variable);
}
This is solely an aesthetic choice and could be done either way.
Modeling Operating Profit
Now that we’ve done the prep work, we can cook the Operating Profit calculation. At this stage, our goal is to create a data structure that accurately describes this equation:
Acme.Bookstore.OperatingProfit =
Acme.Bookstore.TotalIncome – Acme.Bookstore.TotalExpenses
Later in the series, we will discuss how to assign values to these variables and perform the subtraction. This post defines and builds the underlying structure.
First, we create the variables in the the calculation, giving them the decimal type since we are dealing with money:
new Variable("Acme.Bookstore", "OperatingProfit", typeof(decimal));
var totalIncome =
new Variable("Acme.Bookstore", "TotalIncome", typeof(decimal));
var totalExpenses =
new Variable("Acme.Bookstore", "TotalExpenses", typeof(decimal));
This uses the constructor we defined in part 2. Next, we need to create nodes which represent the input variables in an expression tree:
var totalExpensesExpression = Variable.Expression(totalExpenses);
Now comes the interesting part: creating a node that represents the subtraction. This is as easy as using the static factory on the Expression class:
Expression.Subtract(totalIncomeExpression, totalExpensesExpression);
This produces a BinaryExpression whose NodeType property is ExpressionType.Subtract, whose Left value is totalIncomeExpression, and whose Right value is totalExpensesExpression. Since we overrode VariableExpression.Type to return the variable’s type, the Subtract node will see a decimal on each side and determine that its return type should also be decimal (just as if we had written it in code). This is fortunate, as we are attempting to assign it to a decimal variable.
The final step is to create the calculation object that associates the subtraction with the operatingProfit variable. This is straightforward using the constructor we defined earlier:
new Calculation(operatingProfit, operatingProfitExpression);
This is an example of constructing logic to operate on arbitrary data points. The kicker is that the code operatingProfitCalculation.ToString() gives us the same simple text representation as we saw in the beginning of this section.
Summary
We explored the nature of analysis as a data generation process and determined what constitutes a calculation. We also weaved expression trees into our definitions and created an object to represent an example. Grasp hopes to provide a simple usage model on top of complex reasoning, the hallmark of a solid abstraction.
Next time, we will get into some aspects of the runtime and take a look at making calculations actually do something.
—
* I should note that there is already an Expression.Variable factory method, which is why we need the "new" modifier on the declaration of the Variable property. However, that node only represents a name and type, sans namespace; it also doesn’t allow us to store the variable instance, which is why we instead define an entirely new node type.
** At first the Expression.Constant node seemed like it could work, but its return type would be Variable, which wouldn’t be allowed as, say, the operand of an Add node. We need a node representing a variable to look like it is a value of the variable’s type.