# Sunday, June 02, 2013

A "Mini" workflow engine. Part 2: on the serialization of Bookmarks

Last time, we have seen that Activities are composed in a way that allows them to be paused and resumed, using continuations in the form of bookmarks:
internal class Bookmark
    public string Name { get; set; }
    public Action<ActivityExecutionContext, object> ContinueAt { get; set; }  
    public ActivityExecutionContext ActivityExecutionContext { get; set; }
A Bookmark has a relatively simple structure: a Name, a continuation (in the form of an Action) and a Context. Bookmarks are created by the Context, and the continuation is always a ContinueAt method implemented in an activity. Thus, serializing a Bookmark (and, transitively, its Context) should lead to the serialization of an object graph that holds the entire workflow code, its state and its "Program Counter".

However, serializing a delegate sounds kind of scary to me: are we sure that we can safely serialize something that is, essentially, a function pointer?
That's why I started looking for an answer to a particular question: what happens when I serialize a delegate, a Func<T> or an Action<T>?

I found some good questions and answers on StackOverflow; in particular, a question about serialization of lambdas and closures (related to prevalence, a technique to provide transactional implementing checkpoints and redo journals in terms of serialization) and a couple of questions about delegate (anonymous and not) serialization (1, 2 and 3).

It looks like:
  • delegate objects (and Func<>) are not XML serializable, but they are BinarySerializable.
  • even anonymous delegates, and delegates with closures (i.e. anonymous classes) are serializable, provided that you do some work to serialize the compiler-generated classes (see for example this MSDN article, and this blog post)
  • still, there is a general agreement that the whole idea of serializing a delegate looks very risky

BinarySerializable looks like the only directly applicable approach; in our case, we are on the safe side: every function  we "point to" from delegates and bookmarks is in classes that are serialized as well, and objects and data that accessed by the delegate are serialized as well (they access only data that is defined in activities themselves, which is saved completely in the context). BinarySerilization will save everything we need to pause and resume a workflow in a different process.

I implemented a very simple persistence of workflow instances using BinarySerializer for MiniWorkflow, and it does indeed work as expected.
You can grab the code on github and see it yourself.

However, I would like to provide an alternative serialization mechanism for MiniWorkflow, less "risky". I do not like the idea of not knowing how my Bookmarks are serialized, I would like something more "visible", to understand if what I need really goes from memory to disk, and to be more in control.
Also, I am really curious to see how WF really serializes bookmarks; workflows are serialized/deserialized as XAML, but a workflow instance is something else. In fact, this is a crucial point: in the end, I will probably end up using WF4 (MiniWorkflow is -probably- just an experiment to understand the mechanics better), but I will need to implement serialization anyway. I do not want to take a direct dependency on SQL, so I will need to write a custom instance store, alternative to the standard SQL instance store.

I will explore which alternatives we have to a BinarySerializer; but before digging into this problem, I want to work out a better way to handle the flow of execution. Therefore, next time calling continuations: recursion, work item queue or... call/cc?

# Saturday, June 01, 2013

A "Mini" workflow engine. Part 1: resumable programs

WF achieves durability of programs in a rather clever way: using a "code-as-data" philosophy, it represents a workflow program as a series of classes (activities), which state of execution is recorded at predefined points in time.

It then saves the whole program (code and status) when needed, through serialization.
The principle is similar to the APM/IAsyncResult pattern: an async function says "do not wait for me to complete, do something else" and to the OS/Framework "Call me when you are done here (the callback), and I will resume what I have to do". The program becomes "thread agile": it does not hold onto a single thread, but it can let a thread go and continue later on another one.

Workflows are more than that: they are process agile. They can let their whole process terminate, and resume later on another one. This is achieved by serializing the whole program on a durable storage.
The state is recorded using continuations; each activity represents a step, a "statement" in the workflow. But an activity does not call directly the following statement: it executes, and then says to the runtime "execute the following, then get back to me", providing a delegate to call upon completion, a continuation.
The runtime sees all these delegates, and can either execute them or save them. It can use the delegate to build a bookmark.  Serializing the bookmark will save it to disk, along with all the program code and state (basically, the whole workflow object graph plus one delegate that holds a sort of "program pointer" (the bookmark), where execution can be resumed).

The same mechanism is used for a (potentially long) wait for input: the activity tells the runtime "I am waiting for this, call me back here when done" using a delegate, and the runtime can use it to passivate (persist and unload) the whole workflow program.

I found it fascinating, but I wanted to understand better how it worked, and I was dubious about one or two bits. So, I built a very limited, but functional workflow runtime around the same principles. I have to say that using WF3 (and WF4 from early CTPs too) I already had quite a good idea of how it could have been implemented, but I found Dharma Shukla and Bob Shmidt "Essential Windows Workflow Foundation" very useful to cement my ideas and fill some gaps. Still, building a clone was the better way to fill in the last gaps.

The main classes are:
  • an Activity class, the base class for all the workflow steps/statements; a workflow is a set of activities;
  • a WorkflowHandle, which represent an instance of a workflow;
  • a Bookmark, which holds a continuation, a "program pointer" inside the workflow instance;
  • a Context, which will hold all the bookmarks for the current workflow instance;
  • a WorkflowRuntime, which handles the lifecycle of a WorkflowInstance and dispatches inputs (resuming the appropriate Bookmark in the process)

A basic Activity is pretty simple:

public abstract class Activity
   public Activity()
     this.name = this.GetType().Name;

   abstract protected internal ActivityExecutionStatus Execute(ActivityExecutionContext context);

   protected readonly string name;
   public string Name
     get { return name; }
It just have a name, and an "Execute" method that will be implemented by subclasses; it is how activities are composed that is interesting:

public class Sequence : Activity
    int currentIndex;
    List<Activity> statements = new List<Activity>();
    public IList<Activity> Statements
        get { return statements; }

    protected internal override ActivityExecutionStatus Execute(ActivityExecutionContext context)
        currentIndex = 0;
        // Empty statement block
        if (statements.Count == 0)
            return ActivityExecutionStatus.Closed;
            context.RunProgramStatement(statements[0], ContinueAt);
            return ActivityExecutionStatus.Executing;

    public void ContinueAt(ActivityExecutionContext context, object value)
        // If we've run all the statements, we're done
        if (++currentIndex == statements.Count) 
        else // Else, run the next statement
            context.RunProgramStatement(statements[currentIndex], ContinueAt);
Now the Execute method needs to call Execute on child activities, one at time. But it does not do it in a single step, looping though all the statements list and calling their execute: this will not let the runtime handle them properly. In fact, the execute method says to the runtime "run the first statement, then call me back". This is a pattern you see both in WF3 and WF4, even if in WF3 is more explicit. It is called an "internal bookmark": you ask the runtime to set a bookmark for you, execute an activity, and the activity will resume the bookmark once done:
// This method says: the next statement to run is this. 
// When you are done, continue with that (call me back there)
internal void RunProgramStatement(Activity activity, Action<ActivityExecutionContext, object> continueAt) { // This code replaces // context.Add(new Bookmark(activity.Name, ContinueAt)); var result = activity.Execute(this); // The activity already completed? if (result == ActivityExecutionStatus.Closed) continueAt(this, null); else { // Save for later... InternalBookmark = new Bookmark { ContinueAt = continueAt, Name = "", ActivityExecutionContext = this }; } }
When an Activity completes, returning ActivityExecutionStatus::Closed or calling explicitly CloseAcctivity, the runtime looks for a previously set bookmark and, if found, resumes execution from it:
public void CloseActivity()
    // Someone just completed an activity.
    // Do we need to resume something?
    if (InternalBookmark != null)
        var continuation = InternalBookmark.ContinueAt;
        var context = InternalBookmark.ActivityExecutionContext;
        var value = InternalBookmark.Payload;
        InternalBookmark = null;
        continuation(context, value);
Do you already spot a problem with this way of handling the continuations and the control flow of the workflow program? Yes, it is recursive, and the recursion is broken only if an Activity explicitly sets a bookmark. In this way, the runtime simply returns to the main control point, waiting for the input and resuming the bookmark after it receives it.
For example, in this "ReadLine" activity:
public class ReadLine : Activity
    public OutArgument<string> Text = new OutArgument<string>();       

    protected override ActivityExecutionStatus Execute(ActivityExecutionContext context)
        //Waits for user input (from the command line, or from 
        //wherever it may come
        context.CreateBookmark(this.Name, this.ContinueAt);
        return ActivityExecutionStatus.Executing;

    void ContinueAt(ActivityExecutionContext context, object value)
        this.Text.Value = (string)value;
When the bookmark is created the runtime stores it, and then the "Execute" method returns, without any call to RunProgramStatement. The rutime knows the Activity is waiting for user input, and do not call any continuation: it stores it and wait for input.
public void CreateBookmark(string name, Action<ActivityExecutionContext, object> continuation)
    // var q = queuingService.GetWorkflowQueue(name);
    // q += continuation;
    bookmarks.Add(name, new Bookmark
            ContinueAt = continuation,
            Name = name,
            ActivityExecutionContext = this
The next logical step is to use Bookmarks to save the program state to durable storage. This lead me to ask me what ended up to be a rather controversial question: is it possible, or advisable, to serialize continuations (in any .NET variant in which they appear, i.e. delegates, EventHandlers, Func<> or Action<>)?

Therefore, next time: serializing Bookmarks

# Friday, May 31, 2013

Building a Sharepoint "clone"

Well, from my previous post it should be clear that it will not be a "full" Sharepoint clone :) That would be a daunting task!
Anyway, for the kind of application I have in mind, I need to replicate the following functionalities:
  •     Everything is accessible via a web browser
  •     The basic data collection will be a list of "things" (more on that later)
  •     It will be possible to perform complex, collaborative, persisted, long-running operations to create and modify "things" and "list of things"
  •     Operations, lists and "things" need to be secured (have different accesses, with permissions assignable to users and roles)
  •     It is possible to define a "thing" (lists of them) (but only for users in the correct role)
  •     It is possible to define operations using predefined basic blocks ("steps")
  •     Steps will have automatically generated UI to gather user input, if necessary
For example: one of these "things" may be a project proposal.

Someone starts a project proposal, fills up some fields, and submit it to a manager. The manager can send it back, asking more details. Then someone is assigned to it, to explore the feasibility, for example. And many other steps can follow.

Each of these steps fills out a little part of the proposal (our "thing"). I want to discuss how I want to build "things", and which shape they will have. For now, let's focus on steps.
First of all, not that I have not used the word "workflow" up to now.
Yes, this kind of step-by-step operations fall nicely in a the definition of workflow; if you look at the company documentation, or even if you ask an employee to describe the procedure to submit a proposal, they are described in a graphical way, with boxes and arrows and diamonds: a flowchart.

WF (the .NET workflow engine) enforces this feeling by using a graphical language very similar to flowcharts as one of the ways to author a workflow. Besides the familiar authoring experience (which is, unfortunately, not without shortcomings in SP), WF adds many nice features we want in this kind of operations. Above all, we have  resilience-durability-persistence. This feature is really useful in our scenario: delays introduced by human interaction are very long, and a way to persist and resume programs is really useful to allow resilience and also scalability (programs can pause and resume, and migrate from one machine to another too).

The plan (for this blog):
- A series of posts about workflows
- Comparison with an interesting alternative: dataflows
- Building and persisting "things" (Dynamic types)
- Some nice ASP.NET MVC features used to put everything together (and make the actual users of all of this happy!)

Everything is accompanied by code.
I will try to put most of what I am discussing here on github, pushing new stuff as I blog about it.
Eventually, I hope to push all the code I wrote (and will write) for this project on github, but I will have to discuss it with my "investor" before doing that..

Next time: Part 1 of the workflow series!

# Sunday, May 26, 2013

Back to C#

Three years ago, give or take, I was asked to help in assessing the technical choices for an interesting project.
The project aimed at realizing a platform (intranet, web) to help a company in dealing with all the quality-related processes and procedures; in particular, it should help in preparing and maintaining the high quality levels required by international standards (ISO-9001 and alike).

A very good friend of mine was asked to act as a program manager/product owner, and work as a proxy between the initial set of companies interested in the product (the customers) and the ISV that was in charge of developing the software.
My friend designed and detailed the specification for every aspect needed in the software, which was composed of three main modules: storage, review and approval processes for corporate documents and handbooks; collection, review and resolution of quality-related issues and process enhancement proposals; tracking and monitoring of supplies and suppliers.
All these processes follow nicely the pattern of a workflow: clear steps, with well defined responsibilities, and long-spans between steps. In short, reactive programs that need to be durable, recorded (for auditing) and persisted.

That said, we decided (in agreement with the ISV) to use Windows Workflow Foundations as a basic engine: it seemed a very good fit for the problem at hand. In particular, the ISV decided to use Sharepoint 2010 (which was in Beta, at that time). Personally, I think it was a good choice: I already had experience with WF, and SP seemed to offer all the missing bits and a lot of bells and whistles.

However, the ISV had no prior knowledge of SP 2010, and that showed in how they implemented the solution; it worked, but it required time to implement and was not nearly as flexible as expected. When faced with something more complex, something that required some effort to customize, they fell back to write custom code. A lot of code was written, and a little too much was hard-wired: for example, modifying a workflow (adding an activity, like "send an email", or exchanging two steps) should have been something easily done through the Sharepoint Designer, but ended up being something doable only with code, because most of the activities and the code to compose them was custom. Same for the web forms, for the workflow data (stored as XML data in custom SQL server tables instead of SP lists), for excel reports (which are provided out of the box... but only for SP lists).

Nonetheless, the software did its job, and did it quite well; the design phase was done carefully, and it met the expectations. More companies wanted to use it, and of course they wanted customizations and improvements.
I know all these details because I was asked to take over from were the ISV left, and help in bug fixing, modifications and deployment of new installations. It was hard, more that it should have been; some things that should have been done directly by the customers were possible only using code.
And deployment of Sharepoint solutions on top of an existing, running instance is a real, royal pain.
In short, we arrived at a point were some of the requests were simply not possible to fulfill. Some of the shortcomings are due to SP 2010 (more fine grained permissions and access control; less heavy footprint and resource usage; more custom user interface; easy and flexible deploy options) and we could not address them easily. Two features especially were a no-go for further development: deployment in a shared hosting environment (cloud?) and (linked to it) the ability to use a storage engine that was NOT SQL server.

Earlier this year, my friend and I met, and we decided to try and build a solution that was independent from SP, that retained the good bits (workflows, lists, reports) and improved upon them (adding more fined permissions and access control, workflows that were easier to author, persistence over different storage, a more modern UI).
We decided also to try and do it in what we thought was the right way, more for "fun" than for "profit": that meant no heavy pressure on times, the ability to learn new stuff (ASP.NET MVC, websockets,..) and to re-build some parts of the stack that we saw as necessary (I have a daily job for a living in a software company, so this project was just for fun and to keep my abilities sharp).
I was happy to have an opportunity to go back to C#, to keep my training on .NET going, and stay up-to-date on the latest  .NET technologies: it has been almost two years now without using C# professionally, and I was missing it! So, I said yes.
I decided also to keep a journal here, explaining what I have done and what I will do, what is working and what I will learn.

Next time: building a Sharepoint "clone"
# Monday, May 20, 2013

I'm back!

After some interval that looks like 7 years (hikes!)... I decided it was time to go back and blog again!

The first step was to refresh my old blog: the domain was still there, registered and paid. And unbelievably, after upgrading dasBlog with the right tools, the old posts where there too!

Disclaimer: take them for what they are; old posts about my past coding experiences! :)

They were a fun read; some of them looks funny (wow, my writing style really changed - I hope for the best - in the last few years!), but I decided to let them there anyway: someone can still find them useful.
However, I will definitely not going to blog about the same thinks.

In fact, the main reason I wanted to resume blogging is because I am going to start a new project, and I want to keep a record, a journal of how it will evolve. Even if it is only for myself. Hence, the title: coding for fun. I will talk mainly about coding, and the fun that it (still) gives to me.

# Saturday, July 08, 2006

One thing I really don't like...

...is when search engines (like Google, but also MSN) "automatically" sense what language I want to use. I live in Italy, therefore Google thinks I want results in Italian, or from Italian web pages. Well, that's NOT true. I want results and pages in English (like I want to type on an English keyboard and I also want my OS and programs to be in English). Especially if I type google.com instead of google.it. Also, when I search for development-related answers, as a result of this behaviour, I get really poor search result.

Speaking of translations: one of the worst things I have ever seen? Compiler errors (C++) translated into Italian. Even worse, VBA keywords used to be translated into Italian in some old release of Office. THAT was really crazy! :)

# Thursday, July 06, 2006

Windows Vista and Office 2007 Beta 2 (2)

Over all, I liked Office 2007 very much. The only thing I could not test was weblog posting, since dasBlog is not currently supported in the Beta version (it will be in the final version). I surely hope it will work well!

Instead, I have different feelings wrt Windows Vista. I followed Longhorn closely from the first PDC (2003?) and I was really looking forward to see it. Remember the “native kernel + managed subsystems” part? The three pillars, Indigo, Avalon and WinFS? Well, Indigo and Avalon are great but will be part of .NET 3.0, also available for XP, and WinFS is dead

So, what’s the point of Windows Vista? The three pillars gone, it remains the new UI, the improvements to the kernel and window manager, and the improved security model.

Running as a simple User and having to use tedious runas commands to do very common tasks on my notebook (such as changing IP, power profile or directory permissions) I thought the new LUA model of Vista will be great for me. The default user is still marked as “Administrator”, but I think (hope?) it is a simple User account under disguise, and when performing security related operations (i.e. clicking on buttons with the little shield) the security token is upgraded and substituted with one of the Administrator group, if the user grant the permit.
This is my first complaint: why they did that? Be clear, use default account from the Users group and simply ask for an Administrator password before running administrative programs, or on their first security related operation; then make the admin take ownership of the whole program. Surely, it is safest to ask every time if a program can do this or that…or not? People gets bored very easily, and do not always read what it is written on dialog boxes. Normal users almost never do that, they only try to get rid of that annoying (or for someone scary) box that prevent them “using my computer”. Despite this, the new LUA is still better than the previous situation.

The new window manager-UI instead is great. And I’m not only speaking about all the eye-candy (transparencies, the new Flip 3D I already love, the shiny new icons, the sidebar etc.) but also about usability. I love the new explorer UI and the new Open and Save dialog boxes. Finally we went a step further, stripping away the file menu where is no longer useful (like in Office 2007, where it “evolved” into the ribbon) or necessary (like in Explorer and Internet Explorer, where it is… no more!). The click-able locations on the address bar, the search box in the toolbar (yes! No more stupid search dogs…) and the new copy/move progress dialogs are some things I have waited for, and they are really great. And the Sidebar is both useful and beautiful to see (only one complaint: why is it not easy to hide and show it? Maybe with a F-something key, a-la Exposè?).
On the negative side, I have found the new UI very poorly configurable and customizable: If you chose Aero, you can’t even change colors. Very little can be done, but maybe this is the price to pay for a mainstream and “standardized” OS.

Finally, I know this is only a Beta, but I had a LOT of problems installing programs: cygwin does not work, it is impossible to register file extensions for some programs (7-zip comes into my mind), other programs crash without reason. Even SQL Server 2005 needs a patch to work correctly! There is still much work to be done, and it passed a lot of time. Maybe Mini is right, and the Windows team need to change direction.
The course the event took really disappoints me. Vista is great, but not so great and not so radically different to justify for me the switch from XP (a 5 year old OS!). I love .NET, and the managed and secure world, and I’m with Gary McGraw when he says that Vista is a missed opportunity for a new, modern and secure OS; the three-years-ago Longhorn still looks better to me than the actual Vista. I’ll have to wait for Singularity… :)

# Tuesday, July 04, 2006

Windows Vista and Office 2007 Beta 2

I could not resist, and installed both :) I am very pleased with Office 2007. I think is the best release of Office I have ever used! The ribbon is great, although there are some command that are “misplaced”, but I think this is my opinion and it is based on the use I do of Office (Word, in this case). Everybody uses Word or Excel in its own way, so the Office team had done a great work examining usage patterns, correlations between command usages and position, and so on. Just read Jensen Harris blog; as an example, I found particularly amusing his post on “command lingering”.

I used Word, Excel and PowerPoint for about a month now, and I have to say I really like the ribbon, the mini-toolbar, even the Office “Start” button I initially feared so much. And it looks great on both XP and Vista too! I have found myself using and discovering more and more office commands, that I am sure where there even before Office 2007 but were essentially “ghost commands”, buried into menus and not readily accessible.  A really great piece of software, even in the Beta version.

I'm here again!

Sorry, I have little excuses for not posting for a long time (more than one month!). The only justification is that I have been abroad for one month. Yes, I have been to Salt Lake City, UT, for the last month, working on the project I am currently employed on at IASMA.

In the last year, from July 2005 when I started working here on the Grape Genome Project, we prepared all the programs, databases and informatics structure to house and analyze data of the grape genome. This included databases for the handling of huge data volumes, applications for mass-analysis of data (gene predictions, automatic annotation of as many genes as possible using different sources, etc) and a first infrastructure to query and retrieve data on the genome.

This part involved a lot of scripting, being more a matter of handling data and gluing existing programs together, and this was the reason I had to learn Perl. But there was also room for building interesting programs which make a good use of structured data, such as the Gene Ontology. Ontologies have a lot of limitations (I will probably dedicate a post to this subject), but they are a great leap forward from the actual position of biological data handling. The structure of the Gene Ontology allow us to do some very interesting things, like inferring functions in a more precise manner, taking into consideration more than one source, the “informativeness” of the assigned term, and the degree of confidence we have; it is possible, for example, to see if a term is supported consistently among different sources. We also used the Gene Ontology to create a new and flexible tool for the analysis of expression in different classes of biological processes from microarray experiments, and a new tool for data-mining that queries genes not based on textual searches (as most of the public databases do) or on sequence similarity searches, but based on “semantic” queries, i.e. queries based on the “type” of the gene.

Be aware that ontologies and their terms are not type systems and types; personally I think they are a starting point on which to build more formal and complete techniques. But this will be part of future work, maybe even my future work… :)

# Wednesday, April 26, 2006

Did you know...

...that there are (more than) 21 algorithms to compute the factorial of a number?
I discovered this fact today, while implementing the Fisher's exact test for a 2x2 contingency table.
(For those interested: Fisher's exact test for a 2x2 contingency table, like the more famous Chi-square test, is used for the analysis of categorical frequency data: its aim is to determine whether the two categorical variables are associated.
We use it to extract the most significant genes from a gene set experiment: the two categories are "clue present/clue absent", while the two data sets are the control set and the experiment set)
Fisher exact test requires the computation of 9 factorials for each iteration, and one of them is on the total. Typically, applets and code you can found on the internet can handle data up to a total of about 100-200 before giving up due to overflow problems. Even code that uses libraries for arithmetic with arbitrary precision can handle numbers up to 6000-12000 before becoming unsuable.

Here, we are facing two problems:
  • numbers (integers and doubles) provided by the machine (physichal CPU or VM) have a finite precision and extension
  • the simple and dumb algorithm (the one that performs all the multiplications) is very SLOW
The solution is to adopt a good library that handles multiplications of very large numbers in an efficient way (using Karatsuba multiplication or a FFT-based multiplication), and forget about the naive algorithm. At this page you can find implementations and benchmarks for the 21 different algorithms I mentioned. The most efficients use prime numbers (or, better, primorials, i.e. the multiplication of the n first primes), but even the simple and elegant Split Recursive performs a lot better than the naive algorithm!