# Wednesday, November 09, 2005

More on syntax highlight

As you may have noticed, I have "solved" my problem of posting code. Now, if you look back at last Sunday's post, I have cool area limited by a dashed board for my code, with syntax coloring for C++ and correct indetation. And it works on Firefox too.
My effort was very limited, since I did not coded the entire solution by myself as originally wanted. I'm not entirely satisfied, so maybe in the future I could do it, or simply replace the syntax highlight engine. DasBlog uses the syntax highlight engine from Thomas Johansen. His highlight engine (here)  uses a simple api that takes the source code, a language definition file (written in XML), and spits out html. The code must be enclosed in <pre> tags, since it do not codify spaces as &nbsp. This was the first bug in dasBlog, and I got rid of it easily. The second problem was: why the popup window that could let me enter the code does not show?
The problem was in the showModal javascript function. It is a very pretty function for showing modal windows in a browser, very good for creating rich-content dialogs. Unfortunately, Firefox do not implement it. I could go for two solutions: using an alternate function (or better, a set o functionality like those implemented by SubImage) or not using a popup dialog at all. I went for the second one.
So, I was able to have the syntax highlighter I wanted in three steps:
  • redesign the Add Entry form;

  • Add some html to the code returned by Thomas Johansen's highlighter (for the border and the indentation);
  • Add an XML file for the C++ syntax.
  
It works pretty well; however, the highlighter has some problems: I wasn't able to highlight curly braces, for example, and it isn't able to recognize tokens not separated by a whitespace:

// Separated tokens
if (a != b)

// Not working
if (a!=b)



For now it works pretty well for me, but in the future I could code my own syntax highlight engine (for example follwing this article) or I could repalce it with a more professional one, for example CodeHighlighter

Here is a screenshot of my modified form; if you like it, send me an email and I'll post here the source code.
# Tuesday, November 08, 2005

Managed OS and contracts

Today I had some time to flush my blog queue, and I found two interesting posts: the last episode of Mark Russinovich "war" against Sony and its buggy rootkit used as a DRM system (surely something agaist trused computing.. I'd like to know if anyone knows Microsoft's official position) and Adi Oltean's Sunday post on the Singularity OS at http://research.microsoft.com/os/singularity/

I think the first one is surely worth discussion, but many (too many, since it reached the mainstream media) words have already been spent.

The second one came unexpected, and with a little sense of envy.. for not being there. It's since my conversion to the CLR that I'm dreaming and speculating on how to build an OS in "managed" code (or using as much managed code as possible). And now they have done it!
In the technical report there are many interesting ideas: from contracts (not so different from the one's we'll be using in Indingo, it seems) to the use of MSIL to produce verifialble code, the global optimizing compiler (better than a JIT) or the global garbage collector. And versioning, trusted components, and all the benefits the CLR brought to us... OS wide!
As my MSC thesis was on concurrency contracts for the CLR, I think a very interesting part of the project is the Sing# programming language. I had experience of the Spec# programming language (a good project from another MSR team). I actually had the luck to speak with Schulte and Tillmann from the Spec# team at SAPS last year. Their work inspired mine on the CxC language. Now I really want to go and see how Sing# works with concurrency, and if it is possible to fit it with my idea of concurrency contracts. I will do further reading on the subject in the following days, time permitting as always. For those interested, you can find the paper here, and my thesis here.

# Sunday, November 06, 2005

C#, C++ and Rotor (2)


To see how the C# compiler works, what is better to go and look at the sources? However, if you have ever written a compiler, you can understand that figure out who calls who is not so simple. I decided to simplify my investigation a little with the help of.. a debugger =)
I set two breakpoints in two functions that I believed to be correlated, but that seemed to not came in touch nowhere in the code. It was strange, because the two are parse function and are in the same class.

I placed the first breakpoint in ParseNamespaceBody. As you can see in parser.cpp, this function will in turn call ParseNamaspace, ParseClass,
which in turn will call ParseMethod, ParseAccessor, ParseConstructor...
The second breakpoint was placed instead in ParseBlock. This is the function that parses a block of code, i.e. all the statements enclosed by two curly braces { }.
The interesting thing is that the two breakpoints are not hit subsequentely, but in different moments with different stack traces.

This is the stack trace for the first breakpoint:
   
0006efe8 531fa3ad cscomp!CParser::ParseNamespaceBody+0x74 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\parser.cpp @ 1627]
0006effc 53216891 cscomp!CParser::ParseSourceModule+0x8d [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\parser.cpp @ 1602]
0006f18c 5320c226 cscomp!CSourceModule::ParseTopLevel+0x691 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\srcmod.cpp @ 3331]
0006f19c 53175940 cscomp!CSourceData::ParseTopLevel+0x16 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\srcdata.cpp @ 171]
0006f234 53175bd7 cscomp!COMPILER::ParseOneFile+0x130 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 1197]
0006f284 53174599 cscomp!COMPILER::DeclareOneFile+0x27 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 1228]
0006f324 5317767a cscomp!COMPILER::DeclareTypes+0x89 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 821]
0006f4d0 53173987 cscomp!COMPILER::CompileAll+0x13a [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 1676]
0006f65c 5317d0dd cscomp!COMPILER::Compile+0xd7 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 704]
0006fe48 0040db29 cscomp!CController::Compile+0xdd [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\controller.cpp @ 225]
0006ff50 0040e7ac csc!main+0x349 [d:\cxc\sscli\clr\src\csharp\csharp\scc\scc.cpp @ 2270]
0006ff64 79c8138d csc!run_main+0x1c [d:\cxc\sscli\pal\win32\crtstartup.c @ 49]
0006ff88 0040e70b rotor_pal!PAL_LocalFrame+0x3d [d:\cxc\sscli\pal\win32\exception.c @ 600]
0006ffc0 7c816d4f csc!mainCRTStartup+0x7b [d:\cxc\sscli\pal\win32\crtstartup.c @ 66]
0006fff0 00000000 kernel32!BaseProcessStart+0x23

   
And this is the trace for the second one:
   
0006eed4 5321849c cscomp!CSourceModule::ParseInteriorNode+0xbc [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\srcmod.cpp @ 4576]
0006ef7c 531e163b cscomp!CSourceModule::GetInteriorNode+0x32c [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\srcmod.cpp @ 3863]
0006ef98 531e14fb cscomp!CInteriorTree::Initialize+0x4b [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\inttree.cpp @ 80]
0006efc8 53217fe4 cscomp!CInteriorTree::CreateInstance+0x6b [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\inttree.cpp @ 61]
0006f068 5320c28a cscomp!CSourceModule::GetInteriorParseTree+0xd4 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\srcmod.cpp @ 3728]
0006f07c 5316d1e8 cscomp!CSourceData::GetInteriorParseTree+0x1a [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\srcdata.cpp @ 187]
0006f1f0 5316c44a cscomp!CLSDREC::compileMethod+0x6c8 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\clsdrec.cpp @ 6832]
0006f220 5316b728 cscomp!CLSDREC::CompileMember+0x9a [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\clsdrec.cpp @ 6569]
0006f268 5316c7b2 cscomp!CLSDREC::EnumMembersInEmitOrder+0x228 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\clsdrec.cpp @ 6327]
0006f2f0 5316c084 cscomp!CLSDREC::compileAggregate+0x252 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\clsdrec.cpp @ 6657]
0006f324 53177f1b cscomp!CLSDREC::compileNamespace+0xc4 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\clsdrec.cpp @ 6523]
0006f4d0 53173987 cscomp!COMPILER::CompileAll+0x9db [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 1922]
0006f65c 5317d0dd cscomp!COMPILER::Compile+0xd7 [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\compiler.cpp @ 704]
0006fe48 0040db29 cscomp!CController::Compile+0xdd [d:\cxc\sscli\clr\src\csharp\csharp\sccomp\controller.cpp @ 225]
0006ff50 0040e7ac csc!main+0x349 [d:\cxc\sscli\clr\src\csharp\csharp\scc\scc.cpp @ 2270]
0006ff64 79c8138d csc!run_main+0x1c [d:\cxc\sscli\pal\win32\crtstartup.c @ 49]
0006ff88 0040e70b rotor_pal!PAL_LocalFrame+0x3d [d:\cxc\sscli\pal\win32\exception.c @ 600]
0006ffc0 7c816d4f csc!mainCRTStartup+0x7b [d:\cxc\sscli\pal\win32\crtstartup.c @ 66]
0006fff0 00000000 kernel32!BaseProcessStart+0x23


As you can see, the lowest common ancestor is COMPILER::CompileAll. Then the traces diverge: the first one calls DeclareTypes, the second one compileNamespace. I started to guess what was happening, but I wanted to see it clearly.

So I examined the two functions and the functions called by them. The ParseMethod, ParseAccessor, ParseConstructor functions parse the declaration of classes, functions and properties. But then, when they reach the open curly brace, they stop. Or, better, they call a SkipBlock function that searches the body of a class/method/etc for other declaration (inner classes, for example) but skips the statements. They record the start of the method (the position af the first curly) and return.

Digging into ParseBlock, I saw that it will in turn call the function responsible to generate the intermediate code, a parse tree made of STATEMENTNODEs

   STATEMENTNODE   **ppNext = &pBlock->pStatements;
while (CurToken() != TID_ENDFILE)
{
if ((iClose != -1 && Mark() >= iClose) ||
(iClose == -1 &&
(CurToken() == TID_CLOSECURLY ||
CurToken() == TID_FINALLY ||
CurToken() == TID_CATCH)))
break;
if (CurToken() == TID_CLOSECURLY) {
Eat (TID_OPENCURLY);
NextToken();
continue;
}
*ppNext = ParseStatement (pBlock);
ppNext = &(*ppNext)->pNext;
ASSERT (*ppNext == NULL);
}

    
The ParseStatement function will drive the parsing calling every time the right function among
CallStatementParser
ParseLabeledStatement
ParseDeclarationStatement
ParseExpressionStatement

    
So this second phase will finally parse the statements. The pertinent code is in the ParseInteriorNode function:

m_pParser->Rewind (pMethod->iCond);
pMethod->pBody = (BLOCKNODE *)m_pParser->ParseBlock (pMethod);
 

Now, we can finally see what happens and why a C# compiler works without forward declarations or without the separation of declaration and implementation: it is a two-phase parser. The first phase collects informations about symbols (classes and function names) and builds the symbol table.
The second phase uses the symbol table (already complete) for name resolution, than builds the complete parse tree.
            
           


Blog engine, Code and firefox

DasBlog have a nice feature: since it uses the FreeTextBox component to allow editing of the blog, it also has a "Insert Code" button. However, this component is less then satisfactory, the main problems being:
  • Lack of C++ language support (as you can see for yesterday's post: highlight is for C#)
  • It does not work on Firefox (mozilla browsers do not have the showModalDialog funtion)
  • It does not mantain indentation (very bad!)
  • I wish to separate code and free text, maybe enclosing code in a nice box
So, I think I am going to implement something myself, that supports at least C++ (maybe /CLI)! Another pet project for my list =)
# Saturday, November 05, 2005

Blog engine

Finally I managed to setup a blog in my web space.
The first problem was: find a ready solution, or craft a simple blog engine by myself? Since, unfortunately, time is finite, I had to go for the first one.
I have already declared my love for .NET (and I will go into details on why I found ASP.NET the coolest solution for web applications, in a future post), so I had to go for an ASP.NET solution. I have found two good blog engines: CommunityServer/.Text/Subtext and dasBlog?
I think that CommunityServer is well over my need, however I like .Text/Subtext. Unfortunately, my hosting provider do not have MsSql...at least, not without paying too much (and even in that case, without direct access to the server! Which means, no stored procedures).
I have found a pretty article by Robert Love on customizing .Text for using it with Borland Interbase. And Phillip J Haack have plans in the future to port Subtext to MySql. So this became one of my (many) pet-projects. I plan to port .Text/Subtext to pure SQL, by writing a neutral DataProvider. It should be feasible, then, use it with MySql or even with a good embedded SQL engine like FireBird or SharpHSQL... For now, I am using the excellent dasBlog, that requires very little effort to deploy and setup!


C#, C++ and Rotor

As a C++ programmer, initially I was very disapponited when Microsoft took the .NET way. And I didn't really liked C#. Today, I have a different opinion (especially thanks to anonymous delegates, generics, all the other version 2 goodies and LINQ), but at the begininnig it was very hard for me to leave my loved COM+/C++ for .NET/C#.

I love a lot of C++ features, among them the possibility of separate declaration and definition, even phisically in two separate files (.h and .cpp). This is a need due to the early C++ compilers design, but it is also a way to guarantee a correct forma mentis (and to increase readability of the code).

Consider the following classes:

#pragma once

#include <iostream>

class Bar
{
public:

   void bar()
   {
      Foo* foo = new Foo();
      std::cout << "Bar" << std::endl;
   }
};

class Foo
{
public:
   void foo(Bar* owner)
   {
      bar->bar();
   }
};


There are two problems in this file: the obvious one, that it does't compile:

d:\VC\Test\Test.h(13) : error C2065: 'Foo' : undeclared identifier
d:\VC\Test\Test.h(13) : error C2065: 'foo' : undeclared identifier

the second one, that it does introduce a dependendency on iostream on anybody that is going to include it.
Now, if we separate definition

class Bar
{
public:
   void bar();
};

class Foo
{
public:
   void foo(Bar* owner);
};

and implementation

#include ".\test.h"
#include <iostream>

void Bar::bar()
{
   Foo* foo = new Foo();
   std::cout << "Bar" << std::endl;
}

void Foo::foo(Bar* bar)
{
   bar->bar();
}

the two problems are solved. Do you think my example is a stupid one? Sure, but think about the Model View Controller pattern, or about every system involving notifications, like a Window manager...

In C# there is no distinction: when you declare a type you must also give its implementation, the only exceptions being interfaces and abstract methods. So, how can a C# compiler deal with the above case? We will discover it in the next blog: time to do a little digging in Rotor!

Blog topics

For this blog, I have four topics in mind:
  • Speak about my work as a "bioinformatician"
  • Rotor
  • Tanscribe here my security lessons
  • Random thoughts
I plan to start today with somthing about Rotor, the Microsoft Shared Source CLI (a good article on MSDN here)
I have found Rotor one of the great things Microsoft did in the last years (the others being the msdn blogs and the shift towards security). For who don't know what I am speaking about, Rotor is the source code of .NET. It is actually mainly the production code of the clr (the execution engine, fusion, ...) of the C# compiler and of the J# compiler. There is even a limited BCL. The only different things (for copyright reasons, I suppose) are the Just In Time compiler and the Garbage Collector. And it compiles and runs on BSD and Mac OSX too!
I have found it invaluable for my work at the university, both on the security side and on the compiler side.
# Friday, November 04, 2005

Hello Blog World!

My name is Lorenzo, and I am an Italian developer. I am currently employed in a public institute for bioinformatics, in a small town not too far from where I live. At work, I focus on developing web applications for the management of laboratory data, as well as on tecniques to annotate (i.e. predict and give a function) to genes and genes products. This includes mainly algorithms on graphs and text mining.
My real interests, however, are Windows, .NET and programming languages.
Why? Well.. Windows was my first OS. I grew with him, gathering more knowledge in the years. I really like exploring the inners of the NT kernel, the Win32 API, the COM(+) programming model, and now .NET.
I am really fond of C++, its multi-paradigm nature, and I also like very much C# and ML.
I thought about writing a blog to share with others knowledge and experience, both on bioinformatics and on the other subjects. I'd love to know if anyone will read me. So please, comment! Thank you.

Test

finally, my blog!