# Sunday, 11 December 2005

Sorry for the delay

Wow! It's been a long time since my last blog.. Too much to do at work, plus a little holiday with a trip to Florence.. =)
Here in Italy we had a pleasant long week-end (we call it a "bridge" when an holiday day comes close to the weekend), and I especially enjoyed it at my granma' in Florence.
It snowed a little the week before, it was only about 20 cm but I think we are not more used to snow: traffic was a mess, and childrens were so happy =) it used to snow a lot more when I was a child, when i was five we had a huge snow shower that brought us almost 1 meter of snow!

So, back on this blog subject: programming! At work we had a lot of work because our servers finally arrived. There were 4 beautiful AMD opteron at 2.6GHz that needed me to be inserted in an HPC cluster. An HPC cluster is not for redundancy or fail-over, nor for load distribution in a web-server fashion: is about high performance computing. So, I had to learn about ways of configuring and using the cluster, and because I found the informations both interesting and amusing the few next entries will be about clusters!
Bye

# Sunday, 27 November 2005

Jingle Bells time again...

Yesterday we had a snow shower, today is the first Sunday of Advent.. What should be more appropriate for a practical example than Jingle Bells? =)

Trono designed the Santa Claus problem as a new exercise in concurrent programming. He wished to have an exercise harder to solve than the ususal mutual exclusion problems, and indeed the Santa Claus problem involves three sorts of process that need to synchronize and cooperate at certain moments. The problem may be stated as follows:

Santa repeatedly sleeps until wakened by either all of his nine reindeer,
back from their holidays, or by a group of three of his ten elves.
If awakened by the reindeer, he harnesses each of them to his sleigh,
delivers toys with them and finally unharnesses them (allowing them
to go off on holiday). If awakened by a group of elves, he shows each
of the group into his study, consults with them on toy R&D and
finally shows them each out (allowing them to go back to work).
Santa should give priority to the reindeer in the case that there is
both a group of elves and a group of reindeer waiting.

Some parts of the problem, if solved with simple primitives like monitors or semaphores, are very tricky: Trono's original implementation had a bug itself, as pointed out by Ben Ari. Ben Ari solved the problem in a quite simple way using Ada95 synchronization primitives. He also showed how java monitors ar not much better at this task than the simple P-V semaphores used by Trono.
Unlike Java, Ada 95 do not use monitors with condition variables which must be explicitly signalled and waited upon. The language uses instead a construct similar to our wait preconditions: a boolean expression (called a barrier) associated with each method. If the barrier evaluates to true, the thread calling the method is allowed to execute it, it nobody is already 'holding' that object (access to objects is mutual exclusive).
If the barrier is false, threads calling the method are queued. When a method exits, other threads are signalled, so they can re-evaluate therir barrier and see if they are allowed to execute the method.

Wait preconditions acts in the same manner, with two exceptions:
  1. it is possible to use temporal logic
  2. it is possible to refer to meta-variables representing methods.
Later, Nick Benton solved the problem using Polyphonic C# (now Comega). Comega is based on a paradigm derived from join calculus, whose basis are chords and asynch methods. To make it short, synchronization is message-based, and execution derives from the ability of a chord to 'consume' messages.

For example:

static void reindeerBack() & static async  reinWaiting(int r)
{
if (r == 8)
reindeerReady(); // last reindeer
else
reinWaiting(r + 1);
}


the method reindeerBack will be executed only if there is a reinWaiting 'message' ready to be consumed. Then will send the reindeerReady 'message' that will be consumed by the Santa's thread. This will wake up and do its job with the reindeer.

Polyphonic C# solution is nice, and the code very compact. However, it could be quite hard to understand if you are not familiar with process algebras, or with concurrency based on messages.

For our solution, we take an approach that combines the two. Synchronization contracts let us do so, since it is possible to use wait preconditions like Ada95 write barriers. On the other side, we can mimick (in a very restricted way, but sufficient for this example) the method/message consumption of Comega.

We use the Room class from Ben-Ari solution (we'll call it Queue, and this class will replace class nway in the Comega solution, since they solve the same problem).

class Queue
{
private int queueSize;

public Queue(int size)
{
queueSize = size;
}

void Accept() //Open_Door
{
queue.clear();
enterPossible = true;
}

public void Entry(Object o)
requires_wait(enterPossible == true)
{
queue.add(o);
waiting = waiting + 1;
if (waiting < queueSize)
{
waitForLastMember();
}
else
{
waiting = waiting - 1;
enterPossible = false;
roomDoor = true;
}
}

private void waitForLastMember(Object o)
requires_wait(queue.size() == groupSize)
{
waiting = waiting - 1;
if (waiting = 0)
{
//i'm the last one
roomDoor = false;
santa.Wake(this.queue);
}
}

public bool Full()
{
return queue.size() == queueSize;
}

public void WaitFull()
requires_wait(queue.size() == queueSize);

}

Then, to wake Santa, we do not use a special keyword like Ada95 (we do not have one, at this point), but somethig similar to Comega:

class Santa
{
//...

void WakedByReindeer()
{
harness.WaitFull();
DeliverToys();
unharness.WaitFull();
ReindeerQueue.Accept();
}

void Wake() { }

void Sleep()
require_wait(pre(@Wake))
{
if (ReindeerQueue.Full())
{
//Santa's wake and Go to Deliver (when all harnessed..)
WakedByReindeer();
}
else
WakedByElves();
}
}

We do not allow, with the actual implementation, to override methods with same signature but different wait preconditions, otherwise the Comega solution:

static void waittobewoken() & static async reindeerready() { }
static void waittobewoken() & static async elvesready() { }

could be mapped to:

void Sleep()
require_wait(pre(@ReindeerWake));
void Sleep()
require_wait(pre(@ElvesWake));


Surely, Comega solution is more elegant in this point, but our solution is not so bad, and we hadn't the need to introduce a keyword, like select in Ada95.

The key point is that synchronization contracts are flexible; they allow the programmer to use the style best tailored to the task, or best tailored to her own experience.

References:

N. Benton
Jingle Bells: Solving the Santa Claus Problem in Polyphonic C#
Microsoft Research, 2003

M. Ben-Ari
How to solve the santa claus problem.
Concurrency: Practice & Experience, 1998.

# Saturday, 26 November 2005

Synchronization contracts: an example of syntax

I'd like to introduce very briefly the syntax of synchronization contracts.
As we have seen in the previuos two posts, we use them to express
  • assertions on temporal behaviour
  • synchronization
In the first group we have three keywords, mutated from DbC (those familiar with the concept will be at home here): requires, ensures and invariant.

In the second group, we have the variants of preconditions and postconditions we introduced last time: requires_wait and ensures_wait. In the base form of a synchonization contract language, these keywords are all we need. Well, plus the LTL keywords, of course.
Their meaning is pretty obvious: for a complete introduction, see chapter 5 of my thesis. Here I will introduce only some of them:
  • pre(cond): cond was true at the previous step
  • once(cond): cond was true once
  • cond1 since cond2: cond2 became true, and from that moment cond1 was true also
  • cond1 until cond2: cond1 must hold until cond2 is true
These operators will be useful to express conditions on the flow of the program: the first three are PLTL operators, the last one is a classical LTL operator.

class Stream
{
bool ready;
invariant(@recevive -> (!@send until @endReceive));
invariant(@receive -> finally(endReceive));

void init();

void receive()
requires(ready && once(@init));
void endReceive()
void send()
void endSend()
}

Pretty simple, huh? the @method construct is used to express the concept "the method has been executed", and is a pretty shortcut for not introducing a ton of state variables. Here synchronization contracts are used to specify behaviour, and to check it. But let's see another example:

public class HistoryBuffer
{
invariant(current == max -> (!@Put until @Get));

public virtual object Get()
requires_wait(current > 0);

public virtual object Get()
requires_wait(current < max);
{
--current;
object ret = buf[current];
return ret;
}
 public virtual object GGet()
requires_wait(pre(@Put))
{
return Get();
}
}

Now we have chacks (the invariant) and also synchronization (the guards on Put, Get and GGet). The GGet function have the requirement to be executed only if the previous operation is a Put (i.e. it does not permit two subsequent Get operations). See how it is simple to express such a concept, much simpler than with a stadard programming language.
Next time we'll see an interesting synchronization problem and how it can be solved with synchronization contracts.

# Friday, 25 November 2005

Contracts and Inheritance anomaly.. A solution?

Despite the dozens of concurrent object languages born in the research laboratories few have seen practical use: objects and concurrency constructs have failed to work well together.

This happenes because the object oriented methodology and the synchronization of concurrent activities are orthogonal concepts. The puprose of Concurrent Object Oriented Language is to find a way to resolve automatically this orthogonality.

The famous anomaly in the combination of synchronisation of concurrent activities and inheritance is well known as the Inheritance Anomaly: when inheriting from a base class to add a new method, unrelated to the other methods in the class, concurrency constructs force the redefinition of all other methods of a class.

There is not "solution" to the inheritance anomaly: as long as synchronization constructs and application code are interwinded, the problem stands.
However, we can mitigate it separating the two as much as possible. If we could put no synchronization code at all, obviously the anomaly is solved.
Therefore, we use synchronization constract not only to check the temporal behaviour of objects, but to "orchestrate" them. To put it simple, a method is not executed till its precondition is statisfied; assertions are turned into synchronization primitives. A method will not return until its postcondition is satisfied (so we have a rendez-vous point). This approach make it possible to write monitor-like code using only precondition and poscondition, without writing code in the method body.

Of course, not all scenarios can be modeled using only synchronization contracts, but temporal logic should be powerful enough for many common scenarios. Next post, after all this speaking, we will finally see a real application in a common scnario.

# Thursday, 24 November 2005

Introducing synchronization contracts

In the last year, during my thesis period and later, I have heard a lot of interest from the software research community around contracts. Maybe I'm biased, since my thesis was on contracts, but Don Box claimed more than once that Indigo (now WCF) is based on contracts, and Singularity, the new prototipe of managed OS from Microsoft Research, is based as contracts as well.
In the same period, we also had an increasing interest in concurrency; the most evident sign is the interest raised by Herb Sutter's article "The free lunch is over".

The idea expressed in my thesis is: why do not use contracts to specify concurrency constraints? We can use them to check if a desired temporal behaviour is met, or if an undesiderable sequence of events happens. But we can do even more: we can enforce a desired temporal behaviuor, generating synchronization code from the contract isself.

If you are interested in this technique, let me explain it in some more details: in a contractual aware language, the correctness of a program can be expressed and enforced through contracts. Contracts are assertions on a single method (pre-conditions and post-conditions) or over a whole class (class or object invariants). In their simplest form, those assetions are expressed in boolean logic, but there are various "levels" for contracts:
  • basic or syntactic contracts (for example: XML schemas, IDL);
  • assertion-based contracts (for example: Eiffel);
  • behavioral contracts (assertions are not only on methods or classes, but on the control flow as well);
  • synchronization contracts (like the previous ones, but able to work in a concurrent environment, i.e. to check temporal behaviour);
  • QoS contracts (like the previous ones, but with the possibility to specify also numerical constraints (throughput, mean response time, etc.)).
Therefore a contract aware language is often a dialect of an original programming languages with few additional keywords for expressing assertions.

A compiler for a language supporting concurrent contracts, therefore, will automatically generate checks to see at run-time if a undesiderable sequence of events happens. To express assertions from which the check will be generated we use temporal logics, an extension of boolean logic that adds operators for specifying temporal behaviour.

How it works? We mutuate techinques from formal cheching techniques. The temporal logics we use (LTL -linear temporal logic- and PLTL -LTL with Past-) are mutuated from this field as well.
The life of an object can be seen as an execution trace, i.e. a sequence of traversed states. From its "birth" (construction) to its "death" (finalization and destruction) an object has an internal state, and this state is altered by external events through callings to its method, using the interface it provides.

We can check if a particular assertion holds by examining the sequence of state the object traversed. Moreover, since the satisfaction of a Past LTL formula can be defined in a recursive manner, for those formulas we can do it only looking at the previous execution step.
For LTL formulae, instead, we make use of well-known algorithms to translate every formula in a finite state automaton.

This is for the checking side, but we wished to push it a little further...

# Tuesday, 22 November 2005

MSDN search on Firefox

I use Mozilla Firefox as my primary browser, mainly for its great extensibility through extensions/plugin/whatever. They are a big add-on for every developer! Anybody knows that customization is not a primary goal for the mainstream market (most of end-users prefer standardization); however if you are a developer you try to tailor to your needs everything, starting from the development IDE through the web browser.
Having heard from Antimail of the new MSDN search engine, I could not resist to add it to the search engines in Firefox.. =) Here is the code and png images for it.

BTW, I hope Microsoft will add a managed model to IE 7. Making the browser features exposed to .NET languages will be great. I'm not talking about the HTML rendering component: this one is already an ActiveX object re-usable in every application. I think about the whole browser: context menus, searchbar, progress, address and menubars.. It should be a pleasure to write Browsers plugin with .NET: for now, it is possible only in the excellent new Visual Studio 2005!

msdnplugin.src (.89 KB)
msdnplugin.png (.88 KB)

# Saturday, 19 November 2005

My first lesson was...

It is over an year from my first lesson as an assistant professor at the University of Trento, held while I was writing my thesis. The course was on Security and Privacy, and my lessons where on Software Attacks and how to prevent them. I do not claim to be a Security expert, I'm not, but my love for compilers, inner OS working and my willing to know "how it works inside" provided me with some knowledge to show the others what happens when a cracker gets control of your computer.
I always promised me to publish my lessons on the web. I also have some examples, but I don't think I'm going to publish them. They are all innocuos, since they only affect themselves, but maybe it's not so responsible to leave them "free" in the open air.. =). My first lesson was on general vulnerabilities, an introduction to the lexicon and to the first end simples type of software attack: the Buffer Overflow. If you want to read the whole Lesson you can find it here (in PowerPoint format).

If you, like me, are lazy, here is a little appetizing of what you will found, to see if it's worth your time. I hope so!
When you compile a C program, the instruction are translated into ASM code (and then to machine code, but here is almost a 1-1 mapping). As we saw some days ago, on an x86 machine most of the data (parameters, local space) is held on the stack. As an example, consider the following C code:

void f(int a, int b)
{
char buffer[20];
int i;
int j;
j = a;
i = a + b;
}

int main()
{
f(2, 3);
return 0;
}

When we compile it, it is translated to the following ASM instructions:

; 3    : {

   push  ebp
   mov   ebp, esp
   sub   esp, 28              
   
; 4    :    char buffer[20];
; 5    :    int i;
; 6    :    int j;   
; 7    :    j = a;
; 8    :    i = a + b;

   mov   eax, DWORD PTR [ebp + 8h] ; a
   mov   DWORD PTR [ebp - 8], eax ; j EBP - 8
   add   eax, DWORD PTR [ebp + 0Ch] ; b
   mov   DWORD PTR [ebp - 4], eax ; i EBP - 4
 
; 9    : }

   mov   esp, ebp
   pop   ebp
   ret   0

; 12   : {

   push  ebp
   mov   ebp, esp
   sub   esp, 0

; 13   :    f(2, 3);

   push  3
   push  2
   call  _f

; 14   :    return 0;

   xor   eax, eax

; 15   : }

   mov   esp, ebp
   pop   ebp
   ret   0

From the assembly code, you can see how parameters and locals are translated to stack space. The return address (the point from which the instruction is called, and to which we should return) is also saved on the stack. Can you see it? If we fill the buffer with over 20 character we will spill over, "invade" the space of the other locals, of the return address, and of the other parameters.

void func()
{
char buffer[20];
char* current = buffer;
int i;

for (i = 0; i < 256; ++i)
{
*current = 'A';
++current;
}
}

int main()
{
func();
return 0;
}

What will happen? A memory access error ("This program will be terminated" on Win32, "Segmentation fault" on Linux), surely. But, if you compile and execute the code, at which address the fault will happen? (Suggestion: the ASCII code for 'A', repeated four time, is a memory address in the 32-bit virtual memory space of your process).

Now with another example program we try to spill the buffer in a clever way: knowing that at memory address 0x00401090 there is an "interesting" piece of code, we can try to "return" to it, instead of returning to main.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

//00401090
static char message[] = "AAAAAAAAAAAAAAAAAAAAAAAA\x90\x10\x40\x00";


void handleRequest()
{
char buffer[20];
strcpy(buffer, message);
printf("Request: %s\n", buffer);
}

void doNastyThings()
{
printf("He he!!\n");
}

int main()
{
while(1)
{
handleRequest();
}
return 0;
}

Compile it with VC++ 6.0 (if you have another compiler, obviously the address of the doNastyThings function will change: fix it). Surprised to see "He he!!" on the console? Have I tickled your brains? So, to see how it works.. Let's read my lesson!

BufferOverflow.ppt (162 KB)
# Friday, 18 November 2005

More on SPARC


Yesterday we saw briefly the architecture of SUN SPARC processors, and how the presence of a big register array and its organization into register windows affected the calling convention and the assembly programming model.
The idea seemed clever, and C programs should be affected very positively by this design (we know that function calls are a great performance issue on x86). However, its is with a reason that the SPARC-Solaris platform was called Slowlaris.. =)
The register window was designed without consideration of a real word scenario: the presence of all those register is a terrible slowdown on multitasking operating system; when a context switch happens, all the register of the thread being switched off must be dumped into memory. It's not like on x86, where only general purpose registers shuld be pushed in memory; the other ones are "hidden", and used cleverly by the processor.
For SPARC processors, the number of registers is (for a 8 windows processor) 136, i.e. about 1Kb. With the increasing of CPU frequencies, memory bandwidth is a major bottleneck, and with a slice time of 50 ms (a quite high slice time) the memory wasted is about 20Kb every second.
It may seem not too much, but as an example I have read that IPC would be 5 times slower on a Sparc processor than on
a comparable 8-register processor, if all  Sparc registers are saved and restored on context switch.
Algorithms for lazy context switch have been proposed, but those are applicable only through a special designed kernel or through harware modifications.
Hence the "friendly" name Slowlaris for the SPARC-Solaris pair.

# Thursday, 17 November 2005

Porting Bugs


Today I worked with a developer from another corporation on porting some C programs they wrote for a SPARC (with Solaris) to our machines, that are all based on Intel x86. It when pretty much straightforward, we were ready to face the endianess problem (for network communication and binary files reading). But we found a very interesting bug, that didn't manifest on SPARC but fired immediately on Intel. The souce was more or less:
int Function1(int paramA, int paramB)
{
if (paramA == 0)
{
//bad value
return 0;
}

if (paramB != 1 || paramB != 2)
{
//bad value
return 0;
}

//do stuff
//call another function
Function2();
}

int main()
{
//...
if (Function1(a, b) == 0)
printf("Error: Function1 returned 0\n");
//...
}

Can you already spot the bug? We did, after a little "tracing" (printf) (I know, is the worst type of debugging.. but we were compiling over ssh, in a shell, and it proved to be a quick solution).
If you have already guessed what's next, you can stop reading and go to see the solution. If not, you can enjoy a trip in the calling conventions of two very different platforms: a CISC and a rich-register RISC.

I know, defining a Pentium IV as a CISC is not quite correct. Pentium processors have now a superscalar architecture, unordered execution, and internelly they use micro-ops that are very RISC-like. And they have a tons of registers (anybody knows how many, exactly?). But the "programming interface", i.e. the instruction set and register set exposes by the CPU to the world is CISC. It has only four general purpose registers, and a rich instruction set.

On the other side, SPARC architecture, instruction set and register set are very different. Even fuction calling conventions on SPARC are deeply influenced by its early hardware design. Since the instruction set was reduced (in the first version there wasn't even an instruction to divide two numbers), engineers tought to use the spared die space to fill it with a lot of register. And in order to use them efficiently, they organized them in several "register windows".
On x86, every function has its own stack space (the stack frame). On SPARC every function has also its own "register space", a register window. A register window is a set of about 24 register, 8 %i (input) registers, 8 %o (output) and 8 %l (local). The registers are mapped in a clever way: out registers in the caller become in registers in the callee, so that the first 8 parameters are passed through the 8 out-in registres. The first one has a special additional semantic:

%o0  (r08)  [3]  outgoing parameter 0 / return value from callee   
%i0  (r24)  [3]  incoming parameter 0 / return value to caller

This register, unlike the other input registers, is assumed by caller to be preserved across a procedure call
This mapping is done with the help of the SAVE and RESTORE functions; the prolog/epilog structure on SPARC so it is like:

function:
   save  %sp, -K, %sp

   ; perform function
   ; put return value in register %i0 (if the function returns a value)

   ret        
   restore

   
It is important to note that in the SAVE instruction the source operands (first two parameters) are read from the old register window, and the destination operand (the rightmost parameter) is written to the new window. "%sp" is used as both source and destination, but the result is written into the stack pointer of the new window, which is a different register (the stack pointer of the old window is renamed and becomes the frame pointer).
   
The call sequence is:

call <function>
mov 10, %o0

    
The delay slot is often filled with an instruction to set some parameters, in this example it loads the first parameter.

What is a delay slot?
When the control flow is altered (by a unconditional jump instruction, like call, jmpl or branch) the order of execution is inverted: the instruction after the unconditional jump instruction is being fetched simultaneously to the unconditional jump instruction.
This is done to implement a simple pipeline solution: when performing a jump instruction, there is already another instruction in the pipeline: the instruction in the delay slot, will be executed before the processor actually has a chance to jump to the new location. This is the simplest scheme for the chip designer to implement, since the general pipelining mechanism can be used without making any exceptions for transfer instructions; and it is the fastest way of arranging things, since no instructions are discarded.
The SPARC assembly language programmer must be aware of delay slots constantly while coding any change in flow of control, since the order of instruction execution is reversed from the order in which the instructions appear.


Returing to our problem, when function a calls function b %o registers in function a become %i in function b.
If b do not touch %i0 (it reaches the end of the function without a return statement) %i0 / return value remains equal to the incoming parameter.
In the original code, parameterA is always != 0 (otherwise, the function returns with value 0), so the fuction returns to the caller with return value != 0 (uqual to the first parameter).

What happens when porting to Intel? Intel calling conventions are very different. Remeber, x86 has only 4 general purpose registers accessible through assembly language, so it has to pass parameters using the stack:

  • Arguments are pushed on the stack in reverse order.
  • The caller pops arguments after return.
  • Primitive data types, except floating point values, are returned in EAX or EAX:EDX depending on the size.
  • float and double are returned in fp0, i.e. the first floating point register.
  • Simple data structures with 8 bytes or less in size are returned in EAX:EDX.
  • Complex Class objects are returned in memory.
  • When a return is made in memory the caller passes a pointer to the memory location as the first parameter. The callee populates the memory, and returns the pointer. The caller pops the hidden pointer together with the rest of the arguments.

If no C return statement is found, when the control flow reaches the end of the function the ret
instruction is executed and the control returns to the caller. The return value, i.e. the one in the EAX register, is the last value that register assumed... possibly the return value of the last function called.
In our case, the last function before the end (Function2) returned 0, and so our Function1.

What is the lesson? Compiling and testing on one platform (even with a cross complier, like the GCC we used) isn't enough. What surprised me was the absolute absence of compiler errors or warnings. Maybe I am too used to tools that give you warnings if you do something wrong =) but that's the way I like them. When a compiler seems to be too strict or to give you too much warning, remember that an error detected by the complier is a bug less in your product.

# Friday, 11 November 2005

Visual Studio 2005 installed!

Yesterday I was at the Microsoft 2005 Technical conference, part of the 2005 ready to launch tour. We all got Visual Studio 2005 standard NFR; a very glad present. It's true: Visual Studio 2005 rocks! Thank you to all the team for the great work, and to Cyrus for the beutiful IDE!