# Saturday, 06 August 2016

Old school code writing (sort of)

As I mentioned in my previous post, online resources on Hosting are pretty scarce. 

Also, writing an Host for the CLR requires some in-depth knowledge of topics you do not usually see in your day-to-day programming, like for example IO completion ports. Same for AppDomains: there is plenty of documentation and resources compared to Hosting, but still some more advanced feature, and the underlying mechanisms (how does it work? How does a thread interact and knows of AppDomains?) are not something you can find in a forum. 

Luckily, I have been coding for enough time to have a programming library at home. Also, I have always been the kind of guy that wants to know not only how to use stuff, but how they really work, so I had plenty of books on how the CLR (and Windows) work at a low level. All the books I am going to list were already in my library!

The first one, a mandatory read, THE book on hosting the CLR:



Then, a couple of books from Richter:

  

The first one is very famous. I have the third edition (in Italian! :) ) which used to be titled "Advanced Windows". It is THE reference for the Win32 API.
If you go anywhere near CreateProcess and CreateThread, you need to have and read this book.

The second one has a title which is a bit misleading. It is acutally a "part 2" for the first one, focused on highly threaded, concurrent applications. It is the best explanation I have ever read of APCs and IO Completion Ports.

  

A couple of very good books on the CLR to understand Type Loading and AppDomains.
A "softer" read before digging into...

  

...the Internals. You need to know what a TEB is and how it works when you are chasing Threads as they cross AppDomains.
And you need all the insider knowledge you may get, if you need to debug cross-thread, managed-unmanaged transitions. And bugs spanning over asynchronous calls. 

My edition of the first book is actually called "Inside Windows NT". It is the second edition of the same book, which described the internals of NT3.1 (which was, despite the name, the first Windows running on the NT kernel), and was originally authored by Helen Custer. Helen worked closely with Dave Cutler's original NT team. My edition covers NT4, but it is still valid today. Actually, it is kind of fun to see how things evolved over the years: you can really see the evolution, how things changed with the transition from 32 to 64 bits (which my edition already covers, NT4 used to run on 64 bit Alphas), and how they changed it for security reasons. But the foundations and concepts are there: evolution, not revolution.

  

And finally two books that really helped me while writing replacements for the ITaks API. The first one to tell me how it should work, the second one telling me how to look inside the SSLCI for the relevant parts (how and when the Hosting code is called).

Of course, I did not read all these books before setting to work! But I have read them over the years, and having them in my bookshelf provided a quick and valuable reference during the development of my host for Pumpkin.
This is one of the (few) times when I'm grateful to have learned to program "before google", in the late '90/early '00. Reading a book was the only way to learn. It was slow, but it really fixed the concepts in my mind. 

Or maybe I was just younger :)


# Friday, 15 August 2014

Android NFC service and "thin client": one problem, and one hack

Lately (in the last year or so), Android work intensified at my company. So, I finally took the time to study it in depth and I discovered how MUCH Android differs from what I was expecting. It really starts to make sense when you dig under the cover. And you start to discover how much better your apps behave when you are using the SDK the way it should be used (and you also start to pick up defects in other apps and say "Ha! You did that! Gotcha!).
But this is the topic for another post... :)

Today I want to concentrate on an issue I was experiencing using the NFC framework in Android to read our contactless cards.
Using the NFC framework as a general purpose card reader is a bit on the "stretchy" side: the framework, after all, is mainly there to read Ndef tags, which have a precise structure. Fortunately, Android allows you do go deeper, and interact directly with a card using a "transceive" method.

In this way, you can send commands and data to the card (in the form of a byte[]) and receive a response from the card (again, in the form of a byte[]).
So far, so good: this means that we can read our Mifare Desfire cards, using the Desfire authentication and our keys.
I implemented the commands to authenticate a card, select an application (i.e. a protected directory inside the card memory) and read the data.

All is working well, but.. you have to store your key on the phone, and the storage on your phone is not secure.
In theory, every application has a local storage that cannot be read by other applications. In practice, you just have to have root access to your phone (which is mighty easy with Android handsets) and you are done.

This is not a particular problem for some scenarios (e.g. if you provide an app that uses the user differentiated key, so that the user can read his own card), but it is a problem when you need to read multiple cards, and therefore to use the master key.

Suppose you are a third-party company. You are my friend, and you want to provide a discount for my subscribers (people that have my smart-card).
How can you check that the card is real, and that the card is not expired? Easy, you authenticate with the card and read its content: the expiration date is written right there.
But I do not trust you enough to let you have my read keys!

Maybe you even want to top up my card with "reward points": if my users buy something from you, they will have discount on my services. Super cool!
But I will not let you have my write keys.. that's out of question!

Sure, you can read just the UID, and use that to look up user info on my web-service. And use the same service to POST reward points. But my network is sparsely connected, and it might take long before a card is used on one of my terminals and I can update them.
And we have seen that a UID can be faked..
So?

The answer is "thin-client". You use your NFC phone as an antenna, nothing more. What you read from the card is sent as a (hex-encoded) string to a web service. The web service contains the logic and data to interpret the request and prepare the right response. The response is sent back to the phone, and then transmitted to the card.

You can authenticate with the card, but your keys are safely stored away on your server and they never transit on the the phone!
The phone does not even see the personalized key, so the user is safe against cloning.
I build a prototype, and it worked great on our WiFi network.
They I tried to use it on a cellular network and it failed (almost) regularly. Why?

My suspect was that after a (very short) while the card was reset.
The answer I was getting back from the card was something like "operation not supported in this state". It was like somehow the card forgot that we were in the middle of an authentication challenge-response before the protocol was over.
I decided to investigate, to see if my suspicion was confirmed.
Fortunately, Android is OSS and source code is available! So I dug into the Android source code, looking for clues in the NFC implementation.

Android implements NFC using a mix of libraries and processes; most of the NFC stack is native, and managed by the OS. Then, there is a Service (provided by the OS) that handle communication with the native NFC stack. And some client-side classes you can use inside your application, which will communicate with the Service, hiding it from you.
I started to dig into the source by following a "tranceive" call.

On the application side, you receive an Intent when a card is presented to the reader. Inside the intent payload there is a class derived from BasicTagTechnology; in our case, we use a ISO-A compatible card, so we get a IsoDep object.

The most important method of this class is, as I mentioned, tranceive:

IsoDep.transceive
BasicTagTechnology.transceive

The method inside is just a thin wrapper for remote invocation to a service, which is the NfcService or NfcApplication (the name has changed between Android releases:

Tag.getTagService().transceive(mTag.getServiceHandle(), data, raw)

class Tag ...

    public INfcTag getTagService() {
        return mTagService;
    }
 
INfcTag is an aidl interface, which is used to forward data and commands to NfcService.
We can follow the transceive implementation inside NfcService:

public TransceiveResult transceive(int nativeHandle, byte[] data, boolean raw)   
 
 tag = (TagEndpoint) findObject(nativeHandle);
 response = tag.transceive(data, raw, targetLost);
 ...
 Object findObject(int key) {
        synchronized (this) {
            Object device = mObjectMap.get(key);
            if (device == null) {
                Log.w(TAG, "Handle not found");
            }
            return device;
        }
    }
  ...

So, there is another "Tag" class inside the service; all known (in range) tags are held by the NfcService class in a map.
This "Tag" is named NativeNfcTag:
    
public class NativeNfcTag implements TagEndpoint
   ...
   private native byte[] doTransceive(byte[] data);
   public synchronized byte[] transceive(byte[] data) {
      if (mWatchdog != null) {
         mWatchdog.reset();
      }
      return doTransceive(data);
   }
   ...

The implementation of doTransceive is native, and it varies from a card tech to another.
We have found the end of the flow. Have we also found any clue about the card reset?

The answer is there, inside NativeNfcTag. You should have notice the "mWatchdog.reset()" statemente inside doConnect. What is mWatchdog?

private PresenceCheckWatchdog mWatchdog;
    class PresenceCheckWatchdog extends Thread {

        private int watchdogTimeout = 125;

        ...

        @Override
        public synchronized void run() {
            if (DBG) Log.d(TAG, "Starting background presence check");
            while (isPresent && !isStopped) {
                try {
                    if (!isPaused) {
                        doCheck = true;
                    }
                    this.wait(watchdogTimeout);
                    if (doCheck) {
                        isPresent = doPresenceCheck();
                    } else {
                        // 1) We are paused, waiting for unpause
                        // 2) We just unpaused, do pres check in next iteration
                        //       (after watchdogTimeout ms sleep)
                        // 3) We just set the timeout, wait for this timeout
                        //       to expire once first.
                        // 4) We just stopped, exit loop anyway
                    }
                } catch (InterruptedException e) {
                    // Activity detected, loop
                }
            }
            // Restart the polling loop

            Log.d(TAG, "Tag lost, restarting polling loop");
            doDisconnect();
            if (DBG) Log.d(TAG, "Stopping background presence check");
        }
    }

    
    
The "watchdog" is a thread that at short intervals (125ms) checks if the card is still in range, using the "doPresenceCheck()" function. Which is native, and card-dependent.

The function could be therefore an innocuous instruction (a no-op), or a new select that will reset the card to its not-authenticated state.
Guess which one is for Desfire cards?

So, if the watchdog is not reset periodically by transmitting something to the card, a presence check will be triggered and the card will be selected again, resetting the authentication process. While you are still waiting for the cellular network to answer (125ms is a short time on 3G).

I started to think on ways to work around it, from suspending the thread (inside another process - the service - in Android? Root necessary), to set the timeout (by invoking a method on NativeNfcTag using reflection... again, another process was out of my reach) to substitute the code for "doPresenceCheck()" (which you can do with things like Xposed, but.. that will require Root access too).

You just cannot access anything inside another process in Android, if you don't have root access. Which is usually a very good thing indeed, but it getting in our way in this case.
But what about our process? Sure, we can do almost anything inside it... but how can it do any good?

Well, there is a function inside NativeNfcCard which we can use. This function is "exposed" from "Tag" (the non-public class used at "client" side, see above), but not by BasicTagTechnology.
So we cannot call it directly (like transceive), but from the Tag class onwards it follows the same flow as transceive. This function is "connect":

class Tag {
   ...
   public int connect(int nativeHandle, int technology)
   public synchronized int connectWithStatus(int technology)
   ...
}

If we examine the source code of "doConnect" on the other side (its implementation inside NativeNfcCard) we can see that this function will reset the watchdog too (like transceive). Moreover, we can turn "connect" into a no-op:
private native boolean doConnect(int handle);
    public synchronized boolean connect(int technology) {
        if (mWatchdog != null) {
            mWatchdog.pause();
        }
        boolean isSuccess = false;
        for (int i = 0; i < mTechList.length; i++) {
            if (mTechList[i] == technology) {
                // Get the handle and connect, if not already connected
                if (mConnectedTechnology != i) {
                    ...
                } else {
                    isSuccess = true; // Already connect to this tech
                }
                break;
If the technology we specify is the same one we are already using, or if it is a non-existing technology, the function will do nothing.

We can just grab the Tag class inside our code, call connect on our side (using reflection, as it is not exposed by the API), and wait for it to forward the command to the service, resetting the watchdog. Do this regularly, and we can "buy" as much time as we want to complete our authentication protocol!

This is obviously a hack. But I tested it with every version of Android we support (2.3.6, 3.x, 4.x up to 4.4.3) and it just works. It uses knowledge of an internal mechanism which is subject to change even at the next internal revision, but it seems that the code I examined has been stable for a while. And maybe, by the time it changes, they will fix the main issue (use a select function to control presence of a card) as well!

# Thursday, 06 July 2006

Windows Vista and Office 2007 Beta 2 (2)

Over all, I liked Office 2007 very much. The only thing I could not test was weblog posting, since dasBlog is not currently supported in the Beta version (it will be in the final version). I surely hope it will work well!

Instead, I have different feelings wrt Windows Vista. I followed Longhorn closely from the first PDC (2003?) and I was really looking forward to see it. Remember the “native kernel + managed subsystems” part? The three pillars, Indigo, Avalon and WinFS? Well, Indigo and Avalon are great but will be part of .NET 3.0, also available for XP, and WinFS is dead

So, what’s the point of Windows Vista? The three pillars gone, it remains the new UI, the improvements to the kernel and window manager, and the improved security model.

Running as a simple User and having to use tedious runas commands to do very common tasks on my notebook (such as changing IP, power profile or directory permissions) I thought the new LUA model of Vista will be great for me. The default user is still marked as “Administrator”, but I think (hope?) it is a simple User account under disguise, and when performing security related operations (i.e. clicking on buttons with the little shield) the security token is upgraded and substituted with one of the Administrator group, if the user grant the permit.
This is my first complaint: why they did that? Be clear, use default account from the Users group and simply ask for an Administrator password before running administrative programs, or on their first security related operation; then make the admin take ownership of the whole program. Surely, it is safest to ask every time if a program can do this or that…or not? People gets bored very easily, and do not always read what it is written on dialog boxes. Normal users almost never do that, they only try to get rid of that annoying (or for someone scary) box that prevent them “using my computer”. Despite this, the new LUA is still better than the previous situation.

The new window manager-UI instead is great. And I’m not only speaking about all the eye-candy (transparencies, the new Flip 3D I already love, the shiny new icons, the sidebar etc.) but also about usability. I love the new explorer UI and the new Open and Save dialog boxes. Finally we went a step further, stripping away the file menu where is no longer useful (like in Office 2007, where it “evolved” into the ribbon) or necessary (like in Explorer and Internet Explorer, where it is… no more!). The click-able locations on the address bar, the search box in the toolbar (yes! No more stupid search dogs…) and the new copy/move progress dialogs are some things I have waited for, and they are really great. And the Sidebar is both useful and beautiful to see (only one complaint: why is it not easy to hide and show it? Maybe with a F-something key, a-la Exposè?).
On the negative side, I have found the new UI very poorly configurable and customizable: If you chose Aero, you can’t even change colors. Very little can be done, but maybe this is the price to pay for a mainstream and “standardized” OS.

Finally, I know this is only a Beta, but I had a LOT of problems installing programs: cygwin does not work, it is impossible to register file extensions for some programs (7-zip comes into my mind), other programs crash without reason. Even SQL Server 2005 needs a patch to work correctly! There is still much work to be done, and it passed a lot of time. Maybe Mini is right, and the Windows team need to change direction.
The course the event took really disappoints me. Vista is great, but not so great and not so radically different to justify for me the switch from XP (a 5 year old OS!). I love .NET, and the managed and secure world, and I’m with Gary McGraw when he says that Vista is a missed opportunity for a new, modern and secure OS; the three-years-ago Longhorn still looks better to me than the actual Vista. I’ll have to wait for Singularity… :)

# Saturday, 04 March 2006

Intercepting Windows APIs

As I described in a previous entry, one of the few games I really enjoyed playing was Enemy Territory. It is a free, multi-player FPS based on the Quake 3 engine. It is class based: you chose a class and that dictates the ability of your soldier (and what he can do). I played with my fellow university mates: some of them created a clan (they even did one or two official tournaments) and they wanted to train (I was not particularly good.. I received the "easy frag" attribute!). Besides, it was good to relax an hour after lunch, before attending other lessons.

However, we had an hard time playing it... The admin won't let us use the computer lab for non didactic purposed. It is silly, if you ask me, especially it was not explicitly forbidden by college rules: for example, students and professors alike are allowed to use empty classrooms to play card games. So why can't we use an empty lab to play a free game? Since the labs were not under CCTV surveillance, we took the risk and played nonetheless (we were young.. :) ). But one day, an email from the admin warned me to not use that particular game anymore. How did they know? Simple: someone was checking all the files on the public directories (were the game was installed), which user owned them (using ACLs) and what kind of files they where.

Me and a friend of mine started to think about the problem. Initially we thought about manipulating the ACL to change the ownership (maybe to the Administrator.. it would be ironic!) of the game files, but it was impractical and it required privileges above those allowed to students, and we didn't want to do anything illegal (like an escalation of privileges). Our solution was simple: hide your programs, not only your data.

Once upon a time, programs consisted of a single .exe (or .com) file. Nowadays instead, an average application has thousands of files and DLLs in its installation directory. Think at Office, or at a game like Quake3. We wanted to execute a complete program out of a sigle data packed file, possibly compressed or encrypted. I'll discuss our ideas and the techniques we used, namely DLL injecting and API intercept and forwarding. We began to discuss seriously on the topic. Our first idea was to provide a DLL that was a proxy / interceptor for the msvcrt.dll, the C runtime of MS C++ compiler. This DLL contains the implementation of the C file handling function, such as fopen, fread, fseek. We can make a DLL with the same name, put it in the app directory (which come firts in the loader search path), export all the function of the original msvcrt.dll implementing file handling function and passing other function to the original DLL. Phew, a lot of work...msvcrt.dll exports 780 functions! We can already sense the calluses on our fingers! Furthermore, the C runtime can be statically linked to the exe, or the program could directly call Win32 API functions.

But wait, even fopen, fread, fseek and friends call Win32 API functions! So, plan B: intercept kernel32 functions! Despite her name, kernel32 is not a kernel module: is a simple user mode DLL that provides a nice API for the real kernel calls. So it can be intercepted... Calling the application we want to execute ot of the compound file victim, all we have to do is:

  1. Place some code in the victim process address space.
  2. Execute this code in order to:
    1. locate the IAT (Import Address Table) of the exe
    2. patch pointers in the IAT to point to OUR functions
  3. For now on, all calls to the patched functions will cause a jump not to the original kernel32 code, but to our functions.
The advantages of this appoach? It's more economic (we have to write only the functions we need), it works with (almost) every app (even with non C apps) and it's funny to code!

DLL injection

We need to place code and execute it in the address space of another process. This at first can seem impossible: every Win32 process has its Virtual Address Space and pointers range over this space, so it's impossible to access another process space [1][2]

The virtual address space: the lower 2GB are the user-mode space, and they are private for each process (see [1][2] for details)

Well, not really: how debuggers can work? With the help of the OS of course! We'll ask for help to the OS too. Our goal is to load a DLL in the victim address space: when a DLL is loaded, function \emph{DllMain} in the DLL is called, with dwReason equal to PROCESS_ATTACH. There are several methods to load a DLL in a process [3]:

  1. Windows HOOKS (the most ancient one). A Hook is a callback function called by windows every time a particular event occours: the most interesting one, when a top window is created or destroyed. We can then see if the application is interesting, and what to do with it. The nice thing is that the DLL that contains hook code is loaded into the other application address space.
  2. The registry. Somewhere in the registry, you can specify a key in which you place DLLs that have to be loaded in every process address space (\HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\AppInit_DLLs) . This is how mouse or video card DLLs end up in your address space. Drawback: you must have Admin rights to write to the registry, and you DLL is loaded in a lot of non-interesting processes. What a waste.
  3. Two magic Win32 functions: CreateRemoteThread and WriteProcessMemory [4].
Ritcher in [4] explains the magic very well. To summarize:
  1. Obtain the HPROCESS of the victim (via CreateProcess or via its pid).
  2. Reserve some space in the Virtual Address space of the victim with VirtualAllocEx.
  3. Use WriteProcessMemory to write the name of the DLL to load in the memory just reserved.
  4. Use CreateRemoteThread to load the library.

The virtual address space: the lower 2GB of the user-mode space, with the kernel32.dll loaded at the same address.

At a first time, we believed we needed to write shell code to execute LoadLibrary, and this is bad for 3 reasons:
(a) is difficult to write,
(b) with the new XP-SP2 NX (non execute) page protect flag we could have troubles.
Fortunately, we realized a fact: DLL are loaded in every process address space, so are private to each process. However, when you create a DLL you specify a "preferred load address'" at link time, and the OS loader will load the DLL at that address if it's free. This is due the fact that otherwise the loader must relocate the DLL, and this is a time consuming operation. This is particularly true for system DLLs, which are loaded always at the same address in every process. So, if we do a GetProcAddress against LoadLibrary in our process, we obtain the same address as in the victim process.

Scheme of the steps that lead zdll.dll to be loaded in victim's address space

We can pass to CreateRemoteThread the address of LoadLibrary as startup routine, and the name we wrote in victim address space as parameters as in figure.

IAT patching

Now we have our own code running in a thread in victim's address space. What can we do now? Everything. In particular, we can have access to PE data directories in our "host", the victim. The executables in Win32 (DLLs, exe, and even device drivers) follow a format called PE (Portable Executable). Every PE is divided in sections: export, import, resources, debug data, delayload, bound modules...[5][6].

The section we are interested in is the import section, with its IMAGE_IMPORT_DESCRIPTOR structure.

The import section, with its two parallel arrays of function pointers

The import section after the loader has done its work. The IAT now points to function entries in kernel32.dll

There's one IMAGE_IMPORT_DESCRIPTOR for each imported PE (executable or, a most common case, DLL). Each IMAGE_IMPORT_DESCRIPTOR points to two essentially identical arrays. The first one is the Import Address Table (IAT). The second one is called the Import Name Table (INT) and is used by the loader as a backup copy in case the IAT is overwritten in the binding process. Binding is an operation done to PE files before the link step, but this goes beyond the scope of this article. Matt Pietrek in [5] covers all the details. The IMAGE_THUNK_DATA structures in the IAT has two roles:

  • In the executable file, they contain either the ordinal of the imported API or an RVA (Relative Virtual Address, an offset from the base address at which the PE is loaded) to an IMAGE_IMPORT_BY_NAME structure. The functions we need to patch in DLLs are those with a name, so we look at those entries that contain an RVA. The IMAGE_IMPORT_BY_NAME structure is just a WORD, followed by a string naming the imported API.
  • When the loader starts the executable, it overwrites each IAT entry with the actual virtual address of the imported function 

 

The import section after zdll's DllMain has done its work. The IAT now points to function entries in zdll.dll

So we need to replace the addresses placed in the IAT by the loader with the addresses of our functions. Here the INT becomes important: how do we know which entry in the IAT we need to overwrite for, as an example, CreateFileA? We need to iterate through the entries of the IAT and INT together. The INT provides the name for the n-th entry, the IAT its VA. We simply overwrite the entry in the IAT with our own.

void patchIAT(PIMAGE_THUNK_DATA32 pINT, PIMAGE_THUNK_DATA32 pIAT)
{
   PIMAGE_IMPORT_BY_NAME ordinalName;

   while (1) // Loop forever (or until we break out)
   {
      if ( pINT->u1.AddressOfData == 0 )
           break;

        ULONGLONG ordinal = -1;

        if ( IMAGE_SNAP_BY_ORDINAL32(pINT->u1.Ordinal) )
           ordinal = IMAGE_ORDINAL32(pINT->u1.Ordinal);           
       
        if ( ordinal != -1 )
      {
           // We don't consider un-named functions
      }
      else
      {
         ordinalName = (PIMAGE_IMPORT_BY_NAME)getPtrFromRVA((DWORD)(pINT->u1.AddressOfData));         
        
         const char* funcName = (const char*)ordinalName->Name;
         PDWORD oldFuncPointer (PDWORD)&(pIAT->u1.Function);
        
         if (funcName == "CreateFileA")
         {
            pIAT->u1.Function = MyCreateFile;
            break;
         }        
      }     
       
      pINT++;         // Advance to next thunk
          pIAT++;         // Advance to next thunk
   }   
}


Compound file

So, at this point the only thing to do was to provide our own implementation of functions like CreateFile, WriteFile, SetFilePointer, FindFirstFile... and patch the IAT for kernel32 with them. But how can we  implement a file system in a single file? After some searching, I suggested that maybe Structured Storage, the way Microsoft calls its compound files, could be used: Word and Powerpoint uses them, for example.
It was only a suggestion, but the day after my mate came with an almost complete implementation based con Structured Storage functions and COM interfaces. Amazing! The last things to do were an application for building a compound file, and some cryptography to hide the content of the file. After all, this was the original goal :)

The final product worked. It was great! A piece of software complex as a video game was able to run with our own file APIs. We never used it (it was a bit too slow on startup, and we found a much simpler solution: network our notebooks), but it was fun, and I used the intercepting library we created for more interesting stuff!



[1] Jeffrey Richter. Load your 32-bit dll into another process’s address space using injlib. Microsof System Journal, May 1994.

[2] Jeffrey Richter. Advanced Windows Programming, 3rd edition. Microsoft Press, 1997.

[3] Mark Russinovich. Inside memory management, part 1. Windows and .NET Magazine, August 1998.

[4] Mark Russinovich. Inside memory management, part 2. Windows and .NET Magazine, September 1998.

[5] Matt Pietrek. Inside windows: An in-depth look into the win32 portable executable file format. MSDN Magazine, Feb. 2002.

[6] Matt Pietrek. Inside windows: An in-depth look into the win32 portable executable file format, part 2. MSDN Magazine, March 2002.

[7] Microsoft corp. Platform SDK: Structured storage. MSDN Library, April 2004.


# Friday, 20 January 2006

More on filesystems

Last time I posted an opinion on one of Jeff Atwood's posts. This time I'd like to elaborate a little more on the good points Jeff made in its article, on filesystems not being a feature but a mere implementation detail.

The post goes on with several proposal, all aimed at reaching the definite solution: get ride of the hierarchical file system. This was done by Jef Raskin with his experimental LEAP interface. A less radical solution is then discusses: do everything you can to hide the file-system from the user, using no filenames, no prompts on saves, automatic versioning. A sort of milestone-based machanism. While this is great for multiple saves and versioning of the same document, and for recovering the too frequent overwrite errors, I don't see "restore points" as a substitute for filesystems, but a complementary feature (although it can really help a lot in naming revised documents; how many times have you seen a file named "thesis14MarTry2.doc"?) (NB: this technology already exists in Windows (2003 and Vista): based on VSS, Shadow Copy preserves multiple copies of the same file transparently, as reported here through AdiOltean).

I don't like very much both of the solutions: however, the problem is real and somehow must be addressed. Hierarchical file-systems, partitions, hard drives are not only implementation details we (developers) must hide form the unexperienced users. It is a pain also for people that work on a computer all days. How many times in the last month did you though "where the heck is that file named something like that?" or "where I installed that application, an year ago? I cannot find neither a link nor the executable!"

One of the comments posted to the original blog post, and Jeff's response,  reflects my own opinion on the subject: The commenter objects that the proposed solution, using the contents of the file, instead of its name, isn't really adequate. He observes that it may make it easier to search for files and simpler to save files, but it makes browsing through the file-system quite difficult. Jeff response was: "When was the last time you "browsed" through the Internet, eg, you used a YAHOO or DMOZ style directory?"
Searching technologies are the future of file-systems. In many cases they already are the present, and they are continuously growing. What they still lack is integration. A lot of integration. Windows desktop search engines are the worst, in this sense. Both Google Desktop search and Windows Desktop search are completely separated applications, integrated only in a poor manner into the OS. In Windows desktop search, you can't even customize your view adding and removing columns, or reordering them.

For actual OSes, OSX has a point (with its SpotLight). You can even save search folders, organizing your items in a way that is independent from the file-system real organization. The find-as-type behavior, the subdivision of results in different categories, all make SpotLight a very well done system.



Windows Vista do the same things, sometimes better (especially for virtual folders, see picture). The search is extensive, in every folder, and in many other places - even in open and save dialogs.



Open dialog: pressing keywords on the left you can make queries.



Searching a folder in Explorer



Meta (virtual) folders, stacked, in Explorer.

However, this point is still a little weak in my opinion. I don't have a Vista beta, but from what I have seen and read, the search function could be integrated better. Take the open and save dialogs, the search for applications in the start menu (which is different! Why??) and, for example, the create shortcut wizard and the run.. dialog. Then look at these screenshot of SkyOS, or even better, go and watch the Search movie form here. I think it is impressive how search functionalities are widespread in the system, using a consistent ad standard interface. I'd really love to open a dialog, and then type a few keywords and find my file. Or to do the same thing in explorer, on the desktop...




Searching for a file in the storage manager...



... an application from the Run dialog (note the Incremental, nor excremental search funtion)...




...and from the Open common dialog.

The explorer view and the common dialog view are also pretty similar (even if they are not identical as I was hoping), making like easier for end users. And the search pane, with its subdivision into categories, is consistent for run dialog, create shortcut, locate icon, open with...
# Thursday, 19 January 2006

Main user interface problem: consistency!

I recently read a very interesting post on Jeff Atwood post: Filesystems Aren't a Feature. He starts pointing out an observation done by a developer watching his relatives using the pc:
When I observe how my wife and son uses the family computer, I can't help noticing how little use they have for the desktop. They look bewildered when I open the Windows Explorer. To them, file open or file save dialog is where the files go. My Documents? It's just an icon they never touch.
I don't even know why the open dialog (and the save dialog) and the File Manager (Explorer) should be different. They have the same function: locate a file. In his post Jeff consider alternatives to the direct exposure of the file system to the user. I agree that the File System hierachical structure -with files, folders, and different partitions- is confusing, but we should wonder in the first place why open/save dialogs ("where the files go") is different from My Computer. If they were identical, sharing the same interface, would it be so confusing? I don't think so..

# Tuesday, 10 January 2006

Ah-ah! [1]


From Sam Gentile:
“Reported by CNET, of all the CERT security vulnerabilities of the year 2005, 218 belonged to the Windows OS.  But get this - ther were 2,328 CERT security vulnerabilities for UNIX/Linux systems.”
That's great news, but it only confirms that Windows is now a OS that takes security really seriously.
Why even clever people, like Paul Graham, are sometimes so biased about Windows and Microsoft?


On openMosix


The first clustering architecture I am going to speak about is openMosix. openMosix is a Linux-only solution, for reasons that will be clear, but the concepts are applicable to every OS with virtual memory architecture. I think that a port of these ideas to the Windows OSes can be very interesting, but enormously challenging (at least for developers that cannot access the sources) and maybe not so paying for: other architectures that require a shift in the concurrent/distributed programming paradigm can bring more benefits at last.

Anyway, openMosix is unique for its (almost) complete tranparency: processes can be migrated to other nodes, and distributed computing could happen, without any intervent on the user or programmer side. openMosix turns a cluster into a big multi-processor machine.

The openMosix architecture consists of two parts:
  • a Preemptive Process Migration (PPM) mechanism and
  • a set of algorithms for adaptive resource sharing. 
Both parts are implemented at the kernel level, thus they are completely transparent to the application level.
The PPM can migrate any process, at anytime, to any available node. Usually, migrations are based on information provided by one of the resource sharing algorithms.
Each process has an home node, the machine where it was created. Every process seems to run at its home node, and all the processes of a user's session share the execution environment of the home node. Processes that migrate to other nodes use the new nodes resources (memory, files, etc.) whenever possible, but interact with the user's environment through the home node.
Until recently, the granularity of the work distribution in openMosix was the process. Users where able to run parallel applications by starting multiple processes in one node, and then the system distributed these processes to the best available nodes at that time; then the load-balancing algorithm running on each node decided when to relocate resources due to changes on nodes load. Thus, openMosix has no central control or master/slave relationship between nodes.

This model makes openMosix not so different from MPI-Beowulf clusters. Fortunately, recent work brought openMosix granularity down to thread level, enabling "migration of shared memory", i.e. the migration of pages of the process address space to other nodes. This feature permits to migrate multi-threaded applications.

Processes and threads in Linux
(Figures from the MigShm technical report and presentation: The MAASK team (Maya, Asmita, Anuradha, Snehal, Krushna) designed and implemented the migration of shared memory on openMosix)


For process migration, openMosix creates a new memory descriptor on the remote node. This is fine for normal processes, but could cause problems for threads. Because a thread shares almost all of its memory pages with its parent (all but the thread stack and TLS) when threads of the same parent process are migrated, they need to share a common memory descriptor. If they have different descriptors, these threads could point to false segments.
When a thread is migrated, openMosix migrates only the user mode stack of that particular thread. The heap is migrated "on demand", paying attention to the case in which the same node is already executing threads of the same process to ensure consistency.




openMosix + MigShm control flow
Other features of the process are the ridefinition of shared-memory primitives (shalloc() etc.) and linux thread primitives, a transparent Eager Release consistency policy, and the addition of an algorithm for adaptive resource sharing based on the frequency of shared memory usage and the load across the cluster, so that threads are migrated in a way that decreases the remote accesses to the shared memory.

Processes, Threads and Memory space

This piece of software is a very interesting and good technical quest, however the question is: it is really worth the effort? Could it scale well? Making processes, and above all developers, thinking that they only have to add threads can be misleading. And multi-thread programming requires locking, explicit synchronization, and to scale well a thoughtful administration of running threads. Threads and semaphores are starting to become uncomfortable even for multi-thread programming on a single machine.
My personal opinion is that the future is going in the other direction. There will be no shared memory, and distributed, multithreaded or clustered computation will all have the same interface, with no shared memory. The problem is that memory is lagging behind.

Processes where created for having different units of execution on the same CPU. When they were introduced, we had multiple processes all runnig in the same address space (directly into the physical address space, at that time).
Then, fortunately there was the advent of Virtual Memory, and of private virtual address spaces. We had a balanced situation: every process thought to be the only one in the machine, and to have a whole address space for its own purposes. Communication with other processes was possible, mainly message based. At that time, IPC was substantially the same if processes where on the same machine or in different machines: the main methods where sockets and named pipes.
The introduction of threads put again the system out of balance: every process had many threads of execution, sharing the same address space.

According to my historic Operating System textbook, a process is a program in execution
"with it’s current values of program counter, registers and variables; conceptually every process has it’s own virtual CPU" - A.S.Tanenbaum.
This is very close to the way modern OSes treat processes, running them in a context of execution virtually independent from the others.
Threads instead
"allow a lot of executions in the environment of a process, in wide measure independent the one from the others" - A.S. Tanenbaum.
However this definition for threads is not so close to reality: threads are not so independent among them, because they always share a primary resource (the common addressing space of the parent process).
openMosix "solves" this problem (making threads "independent" again) migrating trasparentely the required memory pages. 
But it is possible to restore the balance again? What about changing the affinity of memory from process to thread? Notice that here I am not talking about reintroducing the concept of virtual memory space for threads; modern OS uses the processor architecture to enforce and enable virtual memory for processes, at the overhead we all know; furthermore, you can't "box" addresses space one inside the other. What I am thinking about is a "light" thread that encapsulate both its code, its state (the stack) AND its data. If another thread want those data, it must ask them, and the thread that owns the data must be willing to share them. Like in the IPC case back in the '80, but without the burden of context switch unless necessary (i.e. when the thread and its data resides in another process or on another machine).

Application Domains

To complicate this design, .NET brought us Application Domains. Application Domains are designed to be "light-weight" precesses, as Chris Brumme explains. But they are "unrelated" to threads.

Wires?

In my opinion, we need light threads, let call them wires, that live in the managed space (so they not clobber the scheduler), have their memory and their message based  primitives for communication. Use should be simpler than threads; a good starting point may be Join-calculus, or C-omega, or any other language that support asynchronous or active functions. Those fuctions should map directly to wires, and the runtime will map them to native "tasks" (processes or threads or fibers) so that users can finally stop to worry about hacks to mitigate thread performance limitations (number of threads, thread pools, completion ports) and explicit synchronization (sempahores, mutexes, race conditions).
Wires could also adapt very well to a distributed environment: since they carry with them their data, they can be "detached" from a computational node and "re-attached" to a different destination node.




# Wednesday, 04 January 2006

What am I reading right now?

  • What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg. Eh, I knew it from my first Fortran program that floats (the standard 32 bit precision using REALs in that language) will give you rounding precision. And they accumulate pretty well! Very interesting reading, though.

  • Practical Common Lisp: my struggling to become good at this (for me, and for now) weird language. Very good till this point.

  • Language Support for Lightweight Transactions, by Tim Harris and Keir Fraser. I am  always interested in concurrency and distributed computing, so this is a must after MSDN Magazie January's End Bracket, by Joe Duffy

  • Speaking of parallel computing, I learned a lot at work on current standards for grid computing. Basically, we are building a computing cluster, and two different projects come into the scene: openMosix and MPI (the second one is the protocol choosen for Windows Server 2003 CCE, too). The two use two very different approaches, each with his own drawbacks and strenghts. I want to study some more, especially on the openMosix front, and then expose here what I learned and my own ideas.

  • And, last but not least, Hackers and Painters, by Paul Graham. Very interesting and stimulating reading, made even more stimulating by the fact that I agree with many of his opinions, but I totally disagree with many others. I find it difficult to understand how an open minded persons could fall trapped in the same mistakes he points out in other people. But maybe Paul wrote some of his pages only to please an audience.. He was going to sell his book, after all. I want to discuss this topic more deeply in the future; it surely deserves a post.