# Saturday, August 06, 2016

Old school code writing (sort of)

As I mentioned in my previous post, online resources on Hosting are pretty scarce. 

Also, writing an Host for the CLR requires some in-depth knowledge of topics you do not usually see in your day-to-day programming, like for example IO completion ports. Same for AppDomains: there is plenty of documentation and resources compared to Hosting, but still some more advanced feature, and the underlying mechanisms (how does it work? How does a thread interact and knows of AppDomains?) are not something you can find in a forum. 

Luckily, I have been coding for enough time to have a programming library at home. Also, I have always been the kind of guy that wants to know not only how to use stuff, but how they really work, so I had plenty of books on how the CLR (and Windows) work at a low level. All the books I am going to list were already in my library!

The first one, a mandatory read, THE book on hosting the CLR:



Then, a couple of books from Richter:

  

The first one is very famous. I have the third edition (in Italian! :) ) which used to be titled "Advanced Windows". It is THE reference for the Win32 API.
If you go anywhere near CreateProcess and CreateThread, you need to have and read this book.

The second one has a title which is a bit misleading. It is acutally a "part 2" for the first one, focused on highly threaded, concurrent applications. It is the best explanation I have ever read of APCs and IO Completion Ports.

  

A couple of very good books on the CLR to understand Type Loading and AppDomains.
A "softer" read before digging into...

  

...the Internals. You need to know what a TEB is and how it works when you are chasing Threads as they cross AppDomains.
And you need all the insider knowledge you may get, if you need to debug cross-thread, managed-unmanaged transitions. And bugs spanning over asynchronous calls. 

My edition of the first book is actually called "Inside Windows NT". It is the second edition of the same book, which described the internals of NT3.1 (which was, despite the name, the first Windows running on the NT kernel), and was originally authored by Helen Custer. Helen worked closely with Dave Cutler's original NT team. My edition covers NT4, but it is still valid today. Actually, it is kind of fun to see how things evolved over the years: you can really see the evolution, how things changed with the transition from 32 to 64 bits (which my edition already covers, NT4 used to run on 64 bit Alphas), and how they changed it for security reasons. But the foundations and concepts are there: evolution, not revolution.

  

And finally two books that really helped me while writing replacements for the ITaks API. The first one to tell me how it should work, the second one telling me how to look inside the SSLCI for the relevant parts (how and when the Hosting code is called).

Of course, I did not read all these books before setting to work! But I have read them over the years, and having them in my bookshelf provided a quick and valuable reference during the development of my host for Pumpkin.
This is one of the (few) times when I'm grateful to have learned to program "before google", in the late '90/early '00. Reading a book was the only way to learn. It was slow, but it really fixed the concepts in my mind. 

Or maybe I was just younger :)


# Thursday, August 04, 2016

IL rewriting + AppDomain sandboxing + Hosting

So, in the end, what went into Pumpkin?

Control was performed at compilation time or execution time? And if it is execution, using which technique?

In general, compilation has a big pro (you can notify immediately the snippet creator that he did something wrong, and even preventing the code block from becoming an executable snippet) and a big con (you control only the code that is written. What if the user code calls down some (legitimate) path in the BCL that results in a undesired behaviour?)

AppDomain sandboxing has some big pros (simple, designed with security in mind) and a big con (no "direct" way to control some resource usage, like thread time or CPU time).
Hosting has a big advantage (fine control of everything, also of "third" assemblies like the BCL) which is also the big disadvantage (you HAVE to do anything by yourself).

So each of them can handle the same issue with different efficacy. Consider the issue of controlling thread creation:
  • at compilation, you "catch" constructs that create a new thread (new Thread, Task.Factory.StartNew, ThreadPool.QueueUserWorkItem, ...)
    • you have to find all of them, and live with the code that creates a thread indirectly.
    • but you can do wonderful things, like intercepting calls to thread and sync primitives and substitute them - run them on your own scheduler!
  • at runtime, you:
    • (AppDomain) check periodically. Count new threads from the last check.
    • (hosting) you are notified of thread creation, so you monitor it.
    • (debugger) you are notified as well, and you can even suspend the user code immediately before/after.

Another example:
  • at compilation, you control which namespaces can be used (indirectly controlling the assembly)
  • at runtime you can control which assemblies are really loaded (you are either notified OR asked to load them - and you can prevent the loading)

What I ended up doing is to use a mix of techniques. 

In particular, I implemented some compiler checks.
Then, run the compiled IL on a separate AppDomain with a restricted PermissionSet (sandboxing).
Then, run all the managed code in an hosted CLR.

I am not crazy...
 

Guess who is using the same technique? (well, not compiler checks/rewriting, but AppDomain sandboxing + Hosting?)
A piece of software that has the same problem, i.e. running unknown, third party pieces of code from different parties in a reliable, efficient way: IIS.
There is very little information on the subject; it is not one of those things for which you have extensive documentation already available. Sure, MSDN has documented it (MSDN has documentation for everything, thankfully), but there is no tutorial, or Q&As on the subject on StackOverflow. But the pieces of information you find in blogs and articles suggests that this technology is used in two Microsoft products: SQL Server, for which the Hosting API was created, and IIS.

Also, this is a POC, so one of the goals is to let me explore different ways of doing the same thing, and assess robustness and speed of execution. Testing different technologies is part of the game :)

Building barriers: compilation time VS execution time

In order to obtain what we want, i.e. fine grained resource control for our "snippets", we can act at two levels:

  • compilation time
  • execution time

Furthermore, we can control execution in three ways:

  1. AppDomain sandboxing: "classical" way, tested, good for security
  2. Hosting the CLR: greater control on resource allocation
  3. Execute in a debugger: even greater control on the executed program. Can be slower, can be complex

Let's examine all the alternatives.

Control at compilation time

Here, as I mentioned, the perfect choice would be to use the new (and open-source) C# compiler.

It divides well compilation phases, has a nice API, and can be used to recognize "unsafe" or undesired code, like unsafe blocks, pointers, creation of unwanted classes or call to undesired methods.

Basically, the idea is to parse the program text into a SyntaxTree, extract the node matching some criteria (e.g. DeclarationModifiers.Unsafe, calls to File.Read, ...), and raise an error. Also, it a possibility is to write a CSharpSyntaxRewriter that encapsulates (for diagnostic) or completely replace some classes or methods.

Unfortunately, Roslyn is not an option: StackOverflow requirements prevents the usage of this new compiler. Why? Well, users may want to show a bug, or ask for a particular behaviour they are seeing in version 1 of C# (no generics), or version 2 (No extension methods, no anonymous delegates, etc.). So, for the sake of fidelity, it is required that the snippet can be compiled with an older version of the compiler (and no, the /langversion switch is not really the same thing).

An alternative is to act at a lower level: IL bytecode. 
It is possible to compile the program, and then inspect the bytecode and even modify it. You can detect all the kind of unsafe code you do not want to execute (unsafe, pointers, ...), detect the usage of Types you do not want to load (e.g. through a whitelist), insert "probes" into the code to help you catch runaway code.

I'm definitely NOT thinking about "solving" the halting problem with some fancy new static analysis technique... :) Don't worry!

I'm talking about intercepting calls to "problematic" methods and wrap them. So for example:

static void ThreadMethod() {
   while (1) {
      new Thread(ThreadMethod).Start();
   }
}
This is a sort of fork bomb

(funny aside: I really coded a fork bomb once, 15 years ago. It was on an old Digital Alpha machine running Digital UNIX we had at the university. The problem was that the machine was used as a terminal server powering all the dumb terminals in the class, so bringing it down meant the whole class halted... whoops!)

After passing it through the IL analyser/transpiler, the method is rewritted (compiled) to:


static void ThreadMethod() {
   while (1) {
      new Wrapped_Thread(ThreadMethod).Start();
   }
}

And in Wrapped_Thread.Start() you can add "probes", perform every check you need, and allow or disallow certain behaviours or patterns. For example, something like: 

if (Monitor[currentSnippet].ThreadCount > MAX_THREADS)
  throw new TooManyThreadException();

if (OtherConditionThatWeWantToEnforce)
  ...

innerThread.Start();


You intercept all the code that deals with threads and wrap it: thread creation, synchronization object creation (and wait), setting thread priority ... and replace them with wrappers that do checks before actually calling the original code.

You can even insert "probes" at predefined points: inside loops (when you parse a while, or a for, or (at IL level), before you jump), before functions calls (to have the ability to check execution status before recursion). These "probes" may be used to perform checks, to yield the thread quantum more often (Thread.Sleep(0)), and/or to check execution time, so you are sure snippets will not take the CPU all by themselves. 

An initial version of Pumpkin used this very approach. I used the great Cecil project from Mono/Xamarin. IL rewriting is not trivial, but at least Cecil makes it less cumbersome. This sub-project is also on GitHub as ManagedPumpkin.

And obviously, whatever solution we may chose, we do not let the user change thread priorities: we may even run all the snippets in a thread with *lower* priority, so the "snippet" manager/supervisor classes are always guaranteed to run.

Control at execution time

Let's start with the basics: AppDomain sandboxing is the bare minimum. We want to run the snippets in a separate AppDomain, with a custom PermissionSet. Possibly starting with an almost empty one. 

Why? Because AppDomains are a unit of isolation in the .NET CLI used to control the scope of execution and resource ownership. It is already there, with the explicit mission of isolating "questionable" assemblies into "partially trusted" AppDomains. You can select from a set of well-known permissions or customize them as appropriate. Sometimes you will hear this approach referred to as sandboxing.

There are plenty of examples on how to do that, it should be simple to implement (for example, the PTRunner project).

AppDomain sandboxing helps with the security aspect, but can do little about resource control. For that, we should look into some form of CLR hosting.

Hosting the CLR

"Hosting" the CLR means running it inside an executable, which is notified of several events and acts as a proxy between the managed code and the unmanaged runtime for some aspects of the execution. It can actually be done in two ways:

1. "Proper" hosting of the CLR, like ASP.NET and SQL Server do

Looking a what you can control through the hosting interface  you see that, for example, you can control and replace all the native implementations of "task-related" (thread) functions.
It MAY seem overkill. But it gives you complete control. For example, there was a time (a beta of CLR v2 IIRC) in which it was possible to run the CLR on fibers, instead of threads. This was dropped, but gives you an idea of the level of control that can be obtained.

2. Hosting through the CLR Profiling API (link1, link2)

You can monitor (and DO!) a lot of things with it: I used it in the past to do on-the-fly IL rewriting (you are notified when a method is JIT-ed and you can modify the IL stream before JIT) (my past project used it for a similar thing, monitor thread synchronization... I should have talked about it on this blog years ago!)

In particular, you can intercept all kind of events relative to memory usage, CPU usage, thread creation, assembly loading, ... (it is a profiler, after all!).
An hypothetical snippet manager running alongside the profiler (which you control, as it is part of your own executable) can then use a set of policies to say "enough!" and terminate the offending snippet's threads.

Debugging

Another project I did in the past involved using the managed debugging API to run code step-by-step.

This gives you plenty of control, even if you do not do step-by-step execution: you can make the debugger code "break into" the debugger at thread creation, exit, ... And you can issue a "break" any time, effectively gaining complete control on the debugged process (after all, you are a debugger: it is your raison d'etre to inspect running code). It can be done at regular intervals, preventing resource depletion by the snippet.

# Sunday, July 31, 2016

Choices, choices, choices...

How would you design and write a system that takes some C# code and runs it "in the browser"?

In general, my answer would be: Roslyn. Roslyn was already quite hot and mature at the end of 2014; having something like scriptcs would give you complete control on each line of code you are going to execute.

But this particular project, being something that must work for StackOverflow, had several constraints, most of which were in stern contrast one with the other:
  • High fidelity: if I am asking a question about a peculiar problem I am having with C# 1 on .NET 1.1, I want my "snippet" to behave as if it is compiled with C# 1 and run on .NET CLR 1.1
  • Safe: can you just compile and execute your snippet inside your IIS? Mmmm.. not a great idea...
  • High performance: can you spin up a VM (or a container), wait for it to be ready, "deploy" the snippet, execute it, get it back? That would be very safe, but a bit slow.

Safety/security is particularly important. For example: you do not want users to use WMI to shutdown the machine, or open a random port, install a torrent server, read configuration files from your machine, erase files...
For safety, we want to be able to handle dependencies in a sensible way. Also, some assemblies/classes/methods just do no make any sense in this scenario: Windows Forms? Workflow Foundations? Sql?
For safety and performace, we want to monitor and cap resource usage (no snippet that does not terminate).

Going a deep further, I stared to dash out some constraints. It turns out that we need to disallow something, even if this means going againt the goal of "high-fidelity":
  • no "unsafe", no pointers
  • no p/invoke or unmanaged code
  • nothing from the server that runs the snippet is accessible: no file read, no access to local registry (read OR write!)
  • no arbitrary external dependency (assemblies): whitelist assemblies

Also, we need control over some "resources". We cannot allow snippets to get a unlimited or uncontrolled amount of them.
  1. limit execution time
    • per process/per thread?
    • running time/execution time
  2. limit kernel objects
    • thread creation (avoid "fork-bombs")
    • limit other too? Events, mutexes, semaphores...
    • deny (or handle in a sensible way) access to named kernel objects (e.g. named semaphores.. you do not want some casual interaction with them!)
  3. limit process creation (zero?)
  4. limit memory usage
  5. limit file usage (no files)
  6. limit network usage (no network)
    • in the future: virtual network, virtual files?
  7. limit output (Console.WriteLine, Debug.out...)
    • and of course redirect it
Does it sounds familiar? For me, it was when I learned about something called cgroups. Too bad we don't have it in windows! Yes, there are Job Objects, but they do not cover every aspect.

Could we have cgroups-like control for .NET applications?

# Saturday, July 30, 2016

Pumpink: a .NET "container"

More or less 20 months ago (gosh, time flies!) I started a side-project for a very famous company (StackExchange).
StackExchange had just launched, a few months before, a new feature on the main site of their network (StackOverflow).
This feature is called "Code Snippets", and it allows you to embed some sample HTML + JS code in a Question, or an Answer, and let the visitors of the page run it.
Of course, being JS it would run inside the browser, and with few focused precautions it can be made safe for both servers and clients (you do not want to leave an attack vector open on your servers, but you also don't want your visitors to be attacked/exploited as well!)

More details on how they implemented and safeguarded it can be found on their meta.stackoverflow.com site.

The feature got a lot of attention, and of course there where requests to extend it to other languages.
I was one of those that wanted more languages added, but I understood that JS was a very particular case.
Snippets in any other language would have meant an entirely different approach and an entirely different scale of complexity.

In October 2014 I visited NYC; before my visit I got in touch with David Fullerton, the "big boss" of SO development team. We were in touch since my previous "adventure", a few years before, when I interviewed for a position on their Q&A team. We discussed briefly about my past interview, and then he asked me a very interesting question: what would I add to StackOverflow? C# snippets immediately come to my mind.

We discussed briefly about it, drafted up some ideas, added requirements in the process (discarding most of the ideas) and finally David asked if I would like to try it out, as an Open Source experiment sponsored buy StackExchange.

... well, of course! Fun and challenging software, exchanging ideas with some of the most brilliant devs in the .NET ecosystem, and I get paid too! :)

So "Pumpink" was born. If you are curious, you can find it on my GitHub. It already contains a rather in-depth analysis about the structure of the project, and how it works.

Or, if you want to know "why?" instead of simply "how?", you can wait for the upcoming blog posts, in which I will detail some of the choices, problems, headaches that shaped the problem.

# Monday, April 06, 2015

jQuery UI datepicker: being notified of every date change

Recently, I was fighting with a Primefaces component: calendar. I was using it with my keyboard, and I immediately noticed that the ajax update that keeps the control synchronized with the bean was not working. Selecting a date with the mouse was fine, editing and pressing "Enter" was OK, but changing it and tabbing or clicking away was not.

Curious, I dug in and found that primefaces uses the standart jQuery UI datepicker, and wires itself to its standard onSelect event. But this event is fired only when you explicitly select a date, not when a date changes (through the keyboard, or code).

So, I decided to write a quick js to call onSelect when a date is updated. Here it is: I also keep track of the previous date, to avoid double notifications. And it works very well for us!
# Monday, December 15, 2014

Strangest bug of the day

System.TypeLoadException:
Could not load type 'System.Diagnostic.Debug'
from assembly 'MyAssembly'

Uhm... Hello?
Why are you looking for 'System.Diagnostic.Debug' inside my assembly? Go look inside System!

To be fair, I was fiddling with the CLR Hosting APIs, in particular IHostAssemblyStore.
Using this interface, you can override the CLR assemby probing logic, and load you own assemblies from wherever you want.
This is particularly helpful if you are using your own AppDomainManager, and you want to load it inside your sandboxed domain, which you carefully configured to disallow loading of any external assembly.
If you try to do that, binding fails, obviously. And since the AppDomainManager is created and loaded by the unmanaged portion of the CLR, you don't get a chance to use AssemblyResolve (tried, didn't work).

But that does not really justify this behavior: the Fusion log viewer was reporting no errors, and the exception was wrong (or, at least, not one of the usual exceptions you should look for when debugging binging failures).

There is a tiny parameter, called pAssemblyId. The docs are quite clear: it is used internally to see if that assembly was already loaded.
If it is, do not try to load it again. Sounds like a good optimization BUT:

The docs don't say it, but if you pass back "0", something very peculiar happens.
In my case, I had the Jitter looking for types inside my assembly. Others had different issues.

That was definitely the strangest bug of the day :)

# Saturday, August 16, 2014

AngularJS and ASP.NET MVC validation

Not really a blog post, just a collection of links and articles I used when I was writing validation code for Angular with ASP.NET MVC.
More a "reminder to self" than anything else, as they are a bit outdated (they refer to ASP.NET MVC version 4, mostly) but they could be helpful to others as well!


# Friday, August 15, 2014

Android NFC service and "thin client": one problem, and one hack

Lately (in the last year or so), Android work intensified at my company. So, I finally took the time to study it in depth and I discovered how MUCH Android differs from what I was expecting. It really starts to make sense when you dig under the cover. And you start to discover how much better your apps behave when you are using the SDK the way it should be used (and you also start to pick up defects in other apps and say "Ha! You did that! Gotcha!).
But this is the topic for another post... :)

Today I want to concentrate on an issue I was experiencing using the NFC framework in Android to read our contactless cards.
Using the NFC framework as a general purpose card reader is a bit on the "stretchy" side: the framework, after all, is mainly there to read Ndef tags, which have a precise structure. Fortunately, Android allows you do go deeper, and interact directly with a card using a "transceive" method.

In this way, you can send commands and data to the card (in the form of a byte[]) and receive a response from the card (again, in the form of a byte[]).
So far, so good: this means that we can read our Mifare Desfire cards, using the Desfire authentication and our keys.
I implemented the commands to authenticate a card, select an application (i.e. a protected directory inside the card memory) and read the data.

All is working well, but.. you have to store your key on the phone, and the storage on your phone is not secure.
In theory, every application has a local storage that cannot be read by other applications. In practice, you just have to have root access to your phone (which is mighty easy with Android handsets) and you are done.

This is not a particular problem for some scenarios (e.g. if you provide an app that uses the user differentiated key, so that the user can read his own card), but it is a problem when you need to read multiple cards, and therefore to use the master key.

Suppose you are a third-party company. You are my friend, and you want to provide a discount for my subscribers (people that have my smart-card).
How can you check that the card is real, and that the card is not expired? Easy, you authenticate with the card and read its content: the expiration date is written right there.
But I do not trust you enough to let you have my read keys!

Maybe you even want to top up my card with "reward points": if my users buy something from you, they will have discount on my services. Super cool!
But I will not let you have my write keys.. that's out of question!

Sure, you can read just the UID, and use that to look up user info on my web-service. And use the same service to POST reward points. But my network is sparsely connected, and it might take long before a card is used on one of my terminals and I can update them.
And we have seen that a UID can be faked..
So?

The answer is "thin-client". You use your NFC phone as an antenna, nothing more. What you read from the card is sent as a (hex-encoded) string to a web service. The web service contains the logic and data to interpret the request and prepare the right response. The response is sent back to the phone, and then transmitted to the card.

You can authenticate with the card, but your keys are safely stored away on your server and they never transit on the the phone!
The phone does not even see the personalized key, so the user is safe against cloning.
I build a prototype, and it worked great on our WiFi network.
They I tried to use it on a cellular network and it failed (almost) regularly. Why?

My suspect was that after a (very short) while the card was reset.
The answer I was getting back from the card was something like "operation not supported in this state". It was like somehow the card forgot that we were in the middle of an authentication challenge-response before the protocol was over.
I decided to investigate, to see if my suspicion was confirmed.
Fortunately, Android is OSS and source code is available! So I dug into the Android source code, looking for clues in the NFC implementation.

Android implements NFC using a mix of libraries and processes; most of the NFC stack is native, and managed by the OS. Then, there is a Service (provided by the OS) that handle communication with the native NFC stack. And some client-side classes you can use inside your application, which will communicate with the Service, hiding it from you.
I started to dig into the source by following a "tranceive" call.

On the application side, you receive an Intent when a card is presented to the reader. Inside the intent payload there is a class derived from BasicTagTechnology; in our case, we use a ISO-A compatible card, so we get a IsoDep object.

The most important method of this class is, as I mentioned, tranceive:

IsoDep.transceive
BasicTagTechnology.transceive

The method inside is just a thin wrapper for remote invocation to a service, which is the NfcService or NfcApplication (the name has changed between Android releases:

Tag.getTagService().transceive(mTag.getServiceHandle(), data, raw)

class Tag ...

    public INfcTag getTagService() {
        return mTagService;
    }
 
INfcTag is an aidl interface, which is used to forward data and commands to NfcService.
We can follow the transceive implementation inside NfcService:

public TransceiveResult transceive(int nativeHandle, byte[] data, boolean raw)   
 
 tag = (TagEndpoint) findObject(nativeHandle);
 response = tag.transceive(data, raw, targetLost);
 ...
 Object findObject(int key) {
        synchronized (this) {
            Object device = mObjectMap.get(key);
            if (device == null) {
                Log.w(TAG, "Handle not found");
            }
            return device;
        }
    }
  ...

So, there is another "Tag" class inside the service; all known (in range) tags are held by the NfcService class in a map.
This "Tag" is named NativeNfcTag:
    
public class NativeNfcTag implements TagEndpoint
   ...
   private native byte[] doTransceive(byte[] data);
   public synchronized byte[] transceive(byte[] data) {
      if (mWatchdog != null) {
         mWatchdog.reset();
      }
      return doTransceive(data);
   }
   ...

The implementation of doTransceive is native, and it varies from a card tech to another.
We have found the end of the flow. Have we also found any clue about the card reset?

The answer is there, inside NativeNfcTag. You should have notice the "mWatchdog.reset()" statemente inside doConnect. What is mWatchdog?

private PresenceCheckWatchdog mWatchdog;
    class PresenceCheckWatchdog extends Thread {

        private int watchdogTimeout = 125;

        ...

        @Override
        public synchronized void run() {
            if (DBG) Log.d(TAG, "Starting background presence check");
            while (isPresent && !isStopped) {
                try {
                    if (!isPaused) {
                        doCheck = true;
                    }
                    this.wait(watchdogTimeout);
                    if (doCheck) {
                        isPresent = doPresenceCheck();
                    } else {
                        // 1) We are paused, waiting for unpause
                        // 2) We just unpaused, do pres check in next iteration
                        //       (after watchdogTimeout ms sleep)
                        // 3) We just set the timeout, wait for this timeout
                        //       to expire once first.
                        // 4) We just stopped, exit loop anyway
                    }
                } catch (InterruptedException e) {
                    // Activity detected, loop
                }
            }
            // Restart the polling loop

            Log.d(TAG, "Tag lost, restarting polling loop");
            doDisconnect();
            if (DBG) Log.d(TAG, "Stopping background presence check");
        }
    }

    
    
The "watchdog" is a thread that at short intervals (125ms) checks if the card is still in range, using the "doPresenceCheck()" function. Which is native, and card-dependent.

The function could be therefore an innocuous instruction (a no-op), or a new select that will reset the card to its not-authenticated state.
Guess which one is for Desfire cards?

So, if the watchdog is not reset periodically by transmitting something to the card, a presence check will be triggered and the card will be selected again, resetting the authentication process. While you are still waiting for the cellular network to answer (125ms is a short time on 3G).

I started to think on ways to work around it, from suspending the thread (inside another process - the service - in Android? Root necessary), to set the timeout (by invoking a method on NativeNfcTag using reflection... again, another process was out of my reach) to substitute the code for "doPresenceCheck()" (which you can do with things like Xposed, but.. that will require Root access too).

You just cannot access anything inside another process in Android, if you don't have root access. Which is usually a very good thing indeed, but it getting in our way in this case.
But what about our process? Sure, we can do almost anything inside it... but how can it do any good?

Well, there is a function inside NativeNfcCard which we can use. This function is "exposed" from "Tag" (the non-public class used at "client" side, see above), but not by BasicTagTechnology.
So we cannot call it directly (like transceive), but from the Tag class onwards it follows the same flow as transceive. This function is "connect":

class Tag {
   ...
   public int connect(int nativeHandle, int technology)
   public synchronized int connectWithStatus(int technology)
   ...
}

If we examine the source code of "doConnect" on the other side (its implementation inside NativeNfcCard) we can see that this function will reset the watchdog too (like transceive). Moreover, we can turn "connect" into a no-op:
private native boolean doConnect(int handle);
    public synchronized boolean connect(int technology) {
        if (mWatchdog != null) {
            mWatchdog.pause();
        }
        boolean isSuccess = false;
        for (int i = 0; i < mTechList.length; i++) {
            if (mTechList[i] == technology) {
                // Get the handle and connect, if not already connected
                if (mConnectedTechnology != i) {
                    ...
                } else {
                    isSuccess = true; // Already connect to this tech
                }
                break;
If the technology we specify is the same one we are already using, or if it is a non-existing technology, the function will do nothing.

We can just grab the Tag class inside our code, call connect on our side (using reflection, as it is not exposed by the API), and wait for it to forward the command to the service, resetting the watchdog. Do this regularly, and we can "buy" as much time as we want to complete our authentication protocol!

This is obviously a hack. But I tested it with every version of Android we support (2.3.6, 3.x, 4.x up to 4.4.3) and it just works. It uses knowledge of an internal mechanism which is subject to change even at the next internal revision, but it seems that the code I examined has been stable for a while. And maybe, by the time it changes, they will fix the main issue (use a select function to control presence of a card) as well!

# Sunday, August 10, 2014

Smartcard authentication and access control

DISCLAIMER: I am NOT a crypto expert. Of course I use cryptography, authentication protocols, etc.
I understand them (and their cleverness amazes me) and I have even taken 2/3 courses on cryptography during my Master and my PhD. But the first lesson you ought to learn when you are working on authentication, and on cryptography in particular, is: you are not, and the courses will not make you, an expert. And so you better never, ever design your own crypto algorithm, or your own authentication protocol. Stick with what others have done (and studied for years) and follow best practices. Not blindly, of course, but one matter is understanding what is going on, and another matter entirely is designing them flawlessly.

Last time I mentioned that one of the possible usage scenario for a smart-card is to act as an access and "presence" token. You can think about them as a read-only ID.
Well, you usually go further than that: writing on cards has advantages (you can update them with new information, for example, or "stamp" them when you see them, so that officials can  check that you have validated it and therefore you are authorized, which is a necessity if you do not have gates at any access point); however for the sake of discussion, we can think about them as read-only IDs.

Couplers, i.e. card readers/writers, will recognize the card (and optionally "stamp" it), and record your passage (via LAN or WiFi or UMTS).

Each card has a unique identifier (UID). This ID is hard-wired when the card is produced. Think about it as the MAC address of a network card.
The UID is always transmitted in the initial phases of the communication between the card and the coupler. Many applications just stop there: read the UID, whitelist them, let them through if they are on the white list.

Does it work? Sure. Is it a good approach? No, absolutely NOT. And I was (and I still am) amazed at how many contactless systems just stop there. These systems are not secure, at all. You just have to read the UID (any NFC reader can do that, you just need -for example- to touch the card with an Android phone) and then configure a device (an NFC phone, or a programmable card) with the same UID and... presto! You "cloned" the card.
You didn't, of course: these cards are not clonable, not without a (very) substantial effort, time, and expensive machinery.

The UID is just not designed for this purpose. It is used to identify a card, not to protect against cloning.
Let's remind us which are the typical requirement in this scenario. You want to:
  1. identify each user uniquely
  2. prevent the user from "faking" a card - either a non-existing one, or a clone of another user
  3. let each user read his own card content, if they want to, but not the content of other cards,
  4. let the officials (security guards?) read the content of the card,
  5. let only the authorized couplers write the card: nobody else can do that (or the officials could read "fake" validations, i.e. records not produced by authorized couplers, and which the systems knows nothing about - if you can do that, you can pass a casual inspection unpunished).

For doing this, many cards (in our case: Mifare Desfire EV1 cards) provide some mechanism out of the box that already cover most of the points. Furthermore, there are a couple of best practices that let us cover all the points.

Desfire EV1 cards use T-DES keys to provide security and authentication. These cards are essentially file-storage cards, with files organized in a one level file system. The top "directories" of this file system are called applications.

The card, every application, and every file can have three keys each. The keys will be used for access control, and can be viewed as the read-key, write-key, and modify-key.
So, if I authenticate with the card using KeyA, and KeyA is the read-key for App1/File1, I can open and read the content of that file, but not write it.

Keys are symmetric, and are stored on the card and on the coupler (the RFID reader/writer device).
Cards are considered a secure storage element, as keys can NOT be read from cards. There is no way to retrieve them, and cards are engineered to be tamper resistant. Most of them even protected from attempts to directly read their content using differential power analysis.

Keys are stored on couplers using essentially the same technique: they are not stored on disk or memory, but inside a Secure Element (SE) (sometimes called SAM). It is essentially the same piece of silicon and plastic you find in the smart-cards or in your phone SIM card (the have the same physical format as well), and it is in fact a smart-card, with the same tamper-resistence properties. So, even if someone steals one of your couplers, he/she still hasn't got the key.

Obviously, only some couplers have the write key. They are also the only devices using a Secure Element: readers (couplers with only a read key) do not have them, for cost reasons.

How do you authenticate with the card? This is all covered by the smart-card protocol: in the case of Desfire cards it is a basically a challenge-response where you "prove" to each other that you know the key. You to that by generating a random number, encrypt it, and send it to the other party. The other party decrypts it, generate another random number and send both of them back, encypted. The first party decrypts it, see that its number is included (good, the other party has the correct key), and sends back the other number, encrypted. So the other party knows we have the key as well (we can encrypt and decrypt).

Previously I mentioned some best practices you ought to add to the basic framework. These are commonly accepted as standard and easy to implement: key differentiation and blacklisting.
Each card you issue should hold a different set of keys. This can seem like a problem, if you have, like us, half a million cards (and therefore 1 million keys).
Wow.. how can we store in an efficient way millions of keys? And check all of them when a card is presented to our couplers?
Well.. we do not store them. We derive them.

The UID of the card is used with a master key to derive a "personalized" key, using a one-way hash.
So even if a card is defective and compromised, we have lost its set of keys, nothing more.
This has several advantages:
  • each card holds different keys, so if one is compromised, you just have to throw away (blacklist) that card
  • (assuming you have chosen a proper hash algorithm) it is impossible to reconstruct the master key from the derived key
  • you can just store the master keys on you "official" devices, in a secure way (i.e. in the SAMs) and they will be able to read and write any card (by deriving the right key on the fly)
  • you can hand off to the user their own, "personalized" key, and they will be able to read their own card. But not the card of their friends (or spouse).

By using what the cards manufacturer offers and two little (and simple to implement) best practices, you have a very good level of security.
We added a little more, just to be on the safe side, but these best practices already are a very good point. I am amazed at how many companies fail to realize the issues and threats they are facing with the extremely naive UID-only implementation.
But we still have sites that store passwords in clear text, or that force the user to a numeric PIN of 5 digits, or to 8-char passwords.. so I should know better by now :)