Data, information and inference

There’s an old and mildly sexist comment I remember from a training course back in the 80s.

“362436”, the instructor told us, “is data: 36-24-36 is information”. ho ho

James Governor tweets to point to this piece about data models, data reuse and ontology. He asserts, correctly I think, that a comprehensive ontology is not the missing piece without which a data model cannot be the basis for data reuse.

By one of those happy Twitter confluences, Trent Adams (Internet Society) posts from the IDTrust conference to quote Ken Klingenstein (Internet/2) as follows: “Ken Klingenstein points out that assertions themselves will require LOA on top of what is applied to the ID.

And just to round off the picture, back to James’ Twitter feed again for a pointer to the Guardian’s article describing the apparent assault by a police officer (since suspended pending further investigation) of a protester at the G20 summit.

The theme which ties all these threads together – albeit loosely – is that initial distinction between data and information. Another way of expressing it is that “data” is, broadly, “raw”: uninterpreted and context-free.

Information, on the other hand, is contextual and subject to interpretation and inference. relating this to the various threads I’ve mentioned:

– A thorough ontology of a data model may explain the relation of one element to another within that model, but cannot realistically account for either context or inference – both of which are external to the data model.

– Assertions of identity are commonly expected to have an LOA (Level Of Assurance), based on factors such as the robustness of the credentials presented, the reliability of the binding between the credential and the holder of the credential, the trustworthiness of the processes for registration, verification and enrolment (RVE), and the accuracy of the credential verification step (authentication). However, in many cases authentication is only the first gating step towards authorisation, which is more often based on assertions of attributes, rather than just identity. (“Now that I know who you claim to be, what inferences can I make about what you are entitled to do?”).

– And so back to the apparent G2o assault. There is a certain level of risk involved in making inferences from a limited set of data (such as the video clip showing the apparent assault).

The police themselves used to illustrate a similar problem back in the 80s by showing a picture of a (white) uniformed officer chasing a (black) man in jeans and a t-shirt. Viewers were asked to interpret the picture, and tended to infer that it showed a policeman chasing a criminal. They were then shown the whole picture, in which a third (white) man could be seen, apparently being chase by the other two. This, they were told, was the actual criminal – the man in jeans being a plain-clothes policeman.

I mention this just to reiterate the difference between data and inferences drawn from that data. The Guardian piece describes the G20 police officer in question as having covered up his badge number before striking the protester. I couldn’t see that in the video clip, but on the other hand, did notice another piece of data which the article did not mention.

In the clip, the protester can be heard shouting at the police officer before he strikes her: “What [are] you doing, punching a f***ing woman? You scum!”. The implication is that the officer had already been seen to strike someone – though not necessarily the same protester.

Law-breakers like to conceal identifiers such as their faces or their real credentials, because it breaks the link between the identifier and the individual, or between the credential and the inferences drawn from it (i.e. you can see that someone stole the car, but not who it was). It is in their interest to do that, because it prevents them from being held accountable for unlawful behaviour.

The same chain of reasoning is not supposed to apply to those responsible for law enforcement. Having made a commitment to uphold the law, they are supposed to act in accordance with it and therefore, by implication, behave in a way which can be audited transparently and without recourse to anonymity. If they have nothing to hide, one might say, they have nothing to fear from being identifiable in the course of their duties.