Introduction to Channel Theory
Posted on Mon 06 June 2011 in Situation Theory
The channel theory of Jeremy Seligman and Jon Barwise [1] is intended to help us understand information flows of the following sort: a's being F carries the information that b is G. For example, we might want a general framework in which understand how a piece of fruit's bitterness may carry the information that it is toxic, or how a mountain side's having a particular distribution of flora can carry information about the local micro-climate, or how a war leader's generous gift-giving may carry information about the success of a recent campaign, or the sighting of a gull can carry the information that land is near. A classification is a simple sort of data structure classifying a set of particulars by a set of types. But as my examples above were intended to show, classifications are not only intended to model 'categorical' data, as usually construed.
Def 1. A classification is a triple A = \(\langle tok(A), type(A), \vDash \rangle\) such that for every token \( a \in tok(A)\), and every type \( \alpha\in typ(A)\), \( a \vDash_{A}\alpha\) if and only if \( a\) is of type \( \alpha\).
One might remark that a classification is not much more than a table whose attributes have only two possible value, a sort of degenerate relational database. However, unlike a record/row in a relational database, channel theory treats each token as a first-class object. Relational databases require keys to guarantee that each tuple is unique, and key constraints to model relationships between records in tables. By treating tokens as first class objects, we may model relationships using an infomorphism:
Def 2. Let \( A\) and \( B\) be two classifications. An infomorphism \( f : A \rightleftarrows B\) is a pair of functions \( f = \lbrace f^{\wedge}, f^{\vee} \rbrace\) such that \(f^{\wedge}: typ(A) \rightarrow typ(B)\) and \( f^{\vee}: tok(B) \rightarrow tok(A)\) so that it satisfies the following property: that for every type \( \alpha\) in \(A\) and every token \(b\) in \(B\), \(b \vDash_{B} f^{\wedge}(\alpha)\) if and only if \( f^{\vee}(b) \vDash_{A} \alpha\).
An infomorphism is more general than an isomorphism between classifications, i.e. an isomorphism is a special case of an infomorphism. For example, an infomorphism \( f : A \rightleftarrows B\) between classifications A and B might map a single type \( \beta\) in B onto two or more types in A, provided that from B's point of view the two types are indistinguishable, or more precisely that for all tokens b in B and all types \( \alpha\) in A, \( f^{\vee}(b) \vDash_{A} \alpha\) if and only if \( f^{\vee}(b) \vDash_{B} \alpha^{\prime}\). Note that this requirement does not mean that those types in A are not distinguishable in A (or more technically, are not co-extensional in A). There may be tokens in A outside the range of \( f^{\vee}\) for which, for example, \( a \vDash_{A} \alpha\) but not \( a \vDash_{A} \alpha^{\prime}\). A dual observation may be made about the tokens of B. Two tokens of B may be mapped onto the same token in A, provided that those tokens in B are indistinguishable with respect to the set of types \( \beta\) in B for which there exists some \( \alpha\) such that \( f^{\wedge}(\alpha) = \beta)\). Again, this does not mean these same tokens in B are wholly indistinguishable in B. For example, there may be types outside the range of \( f^{\wedge}\) classifying them differently. Thus, an infomorphism may be thought of as a kind of view or filter into the other classification.
It is actually rather difficult to find infomorphisms between arbitrary classifications. In many cases there will be none. If it were too easy, then the morphism would not be particularly meaningful. Too stringent and then it would not be very applicable. However, two classifications may be joined in a fairly standard way.For example, we can add them together:
Def 3. Given two classifications A and B, the sum of A and B is the classification A+B such that:
- \( tok(A + B)=tok(A)\times tok(B)\),
- \( typ(A + B)\) is the disjoint union of \( typ(A)\) and \( typ(B)\) given by \(\langle 0,\alpha \rangle\) for each type \(\alpha \in typ(A)\) and \(\langle 1,\beta \rangle\) for each type \(\beta \in typ(B)\) , such that
- for each token \(\langle a,b\rangle \in tok(A+B)\), \(\langle a,b \rangle {{\vDash }_{A+B}}\langle 0,\alpha \rangle \text{ iff a}{{\vDash }_{A}}\alpha\) and \(\langle a,b\rangle {{\vDash }_{A+B}}\langle 1,\beta \rangle \text{ iff b}{{\vDash }_{B}}\beta\).
Remark. For any two classifications A and B there exist infomorphisms \( {{\varepsilon }_{A}} : A \rightleftarrows A+B\) and \( {{\varepsilon }_{B}}:B\rightleftarrows A+B\) defined such that \( {{\varepsilon }_{A}}^{\wedge }(\alpha )=\langle 0,A\rangle\) and \( {{\varepsilon }_{B}}^{\wedge }(\beta )=\langle 1,B\rangle\) for all types \( \alpha \in typ(A)\) and \( \beta \in typ(B) {{\varepsilon }_{B}}^{\vee }(\langle a,b\rangle )=b\) and \( {{\varepsilon }_{A}}^{\vee }(\langle a,b\rangle )=a\) for each token \( \langle a,b\rangle \in tok(A+B)\).
To see how this is useful, we turn now to Barwise and Seligman's notion of an information channel.
Def 4. A channel C is an indexed family of infomorphisms \( \{ f_{i} : A_{i} \rightleftarrows C \} _{i \in I}\) each having co-domain in a classification C called the core of the channel.
As it turns out, in a result known as the Universal Mapping Property of Sums, given a binary channel C = \( \{ f : A \rightleftarrows C, g : B \rightleftarrows C \}\), and infomorphisms \( {{\varepsilon }_{A}} : A \rightleftarrows A+B\) and \( {{\varepsilon }_{B}}:B\rightleftarrows A+B\), the following diagram commutes:
The result is general and can be applied to arbitrary channels and sums.
I still haven't exactly shown how this is useful. To do that we introduce some inference rule that can be used to reason from the periphery to the core and back again in the channel.
A sequent \( \langle \Gamma ,\Delta \rangle\) is a pair of sets of types. A sequent \( \langle \Gamma ,\Delta \rangle\) is a sequent of a classification \( A\) if all the types in \( \Gamma\) and \( \Delta\) are in \( typ(A)\).
Def 5. Given a classification \( A,\) a token \( a\in tok(A)\) is said to satisfy a sequent \( \langle \Gamma ,\Delta \rangle\) of \( A\), if \( a{{\vDash }_{A}}\alpha\) for every type \( \alpha \in \Gamma\) and \( a{{\vDash }_{A}}\alpha\) for some type \( \alpha \in \Delta\). If every \(a\in tok(A)\) satisfies \( \langle \Gamma ,\Delta \rangle\), then we say that \( \Gamma\) entails \( \Delta\) in \(A\), written \( \Gamma {{\vdash }_{A}}\Delta\) and \( \langle \Gamma ,\Delta \rangle\) is called a constraint of \( A\).
Barwise and Seligman introduce two inference rules: f-Intro and f-Elim. Given an infomorphism from a classification A to a classification C, \( f:A\rightleftarrows C\):
\( f\text{-Intro: }\frac{{{\Gamma }^{-f}}{{\vdash }_{A}}{{\Delta }^{-f}}}{\Gamma {{\vdash }_{C}}\Delta }\)
\( f\text{-Elim: }\frac{{{\Gamma }^{f}}{{\vdash }_{C}}{{\Delta }^{f}}}{\Gamma {{\vdash }_{A}}\Delta }\)
The two rules have different properties. f-Intro preserves validity, f-Elim does not preserve validity; f-Intro fails to preserve invalidity, but f-Elim fails to preserve invalidity. f-Elim is however valid precisely for those tokens in A for which there is a token b of B mapping onto A by the infomorphism f.
Suppose then that we have a channel. At the core is a classification of flashlights, and and at the periphery are classifications of bulbs and switches. We can take a sum of the classifications of bulbs and switches. We know that there are infomorphisms from these classifications to the sum (and so this too makes up a channel), and using f-Intro, we know that any sequents of the classifications of bulbs and switches will still hold in the sum classifications: bulbs + switches. But note that the classification bulbs + switches, since it connects every bulb and switch token, any sequents that might properly hold between bulbs and switches will not hold in the sum classification. Similarly, all the sequents holding in the classification bulbs + switches will hold in the core of the flashlight channel. However, there will be constraints in the core (namely those holding between bulbs and switches) not holding in the sum classification bulbs + switches.
In brief: suppose that we know that a particular switch is in the On position, and that it is a constraint of switches that a switch being in the On position precludes it being in the Off position. We can project this constraint into the core of the flashlight channel reliably. But in the channel additional constraints hold (the ones we are interested in). Suppose that in the core of the channel, there is a constraint that if a switch is On in a flashlight then the bulb is Lit in the flashlight We would like to know that because *this* switch is in the On position, that a particular bulb will be Lit. How can we do it? Using f-Elim we can pull back the constraint of the core to the sum classification. But note, that this constraint is *not valid* in the sum-classification. But it is not valid for precisely those bulbs that are not connected in the channel. In this way, we can reason from local information to a distant component of a system, but in so doing, we lose the guarantee that our reasoning is valid, and we lose the guarantee that it is sound.
[1] Barwise, Jon, and Jerry Seligman. 1997. Information Flow: The Logic of Distributed Systems. Cambridge tracts in theoretical computer science 44. Cambridge: Cambridge University Press.