Re: Subject reduction fails in Java

To: types@cs.indiana.edu
Subject: Re: Subject reduction fails in Java
From: Donald Syme <Donald.Syme@cl.cam.ac.uk>
Date: Mon, 22 Jun 1998 00:12:43 +0100
cc: pierce@cs.indiana.edu, hahosoya@cs.indiana.edu, David Turner <dnt@an-teallach.com>
Delivery-Date: Sun, 21 Jun 1998 20:27:27 -0500

Here's my take on the issue that Haruo et al. have raised.
It describes the method I currently use to tackle this kind of problem
in my theorem prover based verifications of subject reduction 
for Sophia and Susan's subset of Java (I'm sure others
have used the method before).  I hope it's of use.
I'm away for the next week so please be patient with follow ups.

First a general point: 
   (a) Sooner or later we want to be able to reason about 
       dynamic semantics that do not have access to additional type
       annotations.  

That is, ultimately we would like to be able to formulate
arguments for the type correctness of "real" implementations of 
languages, or at least close approximations to them.  These do not 
carry around much type information, and so further annotating 
our runtime apparatus with types is, from my perspective, "cheating".
It is in a way potentially giving us theoreticians access to more 
information than is really available to those poor
compiler writers. (A separate argument could be formulated to 
demonstrate that we don't really use this information to modify 
the course of a computation, but as it turns out we don't need the 
annotations anyway).

[ Note I prefer to call the dynamic semantics the "runtime machine" to 
emphasise that it must be an idealised interpreter, or, even better, a formal
description of a real implementation of, say, a JVM. ]

Now two specific points:
   (b) In general, it is impossible to give *exact* types to terms that
       occur in runtime machines.  We should use a compatibility relation
       instead.

   (c) It is a mistake is to assume that static typing rules 
       should also apply to *all* terms that appear at runtime.
       That is, we should not expect the rules for the compatibility 
       relation to be just the rules for the static type system plus
       widening.

Let's discuss (b) first. Typically a lot of information may be thrown 
away when compiling to runtime terms (e.g. assembly code), and execution 
itself may muddy the waters further.  For example an "honest" 
dynamic semantics for a JVM would use the same term to represent 
the null object and the 0 integer (of the correct length): we
can't tell, by looking at the dynamic structures alone, what types
the terms have.   Type erasure and other optimizations that occur in real
compilers pretty much guarantee there will be ambiguity in the 
types of runtime terms.

[ Aside: There is a similar problem with the runtime treatment of datatype
constructors for any faithful descriptions of Standard
ML: in most implementations datatype constructors get mapped
to a very ambiguous integer tag!  I believe (and forgive
me Myra if I'm wrong) that Myra van Inwegen hit a related
difficulty in her thesis [2], where she failed to use a compatibility
relation and instead tried to recover an exact type from insufficient 
runtime information. ]

The solution is to come up with a compatibility relation 
between types and runtime terms.  
Thus, 
     -- to type runtime structures we use compatibility
     -- to type statically we use exact types

So, to verify the type correctness of a runtime machine,
we must demonstrate the continuing compatibility
of term (and its derivatives) with its static type.
Compatibility must simply be strict enough to ensure:

     -- the machine doesn't get stuck (i.e. can always make a transition
        unless the program terminates)

     -- any outputs from the runtime machine are compatible with their
        expected types.

Applying this methodology to our two examples...

1. Conditionals

Say we have statically typed (b ? e1 : e2) to T, using the standard rule

                b : boolean 
                e_1 : T_1
                e_2 : T_2
                (T_1 widens to T_2 and T_2=T) 
			or (T_2 widens to T_1 and T_1=T)
		----------------------------------------
                ( b ? e_1 : e_2 ) : T

The compatibility (<:) rule for dynamic terms of this form will be:

                b <: boolean 
                e_1 <: T_1
                e_2 <: T_2
                T_1 widens to T
                T_2 widens to T
                ------------------------------------------------
                ( b ? e_1 : e_2 ) <: T 

T is not uniquely determined by this rule and need not be minimal: 
we just need to show that the type bounds derived during 
static checking are not exceeded during reduction.
(The rule is like the one Haruo et al. suggest except without the explicit
annotation, which we can fill in as part of the subject reduction
proof, based on the static type judgment for the corresponding original
term).

Thus, to use the original example,
     m (i, h) <: Object 
certainly holds (the compatibility rule for terms containing 
procedure calls will simply check compatibility of arguments and results
in the obvious way, since it has access to the types of the method "m")

as does
     (b ? i : h) <: Object

because Object is a common supertype.

Note it is a mistake is to suppose that static typing rules should also apply
to all terms that appear at runtime: looser rules must be used for
the terms that appear along the way (which is as we would expect:
compiled code can do whatever it likes for a while as long as all
paths eventually lead to a state where type sanity is restored.  The
compatibility relation is used to capture the invariant along the way).

[
This approach is roughly Sophia's (2.3) suggestion, applied to 
the notion of compatibility, which makes the
statement of subject reduction more succinct:

	If    \Gamma  |- e <: T, and e non-ground and e ---> e' then
	1. Case		* \Gamma |- e' <: T
	2. Case 	an exception was raised

        (plus various invariants needed to make the proof go through...)
]


2. Array Assignment

In the Java static semantics the array rule is

                arr : T[]
                idx : INT
                v : T'
                T' widens_to T
                -------------------------
                arr[idx] := v : void

Of course a runtime typecheck is needed for array assignments because
as execution progresses "arr" might become narrower, as might "v", and
no one knows a priori what situation we'll end up in.  So  (presuming the
evaluation of "arr" and "v" actually terminate) we will arrive at the
point where 

   - "arr" has runtime-type T''[] (via a real runtime type annotation)
   - "v" has runtime-type T'''

and we dynamically check that T''' widens to T''.

The question is "what terms appear along the way, and can we use the 
static rule above to type them? "  The answer is no: as
execution progresses, "arr" might become too narrow for this rule,
even though "v" might subsequently reduce to something even narrower
so the runtime typecheck will pass.

Again the solution is not to add more type annotations, but to
weaken the rule for compatibility for runtime terms:

                arr <: T[]
                idx <: INT
                v <: T'
                -------------------------
                arr[idx] := v <: void

We assert no relation between T and T' (actually we may need some
weak relation depending on the strength of the runtime checks: e.g.
if one of T and T' is a primitive type then they must be identical,
since no runtime typechecks are carried out for primitive type
array assignments).

Finally, note that we do sometimes need a notion of "exact typing" 
for runtime terms, when reducing the left hand side of an assignment.
In Java we must prove that the "exact types" of field accesses and local
variable mutable cells are maintained by execution prior to
dereferencing, because there are no runtime checks for these.


Cheers,
Don

-----------------------------------------------------------------------------
At the lab:                                          At home:
The Computer Laboratory                              Trinity Hall
New Museums Site                                     CB2 1TJ
University of Cambridge                              Cambridge, UK
CB2 3QG, Cambridge, UK
Ph: +44 (0) 1223 334688                              Ph: +44 (0) 1223 563783
http://www.cl.cam.ac.uk/users/drs1004/
email: drs1004@cl.cam.ac.uk
-----------------------------------------------------------------------------
Prev: Re: Subject reduction fails in Java
Next: Subtyping with if
Index(es):
- Main
- Thread