[Prev][Next][Index][Thread]
Re: Subject reduction fails in Java
Here's my take on the issue that Haruo et al. have raised.
It describes the method I currently use to tackle this kind of problem
in my theorem prover based verifications of subject reduction
for Sophia and Susan's subset of Java (I'm sure others
have used the method before). I hope it's of use.
I'm away for the next week so please be patient with follow ups.
First a general point:
(a) Sooner or later we want to be able to reason about
dynamic semantics that do not have access to additional type
annotations.
That is, ultimately we would like to be able to formulate
arguments for the type correctness of "real" implementations of
languages, or at least close approximations to them. These do not
carry around much type information, and so further annotating
our runtime apparatus with types is, from my perspective, "cheating".
It is in a way potentially giving us theoreticians access to more
information than is really available to those poor
compiler writers. (A separate argument could be formulated to
demonstrate that we don't really use this information to modify
the course of a computation, but as it turns out we don't need the
annotations anyway).
[ Note I prefer to call the dynamic semantics the "runtime machine" to
emphasise that it must be an idealised interpreter, or, even better, a formal
description of a real implementation of, say, a JVM. ]
Now two specific points:
(b) In general, it is impossible to give *exact* types to terms that
occur in runtime machines. We should use a compatibility relation
instead.
(c) It is a mistake is to assume that static typing rules
should also apply to *all* terms that appear at runtime.
That is, we should not expect the rules for the compatibility
relation to be just the rules for the static type system plus
widening.
Let's discuss (b) first. Typically a lot of information may be thrown
away when compiling to runtime terms (e.g. assembly code), and execution
itself may muddy the waters further. For example an "honest"
dynamic semantics for a JVM would use the same term to represent
the null object and the 0 integer (of the correct length): we
can't tell, by looking at the dynamic structures alone, what types
the terms have. Type erasure and other optimizations that occur in real
compilers pretty much guarantee there will be ambiguity in the
types of runtime terms.
[ Aside: There is a similar problem with the runtime treatment of datatype
constructors for any faithful descriptions of Standard
ML: in most implementations datatype constructors get mapped
to a very ambiguous integer tag! I believe (and forgive
me Myra if I'm wrong) that Myra van Inwegen hit a related
difficulty in her thesis [2], where she failed to use a compatibility
relation and instead tried to recover an exact type from insufficient
runtime information. ]
The solution is to come up with a compatibility relation
between types and runtime terms.
Thus,
-- to type runtime structures we use compatibility
-- to type statically we use exact types
So, to verify the type correctness of a runtime machine,
we must demonstrate the continuing compatibility
of term (and its derivatives) with its static type.
Compatibility must simply be strict enough to ensure:
-- the machine doesn't get stuck (i.e. can always make a transition
unless the program terminates)
-- any outputs from the runtime machine are compatible with their
expected types.
Applying this methodology to our two examples...
1. Conditionals
Say we have statically typed (b ? e1 : e2) to T, using the standard rule
b : boolean
e_1 : T_1
e_2 : T_2
(T_1 widens to T_2 and T_2=T)
or (T_2 widens to T_1 and T_1=T)
----------------------------------------
( b ? e_1 : e_2 ) : T
The compatibility (<:) rule for dynamic terms of this form will be:
b <: boolean
e_1 <: T_1
e_2 <: T_2
T_1 widens to T
T_2 widens to T
------------------------------------------------
( b ? e_1 : e_2 ) <: T
T is not uniquely determined by this rule and need not be minimal:
we just need to show that the type bounds derived during
static checking are not exceeded during reduction.
(The rule is like the one Haruo et al. suggest except without the explicit
annotation, which we can fill in as part of the subject reduction
proof, based on the static type judgment for the corresponding original
term).
Thus, to use the original example,
m (i, h) <: Object
certainly holds (the compatibility rule for terms containing
procedure calls will simply check compatibility of arguments and results
in the obvious way, since it has access to the types of the method "m")
as does
(b ? i : h) <: Object
because Object is a common supertype.
Note it is a mistake is to suppose that static typing rules should also apply
to all terms that appear at runtime: looser rules must be used for
the terms that appear along the way (which is as we would expect:
compiled code can do whatever it likes for a while as long as all
paths eventually lead to a state where type sanity is restored. The
compatibility relation is used to capture the invariant along the way).
[
This approach is roughly Sophia's (2.3) suggestion, applied to
the notion of compatibility, which makes the
statement of subject reduction more succinct:
If \Gamma |- e <: T, and e non-ground and e ---> e' then
1. Case * \Gamma |- e' <: T
2. Case an exception was raised
(plus various invariants needed to make the proof go through...)
]
2. Array Assignment
In the Java static semantics the array rule is
arr : T[]
idx : INT
v : T'
T' widens_to T
-------------------------
arr[idx] := v : void
Of course a runtime typecheck is needed for array assignments because
as execution progresses "arr" might become narrower, as might "v", and
no one knows a priori what situation we'll end up in. So (presuming the
evaluation of "arr" and "v" actually terminate) we will arrive at the
point where
- "arr" has runtime-type T''[] (via a real runtime type annotation)
- "v" has runtime-type T'''
and we dynamically check that T''' widens to T''.
The question is "what terms appear along the way, and can we use the
static rule above to type them? " The answer is no: as
execution progresses, "arr" might become too narrow for this rule,
even though "v" might subsequently reduce to something even narrower
so the runtime typecheck will pass.
Again the solution is not to add more type annotations, but to
weaken the rule for compatibility for runtime terms:
arr <: T[]
idx <: INT
v <: T'
-------------------------
arr[idx] := v <: void
We assert no relation between T and T' (actually we may need some
weak relation depending on the strength of the runtime checks: e.g.
if one of T and T' is a primitive type then they must be identical,
since no runtime typechecks are carried out for primitive type
array assignments).
Finally, note that we do sometimes need a notion of "exact typing"
for runtime terms, when reducing the left hand side of an assignment.
In Java we must prove that the "exact types" of field accesses and local
variable mutable cells are maintained by execution prior to
dereferencing, because there are no runtime checks for these.
Cheers,
Don
-----------------------------------------------------------------------------
At the lab: At home:
The Computer Laboratory Trinity Hall
New Museums Site CB2 1TJ
University of Cambridge Cambridge, UK
CB2 3QG, Cambridge, UK
Ph: +44 (0) 1223 334688 Ph: +44 (0) 1223 563783
http://www.cl.cam.ac.uk/users/drs1004/
email: drs1004@cl.cam.ac.uk
-----------------------------------------------------------------------------