`Linguistic annotation' is a term covering any transcription, translation or annotation of textual data or recorded linguistic signals. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of annotation formats and demonstrate a common conceptual core. This provides the foundation for an algebraic framework which encompasses the representation, archiving and query of linguistic annotations, while remaining consistent with many alternative file formats.
Although photometric data is a readily available dense source of information in intensity images, it is not widely used in computer vision. A major drawback is its dependence on viewpoint and incident illumination. A novel methodology is presented which extracts reflectivity information of the various materials in the scene independent of incident light and scene geometry. A scene is captured under three different narrow-band color filters and the spectral derivatives of the scene are computed. The resulting spectral derivatives form a spectral gradient at each pixel. This spectral gradient is a surface reflectance descriptor which is invariant to scene geometry and incident illumination for smooth diffuse surfaces. The invariant properties of the spectral gradients make them a particularly appealing tool in many diverse areas of computer vision such as color constancy, tracking, scene classification, material classification, stereo correspondence, even re-illumination of a scene.
Adaptive protocols are protocols which automatically adjust their behavior to runtime phenomena such as traffic or link characteristics. For such protocols, the behavioral adjustment is accomplished with some mechanism; the decision as to how much (if any) adjustment is needed is made under control of a it policy. Design and implementation of policies has often proven more challenging than that of mechanisms.
We make three contributions in this paper. First, we develop a rule of thumb for policy/mechanism separation and lay out a general set of challenges in policy module design. Second, as an illustration, we have analyzed a Forward-Error Correcting Code (FEC) for ATM supporting TCP/IP, and analytically identified a small robust set of "tunable parameters" to delineate regions where the code should be applied. Third, we exploited the analytic results and policy/mechanism separation to implement an adaptive protocol for network errors using programmable hardware to obtain high performance.
Measurements of TCP/IP traffic over a 155 Mbps ATM link augmented with two cooperating Programmable Protocol Processing Pipelines (P4s) were made with ttcp. The new policy module improves throughput by up to 50% over an unaugmented TCP/IP in the face of increasing bit error rates, and continue operation into a range of bit error rates where an unmodified TCP practically ceases to function.
Lack of security is a major threat to ``Active Networking,'' as programmability creates numerous opportunities for mischief. The point at which programmability is exposed, e.g. through the loading of code into network elements, must therefore be carefully crafted to ensure security.
This paper makes two contributions. First, it describes the implementation of a solution, the Secure Active Network Environment (SANE), which is intended to operate on an active network router. The SANE architecture provides a secure bootstrap process, which includes cryptographic certificate exchange and results in execution of a module loader for introducing new code, as well as a packet execution environment. SANE thus permits a direct comparison of security implications of active packets (such as ``capsules'') with active extensions (used for ``flows'' of packets).
The second contribution of the paper is a performance study using a combination of execution traces and end-to-end throughput measurements. The example code performs an ``active ping'' and allows us to break down costs into categories such as authentication. In our SANE implementation on 533 Mhz Alpha PCs, securing active packets effectively increases the time required to process a packet by a third. This result implies that the majority of packets must remain unauthenticated in high performance active networking solutions. We discuss some solutions which preserve security.
Proposals for programmable network infrastructures, such as Active Networks and Open Signaling, provide programmers with access to network resources and data structures. The motivation for providing these interfaces is accelerated introduction of new services, but exposure of the interfaces introduces many new security risks. The risks can be reduced or eliminated via appropriate restrictions on the exported interfaces, as we demonstrate in our Secure Active Network Environment (SANE). SANE restricts the actions that loaded modules (including ``capsules'') can perform by restricting the resources that can be named; this model is extended to remote invocation by means of cryptographic certificates.
We have extended SANE to support restricted control of Quality of Service in a programmable network element. The Piglet lightweight device kernel provides a ``virtual clock''-like scheduling discipline for network traffic, and exports several tuning knobs with which the clock can be adjusted. The ALIEN active loader provides safe access to these knobs to modules which operate on the network element. Thus, the SQoSH architecture is able to provide safe, secure access to network resources, while allowing these resources to be managed by end users needing customized networking services. A desirable consequence of SQoSH's integration of access control and resource control is that a large class of denial of service attacks, unaddressed solely with access control and cryptographic protocols, can now be prevented. The ALIEN active loader provides safe access to these knobs to modules which operate on the network element. Thus, the SQoSH architecture is able to provide safe, secure access to network resources, while allowing these resources to be managed by end users needing customized networking services. A desirable consequence of SQoSH's integration of access control and resource control is that a large class of denial of service attacks, unaddressed solely with access control and cryptographic protocols, can now be prevented.
We provide some performance measurements to illustrate the cost of security, and demonstrate that these costs are minor in the context of managing a multimedia stream.
We begin the paper with an outline of the general properties of FPGAs that create risks. We then explain how to exploit these risks, and demonstrate through directed experiments that they are exploitable even in the absence of detailed layout information. We prove our point by demonstrating the first known FPGA virus and its effect on the current absorbed by the device, namely that the device is destroyed. We close by outlining possible methods of defense and point out the similarities and differences between FPGA and software viruses.
Any large scale security architecture that uses certificates to provide security in a distributed system will need some automated support for moving certificates around in the network. We believe that for efficiency, this automated support should be tied closely to the consumer of the certificates: the policy verifier. As a proof of concept, we have built QCM, a prototype policy language and verifier that can direct a retrieval mechanism to obtain certificates from the network. Like previous verifiers, QCM takes a policy and certificates supplied by a requester and determines whether the policy is satisfied. Unlike previous verifiers, QCM can take further action if the policy is not satisfied: QCM can examine the policy to decide what certificates might help satisfy it and obtain them from remote servers on behalf of the requester. This takes place automatically, without intervention by the requester; there is no additional burden placed on the requester or the policy writer for the retrieval service we provide. We present examples that show how our technique greatly simplifies certificate-based secure applications ranging from key distribution to ratings systems, and that QCM policies are simple to write. We describe our implementation, and illustrate the operation of the prototype.
Virtual environment (VE) designs have evolved from text-based to immersive graphical systems. The next logical step of this evolution is to have a fully immersive environment in which thousands of widely distributed users will be able to move around and interact. This requires a VE architecture that can scale well for a large number of participants while providing the necessary support for quality of service, security and flexibility. Current VE architectures are unable to fully meet these requirements and a new network/protocol architecture is needed.
The VENUS approach addresses these problems by creating a network architecture which is scalable and flexible. We define a new architecture consisting of a transmit-only satellite/server and bi-directional links which will be capable of sustaining a wide-area virtual environment. We then offer the preliminary results of our experiments.
While the topic of object-oriented databases is by now fairly mature -- a standard for data definition and query languages has been developed -- the question of updates has largely been ignored. Updates on object-oriented databases are relegated to methods, which are written in third-generation languages such as C++ and therefore appear as ``black boxes'' to the query processor. This poses two problems: First, it is difficult to reason about whether an update is non-deterministic. Non-determinism arises due to iteration over sets and potential sharing between elements of the set. Second, these updates are system-specific and cannot be optimized. We propose an optimizable language for updating object-oriented databases called OQL+. The look and feel of OQL+ is similar to OQL, with the addition of insert, update and delete primitives. OQL+ is expressive enough to capture most updates that users might want to perform on an object-oriented database while maintaining the simplicity of extensions for updates as in SQL. We then discuss the problem of non-deterministic updates in the language, and give a proof technique for detecting potential non-determinism.
Right ventricular (RV) dysfunction can serve as an indicator of heart and lung disease and can adversely affect the left venticle (LV). However, normal RV function must be characterized before abnormal states can be detected. We can describe a method for reconstructing the 3D motion of the RV images by fitting of a deformable model to extracted tag and contour data from multiview tagged magnetic resonance images(MRI). The deformable model is a biventricular finite element mesh built directly from the contours. Our approach accommodates the geometrically complex RV by using the entire lengths of the tags, localized degrees of freedom (DOFs), and finite elements for geometric modeling. We convert the results of the reconstruction into potentially useful motion variables, such as strains and displacements. The fitting technique is applied to synthetic data, two normal hearts, and a heart with right ventricular hypertrophy (RVH). The results in this paper are limited to the RV free wall and septum. We find noticeable differences between the motion variables calculated for the normal volunteers and the RVH patient.
When doing high field (1.5T) magnetic resonance breast imaging, the use of a compression plate during imaging after a contrast-agent injection may critically change the enhancement characteristics of the tumor, making the tracking of its boundaries very difficult. A new method for clinical breast biopsy is presented, based on a deformable finite element model of the breast. The geometry of the model is constructed from MR data, and its mechanical properties are based on a non-linear material model. This method allows imaging the breast without compression before the procedure, then compressing the breast and using the finite element model to predict the tumor's position. The axial breast contours and the segmented slices are ported to a custom-written MR-image contour analysis program, which generates a finite element model (FEM) input file readable by a commercial FEM software. A deformable silicone gel phantom was built to study the movement of an inclusion inside a deformable environment. The hyperelastic properties of the phantom materials were evaluated on an Instron Model 1331 mechanical testing machine. The phantom was placed in a custom-built pressure device, where a pressure plate caused a 14% (9.8mm) compression. The phantom was imaged in a 1.5T magnet (axial and coronal), in the undeformed and deformed states. An FEM of the phantom was built using the custom-written software from the MR data, and another FEM of the phantom was built using a commercial pre-processor from the phantom's directly measured dimensions. The displacements of the inclusion center and its boundaries were calculated, both from the experimental and FEM results. The calculated displacements from both models are within 0.5mm of each other, and agree within 1.0mm with the experimental results. This difference is within the imaging error.
Active Networks promise greater flexibility than current networks, but threaten safety and security by virtue of their programmability. In this paper, we describe the design and implementation of a security architecture for the active network PLANet. Security is obtained with a two-level architecture that combines a functionally restricted packet language, PLAN, with an environment of general-purpose service routines governed by trust management. In particular, we employ a technique which expands or contracts a packet's service environment based on its level of privilege, termed namespace-based security. As an application of our security architecture, we present the design and implementation of an active-network firewall. We find that the addition of the firewall imposes an approximately 34% latency overhead and as little as a 6.7% space overhead to incoming packets.
Chunks are a programming construct in PLAN, the Packet Language for Active Networks, comprised of a code segment and a suspended function call. In PLAN, chunks provide support for encapsulation and other packet programming techniques. This paper begins by explaining the semantics and implementation of chunks. We proceed, using several PLAN source code examples, to demonstrate the usefulness of chunks for micro-protocols, asynchronous adaptation, and as units of authentication granularity.
This paper addresses the basic principles, performance measures and applications associated with laser scanner technologies. The objective of this report is to communicate and disseminate pertinent information related to state-of-the-art laser measurement systems that are currently available through commercial and research means. This paper should serve two-fold: (1) as a basic tutorial to laser scanning technology and (2) as a guide to current manufacturers and researchers of this technology.
A partial type inference technique should come with a simple and precise specification, so that users predict its behavior and understand the error messages it produces. Local type inference techniques attain this simplicity by inferring missing type information only from the types of adjacent syntax nodes, without using global mechanisms such as unification variables. The paper reports on our experience with programming in a full-featured programming language including higher-order polymorphism, subtyping, parametric datatypes, and local type inference. On the positive side, our experiments on several nontrivial examples confirm previous hopes for the practicality of the type inference method. On the negative side, some proposed extensions mitigating known expressiveness problems turn out to be unsatisfactory on close examination.
This paper investigates the problem of recovering information about the configuration of an articulated object, such as a human figure, from point correspondences in a single image. Unlike previous approaches, the proposed reconstruction method does not assume that the imagery was acquired with a calibrated camera. An analysis is presented which demonstrates that there are a family of solutions to this reconstruction problem parameterized by a single variable. A simple and effective algorithm is proposed for recovering the entire set of solutions by considering the foreshortening of the segments of the model in the image. Results obtained by applying this algorithm to real images are presented.
TinkerType is a framework for compact and modular description of formal systems (type systems, operational semantics, logics, etc.). A family of related systems is broken down into a set of CLAUSES---individual inference rules---and a set of FEATURES controlling the inclusion of clauses in particular systems. Dependency relations on both clauses and features and a simple form of ``judgement signatures'' are used to check the consistency of the generated systems.
As an application, we present a substantial repository of typed lambda-calculi, including systems with subtyping, polymorphism, type operators and kinding, computational effects, and dependent and recursive types. The repository describes both declarative and algorithmic aspects of the systems, and can be used with our tool, the TinkerType Assembler, to generate calculi either in the form of typeset collections of inference rules or as executable ML typecheckers.
In many ways, the Web has become the largest knowledge base known to us. The problem facing the user now is not that the information he seeks is not available, but that it is not easy for him to extract exactly what he needs from what is available. It is also becoming clear that a top down approach of gathering all the information, and structuring it will not work, except in some special cases. Indeed, most of the information is present in HTML documents structured only for visual content. Instead, new tools are being developed that attack this problem from a different angle. XML is a language that allows the publisher of the data to structure it using markup tags. These mark-up tags clarify not only the visual structure of the document, but also the semantic structure. Additionally, one can make use of a query language XML-QL to query XML pages for information, and to merge information from disparate XML sources. However, most of the content of the web is published in HTML. The W4F system allows us to construct wrappers that retrieve web pages, extract desired information using the HTML structure and regular expression search and map it automatically to XML with its XML-Gateway feature. In this thesis, we investigate the W4F/XML-QL paradigm to query the web. Two examples are presented. The first is the Internet Movie Database, and we query it with the idea of understanding the power of these systems. The second is the NCBI BLAST server, which is a suite of programs for biomolecular sequence analysis. We demonstrate that there are real life instances where this paradigm promises to be extremely useful.
Inner classes in object-oriented languages play a role similar to nested function definitions in functional languages, allowing an object to export other objects with direct access to its own methods and instance variables. However, the similarity is deceptive: a close look at inner classes reveals significant subtleties arising from their interactions with inheritance.
The goal of this work is a precise understanding of the essential features of inner classes; our object of study is a fragment of Java with inner classes and inheritance (and almost nothing else). We begin by giving a direct reduction semantics for this language. We then give an alternative semantics by translation into a yet smaller language with only top-level classes, closely following Java's Inner Classes Specification. We prove that the two semantics coincide, in the sense that translation commutes with reduction, and that both are type-safe.
We present a new scheme for _active_, or programmable, packets based upon a new packet language, SNAP (Safe Networking with Active Packets). SNAP's semantics permit us to prove that all SNAP programs are safe with respect to network resource usage and evaluation isolation. Furthermore, we describe an implementation of a SNAP interpreter, _snapd_, which achieves high performance for standard networking tasks. This work represents the first active packet system that is demonstrated to be _both_ safe and efficient.
Several recent studies have introduced lightweight versions of Java: reduced languages in which complex features like threads and reflection are dropped to enable rigorous arguments about key properties such as type safety. We carry this process a step further, omitting almost all features of the full language (including interfaces and even assignment) to obtain a small calculus, Featherweight Java, for which rigorous proofs are not only possible but easy.
Featherweight Java bears a similar relation to full Java as the lambda-calculus does to languages such as ML and Haskell. It offers a similar computational ``feel,'' providing classes, methods, fields, inheritance, and dynamic typecasts, with a semantics closely following Java's. A proof of type safety for Featherweight Java thus illustrates many of the interesting features of a safety proof for the full language, while remaining pleasingly compact. The syntax, type rules, and operational semantics of Featherweight Java fit on one page, making it easier to understand the consequences of extensions and variations.
As an illustration of its utility in this regard, we extend Featherweight Java with GENERIC CLASSES in the style of GJ (Bracha, Odersky, Stoutamire, and Wadler) and give a detailed proof of type safety. The extended system formalizes for the first time some of the key features of GJ.
We describe a tool for compressing XML data, called XMill, that usually achieves about twice the compression ratio of gzip at roughly the same speed. The intended applications are XML data exchange and archiving. XMill does not need schema information (such as a DTD or an XML-Schema), but can exploit hints about such a schema in order to further improve the compression ratio. XMill incorporates and combines existing compressors in order to compress heterogeneous XML data: it uses zlib, the library function for gzip, as well as a collection of datatype specific compressors. XMill can be extended with new specialized compressors: this is useful in applications managing XML data with highly specialized data types, such DNA sequences, images, etc. The paper presents a theoretical justification for the method used, XMill architecture and implementation, a new languages for expression the hints about the XML schema, and a series of experiments validating XMill on several real data sets.
XML has become an important standard for the representation and the exchange of data over the Internet. While several key aspects of XML have been investigated, such as query languages, type systems, and storage models, the issue of incrementally maintaining XML views is largely unstudied. XML views differ from relational views in two essential ways: 1) There is no rigid type system, and 2) The query definition often performs complex restructuring far beyond the typical select-project-join query definition in relational views.
In this paper, we address the problem of incrementally maintaining views over XML data with key constraints. We describe a system called WHAX (Warehouse Architecture for XML) that allows the definition and incremental maintenance of views over existing relational and XML data sources with keys. Our query language supports important operations, such as joins, aggregations, regrouping, and restructuring operations such as flattening. We generalize several well-known results about view maintenance in the relational model based on the notion of "multi-linearity".
In a previous paper we proposed a novel method for generating alternative query plans that uses chasing (and back-chasing) with logical constraints. The method brings together use of indexes, use of materialized views, semantic optimization and join elimination (minimization). Each of these techniques is known sepa- rately to be bene cial to query optimization. The novelty of our approach is in allowing these techniques to interact systematically, eg. non-trivial use of indexes and materialized views may be enabled only by semantic constraints. We have implemented our method for a variety of schemas and queries. We examine how far we can push the method in term of complexity of both schemas and queries. We propose a technique for reducing the size of the search space by "stratifying" the sets of constraints used in the (back)chase. The experimental results demonstrate that our method is practical (i.e., feasible and worthwhile).