Technical Report Abstracts

2002

MS-CIS-02-01

Querying XML with Mixed and Redundant Storage

Alin Deutsch and Val Tannen

This paper examines some of the issues that arise in the process of XML publishing of mixed-storage proprietary data. We argue that such data will reside typically in RDBMS's and/or LDAP, etc, augmented with a set of native XML documents. An additional challenge is to take advantage of redundancy in the storage schema, such as mixed materialized views that are stored for the purpose of enhancing performance.

We argue that such systems need to take into consideration mappings in both directions between the proprietary schema and the published schema. Thus, reformulating queries on the (published) XML schema into executable queries on the stored data will require the effect of both composition-with-views (as in SilfRoute and XPERANTO) and rewriting-with-views (as in the Information Manifold and Agora).

Using any of the simple encodings of relational data as XML, the mappings between schemas and the materialized views can be expressed in XQery, just like the queries on the published schema. For query reformulation we give an algorithm that uses logical assertions to capture formally the semantics of a large part of XQuery. We also give a completeness theorem for our reformulation algorithm. The algorithm was implemented in an XML query rewriting system and we present a suite of experiments that validate this technique.

MS-CIS-02-02

Report on Verification of the MoBIES Vehicle-Vehicle Automotive OEP Problem

Franjo Ivancic

The DARPA MoBIES Automotive Vehicle-Vehicle Open Experimental Platform [14] defines a longitudinal controller for the leader car of a platoon moving in an Intelligent Vehicle Highway System (IVHS) autonomously. The challenge is to verify that cars using this longitudinal controller provide a safe (that is, collision-free) ride. This report presents the process of verifying this particular controller using our CHARON [2] toolkit. In particular, it involves modeling and simulation of the system in CHARON, and verifying the controller using our predicate abstraction technique for hybrid systems [3].

MS-CIS-02-03

Validating Constraints in XML

Yi Chen, Susan B. Davidson and Yifeng Zheng

MS-CIS-02-04

Constraint Preserving XML Storage in Relations

Yi Chen, Susan B. Davidson and Yifeng Zheng

MS-CIS-02-05

What Are Real DTDs Like

Byron Choi

DTDs have proved important in a variety of areas: transformations between XML and databases, XML storage,XML publishing, consistency analysis of XML specifications, typechecking, and optimization of XML queries. Much of this work depends on certain assumptions about DTDs, e.g., the absence of recursion and nondeterminism. With this comes the need to justify these assumptions against DTDs in the real world. This paper surveys a number of DTDs collected from the Web, and provides statistics with respect to a variety of criteria commonly discussed in XML research.

MS-CIS-02-06

Cue Integration using Affine Arithmetic and Gaussians

Siome Goldenstein, Christian Vogler and Dimitris Metaxas

In this paper we describe how the connections between affine forms, zonotopes, and Gaussian distributions help us devise an automated cue integration technique for tracking deformable models. This integration technique is based on the confidence estimates of each cue.

We use affine forms to bound these confidences. Affine forms represent bounded intervals, with a well-defined set of arithmetic operations. They are constructed from the sum of several independent components. An n-dimensional affine form describes a complex convex polytope, called a zonotope. Because these components lie in bounded intervals, Lindeberg's theorem, a modified version of the central limit theorem, can be used to justify a Gaussian approximation of the affine form.

We present a new expectation-based algorithm to find the best Gaussian approximation of an affine form. Both the new and the previous algorithm run in O(n^2m) time, where n is the dimenstion of the affine form, and m is the number of independent components. The constants in the running time of new algorithm, however, are much smaller, and as a result it runs 40 times faster than the previous one for equal inputs.

We show that using the Berry-Esseen theorem it is possible to calculate an upper bound for the error in the Gaussian approximation. Using affine forms and the conversion algorithm, we create a method for automatically integrating cues in the tracking process of a deformable model. The tracking process is described as a dynamical system, in which we model the force contribution of each cue as an affine form. We integrate their Gaussian approximations using a Kalman filter as a maximum likelihood estimator. This method not only provides an integrated result that is dependent on the quality of each one of the cues, but also provides a measure of confidence in the final result. We evaluate our new estimation algorithm in experiments, and we demonstrate our deformable model-based face tracking system as an application of the algorithm.

MS-CIS-02-07

Safety and Performance in an Open Packet Monitoring Architecture

K.G. Anagnostakis, S. Ioannidis, S. Miltchev, J. Ioannidis, M. Greenwald, J.M. Smith

Packet monitoring arguably needs the flexibility of open architectures and active networking. A significant challenge in the design of open packet monitoring systems is how to effectively strike a balance between flexibility, safety and performance. In this paper we investigate the performance of FLAME, a system that emphasizes flexibility by allowing applications to execute arbitrary code for each packet received. Our system attempts to achieve high performance without sacrificing safety by combining the use of a type-safe language, lightweight run-time checks, and fine-grained policy restrictions. Experiments with our prototype implementation demonstrate the ability of our system to support representative application workloads on Bgit/s links. Such performance indicates the overall efficiency of our approach; more narrowly targeted experiments demonstrate that the overhead required to provide safety is acceptable.

MS-CIS-02-08

Shape Priors in Medical Image Analysis: Extensions of the Level Set Method

Albert Montillo

The 3D medical image segmentation problem typically involves assigning labels to 3D pixels, called voxels, which comprise a given medical volume. In its simplest form the segmentation problem involves assigning the labels "part of the structure of interest" or "not part of the structure" to each voxel using locally measured properties and prior knowledge of human anatomy. Robust segmentation remains an open research problem today due to the significant challenges in the task including: partial volume averaging, overlapping intensity distributions and image noise. In the face of these challenges prior knowledge needs to be added to make the segmentation methods more robust. Active contours were introduced in the late 1980's mainly to address situations in which the object to be segmented had a single closed boundary. To address situations in which the object(s) to be segmented have unknown topology the level set framework was recently introduced to segment medical images. Unlike active contours, the level set method relies on an implicit shape representation rather than an explicit shape representation and hence new methods to impose prior knowledge about expected shape have to be devised for the new framework. This paper explores recent segmentation methods from four research groups which address the task of imposing prior knowledge of shape for object boundary segmentation. Three of the methods impose priors onto the level set technique and one employs a medial axis shape representation and statistical shape information to guide a model-based segmentation. All of the methods include a notion of a statistical shape distribution. Each method is described, analyzed for its strengths and weaknesses. The paper concludes with a comparison of all four methods and recommendations for their applicability and extension.

MS-CIS-02-09

Extending Compositional Message Sequence Graphs

Benedikt Bollig, Martin Leucker, and Philipp Lucas

We extend the formal developments for message sequence charts (MSCs) to support scenarios with lost and found messages. We define a notion of extended compositional message sequence charts (ECMSCs) which subsumes the notion of compositional message sequence charts in expressive power but additionally allows to define lost and found messages explicitly. As usual, ECMSCs might be combined by means of choice and repetition towards (extended) compositional message sequence graphs. We show that---despite extended expressive power---model checking of monadic second-order logic (MSO) for this framework remains to be decidable. The key technique to achieve our results is to use an extended notion for linearizations.

MS-CIS-02-10

Quantitative Modeling of Contention in Distributed Algorithms

Michael Greenwald

MS-CIS-02-11

Testing the Electronic Throttle Control

Hyoung S. Hong, Insup Lee, Na Young Lee, Martin Leucker, Oleg Sokolsky

In this report, we summarize our approach for testing the Electronic Throttle Control (ETC) system. We reformulate the ETC model based on the MATLAB/SIMULINK model provided by the Berkeley group. We specify the ETC model using the hybrid modeling language called Charon. From the Charon model, we generate test sequences based on the control-flow and data-flow criteria. These are transformed into test cases which may be used to test an implementation of the ETC system.

MS-CIS-02-12

Formal Modeling and Analysis of Power-Aware Real-Time Systems

Insup Lee, Anna Philippou, and Oleg Sokolsky

The paper describes a unified formal framework for designing and reasoning about power-constrained, timed systems. The framework is based on process algebra, a formalism which has been developed to describe and analyze communicating, concurrent systems. The proposed extension allows the modeling of probalistic resource failures, priorities of resource usages, and power consumption by resources within the same formalism. Thus, it is possible to study several alternative power-consumption behaviors and tradeoffs in their timing and other characteristics. This paper describes the modeling and analysis techniques, and illustrates them with examples, including a dynamic voltage-scaling algorithm.

MS-CIS-02-13

The PEPL Specification

Sotiris Ioannidis

MS-CIS-02-14

Classification Theorem for Compact Surfaces

Dianna Xu

The classification theorem for compact surfaces is a formidable result. This result was obtained in the early $1920$'s, and was the culmination of the work of many. The theorem gives a simple way of obtaining all compact $2$-manifolds, moreover, as a result of the theorem, it's possible to decide whether or not any two compact surfaces are homeomorphic rather easily. Before the statement of the theorem, quite a bit of basic topological concepts are first introduced, including connectivity, compactness and quotient topology. In addition to that, a rigorous proof requires, among other things, a precise definition of a surface, orientability, a notion of generalized triangulation, and a precise way of determining whether two surfaces are homeomorphic, which requires some notion of algebraic topology. All of the above brings together the final proof of the theorem.

MS-CIS-02-15

On the Sensitivity of Network Simulation to Topology

Kostas G. Anagnostakis, Raphael S. Ryger, Michael B. Greenwald

While network simulations for congestion control studies have often varied traffic loads and protocol parameters, they have typically investigated only a few topologies. The most common is by far the so-called ``barbell'' topology. In this paper we argue, first, that the barbell topology is not representative of the Internet. In particular, we report that a measurable fraction of packets pass through multiple congestion points. Second, we argue that the distinction between the ``barbell'' topology and more complex topologies is relevant by presenting a scenario with multiple congestion points that exhibits behavior that seems unexpected based on intuition derived from the barbell topology (in particular, a TCP-only system that exhibits behavior technically considered ``congestion collapse''). We make the larger argument that the typical methodology currently accepted for evaluating network protocols is flawed. Finally, we briefly comment on some issues that arise in designing a simulation methodology that will be better suited to comparison of network protocol performance.

MS-CIS-02-16

Propagating XML Constraints to Relations

S. Davidson, W. Fan, C. Hara and J. Qin

We present a technique for refining the design of relational storage for XML data. The technique is based on XML key propagation : given a set of keys on XML data, what functional dependencies must hold on a relational view of the data? With the functional dependencies one can convert the relational design into, e.g. 3NF, BCNF, and thus develop efficient relational storage for XML data. We provide several algorithms for computing XML key propagation. One algorithm is to check whether a functional dependency is propagated from XML keys via a predefined view, which helps one determine whether the design is in a normal form; the others are to compute a minimum cover for all functional dependencies on a universal relation given certain XML keys, which provides guidance for how to design a view. Our experimental results show that these algorithms are efficient in practice. We also investigate the complexity of propagating other XML constraints to relations. The ability to compute XML key propagation is a first step toward establishing the connection between XML data and its relational representation at the semantic level. Our algorithms can be incorporated into other relational storage techniques. They can also be generalized to study transformations of hierarchically structured data such as scientific databases.

MS-CIS-02-17

Real time voice processing with audiovisual feedback: Toward autonomous agents with perfect pitch

Lawrence K. Saul, Daniel D. Lee, Charles L. Isbell, Yann LeCun

We have implemented a real time front end for detecting voiced speech and estimating its fundamental frequency. The front end performs the signal processing for voice-driven agents that attend to the pitch contours of human speech and provide continuous audiovisual feedback. The algorithm we use for pitch tracking has several distinguishing features: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high resolution estimates; and it works reliably over a four octave range, in real time, without the need for postprocessing to produce smooth contours. The algorithm is based on two simple ideas in neural computation: the introduction of a purposeful nonlinearity, and the error signal of a least squares fit. The pitch tracker is used in two real time multimedia applications: a voice-to-midi player that synthesizes electronic music from vocalized melodies, and an audiovisual Karaoke machine with multimodal feedback. Both applications run on a laptop and display the user's pitch scrolling across the screen as he or she sings into the computer.

MS-CIS-02-18

Think globally, fit locally: Unsupervised learning of nonlinear manifolds

Lawrence K. Saul, and Sam T. Roweis

The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. Here we describe locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data. The data, assumed to lie on a nonlinear manifold, is mapped into a single global coordinate system of lower dimensionality. The mapping is derived from the symmetries of locally linear reconstructions, and the actual computation of the embedding reduces to a sparse eigenvalue problem. Notably, the optimizations in LLE - though capable of generating highly nonlinear embeddings - are simple to implement, and they do not involve local minima. We describe the implementation of the algorithm in detail and discuss several extensions that enhance its performance. The algorithm is applied to manifolds of known structure, as well as real data sets consisting of images of faces, digits, and lips. We provide extensive illustrations of the algorithm's performance.

MS-CIS-02-19

Multiplicative updates for nonnegative quadratic programming in support vector machines

Fei Sha, Lawrence K. Saul, and Daniel D. Lee

We derive multiplicative updates for solving the nonnegative quadratic programming problem in support vector machines (SVMs). The updates have a simple closed form, and we prove that they converge {monotonically} to the solution of the maximum margin hyperplane. The updates optimize the traditionally proposed objective function for SVMs. They do not involve any heuristics such as choosing a learning rate or deciding which variables to update at each iteration. They can be used to adjust all the quadratic programming variables {in parallel} with a guarantee of improvement at each iteration. We analyze the asymptotic convergence of the updates and show that the coefficients of non-support vectors decay geometrically to zero at a rate that depends on their margins. In practice, the updates converge very rapidly to good classifiers.

MS-CIS-02-20

Integral Transforms and Signal Processing on the 2-Sphere

Daniel Rudoy

This paper is concerned with the group-theoretical background of integral transforms and the role they play in signal processing on the sphere. An overview of topological groups, measure and representation theories, and an overview of the Windowed Fourier Transform and the Continuous Wavelet Transform are presented. A group-theoretical framework within which these transforms arise is presented. The connection between integral transforms and square-integrable group representations is explored. The theory is also generalized beyond groups to homogeneous spaces. The abstract theory is then applied to signal processing on the sphere with a discussion of both global and local methods. A detailed derivation of the continuous spherical harmonic transform is presented. Global methods such as the spherical Fourier transform and convolution are presented in an appendix as well as some background material from group theory and topology.

MS-CIS-02-21

On the Feasibility of using Existing Infrastructure to Measure Network-Internal Delays

Kostas G. Anagnostakis, Michael Greenwald and Raphael S. Ryger

Several techniques have been proposed for measuring network-internal delays. However, those that rely on router responses have questionable performance, and all proposed alternatives require either new functionality in routers or the existence of a measurement infrastructure. In this paper we revisit the feasibility of measuring network-internal delays using only existing infrastructure, focusing on the use of ICMP Timestamp probes to routers. We present network measurements showing that ICMP Timestamp is widely supported and that TTL-responses often perform poorly, and we analyze the effect of path instability and routing irregularities on the performance and applicability of using ICMP Timestamp. We also confirm that router responses rarely introduce errors in our measurements. Finally, we present a practical algorithm for clock artifact removal that addresses problems with previous methods and has been found to perform well in our setting.

MS-CIS-02-22

Three Extensions to Register Integration

Amir Roth, Anne Bracy, and Vlad Petric

Register integration (or just integration) is a register renaming discipline that implements instruction reuse via physical register sharing. Initially developed to perform squash reuse, the integration mechanism is a powerful reuse tool that can exploit more reuse scenarios. In this paper, we describe three extensions to the initial integration mechanism that expand its applicability and boost its performance impact. First, we extend squash reuse to general reuse. Wheras squash reuse maintains the superscalar concept of an instruction instance "owning" its output physical register we allow multiple instructions to simultaneously and seamlessly share a single physical register. Next, we replace the PC-indexing scheme used by squash reuse with an opcode-based indexing scheme that exposes more integration opportunities. Finally, we introduce an extension called reverse integration in which we speculatively create integration entries for the inverses of operations-for instance, when renaming an add, we create an entry for the inverse subtract. Reverse integration allows us to reuse operations that were not specified by the original program. We use reverse integration to obtain a free implementation of speculative memory bypassing for stack-pointer based loads (register fills and restores).

Our evaluation shows that these extensions increase the integration rate-the number of retired instructions that integrate older results and bypass the execution engine-to an average of 17% on the SPEC2000 integer benchmarks. On a 4-way superscalar processor with an aggressive memory system, this translates into an average IPC improvement of 8%. The fact that integrating instructions completely bypass the execution engine raises the possibility of using integration as a low-complexity substitute for execution bandwidth and issue buffering. Our experiments show that such a trade-off is possible, enabling a range of IPC/complexity designs.

MS-CIS-02-23

A Quantitative Framework for Automated Pre-Execution Thread Selection

Amir Roth and Gurindar S. Sohi

Pre-execution attacks cache misses for which conventional address-prediction driven prefetching is ineffective. In pre-execution, copies of cache miss computations are isolated from the main program and launched as separate threads called p-threads whenever the processor anticipates an upcoming miss. P-thread selection is the task of deciding what computations should execute on p-threads and when they should be launched such that total execution time is minimized. P-thread selection is central to the success of pre-execution.

We introduce a framework for automated static p-thread selection, a static p-thread being one whose dynamic instances are repeatedly launched during the course of program execution. Our approach is to formalize the problem quantitatively and then apply standard techniques to solve it analytically. The framework has two novel components. The slice tree is a new data structure that compactly represents the space of all possible static p-threads. Aggregate advantage is a formula that uses raw program statistics and computation structure to assign each candidate static p-thread a numeric score based on estimated latency tolerance and overhead aggregated over its expected dynamic executions. Our framework finds the set of p-threads whose aggregate advantages sum to a maximum. The framework is simple and intuitively parameterized to model the salient microarchitecture features.

We apply our framework to the task of choosing p-threads that cover L2 cache misses. Using detailed simulation, we study the effectiveness of our framework, and pre-execution in general, under difference conditions. We measure the effect of constraining p-thread length, of adding localized optimization to p-threads, and of using various program samples as a statistical basis for the p-thread selection, and show that our framework responds to these changes in an intuitive way. In the microarchitecture dimension, we measure the effect of varying memory latency and processor width and observe that our framework adapts well to these changes. Each experiment includes a validation component which checks that the formal model presented to our framework correctly represents actual execution,

MS-CIS-02-24

DISE: Dynamic Instruction Stream Editing

E Christopher Lewis, Amir Roth, Marc Corliss

MS-CIS-02-25

The XQuery Formal Semantics: A Foundation for Implementation and Optimization

Byron Choi, Mary Fernandez and Jerome Simeon

XQuery is a strongly typed, functional language, which supports the common processing, transmation, and querying tasks of a wide variety of XML applications. Following the tradition of other functional languages. XQuery includes a complete formal semantics. In this paper, we argue that basing an XQuery implementation on the XQuery Formal Semantics not only ensures correctness, but is a good foundation for optimization. We describe an architecture that we have implemented and that is based on the XQuery Formal Semantics and describe several logical and physical optimizations that can be easily integrated in the above architecture.

MS-CIS-02-26

Using Supertag in MUC-7 Template Relation Task

Libin Shen and Jinying Chen

The Template Relation (TR) task is an information extraction problem introduced in the 7th Message Understanding Conference (MUC-7). In this paper, we have proposed an approach to converting the problem into a discriminative one. We obtain F-Measure of 78% on sentence-level relation which is comparable to the best systems presented on MUC-7, while almost no extra annotation work is required. In our approach, we first use Supertagger and Lightweight Dependency Analyzer to do shallow parsing on the text. Then features are abstracted from the shallow parsing result. We use these features as weak hypotheses, and combine them into the final one with the boosting algorithm.

MS-CIS-02-27

Dynamic Message Sequence Charts

M. Leucker, P. Madhusudan, S. Mukhopadhyay

MS-CIS-02-28

On-line payment method for small amount transactions using hierarchical prepaid cards

Toru Egashira and Jonathan M. Smith

MS-CIS-02-29

The Influence of ATM on Operating Systems

Jonathan M. Smith

The features of ATM offered many attractions to the application community, such as fine-grained multiplexing and high-throughput links. These created considerable challenges for the O.S. designer, since a small protocol data unit size (the 48 byte "cell") and link bandwidths within a (binary) order of magnitude of memory bandwidths demanded considerable rethinking of operating system structure.

Using an historical and personal perspective, this paper describes two aspects of that rethinking which I participated in directly, namely, those of new event signalling and memory buffering schemes. Ideas an dtechniques stemming from ATM network research influenced first research operating systems and then commercial operating systems. The positive results of ATM networking, although indirect, have benefitted applications an dsystems far beyond the original design goals.

MS-CIS-02-30

Reflections on Active Networking

Jonathan M. Smith

Interactions among telecommunications networks, computers, and other peripheral devices have been of interest since the earliest distributed computing systems. A key architectural question is the location (and nature) of programmability. One perspective, that examined in this paper, is that network elements should be as programmable as possible, in order to build the most flexible distributed computing systems.

This paper presents my personal view of the history of programmable networking over the last two decades, and in the spirit of "vox audita perit, littera scripta manet", includes an account of how what is now called "Active Networking" came into being. It demonstrates the deep roots Active Networking has in the programming languages, networking and operating systems communities, and shows how interdisciplinary approaches can have impacts greater than the sums of their parts. Lessons are drawn both from the broader research agenda, and the specific goals pursued in the SwitchWare project. I close by speculating on possible futures for Active Networking.

MS-CIS-02-31

A Hybrid Approach to Estimating per-link Network Queuing Delays

(Revised December 2003)

Kostas G. Anagnostakis and Michael B. Greenwald

The {\em network tomography} problem requires a remote source to estimate network-internal statistics on individual links within the network. Several techniques have been proposed to perform network delay tomography. The most accurate prior techniques directly measured delay by using ICMP Timestamp requests to the head and tail of the measured link. However, routing irregularities in the Internet significantly limited the applicability to individual links (the route to the head of the link had to be a proper prefix of the route to the tail of the link). Alternative techniques with broader coverage are all either less accurate, require new functionality in routers, or require the existence of a network-wide measurement infrastructure.
This report presents a network tomography technique that achieves accuracy comparable with the best prior techniques, can measure one-way delays, depends only on existing infrastructure, and is much more widely applicable than previous techniques with comparable accuracy. Our approach is to use a number of probes to directly measure overlapping multi-link segments in the neighborhood of the target link. The delay distributions on these segments are used as input to an inference algorithm that derives an estimate of the queuing delay distribution on the target link. The coverage of this technique is assessed by inspecting many thousands of Internet paths. The accuracy is evaluated through simulation (where knowledge of the real delay distributions is obtainable, and where such real distributions can be compared to the estimated delay distributions).

MS-CIS-02-32

Direct 3D-rotation estimation from spherical images via a generalized shift theorem

Ameesh Makadia, Kostas Daniilidis

Omnidirectional images arising from 3D-motion of a camera contain persistent structures over a large variation of motions because of their large field of view. This persistence made appearance-based methods attractive for robot localization given reference views. Assuming that central omnidirectional images can be mapped to the sphere, the question is what are the underlying mappings of the sphere that can reflect a rotational camera motion. Given such a mapping, we propose a systematic way for finding invariance and the mapping parameters themselves based on the generalization of the Fourier transform. Using results from representation theory, we can generalize the Fourier transform to any homogeneous space with a transitively acting group. Such a case is the sphere with rotation as the acting group. The spherical harmonics of an image pair are related to each other through a shift theorem involving the irreducible representation of the rotation group. We show how to extract Euler angles using this theorem. We study the effect of the number of spherical harmonic coefficients as well as the effect of violation of appearance persistence in real imagery.

MS-CIS-02-33

Heuristics for the Braid Conjugacy Witness Problem: extended abstract

Geoffrey B Milord and Max Kanovich

This extended abstract summarizes a heuristic for the braid conjugacy witness problem.

MS-CIS-02-34

Reachability Analysis of Hybrid Systems using Counter-Example Guided Predicate Abstraction

Rajeev Alur, Thao Dang, and Franjo Ivancic

Predicate abstraction has emerged to be a powerful technique for extracting finite-state models from infinite-state discrete programs. This report presents algorithms and tools for reachability analysis of hybrid systems by combining the notion of counter-example guided predicate abstraction with recent techniques for approximating the set of reachable states of linear systems using polyhedra. Given a hybrid system and a set of predicates, we consider the finite discrete quotient whose states correspond to all possible truth assignments to the input predicates. The tool performs an on-the-fly exploration of the abstract system. The success of this scheme crucially depends on the choice of the predicates used for abstraction. We identify such predicates automatically by analyzing spurious counter-examples generated by the search in the abstract state space. We present the basic techniques for guided search in the abstract state space, discovering new predicates that will rule out closely related spurious counter-examples, optimizations of these techniques, implementation of these in our verifier, and case studies demonstrating the promise of the approach.

MS-CIS-02-35

Shallow Parsing with Conditional Random Fields

By Fei Sha and Fernando Pereira

Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in natural language, shallow parsing has received much attention, with the development of standard evaluation datasets and extensive comparison among methods. We show here how to train a conditional random field to achieve performance as good as any reported base noun-phrase chunking method on the CoNLL task, and better than any reported single model. Improved optimization algorithms in training were critical to achieving these results. We present extensive comparisons among training methods that confirm and refine other published results for training maximum-entropy-style models.