Outline
The first project involved making extensions to a simple
(non-distributed) file synchronizer. This project goes further,
asking you to analyze and improve the behavior of a distributed
realization of a more refined synchronization algorithm.
Our Synchronizer
Sundar and I have put together a fairly straightforward realization of
a distributed file synchronizer, closely following the draft specification discussed in class.
The files are available here. As soon
as you've copied them into your own filespace, please change the
MYPORT
field in the Makefile
to some random
number between 1000 and 8000, to avoid possible clashes with others
running RMI servers on the same machine.
By default, the Snc
program will attempt to start its own
RMI registry and FileSystem server processes on the remote machine
(using rsh
). You can also run these processes explicitly
and tell Snc
which port to look on for the registry.
Assignment
- Empirical analysis of the performance of the Java RMI
implementation.
- Build a collection of small test programs to measure the
following aspects of RMI performance (and any others that you think
are relevant to the next part of the exercise).
- the cost of creating a remote object
- the cost of sending a remote object to a remote machine for the
first time
- the cost of a remote message-send and reply (i.e. the
minimum round-trip cost for a trivial remote method invocation)
- the cost of serializing a larger data structure such as a
Hashtable when sending it across the network.
- the cost of computing the MD5 signature of a file (as a
function of file size)
Include the numbers you obtain in your project writeup, as well as a
justification of your measurement methodology.
- Tuning.
- Typing
make time
will measure the total time
required to synchronize a small directory hierarchy over the network
(but using the same machine as both local and remote host),
under two different sets of assumptions: when every file has
been touched (because both sets of files were just created), and when
no files have been touched. The important figure is the
elapsed (wall clock) time from start to finish.
- Improve the synchronizer's performance as measured by this
benchmark as much as you can without changing the behavior
observed by the user (i.e., it should still satisfy the same
abstract specification).
- Possible improvements may include...
- changing the program's internal data structures so that
fewer network communications are required;
- parallelizing the algorithm so that parts of it can make
progress while other parts are waiting for network
communications;
- reducing the startup time taken by starting remote
servers in
ServerImpl.createRemote
(but doing
this just by reducing the timeout periods is a cheap solution)
- etc.
- Describe your improvements in the project write-up. Argue
(informally, but carefully), that you have not changed the
program's observable behavior.
- This part of the project will require that you build a fairly
intimate understanding of the distributed data structures
used by the present implementation and the patterns of
communication that they give rise to.
It may be helpful to run the synchronizer with the
-log
and/or -trace
options to gain an
understanding of where it is spending its time.
- Extensions. Choose one of the following (or, for extra
credit, more than one):
- Discuss ways in which the algorithm itself might be improved
(e.g. by caching additional information between runs of the synchronizer,
choosing a completely different specification of the
synchronization task, etc.) to achieve better performance in
realistic situations.
Make sure to identify your assumptions (small, large, or huge sets
of files being synchronized; high- or low-bandwidth network
connections; frequent or rare synchronizations; etc.)
Implement an improved algorithm and measure its performance.
- Discuss the security issues in the present realization of the
synchronizer.
Propose and implement a more secure synchronizer. Measure its
performance.
- (If you want to propose another kind of extension, make me an
offer... there are many possibilities.)
Deliverables
A finished project consists of the following files:
- A file
README
(or, if you prefer,
README.html
) describing the design and execution of
your changes. Since there is some more serious design work to be
done in this exercise, I'll expect a deeper discussion than in
previous projects.
- Java source files. (I will look at these, but the grading will be
based on the writeup in the README file, so make sure you describe all
your work coherently there.)
- A file
TIMINGS
containing the results of executing
make time (which calls the testharness script
provided in the handout directory) using your solution to part 2. (I'll
re-run the timings on my own machine for consistency; this file is
just for comparison.)
- If your solution to part 3 of the assignment is different from
your solution to part 2 (e.g., because it performs badly on the
particular dataset used by the test harness), put the part-3 sources
in a separate subdirectory.
Submission Procedure
Same as usual.
Due date
The project is due at the beginning of class on
Monday, October 27th.
B629: Languages for Programming the Web
Benjamin Pierce
(pierce@cs.indiana.edu)