Using Pthreads in Parallel Computing
"Pthread Parallel K-means" [pdf]
December 2001
K-means is a popular non-hierarchical method for clustering large data sets. The time requirements increase linearly with the size of the data set which make it paticulary suited for extremely large data sets.such as those found in digital libraries. The method was developed by McQueen in 1967. In our project we take a uniprocessor k-means algorithm and implement a parallel k-means algorithm using pthreads. The algorithm we use is from the Normalized Cuts (Ncuts) code base of the Vision Group at UC Berkeley. The parellel implementation demonstrated good speedup and improved performance.
"Active Learning for Video
Annotation" [pdf],
technical report TJ Watson, August
2001
In this paper, we present the design and implemention of the Video Annotation Library. VAL interfaces with the IBM VideoAnn annotation tool and is designed to aide in the semantic labeling of video and image databases using active learning techniques. A list of confidence values and nearest neighbbor values is maintained for each video segment in the VideoAnn database. A variety of similarity measures are used interchangeably to create hierarchies (or cluster trees) of features of each video. The learning approach proposes sample video segments to the user for annotation and updates the database with the new annotations. It then uses its accumulative knowledge to label the rest of the database, after which it proposes new samples for the user to annotate. The proposed samples are selected by their knowledge gain to the active learner.
"A B-tree For Distributed Data Structures" [ps]
(errata)
[ppt]
[code]
This paper presents the design and implementation of a single node
b-tree storage layer for Ninja Distributed Data Structures (Gribble et
al.). The implementation follows the finite state machine (FSM)
programming model used in DDS and described by the Ninja Project. The
key features of the b-tree design are a lock-free and non-blocking
implementation. Control-flow based concurrency control (as
opposed to lock-based concurrency control) manages concurrency using
non-blocking FSMs and queues. Performance measurements show the b-tree
scales well and exhibits graceful degradation under heavy loads.
The in-and-out-of-core measurements on the DDS Buffer Cache showed the
primary contributor to latency was not I/O, but rather
computation. The CPU costs for traversing the b-tree index are the
dominating costs.
"Wide Area Events Using Distributed Data Structures" ( ppt
) ( PSN
interfaces
)
The Distributed Data Structure (DDS) API is a
powerful abstraction on which to easily build cluster applications
using
reusable components. We show how the DDS can support the
Publish-Subscribe-Notify
class of internet services efficiently. As an example, the "Food in the
Woz" application constructs a channel that notifies subscribers when
food
is available in the Woz lounge. The channel records user-preferences
(such
as the device that the user wishes to be contacted on) and delivers
food
events to subscribers after a transducer has recieved events from
Ninjamail
and a filter has generated events for each user.
"Using Programmable Routers to Protect Intranets"(ppt)
( pdf)
(html)
December 2000
We are facing a trend towards ubiquitous connectivity where users
demand
access at anytime, anywhere. This has lead to the deployment of public
network ports and wireless networks. Current solutions to network
access
control are inflexible and only provide all-or-nothing access. It is
also
increasing important to protect Intranet hosts from other mobile and
static
hosts on the same Intranet, in order to contain damages in the case
that
a host gets compromised.
We present an architecture that addresses these issues by using a programmable router to provide dynamic fine-grained network access control. The Java-enabled router dynamically generates and enforces access control rules using policies and user profiles as input, reducing administrative overhead. Our modular design integrates well with existing authentication and directory servers, further reducing admininstrative costs. Our prototype is implemented using Nortel's Accelar router and moves users to VLANs with the appropriate access privilege.
Automatic Path Creation
("Ninja Paths")
"System Support for Multi-Modal
Information Access and Device Control" [ps]
WMCSA '99, February 1999
[download]
Available in the Iceberg release.
[ppt]
A talk on the prototype.
Automatic Path Creation (APC) is a data-flow based framework
for dynamically composing services. The original prototype was used to
implement an interactive voice response (IVR) system for remote
audio/visual devices. Given a set of strongly typed input and
output devices, transcoding operators, and data streams, APC
automatically creates a path from the input device to the output
device. APC involves two steps: choosing the transcoding operators for
the path and instantiating the path. Path instaniation involves taking
a path that has been created and instantiating the operators and
connecting them together.