Abstracts

A Trade-Off for Worst-Case Efficient Dictionaries
Abstract: We consider dynamic dictionaries over the universe U={0,1}w on a unit-cost RAM with word size w and a standard instruction set, and present a linear space deterministic dictionary accommodating membership queries in time (log log n)O(1) and updates in time (log n)O(1), where n is the size of the set stored. Previous solutions either had query time (log n)Omega(1) or update time 2omega(sqrt(log n)) in the worst case.

Low Redundancy in Static Dictionaries with Constant Query Time
Abstract: A static dictionary is a data structure storing subsets of a finite universe U, answering membership queries. We show that on a unit cost RAM with word size Theta(log |U|), a static dictionary for n-element sets with constant worst case query time can be obtained using B+O(log log |U|)+o(n) bits of storage, where B=ceiling(log2 binom(|U|,n) is the minimum number of bits needed to represent all n-element subsets of U.

Deterministic Dictionaries
Abstract: It is shown that a static dictionary that offers constant-time access to n elements with w-bit keys and occupies O(n) words of memory can be constructed deterministically in O(n log n) time on a unit-cost RAM with word length w and a standard instruction set including multiplication. Whereas a randomized construction working in linear expected time was known, the running time of the best previous deterministic algorithm was Omega(n2). Using a standard dynamization technique, the first deterministic dynamic dictionary with constant lookup time and sublinear update time is derived. The new algorithms are weakly nonuniform; i.e., they require access to a fixed number of precomputed constants dependent on w. The main technical tools employed are unit-cost error-correcting codes, word parallelism, and derandomization using conditional expectations.

Fast Random Access to Wavelet Compressed Volumetric Data Using Hashing
Abstract: We present a new approach to lossy storage of the coefficients of wavelet transformed data. While it is common to store the coefficients of largest magnitude (and let all other coefficients be zero), we allow a slightly different set of coefficients to be stored. This brings into play a recently proposed hashing technique that allows space efficient storage and very efficient retrieval of coefficients. Our approach is applied to compression of volumetric data sets. For the ``Visible Man'' volume we obtain up to 80% improvement in compression ratio over previously suggested schemes. Further, the time for accessing a random voxel is considerably reduced.

Hash and Displace: Efficient Evaluation of Minimal Perfect Hash Functions
Abstract: A new way of constructing (minimal) perfect hash functions is described. The technique is a mixture of two well-known ones, and yields simple as well as efficient hashing schemes. More specifically, the overhead associated with resolving buckets in two-level hashing schemes is reduced considerably. Evaluation of a hash function requires just one multiplication and a few additions apart from primitive bit operations. The number of accesses to memory is two, one of which is to a fixed location. This improves the probe performance of previous minimal perfect hashing schemes, and is shown to be optimal. The hash function description ("program") for a set of size n occupies O(n) words, and can be constructed in expected O(n) time.

Dispersing Hash Functions
Abstract: A new hashing primitive is introduced: dispersing hash functions. A family of hash functions F is dispersing if, for any set S of a certain size and random h in F, the expected value of |S|-|h[S]| is ``close'' to the expectancy if h had been chosen at random from the set of all functions.
We give tight, up to a logarithmic factor, upper and lower bounds on the size of dispersing families. Such families previously studied, for example universal families, are significantly larger than the smallest dispersing families, making them less suitable for derandomization. We present several applications of dispersing families to derandomization (fast element distinctness, set inclusion, and static dictionary initialization). Also, a tight relationship between dispersing families and extractors, which may be of independent interest, is exhibited.
We also investigate the related issue of program size for hash functions which are nearly perfect. In particular, we exhibit a dramatic increase in program size for hash functions more dispersing than a random function.


On the Cell Probe Complexity of Membership and Perfect Hashing
Abstract: We study two fundamental static data structure problems, membership and perfect hashing, in Yao's cell probe model. The first space and bit probe optimal upper bound is given for the membership problem. We also give a new efficient membership scheme where the query algorithm makes just one adaptive choice, and probes a total of three words. A lower bound shows that two word probes generally do not suffice, regardless of the adaptivity. For minimal perfect hashing we show a tight bit probe lower bound, and give a simple scheme achieving this performance, making just one adaptive choice. Linear range perfect hashing is shown to be implementable with the same number of bit probes, of which just one is adaptive. In contrast, we establish that for sufficiently sparse sets, non-adaptive perfect hashing needs exponentially more bit probes. This is the first such separation of adaptivity and non-adaptivity.

Cuckoo Hashing
Abstract: We present a simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. (Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4):738--761, 1994). The space usage is similar to that of binary search trees, i.e., three words per key on average. Besides being conceptually much simpler than previous dynamic dictionaries with worst case constant lookup time, our data structure is interesting in that it does not use perfect hashing, but rather a variant of open addressing where keys can be moved back in their probe sequences. An implementation inspired by our algorithm, but using weaker hash functions, is found to be quite practical. It is competitive with the best known dictionaries having an average case (but no nontrivial worst case) guarantee.

Lossy Dictionaries
Abstract: Bloom filtering is an important technique for space efficient storage of a conservative approximation of a set S. The set stored may have up to some specified number of ``false positive'' members, but all elements of S are included. In this paper we consider ``lossy dictionaries'' that are allowed to also have false negatives. This relaxation allows a very fast and simple data structure making almost optimal use of memory. Being more time efficient than Bloom filters, we believe our data structure to be well suited for replacing Bloom filters in some applications. Also, the fact that our data structure supports information associated to keys paves the way for new applications, as illustrated by an application in lossy image compression.

Optimal Time-Space Trade-Offs for Non-Comparison-Based Sorting
Abstract: We study the fundamental problem of sorting n integers of w bits on a unit-cost RAM with word size w, and in particular consider the time-space trade-off (product of time and space in bits) for this problem. For comparison-based algorithms, the time-space complexity is known to be Theta(n2). A result of Beame shows that the lower bound also holds for non-comparison-based algorithms, but no algorithm has met this for time below the comparison-based Omega(n lg n) lower bound.
We show that if sorting within some time bound T' is possible, then time T = O(T'+ n lg* n) can be achieved with high probability using space S = O(n2/T+w), which is optimal. Given a deterministic priority queue using amortized time t(n) per operation and space nO(1), we provide a deterministic algorithm sorting in time T = O(n (t(n) + lg* n)) with S = O(n2/T+w). Both results require that w <= n1-Omega(1). Using existing priority queues and sorting algorithms, this implies that we can deterministically sort time-space optimally in time Theta(T) for T>= n (lg lg n)2, and with high probability for T>= n lg lg n.
Our results imply that recent lower bounds for deciding element distinctness in o(n lg n) time are nearly tight.


One Probe Search
Abstract: We consider dictionaries that perform lookups by probing a single word of memory, knowing only the size of the data structure. We describe a randomized dictionary where a lookup returns the correct answer with probability 1-epsilon, and otherwise returns "dont know". The lookup procedure uses an expander graph to select the memory location to probe. Recent explicit expander constructions are shown to yield space usage much smaller than what would be required using a deterministic lookup procedure. Our data structure supports efficient deterministic updates, exhibiting new probabilistic guarantees on dictionary running time.

Space Efficient Hash Tables With Worst Case Constant Access Time
Abstract: We generalize Cuckoo Hashing to d-ary Cuckoo Hashing and show how this yields a simple hash table data structure that stores n elements in (1+epsilon)n memory cells, for any constant epsilon > 0. Assuming uniform hashing, accessing or deleting table entries takes at most d=O(ln(1/epsilon)) probes and the expected amortized insertion time is constant. This is the first dictionary that has worst case constant access time and expected constant update time, works with (1+epsilon)n space, and supports satellite information. Experiments indicate that d=4 choices suffice for epsilon around 0.03. We also describe a hash table data structure using explicit constant time hash functions, using at most d=O(ln2(1/epsilon)) probes in the worst case. A corollary is an expected linear time algorithm for finding maximum cardinality matchings in a rather natural model of sparse random bipartite graphs.

Basic External Memory Data Structures
Abstract: This paper gives an overview of the most important basic data structures in external memory, introducing fundamental principles for designing data structures that make efficient use of the memory hierarchy. We show how to implement elementary abstract data structures (queues, stacks, linked lists), B-trees, buffer trees (including their use for priority queues), and hashing based dictionaries.

Uniform Hashing in Constant Time and Linear Space
Abstract: Many algorithms and data structures employing hashing have been analyzed under the uniform hashing assumption, i.e., the assumption that hash functions behave like truly random functions. Starting with the discovery of universal hash functions, many researchers have studied to what extent this theoretical ideal can be realized by hash functions that do not take up too much space and can be evaluated quickly. In this paper we present an almost ideal solution to this problem: A hash function that, on any set of n inputs, behaves like a truly random function with high probability, can be evaluated in constant time on a RAM, and can be stored in O(n) words, which is optimal. For many hashing schemes this is the first hash function that makes their uniform hashing analysis come true, with high probability, without incurring overhead in time or space.

On Adaptive Integer Sorting
Abstract: This paper considers integer sorting on a RAM. We show that adaptive sorting of a sequence with qn inversions is asymptotically equivalent to multisorting groups of at most q keys, and a total of n keys. Using the recent O(n sqrt(log log n)) expected time sorting of Han and Thorup on each set, we immediately get an adaptive expected sorting time of O(n sqrt(log log q)). Interestingly, for any positive constant epsilon, we show that multisorting and adaptive inversion sorting can be performed in linear time if q<= 2(log n)(1-epsilon). We also show how to asymptotically improve the running time of any traditional sorting algorithm on a class of inputs much broader than those with few inversions.

An Optimal Bloom Filter Replacement
Abstract: This paper considers space-efficient data structures for storing an approximation S' to a set S such that S is a superset of S' and any element not in S belongs to S' with probability at most epsilon. The Bloom filter data structure, solving this problem, has found widespread use. Our main result is a new RAM data structure that improves Bloom filters in several ways: The main technical ingredient is a succinct representation of dynamic multisets. We also consider three recent generalizations of Bloom filters.


Home page










































:-)