Database project and thesis proposals
This page contains some proposals for projects and theses in the area
of databases. It is currently mainly targeted at students having
taken the course "Advanced Database Technology", but may also be of
interest to other students with an interest in technical aspects of
databases and other large data sets. For most proposals it is possible
to emphasize either theoretical or practical aspects. The proposals
are only sketched - contact Rasmus Pagh (pagh@it-c.dk) for more
information. Variants of the projects or other proposals are
welcome. We recommend project/thesis groups of at least two
members, as the amount of supervision allocated by ITU to singleton groups
is rather small.
Proposals
- Filtering. Bloom filters have several applications in databases, e.g., for "Bloom join" and in refinements of the A-Priori data mining algorithm. This project considers methods for constructing filters, including a new method proposed by Ostlin, Pagh, and Rao, which has the potential of outperforming existing methods.
- Join size estimation. One of the most important aspects of
query optimization is estimations of join sizes. This project looks at
statistical methods for estimating join sizes. Ref 1.
- Keyword search. Most search engines allow searching for
documents containing a certain set of words. This project would study
the methods used to implement this in more detail. (There may be a
possibility for doing this as a thesis with a private company involved
- however this is not clear at the moment.) Ref 1, ref 2.
- Using multiple disks efficiently. This project considers
recent randomized algorithms for using multiple disks very efficiently
- combining the throughput and space efficiency of striping with the
flexibility of mirroring. Ref 1, ref 2.
- Adaptive sorting with applications for sort join. Often there will be some structure in the order of relations which may be exploited when sorting according to one or more attributes. This project considers such adaptive external sorting algorithms that perform better on inputs that are "partly sorted", and their application to sort join. Reference (by Ostlin, Pagh, and Thorup) not yet published.