FU-VIASM Joint Mini-course: Database Joins: Worst-Case and Beyond

Time: 9:00 – 11:00, Jan 05 – Jan 09, 2015

+ 9:00 – 9:50: lecture

+ 9:50 – 10:10: break

+ 10:10 -11:00: lecture

Location: VIASM Lecture Hall C2.

Lecturer: Hung Q. Ngo (State University of New York at Buffalo)

This is the first joint academic activity between FPT University and VIASM.

Abstract:

The join query evaluation problem is a fundamental problem in database management. In theory, the problem has been studied extensively in the past 40+ years. One can cast a wide variety of problems from seemingly different areas such as SAT solving, permanent evaluation, list decoding, exact inference in probabilistic graphical models as join evaluation and cardinality computation. In practice, a variety of algorithms and heuristics have been widely implemented in relational database management systems (RDBMS), dating back to IBM System R. Surprisingly, known algorithms for joins were sub-optimal until very recently. This mini-course covers some of the core ideas behind a new class of join algorithms that not only are optimal in theory, but also work well in practice.

Tentative Outline:

1. The natural join evaluation problem. Connections to PGMs, CSP, logic, permanent, list-decoding, matrix multiplication, sub-graph enumeration, etc.

2. Problem formulation under different settings: truth-table encoding, sparse encoding, functional encoding, and examples.

3. Basic bounds and a connection to geometric inequalities. Loomis-Whitney, Bollobas-Thomassen, Grohe-Marx and the linear programming bound, NPRR algorithm.

4. Hypergraph widths and acyclicity notions. Tree decomposition, treewidth, hypertree width, fractional hypertree width, adaptive width and submodular width. Berge-acyclicity, alpha/beta/gamma-acyclicity.

5. Variable elimination family of algorithms. VE for graphical models. Davis-Putnam procedure in proof complexity. Characterizing widths using elimination ordering.

6. The message passing family of algorithms. Yannakakis algorithm for join and logic. Message passing for PGMs. The fractional hypertree width bound.

7. Beyond worst-case with Minesweeper. Minesweeper join algorithm. From join evaluation to geometric coverage

8. Beyond worst-case with Tetris. Tetris join algorithm. Connection to Davis-Putnam-Logemann-Loveland (DPLL) with clause-learning for SAT solvers

9. Open problems. Certificates for cardinality estimation, capturing functional dependencies, etc.

Watch the lectures online:

Registration: Closed. Deadline for registration: 31/12/2014.