Principles Of Distributed Database Systems Exercise Solutions Jun 2026

Mastering the Principles of Distributed Database Systems: A Comprehensive Guide to Exercise Solutions Distributed Database Systems (DDBS) represent a core pillar of modern data management. From Google Spanner to Amazon DynamoDB, the principles of fragmentation, replication, distributed query processing, and concurrency control are essential knowledge for any data professional. However, the theoretical rigor of courses like Principles of Distributed Database Systems (often based on the classic textbook by Özsu and Valduriez) means that exercises can be challenging. This article provides a structured approach to solving common exercises in this domain. We will break down solutions by topic, explain the underlying reasoning, and offer strategies to tackle problems ranging from fragmentation to distributed deadlock detection. 1. Data Fragmentation: Horizontal, Vertical, and Hybrid One of the first exercises students encounter involves designing correct and complete fragmentation schemas. The Problem Type You are given a global relation (e.g., EMPLOYEE(EmpID, Name, DeptID, Salary, ManagerID) ) and a set of applications/queries. Your task is to propose horizontal, vertical, or hybrid fragments. Key Principles for Solutions

Horizontal Fragmentation (HF) : Uses selection predicates (e.g., DeptID = 10 ). Ensure completeness (every tuple goes to some fragment) and disjointness (a tuple belongs to at most one HF fragment, unless replication is used). Vertical Fragmentation (VF) : Uses projection over attribute subsets. Must include the primary key in every fragment for reconstruction. Ensure lossless-join property.

Sample Exercise & Solution Exercise : Given PROJECT(Pno, Pname, Budget, Location) . Applications:

Query on Budget and Location for projects with Budget > 100000 . Query on Pno, Pname only for reporting. Mastering the Principles of Distributed Database Systems: A

Solution :

Vertical Fragmentation : F1 = {Pno, Pname} and F2 = {Pno, Budget, Location} . The key Pno is present in both. Horizontal Fragmentation on F2 : F2a = σ_{Budget > 100000} F2 , F2b = σ_{Budget ≤ 100000} F2 .

Solution Strategy : Always start by identifying the primary key. For vertical, check that every attribute appears at least once. For horizontal, ensure predicates are complete and mutually exclusive. 2. Distributed Query Processing: Semi-Join Reduction A classic exercise is to optimize a distributed join between two relations stored at different sites using semi-joins. The Problem Type Relation R at Site 1, relation S at Site 2. You need to answer R ⋈ S while minimizing communication cost. Key Principle A semi-join ( R ⋉ S ) projects the join attributes of S and ships only those to the site of R , reducing the size of R before shipping it for the full join. Sample Exercise & Solution Exercise : This article provides a structured approach to solving

R(A,B) with 10,000 tuples, size 100 bytes each. S(A,C) with 5,000 tuples, size 80 bytes each. Join attribute: A . Only 10% of R.A values are present in S.A. Cost = cost of shipping tuples.

Compute cost without semi-join : Ship entire R (1 MB) or S (0.4 MB). Better to ship S to R’s site: 0.4 MB. Compute semi-join solution :

Site 2 projects S[A] (size per tuple 10 bytes → 5,000 * 10 = 50KB). Ship to Site 1. Site 1 semi-joins: R' = R ⋉ S . Only 10% of R tuples match → 1,000 tuples * 100 bytes = 100KB. Ship R' to Site 2. Site 2 does final join: R' ⋈ S (cost local). Total cost = 50KB + 100KB = 150KB, much less than 400KB. Data Fragmentation: Horizontal, Vertical, and Hybrid One of

Solution Strategy : Always compare total cost of semi-join + reduced tuple transfer vs. naive transfer. Semi-join wins when join selectivity is low. 3. Distributed Concurrency Control: Locking and Timestamp Ordering Exercises often present a schedule of operations across sites and ask: Is this schedule serializable under 2PL (Two-Phase Locking) or T/O (Timestamp Ordering)? The Problem Type Given read and write operations from transactions T1, T2, T3 on data items X, Y, Z stored at different sites. Determine if the schedule is conflict-serializable and if the protocol would allow it. Key Principles for Solutions

Centralized 2PL : All locks managed by one lock manager. Check for growing/shrinking phases. Distributed 2PL : Each site manages locks for its own data. Must still obey 2PL rules globally. Timestamp Ordering : Ensure that for every conflicting operation (R/W, W/R, W/W), the transaction with the smaller timestamp executes first. Roll back if not.