Introduction
In this article, we’ll learn about the methods to find the relationships between people.
We were finding the relation between people in previous article, where we were given some basic information about them. We took the whole set of people, and divided them into disjoint subsets based on their relationships.
Does Haaris relate to Ahmad, if Haaris is the cousin of Saad and Saad is the cousin of Ahmad? Ofcourse, yes. If you are asked to tell instantly that if their relation exist or not. Then, you can do this using following methods.
Solution through 2D array of Booleans
Storing the basic relation information about people in 2D array of Booleans will give the answer in constant time. Look at the Figure-1. People’s name are written in both columns and rows. Crossing these rows and columns make cells of this matrix. T (True) represents that relation exists.
The diagonal of the matrix will be “T”, as every person is related to himself.
Diagonal: All the cells at locations (1, 1), (2, 2), (3, 3)…(n, n). Where n=1, 2, 3…
- Haaris is the cousin of Saad = T at cell (1, 2).
- Saad is the cousin of Ahmad = T at cell (2, 3).
Based on above given information:
- Haaris relates to Ahmad = T will come at cell (1, 3).
Problem: If you were given 1000 people, then matrix’s size will be 1000 x 1000. But what if you were given 1 million names, then size will be million by million. And we know that we’ll run out of memory space. This method is fast though, but not good for memory allocation. Prioritize space over speed, based on this we’ll use another approach.
Solution through Union/Find Method
Suppose the equivalence relation is defined over the five-element set {a1, a2, a3, a4, a5}. We have a pair a1-a1, a1-a2, a1-a3, a1-a4, a1-a5, a2-a2, a2-a3 etc. There are 25 pairs of elements, each of which is related or not (30 pairs – 5 self-pairs = 25). These are the total pairs and we may not have as much relations. The size of array with 5 elements will be 5×5=25. Suppose, we’re given the following relations:
- a1 R a2
- a3 R a4
- a5 R a1
- a4 R a2
So, we made the nodes and connection between them is showing relationship.
The input is initially a collection of n sets, each with one element. Means, if there are 1000 people then there will be 1000 sets, each set with one person. This initial representation is that all relations (except reflexive relations) are false. Now mathematically speaking, each set has a different element so that Si ∩ Sj = Ø which makes the sets disjoint.
Operations in the Sets
There are two permissible operations in these sets i.e. find and union.
- Find returns the name of the set (equivalence class) that contains a given element, i.e., Si = find(a).
- Union merges two sets to create a new set Sk = Si U Sj.
If we want to add the relation a R b, there is need to see whether a and b are already related. If we want to add the relation a R b, we need to see whether a and b are already related. We check whether they are in the same set by performing find on both a and b. If they are not in the same set, we will apply union to merge the two sets into a new set. This process is called the Union/Find algorithm.
We can assume that elements are numbered sequentially from 1 to n. Initially we have Si = {i} for i = 1 through n. The person will be recognized through number, it makes things a lot easier. Secondly, the name of the set returned by find is fairly arbitrary. All that really matters is that find(x) = find(y) if and only if x and y are in the same set.
Any sequence of at most m finds and up to n-1 unions will require time proportional to (m + n).
A set contains unique elements, and a tree represents a set. Each element in a tree shares the same root, so the root names the set.
Due to presence of many sets, there will be a collection of trees which is called forest. The trees used for the sets are not necessarily binary.
- Union: To execute the union operation in two sets, we merge the two trees of these sets in such a manner that the root of one tree points to the root of other.
- Find: In case of find(x), we find this x in a tree in the forest. When this x is found in a tree the find returns the number at root node (the name of the set) of that tree
Let’s come to the examples.
Example No. 1
We’ve numbers from 1 to 8 in Figure-3, these numbers can be the names of people. Each number is acting as a set.
Now, let’s perform the union.
Before union(5, 6), these were two sets. But after union, it has become one set with the root/set name “5”.
See Figure-8, it’s not like that we combine “4” and “5” directly. First, we’ve to look that in which sets “4” and “5” lie. And then, we combine those sets.
Typical tree traversal (like inorder, preorder or postorder) is not required. So there is no need for pointers to point to the children. Instead, we need a pointer to parent, as it’s an up-tree. Parent pointers can be stored in an array. In the array, we will set the parent of root to –1.
Parent[i] = -1
Example No. 2
Initialization
We’ll fill the array with “-1” using for loop, which represents a number as root.
for ( i = 0; i < n ; i ++)
Parent [i] = -1 ;
The numbers at the bottom of array in Figure-9 represent the index of array.
Union Operation
The union finds the roots of “i” and “j”. If they are different, it will merge them.
root_i = find(i); // find the set having “i”
root_j = find(j); // find the set having “j”
if (root_i != root_j) // if they are different
parent[root_j] = root_i; //merge those sets
In the above case, root_i becomes the root of final set.
Find Operation
In case of find(i), this operation looks for the parent of element “i” or the set that contains that element “i”.
// traverse to the root (-1)
for(j=i; parent[j] >= 0; j=parent[j])
;
return j;
We used a ‘for loop. This ‘for loop’ starts from the position that given to the find function. Here it is 8. The condition in the for loop was that as long as parent [j] is greater than zero, set this parent [j] to j. Now we execute the loop with the value of j equal to 8. First of all, the loop goes to position 8 and looks for the value of parent of 8. This value is 7, which sets the value of j to 7 and goes to position 7.
At that position, the value of parent of 7 is 5. It goes to position 5. At position 5, the value of parent is 3. So the loop sets the value of j equal to 3 and goes to the position 3. Here it finds that the parent of 3 is –1 i.e. less than zero so the loop ends. This position 3 is the root and name of the set as parent of it is –1. Thus the name of the set that contains 8 is 3. The find will return 3 which mean that 8 is a member of set 3.
Running Time Analysis
- Union is clearly a constant time operation.
- Running time of find(i) is proportional to the height of the tree containing node i.
- This can be proportional to n in the worst case (but not always).
To make the disjoint set data structure efficient, please refer the Union-by-size/weight and union-by-height methods article.
REFERENCE: CS301 Handouts (page 388 to 403)