In this tutorial you are going to learn about Data Replication in Distributed System.
Introduction
Data replication is the process in which the data is copied at multiple locations or from different computers or servers to improve the availability of the data.
Replica Management: The replica has great importance in database systems. Different sites have their own copy of data objects. The chances of failure of the site are very common and the need of the replica improves availability of the system but it also provides a problem of data inconsistency. In case of replica generation, all sites must be kept consistent and it is called Mutual consistency.
Main features of Data Replication:
- Increase reliability.
- Less access time.
- No directory management.
- Easy load balancing.
Group Communication: It occurs when one source process sending a message to a group of processes at once. Types of group communication includes:
- Broadcast – Destination in every body.
- Multicast – Destination is a designation group.
- Unicast – Destination is a single process.
ExampleIn fault tolerance based on repeated server, locating objects in distributed servers and updating replicated data.
Objectives of Data Replications:
- Data replication is used to increase the availability of data.
- It is used to speed up the query evaluation.
- Data integration: One-point storage and access.
- Real time streaming analytics: Fraud detection and real time stock trades.
- Data change subscription: Event driven architecture.
- Primary and secondary replication: High availability, read and write separation, disaster recovery.
Types of Data Replication:
- Synchronous replication: The replica will be modified immediately after some changes are made in the relation table. So, there is no difference between original data and replica.
- Asynchronous replication: The replica will be modified after commit is fired on the database.
Pros of Data Replication | Cons of Data Replication |
1. It is highly customizable. 2. Reusable for application. 3. High availability. 4. Easy to extend. 5. Deployment flexibility. 6. No extra write latency. |
1. Application invasiveness. 2. Inconsistent. 3. Extra development. 4. Extra domain Knowledge. 5. Extra latency. 6. At most once semantic. |
Replication Schemes:
Full Replication: The database is available to almost every location or user in the communication network.
Advantages of Full Replication | Disadvantages of Full Replication |
1. High availability of data as the database is available to almost every location. 2. Faster execution of the queries. |
1. Concurrency control is difficult to achieve in full replication. 2. Update operation is slower. |
No Replication:
In No replication, each fragment is stored exactly at one location.
Advantages of No Replication | Disadvantages of No replication |
1. Concurrency can be minimized. 2. Easy recovery of data. |
1. Poor availability of data. 2. Slows down the queries execution process as multiple clients accessing the same servers. |
Partial Replication:
Partial replication means that only some fragments are replicated from the databases. The number of replications created for fragments depend upon the importance of data in that fragment.
This article on Data Replication in Distributed System is contributed by Hemalatha P. If you like TheCode11 and would like to contribute, you can also write your article and mail to thecode11info@gmail.com