Relational databases such as MySQL or Oracle have associated database query languages such as SQL. These languages consist of statements that read, write or modify records within the database. One of the way to maintain a replication log on the master node is to simply record every statement e.g. INSERT, UPDATE or DELETE that is received from clients. Note, we don’t need to record statements that perform reads since they don’t mutate the database. The logged SQL statements can then be sent to the followers who execute these statements on the replica of data they hold to get in sync with the data on the master.
Relational databases such as MySQL or Oracle have associated database query languages such as SQL. These languages consist of statements that read, write or modify records within the database. One of the way to maintain a replication log on the master node is to simply record every statement e.g. INSERT, UPDATE or DELETE that is received from clients. Note, we don’t need to record statements that perform reads since they don’t mutate the database. The logged SQL statements can then be sent to the followers who execute these statements on the replica of data they hold to get in sync with the data on the master.
Problems
This replication may sound simple and effective but it comes with its own set of problems. Some of these are:
The SQL statements can consist of non-deterministic functions such as NOW() which returns the current time or RAND() which returns a random number. These functions are likely to evaluate to different values on different nodes. This issue can be overcome by having the master replace the call to the non-deterministic function with a fixed value and then passing the statement to its followers.
If a SQL statement involves an auto incrementing column or depends on data already present in the database then it must be executed in the same order as it was executed on the master. This can restrict execution of multiple transactions at the same time (concurrently) on the system.
Database constructs such as triggers, stored procedures, or user-defined functions can have different outcomes when executed on each of the follower nodes. Care must be taken to ensure that statements with side effects are deterministic.