batur.dev

Data Consistency, Anomalies, and the Role of Isolation - Chapter 1

image 🌄 Image source: Unsplash

E nsuring data integrity in modern software systems is not only related to safeguards at the application layer, but also directly tied to the concurrency controls provided by the underlying database system. Especially in high-traffic, multi-user, and event-driven systems (such as IoT-based smart building management systems, which we’ll use as the basis for our examples), concurrent access to the same data by multiple transactions is inevitable. This brings concepts like data consistency, concurrency anomalies, and isolation levels into focus.

In this three-part article series, I plan to explore in detail the most common anomalies that can arise from inter-transaction interactions, the isolation levels provided by database systems to mitigate them, and additional measures that can be taken at the application level.

Throughout the article, I designed the examples to reflect the dynamic nature of real-time systems. While the examples are mostly based on IoT smart building systems scenarios, I’ve grounded the explanations in system design principles, making them easily applicable to similar issues across different domains.

So let’s begin by discussing data consistency and then move on to anomalies and isolation levels from a high-level perspective before diving into each anomaly type in detail.

🔍 Data Consistency and the ACID Principle

A transaction consists of a series of read and write operations on a database. The ACID properties assure the safe and consistent execution of these transactions:

This article primarily focuses on Isolation, which is crucial for preventing anomalies in concurrent scenarios.

🧠 What Are Anomalies?

If proper safeguards are not in place when multiple transactions access the same data, the following inconsistencies (anomalies) may arise:

  1. Dirty Read: A transaction reads data that another transaction has written but not yet committed.
  2. Non-Repeatable Read: A transaction reads the same data twice and gets different results.
  3. Phantom Read: Re-running the same query within a transaction yields new rows that didn’t exist in the initial result.
  4. Lost Update: Two transactions overwrite each other's changes. So one of the update is lost.
  5. Write Skew: Two transactions make conflicting but individually valid changes without reading each other's data.
  6. Read Skew: A transaction reads a partially inconsistent state due to concurrent updates from another transaction.

Each anomaly arises under specific conditions and requires different mitigation strategies—both at the database and the application layer (e.g., using ORM tools like Hibernate).

🔒 Isolation Levels and Their Relationship to Anomalies

The SQL standard defines four core isolation levels for database transactions:

Isolation Level Prevented Anomalies Performance Impact
Read Uncommitted Very high performance
Read Committed Dirty Read High
Repeatable Read Dirty Read, Non-Repeatable Read Medium / Low
Serializable All anomalies (in theory) Low (high overhead)

Some modern databases attempt to provide strong isolation with lower cost using techniques like Serializable Snapshot Isolation (SSI)—PostgreSQL is a notable example. However, even SSI isn’t a silver bullet—every approach comes with trade-offs.

🔧 What Can We Do at the Application Layer?

While database isolation levels offer a solid foundation in many cases, additional strategies may be required at the application level to fully prevent certain anomalies:

Okay then, we’ve introduced the general concepts, now we can dive into the details of specific anomaly types.

📌 Dirty Read

📖 Definition

Dirty Read occurs when a transaction reads data that has not yet been committed by another transaction that may the data later be rolled back. This leads to faulty decisions based on temporary states.

🧪 Example Scenario

Imagine a smart building system where sensors automatically trigger ventilation appliances based on temperature readings. At a certain time t, the following situation can illustrate a dirty read anomaly:

Result: The system has acted on data that never officially existed—leading to unnecessary energy usage or inconsistencies in other system components.

⚠️ Why It Occurs

Dirty reads are only possible under the Read Uncommitted isolation level, which allows access to uncommitted (dirty) data.

🔒 Resolution via Isolation Levels

Isolation Level Effect on Dirty Read
Read Uncommitted May occur
Read Committed ✅ Prevented
Repeatable Read ✅ Prevented
Serializable ✅ Prevented

Most modern databases (e.g., PostgreSQL, Oracle, SQL Server) default to Read Committed, eliminating dirty read risk.

🛠️ Additional Measures at Application Level

Since dirty reads are generally prevented through isolation levels, additional safeguards are rarely needed at the application layer. However, attention is needed in these cases:

⚖️ Pros and Cons

When raising the isolation level to prevent dirty reads, the following trade-off applies:

Advantage Disadvantage
Preserves data integrity Minimal performance cost (negligible in most cases)
Hides rolled-back changes from other transactions

Note: Dirty Read is the easiest anomaly to prevent and is rarely seen in modern systems. However, it may still be deliberately used in speed-focused, low-integrity systems (e.g., temporary caches).

📌 Non-Repeatable Read

📖 Definition

A Non-Repeatable Read occurs when a transaction reads the same value twice but gets different results. This happens if another transaction modifies and commits the value between the two reads.

🧪 Example Scenario

Continuing with the smart building example, but this time, sensors are monitoring the energy consumption of devices:

Result: The same data is queried twice within a single transaction, but two different results are returned. This can cause inconsistencies in decision-making or business workflows.

⚠️ Why It Occurs

This anomaly arises when other transactions are allowed to update and commit changes to the same data during an ongoing transaction.

🔒 Resolution via Isolation Levels

Isolation Level Effect on Non-Repeatable Read
Read Uncommitted ❌ Not prevented
Read Committed ❌ Not prevented
Repeatable Read ✅ Prevented
Serializable ✅ Prevented

Note: In ORM tools like Hibernate, @Transactional typically defaults to Read Committed, meaning Non-Repeatable Reads are still possible.

🛠️ Additional Measures at Application Level

  1. Raise the isolation level: Can be addressed in frameworks like Spring Boot via @Transactional(isolation = Isolation.REPEATABLE_READ).

  2. Versioning (Optimistic Locking): Adding a version column to the table allows detection of transaction conflicts. However, this method is more effective for write-write conflicts.

  3. Local Caching: Storing initially read values in memory during a transaction and checking them at the end can help in some cases, though it is not a general solution.

⚖️ Pros and Cons

When addressing the issue by setting the isolation level to higher:

Advantage Disadvantage
Ensures consistency in decision processes Increased locking, potential deadlocks
Safe repeated access to the same data Performance impact, especially under high load

🎯 End of the Part 1

In this first part, we covered the most common types of anomalies and the isolation levels used to prevent them. We then dove into the details of Dirty Read and Non-Repeatable Read anomalies.

In the next part, I plan to explore Write Skew, Read Skew, Phantom Read, and Lost Update in more detail. Thanks for reading.

Until next time, take care!

📚 Further Reading

Resources that I used in this article and that I think you can benefit from too:

  1. Martin Kleppmann Designing Data-Intensive Applications.
    https://www.oreilly.com/library/view/designing-data-intensive-applications/9781491903063/

  2. PostgreSQL DocumentationTransaction Isolation and Consistency.
    https://www.postgresql.org/docs/current/transaction-iso.html

  3. SQL Server DocumentationIsolation Levels in SQL Server.
    https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql

  4. Hibernate ORM DocumentationTransaction and Concurrency Control.
    https://docs.jboss.org/hibernate/orm/current/userguide/html_single/Hibernate_User_Guide.html#locking

#database #distributed systems #software development