Data Consistency, Anomalies, and the Role of Isolation - Chapter 1
🌄 Image source: Unsplash
E nsuring data integrity in modern software systems is not only related to safeguards at the application layer, but also directly tied to the concurrency controls provided by the underlying database system. Especially in high-traffic, multi-user, and event-driven systems (such as IoT-based smart building management systems, which we’ll use as the basis for our examples), concurrent access to the same data by multiple transactions is inevitable. This brings concepts like data consistency, concurrency anomalies, and isolation levels into focus.
In this three-part article series, I plan to explore in detail the most common anomalies that can arise from inter-transaction interactions, the isolation levels provided by database systems to mitigate them, and additional measures that can be taken at the application level.
Throughout the article, I designed the examples to reflect the dynamic nature of real-time systems. While the examples are mostly based on IoT smart building systems scenarios, I’ve grounded the explanations in system design principles, making them easily applicable to similar issues across different domains.
So let’s begin by discussing data consistency and then move on to anomalies and isolation levels from a high-level perspective before diving into each anomaly type in detail.
🔍 Data Consistency and the ACID Principle
A transaction consists of a series of read and write operations on a database. The ACID properties assure the safe and consistent execution of these transactions:
- Atomicity: The transaction is either completed entirely (commit) or rolled back with no effect.
- Consistency: Transactions transition the database from one valid state to another.
- Isolation: Concurrent transactions should not interfere with each other.
- Durability: Committed transactions are permanently stored.
This article primarily focuses on Isolation, which is crucial for preventing anomalies in concurrent scenarios.
🧠 What Are Anomalies?
If proper safeguards are not in place when multiple transactions access the same data, the following inconsistencies (anomalies) may arise:
- Dirty Read: A transaction reads data that another transaction has written but not yet committed.
- Non-Repeatable Read: A transaction reads the same data twice and gets different results.
- Phantom Read: Re-running the same query within a transaction yields new rows that didn’t exist in the initial result.
- Lost Update: Two transactions overwrite each other's changes. So one of the update is lost.
- Write Skew: Two transactions make conflicting but individually valid changes without reading each other's data.
- Read Skew: A transaction reads a partially inconsistent state due to concurrent updates from another transaction.
Each anomaly arises under specific conditions and requires different mitigation strategies—both at the database and the application layer (e.g., using ORM tools like Hibernate).
🔒 Isolation Levels and Their Relationship to Anomalies
The SQL standard defines four core isolation levels for database transactions:
Isolation Level | Prevented Anomalies | Performance Impact |
---|---|---|
Read Uncommitted | — | Very high performance |
Read Committed | Dirty Read | High |
Repeatable Read | Dirty Read, Non-Repeatable Read | Medium / Low |
Serializable | All anomalies (in theory) | Low (high overhead) |
Some modern databases attempt to provide strong isolation with lower cost using techniques like Serializable Snapshot Isolation (SSI)—PostgreSQL is a notable example. However, even SSI isn’t a silver bullet—every approach comes with trade-offs.
🔧 What Can We Do at the Application Layer?
While database isolation levels offer a solid foundation in many cases, additional strategies may be required at the application level to fully prevent certain anomalies:
- Optimistic Locking (e.g., with Hibernate
@Version
): Detects conflicts(especially updates for same row) at commit time and throws exceptions. - Pessimistic Locking: Locks resources in advance to block concurrent access (e.g., SELECT FOR UPDATE).
- Predicate Locking (logical constraints in code): Controls not only existing data but also specific conditions to prevent conflicts.
- Unique Constraint + Business Rules: Prevent specific situations at the database or business logic level.
Okay then, we’ve introduced the general concepts, now we can dive into the details of specific anomaly types.
📌 Dirty Read
📖 Definition
Dirty Read occurs when a transaction reads data that has not yet been committed by another transaction that may the data later be rolled back. This leads to faulty decisions based on temporary states.
🧪 Example Scenario
Imagine a smart building system where sensors automatically trigger ventilation appliances based on temperature readings. At a certain time t
, the following situation can illustrate a dirty read anomaly:
- Transaction A writes a new temperature value to the database but hasn’t committed yet.
- Transaction B reads this value and decides to activate the ventilation.
- Transaction A later rolls back due to an error.
Result: The system has acted on data that never officially existed—leading to unnecessary energy usage or inconsistencies in other system components.
⚠️ Why It Occurs
Dirty reads are only possible under the Read Uncommitted isolation level, which allows access to uncommitted (dirty) data.
🔒 Resolution via Isolation Levels
Isolation Level | Effect on Dirty Read |
---|---|
Read Uncommitted | May occur |
Read Committed | ✅ Prevented |
Repeatable Read | ✅ Prevented |
Serializable | ✅ Prevented |
Most modern databases (e.g., PostgreSQL, Oracle, SQL Server) default to Read Committed, eliminating dirty read risk.
🛠️ Additional Measures at Application Level
Since dirty reads are generally prevented through isolation levels, additional safeguards are rarely needed at the application layer. However, attention is needed in these cases:
- ORM-Based Transaction Management: For example, in Hibernate, the isolation level of methods annotated with
@Transactional
is important. - Legacy Systems: In older systems or manually managed transactions,
READ UNCOMMITTED
might be used inadvertently.
⚖️ Pros and Cons
When raising the isolation level to prevent dirty reads, the following trade-off applies:
Advantage | Disadvantage |
---|---|
Preserves data integrity | Minimal performance cost (negligible in most cases) |
Hides rolled-back changes from other transactions | — |
Note: Dirty Read is the easiest anomaly to prevent and is rarely seen in modern systems. However, it may still be deliberately used in speed-focused, low-integrity systems (e.g., temporary caches).
📌 Non-Repeatable Read
📖 Definition
A Non-Repeatable Read occurs when a transaction reads the same value twice but gets different results. This happens if another transaction modifies and commits the value between the two reads.
🧪 Example Scenario
Continuing with the smart building example, but this time, sensors are monitoring the energy consumption of devices:
- Transaction A reads the current energy usage of a device (e.g., 1200 watts).
- Transaction B updates this value to 900 watts and commits while A is still running.
- Transaction A reads the value again and sees 900 watts.
Result: The same data is queried twice within a single transaction, but two different results are returned. This can cause inconsistencies in decision-making or business workflows.
⚠️ Why It Occurs
This anomaly arises when other transactions are allowed to update and commit changes to the same data during an ongoing transaction.
🔒 Resolution via Isolation Levels
Isolation Level | Effect on Non-Repeatable Read |
---|---|
Read Uncommitted | ❌ Not prevented |
Read Committed | ❌ Not prevented |
Repeatable Read | ✅ Prevented |
Serializable | ✅ Prevented |
Note: In ORM tools like Hibernate,
@Transactional
typically defaults to Read Committed, meaning Non-Repeatable Reads are still possible.
🛠️ Additional Measures at Application Level
Raise the isolation level: Can be addressed in frameworks like Spring Boot via
@Transactional(isolation = Isolation.REPEATABLE_READ)
.Versioning (Optimistic Locking): Adding a version column to the table allows detection of transaction conflicts. However, this method is more effective for write-write conflicts.
Local Caching: Storing initially read values in memory during a transaction and checking them at the end can help in some cases, though it is not a general solution.
⚖️ Pros and Cons
When addressing the issue by setting the isolation level to higher:
Advantage | Disadvantage |
---|---|
Ensures consistency in decision processes | Increased locking, potential deadlocks |
Safe repeated access to the same data | Performance impact, especially under high load |
🎯 End of the Part 1
In this first part, we covered the most common types of anomalies and the isolation levels used to prevent them. We then dove into the details of Dirty Read and Non-Repeatable Read anomalies.
In the next part, I plan to explore Write Skew, Read Skew, Phantom Read, and Lost Update in more detail. Thanks for reading.
Until next time, take care!
📚 Further Reading
Resources that I used in this article and that I think you can benefit from too:
Martin Kleppmann Designing Data-Intensive Applications.
https://www.oreilly.com/library/view/designing-data-intensive-applications/9781491903063/PostgreSQL Documentation — Transaction Isolation and Consistency.
https://www.postgresql.org/docs/current/transaction-iso.htmlSQL Server Documentation — Isolation Levels in SQL Server.
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sqlHibernate ORM Documentation — Transaction and Concurrency Control.
https://docs.jboss.org/hibernate/orm/current/userguide/html_single/Hibernate_User_Guide.html#locking