Domain Driven Design

2025-01-29

CQRS Pattern

读写分离；
写模型只需要关注数据合法性、领域逻辑、数据一致性；
读模型只需要关注查询性能；
读写模型的架构解耦、独立伸缩；

Command Query Responsibility Segregation (CQRS) is a design pattern that segregates read and write operations for a data store into separate data models. This allows each model to be optimized independently and can improve performance, scalability, and security of an application.

Context and problem

In traditional architectures, a single data model is often used for both read and write operations. This approach is straightforward and works well for basic CRUD operations (see figure 1).

Diagram that shows a traditional CRUD architecture.
Figure 1. A traditional CRUD architecture.

However, as applications grow, optimizing read and write operations on a single data model becomes increasingly challenging. Read and write operations often have different performance and scaling needs. A traditional CRUD architecture doesn’t account for this asymmetry. It leads to several challenges:

Data mismatch: The read and write representations of data often differ. Some fields required during updates might be unnecessary during reads.

Lock contention: Parallel operations on the same data set can cause lock contention.

Performance issues: The traditional approach can have a negative effect on performance due to load on the data store and data access layer, and the complexity of queries required to retrieve information.

Security concerns: Managing security becomes difficult when entities are subject to read and write operations. This overlap can expose data in unintended contexts.

Combining these responsibilities can result in an overly complicated model that tries to do too much.

Use the CQRS pattern to separate write operations (commands) from read operations (queries). Commands are responsible for updating data. Queries are responsible for retrieving data.

Understand commands. Commands should represent specific business tasks rather than low-level data updates. For example, in a hotel-booking app,use “Book hotel room” instead of “Set ReservationStatus to Reserved.” This approach better reflects the intent behind user actions and aligns commands with business processes. To ensure commands are successful, you might need to refine the user interaction flow, server-side logic, and consider asynchronous processing.

Understand queries. Queries never alter data. Instead, they return Data Transfer Objects (DTOs) that present the required data in a convenient format, without any domain logic. This clear separation of concerns simplifies the design and implementation of the system.

Understand read and write model separation

Separating the read model from the write model simplifies system design and implementation by addressing distinct concerns for data writes and reads. This separation improves clarity, scalability, and performance but introduces some trade-offs. For example, scaffolding tools like O/RM frameworks can’t automatically generate CQRS code from a database schema, requiring custom logic for bridging the gap.

Benefits of CQRS

Independent scaling. CQRS enables the read and write models to scale independently, which can help minimize lock contention and improve system performance under load.
Optimized data schemas. Read operations can use a schema optimized for queries. Write operations use a schema optimized for updates.
Security. By separating reads and writes, you can ensure that only the appropriate domain entities or operations have permission to perform write actions on the data.
Separation of concerns. Splitting the read and write responsibilities results in cleaner, more maintainable models. The write side typically handles complex business logic, while the read side can remain simple and focused on query efficiency.
Simpler queries. When you store a materialized view in the read database, the application can avoid complex joins when querying.

Implementation issues and considerations

Some challenges of implementing this pattern include:

Increased complexity. While the core concept of CQRS is straightforward, it can introduce significant complexity into the application design, particularly when combined with the Event Sourcing pattern.
Messaging challenges. Although messaging isn’t a requirement for CQRS, you often use it to process commands and publish update events. When messaging is involved, the system must account for potential issues such as message failures, duplicates, and retries. See the guidance on Priority Queues for strategies to handle commands with varying priorities.
Eventual consistency. When the read and write databases are separated, the read data might not reflect the most recent changes immediately, leading to stale data. Ensuring the read model store stays up-to-date with changes in the write model store can be challenging. Additionally, detecting and handling scenarios where a user acts on stale data requires careful consideration.

Materialized View pattern

类似数据视图快照，查询所需的全数据的子集；
查询所需数据格式与存储数据的格式不同，典型例子：缓存

Context and problem

When storing data, the priority for developers and data administrators is often focused on how the data is stored, as opposed to how it’s read. The chosen storage format is usually closely related to the format of the data, requirements for managing data size and data integrity, and the kind of store in use. For example, when using NoSQL document store, the data is often represented as a series of aggregates, each containing all of the information for that entity.

However, this can have a negative effect on queries. When a query only needs a subset of the data from some entities, such as a summary of orders for several customers without all of the order details, it must extract all of the data for the relevant entities in order to obtain the required information.

Generate prepopulated views over the data in one or more data stores when the data isn’t ideally formatted for required query operations. This can help support efficient querying and data extraction, and improve application performance.

Issues and considerations

Consider the following points when deciding how to implement this pattern:

How and when the view will be updated. Ideally it’ll regenerate in response to an event indicating a change to the source data, although this can lead to excessive overhead if the source data changes rapidly. Alternatively, consider using a scheduled task, an external trigger, or a manual action to regenerate the view.
In some systems, such as when using the Event Sourcing pattern to maintain a store of only the events that modified the data, materialized views are necessary. Prepopulating views by examining all events to determine the current state might be the only way to obtain information from the event store. If you’re not using Event Sourcing, you need to consider whether a materialized view is helpful or not. Materialized views tend to be specifically tailored to one, or a small number of queries. If many queries are used, materialized views can result in unacceptable storage capacity requirements and storage cost.
Consider the impact on data consistency when generating the view, and when updating the view if this occurs on a schedule. If the source data is changing at the point when the view is generated, the copy of the data in the view won’t be fully consistent with the original data.
Consider where you’ll store the view. The view doesn’t have to be located in the same store or partition as the original data. It can be a subset from a few different partitions combined.
A view can be rebuilt if lost. Because of that, if the view is transient and is only used to improve query performance by reflecting the current state of the data, or to improve scalability, it can be stored in a cache or in a less reliable location.
When defining a materialized view, maximize its value by adding data items or columns to it based on computation or transformation of existing data items, on values passed in the query, or on combinations of these values when appropriate.
Where the storage mechanism supports it, consider indexing the materialized view to further increase performance. Most relational databases support indexing for views, as do big data solutions based on Apache Hadoop.

Competing Consumers pattern

Enable multiple concurrent consumers to process messages received on the same messaging channel. With multiple concurrent consumers, a system can process multiple messages concurrently to optimize throughput, to improve scalability and availability, and to balance the workload.

Context and problem

An application running in the cloud is expected to handle a large number of requests. Rather than process each request synchronously, a common technique is for the application to pass them through a messaging system to another service (a consumer service) that handles them asynchronously. This strategy helps to ensure that the business logic in the application isn’t blocked, while the requests are being processed.

The number of requests can vary significantly over time for many reasons. A sudden increase in user activity or aggregated requests coming from multiple tenants can cause an unpredictable workload. At peak hours, a system might need to process many hundreds of requests per second, while at other times the number could be very small. Additionally, the nature of the work performed to handle these requests might be highly variable. By using a single instance of the consumer service, you can cause that instance to become flooded with requests. Or, the messaging system might be overloaded by an influx of messages that come from the application. To handle this fluctuating workload, the system can run multiple instances of the consumer service. However, these consumers must be coordinated to ensure that each message is only delivered to a single consumer. The workload also needs to be load balanced across consumers to prevent an instance from becoming a bottleneck.

Issues and considerations

Consider the following points when deciding how to implement this pattern:

Message ordering. The order in which consumer service instances receive messages isn’t guaranteed, and doesn’t necessarily reflect the order in which the messages were created. Design the system to ensure that message processing is idempotent because this will help to eliminate any dependency on the order in which messages are handled. For more information, see Idempotency Patterns on Jonathon Oliver’s blog.
Designing services for resiliency. If the system is designed to detect and restart failed service instances, it might be necessary to implement the processing performed by the service instances as idempotent operations to minimize the effects of a single message being retrieved and processed more than once.
Detecting poison messages. A malformed message, or a task that requires access to resources that aren’t available, can cause a service instance to fail. The system should prevent such messages being returned to the queue, and instead capture and store the details of these messages elsewhere so that they can be analyzed if necessary.
Handling results. The service instance handling a message is fully decoupled from the application logic that generates the message, and they might not be able to communicate directly. If the service instance generates results that must be passed back to the application logic, this information must be stored in a location that’s accessible to both. In order to prevent the application logic from retrieving incomplete data the system must indicate when processing is complete.
Scaling the messaging system. In a large-scale solution, a single message queue could be overwhelmed by the number of messages and become a bottleneck in the system. In this situation, consider partitioning the messaging system to send messages from specific producers to a particular queue, or use load balancing to distribute messages across multiple message queues.
Ensuring reliability of the messaging system. A reliable messaging system is needed to guarantee that after the application enqueues a message it won’t be lost. This system is essential for ensuring that all messages are delivered at least once.

Event-driven architecture style

An event-driven architecture consists of event producers that generate a stream of events, event consumers that listen for these events, and event channels that transfer events from producers to consumers.

Events are delivered in near real time, so consumers can respond immediately to events as they occur. Producers are decoupled from consumers: A producer doesn’t know which consumers are listening. Consumers are also decoupled from each other, and every consumer sees all of the events. This process differs from a Competing Consumers pattern where consumers pull messages from a queue and a message is processed only one time, assuming that there are no errors. In some systems, such as Azure Internet of Things (IoT), events must be ingested at high volumes.

Event Sourcing pattern

Instead of storing just the current state of the data in a relational database, store the full series of actions taken on an object in an append-only store. The store acts as the system of record and can be used to materialize the domain objects. This approach can improve performance, scalability, and auditability in complex systems.

Important
Event sourcing is a complex pattern that permeates through the entire architecture and introduces trade-offs to achieve increased performance, scalability, and auditability. Once your system becomes an event sourcing system, all future design decisions are constrained by the fact that this is an event sourcing system. There is a high cost to migrate to or from an event sourcing system. This pattern is best suited for systems where performance and scalability are top requirements. The complexity that event sourcing adds to a system is not justified for most systems.

Context and problem

Most applications work with data, and the typical approach is for the application to store the latest state of the data in a relational database, inserting or updating data as required. For example, in the traditional create, read, update, and delete (CRUD) model, a typical data process is to read data from the store, make some modifications to it, and update the current state of the data with the new values—often by using transactions that lock the data.

The CRUD approach is straightforward and fast for most scenarios. However, in high-load systems, this approach has some challenges:

Performance: As the system scales, the performance will degrade due to contention for resources and locking issues.
Scalability: CRUD systems are synchronous and data operations block on updates. This can lead to bottlenecks and higher latency when the system is under load.
Auditability: CRUD systems only store the latest state of the data. Unless there’s an auditing mechanism that records the details of each operation in a separate log, history is lost.

Solution

The Event Sourcing pattern defines an approach to handling operations on data that’s driven by a sequence of events, each of which is recorded in an append-only store. Application code raises events that imperatively describe the action taken on the object. The events are generally sent to a queue where a separate process, an event handler, listens to the queue and persists the events in an event store. Each event represents a logical change to the object, such as AddedItemToOrder or OrderCanceled.

The events are persisted in an event store that acts as the system of record (the authoritative data source) about the current state of the data. Additional event handlers can listen for events they are interested in and take an appropriate action. Consumers could, for example, initiate tasks that apply the operations in the events to other systems, or perform any other associated action that’s required to complete the operation. Notice that the application code that generates the events is decoupled from the systems that subscribe to the events.

At any point, it’s possible for applications to read the history of events. You can then use the events to materialize the current state of an entity by playing back and consuming all the events that are related to that entity. This process can occur on demand to materialize a domain object when handling a request.

Because it is relatively expensive to read and replay events, applications typically implement materialized views, read-only projections of the event store that are optimized for querying. For example, a system can maintain a materialized view of all customer orders that’s used to populate the UI. As the application adds new orders, adds or removes items on the order, or adds shipping information, events are raised and a handler updates the materialized view.

Pattern advantages

The Event Sourcing pattern provides the following advantages:

Events are immutable and can be stored using an append-only operation. The user interface, workflow, or process that initiated an event can continue, and tasks that handle the events can run in the background. This process, combined with the fact that there’s no contention during the processing of transactions, can vastly improve performance and scalability for applications, especially for the presentation layer.
Events are simple objects that describe some action that occurred, together with any associated data that’s required to describe the action represented by the event. Events don’t directly update a data store. They’re simply recorded for handling at the appropriate time. Using events can simplify implementation and management.
Events typically have meaning for a domain expert, whereas object-relational impedance mismatch can make complex database tables hard to understand. Tables are artificial constructs that represent the current state of the system, not the events that occurred.
Event sourcing can help prevent concurrent updates from causing conflicts because it avoids the requirement to directly update objects in the data store. However, the domain model must still be designed to protect itself from requests that might result in an inconsistent state.
The append-only storage of events provides an audit trail that can be used to monitor actions taken against a data store. It can regenerate the current state as materialized views or projections by replaying the events at any time, and it can assist in testing and debugging the system. In addition, the requirement to use compensating events to cancel changes can provide a history of changes that were reversed. This capability wouldn’t be the case if the model stored the current state. The list of events can also be used to analyze application performance and to detect user behavior trends. Or, it can be used to obtain other useful business information.
The command handlers raise events, and tasks perform operations in response to those events. This decoupling of the tasks from the events provides flexibility and extensibility. Tasks know about the type of event and the event data, but not about the operation that triggered the event. In addition, multiple tasks can handle each event. This enables easy integration with other services and systems that only listen for new events raised by the event store. However, the event sourcing events tend to be very low level, and it might be necessary to generate specific integration events instead.
Event sourcing is commonly combined with the CQRS pattern by performing the data management tasks in response to the events, and by materializing views from the stored events.

Issues and considerations

Consider the following points when deciding how to implement this pattern:

Eventual consistency - The system will only be eventually consistent when creating materialized views or generating projections of data by replaying events. There’s some delay between an application adding events to the event store as the result of handling a request, the events being published, and the consumers of the events handling them. During this period, new events that describe further changes to entities might have arrived at the event store. Your customers must be okay with the fact that data is eventually consistent and the system should be designed to account for eventual consistency in these scenarios.
Versioning events - The event store is the permanent source of information, and so the event data should never be updated. The only way to update an entity or undo a change is to add a compensating event to the event store. If the schema (rather than the data) of the persisted events needs to change, perhaps during a migration, it can be difficult to combine existing events in the store with the new version. Your application will need to support changes to events structures. This can be done in several ways.
Ensure your event handlers support all versions of events. This can be a challenge to maintain and test. This requires implementing a version stamp on each version of the event schema to maintain both the old and the new event formats.
Implement an event handler to handle specific event versions. This can be a maintenance challenge in that bug fix changes might have to be made across multiple handlers. This requires implementing a version stamp on each version of the event schema to maintain both the old and the new event formats.
Update historical events to the new schema when a new schema is implemented. This breaks the immutability of events.
Event ordering - Multi-threaded applications and multiple instances of applications might be storing events in the event store. The consistency of events in the event store is vital, as is the order of events that affect a specific entity (the order that changes occur to an entity affects its current state). Adding a timestamp to every event can help to avoid issues. Another common practice is to annotate each event resulting from a request with an incremental identifier. If two actions attempt to add events for the same entity at the same time, the event store can reject an event that matches an existing entity identifier and event identifier.
Querying events - There’s no standard approach, or existing mechanisms such as SQL queries, for reading the events to obtain information. The only data that can be extracted is a stream of events using an event identifier as the criteria. The event ID typically maps to individual entities. The current state of an entity can be determined only by replaying all of the events that relate to it against the original state of that entity.
Cost of recreating state for entities - The length of each event stream affects managing and updating the system. If the streams are large, consider creating snapshots at specific intervals such as a specified number of events. The current state of the entity can be obtained from the snapshot and by replaying any events that occurred after that point in time. For more information about creating snapshots of data, see Primary-Subordinate Snapshot Replication.
Conflicts - Even though event sourcing minimizes the chance of conflicting updates to the data, the application must still be able to deal with inconsistencies that result from eventual consistency and the lack of transactions. For example, an event that indicates a reduction in stock inventory might arrive in the data store while an order for that item is being placed. This situation results in a requirement to reconcile the two operations, either by advising the customer or by creating a back order.
Need for idempotency - Event publication might be at least once, and so consumers of the events must be idempotent. They must not reapply the update described in an event if the event is handled more than once. Multiple instances of a consumer can maintain and aggregate an entity’s property, such as the total number of orders placed. Only one must succeed in incrementing the aggregate, when an order-placed event occurs. While this result isn’t a key characteristic of event sourcing, it’s the usual implementation decision.
Circular logic - Be mindful of scenarios where the processing of one event involves the creation of one or more new events since this can cause an infinite loop.

DDD相关术语概念

战略建模

战略和战术设计是站在DDD的角度进行划分。战略设计侧重于高层次、宏观上去划分和集成限界上下文，而战术设计则关注更具体使用建模工具来细化上下文。

领域

现实世界中，领域包含了问题域和解系统。一般认为软件是对现实世界的部分模拟。在DDD中，解系统可以映射为一个个限界上下文，限界上下文就是软件对于问题域的一个特定的、有限的解决方案。

界限上下文

一个由显示边界限定的特定职责。领域模型便存在于这个边界之内。在边界内，每一个模型概念，包括它的属性和操作，都具有特殊的含义。

上下文映射图

战术建模

实体

当一个对象由其标识（而不是属性）区分时，这种对象称为实体（Entity）

值对象

当一个对象用于对事务进行描述而没有唯一标识时，它被称作值对象（Value Object）。
不变性、相等性和可替换性；
例如状态枚举、业务类型等不可变对象的封装、验证；
值对象不应当对其内部状态修改，如果需要修改，也应当返回新的值对象；

聚合根

Aggregate(聚合）是一组相关对象的集合，作为一个整体被外界访问，聚合根（Aggregate Root）是这个聚合的根节点。

领域服务

一些重要的领域行为或操作，可以归类为领域服务。它既不是实体，也不是值对象的范畴。

领域事件

领域事件是对领域内发生的活动进行的建模。

资源库

领域对象需要资源存储，存储的手段可以是多样化的，常见的无非是数据库，分布式缓存，本地缓存等。资源库（Repository）的作用，就是对领域的存储和访问进行统一管理的对象。

防腐层

亦称适配层。在一个上下文中，有时需要对外部上下文进行访问，通常会引入防腐层的概念来对外部上下文的访问进行一次转义。

有以下几种情况会考虑引入防腐层：

需要将外部上下文中的模型翻译成本上下文理解的模型。
不同上下文之间的团队协作关系，如果是供奉者关系，建议引入防腐层，避免外部上下文变化对本上下文的侵蚀。
该访问本上下文使用广泛，为了避免改动影响范围过大。

代码实践

coding…

Louris

Domain Driven Design

CQRS Pattern

Context and problem

Understand read and write model separation

Benefits of CQRS

Implementation issues and considerations

Materialized View pattern

Context and problem

Issues and considerations

Competing Consumers pattern

Context and problem

Issues and considerations

Event-driven architecture style

Event Sourcing pattern

Important

Context and problem

Solution

Pattern advantages

Issues and considerations

DDD相关术语概念

战略建模

领域

界限上下文

上下文映射图

战术建模

实体

值对象

聚合根

领域服务

领域事件

资源库

防腐层

代码实践

参考资料