Introduction to Data Access Patterns

INTRODUCTION

Summary

Data Access Patterns
Resource Patterns
Input-Output Patterns
Cache Patterns

Data Access Patterns

The following data access patterns are covered:

Pattern	Purpose
Data Accessor	Encapsulates physical data access in a separate component, exposing only logical operations. Application code is decoupled from data access operations.
Active Domain Object	Encapsulates data model and data access details within a relevant domain object. In other words, an Active Domain Object abstracts the semantics of the underlying data store (i.e., SQL Server) and the underlying data access technology (i.e., ADO.NET) and provides a simple programmatic interface for retrieving and operating on data.
Object-Relational Map	Decouples active domain objects from the underlying data model and data access details. An Object-Relational Map object is responsible for mapping relational data to object-oriented concepts, allowing it to be changed independently of the application and its domain objects.
Layers	Stack orthogonal application features that access data with increasing levels of abstraction.

Logically, a resource is an abstraction that simplifies the low-level complexity of working with input and output. Physically, a resource is an entity that represents storage or devices reserved for use by application. A file handle is a simple example of a resource. A file handle represents a channel that allows writing and reading from a file (the resource).

Resources and Context

In addition to encapsulating input/output details, a resource also serves a semantic purpose by storing contextual information and enabling controlled concurrent access to the underlying storage or device. For example, a file handle stores contextual information about how the file was opened, and whether it the file was opened for reading, writing or appending. In general, resources that store contextual information save programmers from typing lots of redundant code and managing state information from one operation to the other.

Resources and Concurrency

Resources often represent data or objects that are available to multiple applications distributed across a network. For example, a database table is available to a variety of applications including reporting, management tools and client-side applications. When multiple users/tools access the same table in incompatible ways, unpredictable errors occur. Resources play an important role in robust concurrency solutions. Resources offer some level of synchronization that restricts concurrent access.

Resources and Data Access

The following general-purpose low-level resources provide unstructured data access and are useful in a range of applications:

Buffer: This is an unstructured block of memory allocated from the heap for a specific purpose. The system's memory manager prevents other applications from allocating the same memory until it is freed.
File Handle: A file handle controls access to a physical file. File systems implement concurrency to prevent other users from making inconsistent updates to the same file.
Socket: A socket is a network connection handle to another application or server.

Resources and Management

Resources usually consume significant amount of storage as long as they remain open. For example, a database connection requires client memory to store contextual information and it allocates server memory to maintain server-side context. In addition, a database connection keeps a socket open on both sides to enable fast communication between both sockets. Resources also implement synchronization to restrict concurrent access to one or more objects.

Since open resources consume memory and reduce concurrency, it is important to understand how applications use and manage resources. This list describes a few ideas that apply to most scenarios:

Always release resources
Your code is responsible for closing/releasing any resource it opens. A resource leak happens when the application forgets to close/release a resource. For example, forgetting to release memory associated with a buffer will cause a resource (memory) leak.
Minimize the time interval that a resource is left open
Since it is common for resources to lock objects for the purpose of synchronization, it is best to minimize the time interval between opening and closing the resource. For example, instead of opening a connection to the database and then performing complex formatting or calculations on the data, it is better to perform all complex operations and then open the database connection. In this case the connection is open just long enough to access the database.
Pool resources to preserve initialization expense
Some resources like database connections and sockets are quite expensive to initialize. These resources consume client and memory storage but may not necessarily restrict concurrency. In these cases consider pooling resources where a set of resources (i.e., database connections) are left open at all times so that the application can use them quickly without incurring initialization overhead on each operation.

This and the following resource patterns define a common design strategy for managing resources at the application or middleware level.

Patterns

The following resource patterns are covered:

Pattern	Purpose
Resource Pool	A Resource Pool recycles resources to minimize resource initialization overhead. A resource pool manages resources efficiently while allowing application code to freely allocate them.
Resource Decorator	A resource decorator dynamically attaches behavior to an existing resource with minimal disruption to application code. A resource decorator extends a resource's functionality without sub-classing or changing functionality.
Resource Timer	Automatically releases inactive resources. This pattern solves the problem of resources being allocated indefinitely.
Resource Descriptor	Isolates platform- and data-source-dependent behavior within a single component. A resource descriptor exposes platform and datasource specifics as generic logical operations. This allows the majority of data access code to remain independent of its physical environment.
Resource Retriever	Automatically retries operations whose failure is expected under certain defined conditions. This pattern enables fault-tolerance for data access operations.

Input-Output Patterns

Domain objects directly model application or business concepts rather than relational database entities, and enable you to decouple the physical data model and data access details from the application logic. When you design domain objects, you must also design their domain object mapping. A domain object mapping describes the translation between domain objects and corresponding relational data. For example, in the Active Domain Object pattern, each object is responsible for defining and encapsulating its own mapping.

Input-output patterns are used to define and implement database input and output operations. Database input and output are a primary function of domain object mapping:

When applications create new instances of domain objects and read their attributes, the mapping implementation issues analogous database read operations.
When applications alter instances of domain objects, the mapping implementation issues analogous database update operations.

Input / Output Operations

Here are some example of input and output operations expressed in term of domain objects:

Populate a domain object
Populating a domain object involves creating a new instance and finding relevant database data to initialize the object's attributes. In terms of data access, you need to issue SQL to read data and then copy the results into the attributes of the active domain object
Persist/Delete a domain object
Persisting a domain object involves inserting/updating/deleting a row of data corresponding to the attributes of the domain object. In terms of data access, you need to issue SQL to update/insert/delete that copies/deletes the attributes of the active domain object into/from the database.

Identity Objects

In addition to translating data between objects and tables, another important factor for domain object mapping is the issue of identity objects. When an application invokes input/output operations using domain objects, it must identify target data.

To illustrate the issue of identity objects, assume that table [Product] contains all products available for sale. For example, consider these two scenarios for looking up information:

Scenario	SQL
Look up a product by category	select * from Product where category = 'Baby Food'
Look up a product's price	select price from Product where ProductCode = 1234

These expressions should never appear in application code that otherwise uses domain objects. One reason is that it explicitly mentions the names of data entity models such as Product and ProductCode, and second it includes specific SQL syntax that may change if you decide to move your database to another platform.

Identity objects solve this problem by using domain concepts to identify the target relational data. An identity object identifies a domain object, just as a set of primary key values uniquely identifies a row of table data. In fact, it is common for identity objects to correspond to a table's primary key.

In the [Product] table, the primary key is likely to be the unique ProductCode column. The analogous identity object is simple a string representation of the ProductCode value. The application code can then find any product using its ProductCode like this:

Product product = ProductInventory.Find( 1234 );

Identity objects do not always correspond directly to a table's primary key, especially in cases where applications may search on columns other than those included in the primary key. An alternative identity object could define multiple attributes that correspond to search criteria. In this respect, a single identity object does not uniquely define a single domain object, but rather a set of domain objects that matches its criteria. For example, if you want to find all products whose category is vegetable and price is < £1, you would designate this information using a ProductCriteria object:

ProductCriteria criteria = new ProductCriteria();
criteria.Category = Categories.Vegetable;
criteria.Price = " < 1.00";
Product[] products = ProductInventory.Find( criteria );

While the above examples using Identity objects to query and read target data, identity objects are also important for output operations since they identify specific database rows to update/insert/delete. For example, the following code contains code to read and update an identity object:

Product product = ProductInventory.Find( 1234 );
product.Price = product.Price * 1.1;
ProductInventory.Update( product )

Note that the application's code does not explicitly indicate an identity object when it calls the Update operation. The Update operation's implementation implicitly extracts the product's ID to issue the appropriate SQL UPDATE statement.

Patterns

The following resource patterns are covered:

Pattern	Purpose
Selection Factory	Generates query selections based on identity object attributes
Domain Object Factory	Populates domain objects based on query result data.
Update Factory	Generates update selections based on modified domain object attributes.
Domain Object Assembler	Populates, persists, and deletes domain objects using uniform factory framework.

These patterns are often combined to build a robust domain object mapping framework that can decouple generic mapping logic from the customized conversion details for specific types. This separation allows you to introduce additional domain objects as your application requires them. It also allows you to apply common optimizations and enhancements that apply immediately to operations on all domain objects.

Cache Patterns

Cache patterns define strategies for integrating caching into your applications and middleware components. These patterns concentrate on improving data access performance and resource utilizations by eliminating redundant data access operations. Data access operations are a common source of bottlenecks as they consume a significant portion of a system's memory. While recycling database resources and using indices goes a long way to achieve this, one of the most effective strategies is to eliminate redundant data access operations altogether. Caching enables applications to avoid issues multiple database read operations for the same data item. Caches usually reside in memory and enable fast access to their components. Applications do not need to issue subsequent database operations to access the cached data.

Cache Operations and Transparency

A cache starts empty, and at some point during application startup or initialization, applications or middleware components read data from the database to store in the cache. Strategies for populating caches vary, ranging from simple copying of entire ADO.NET DataSets to using strategic and selective decisions to populate only the most accessed data. Application code is the ultimate consumer of cached data. It is common for applications to access cached data using primary keys or identity objects, but some applications may require other semantics such as statement handles or a query language.

The semantics that cache operations define help achieve cache transparency. Cache transparency refers to the visibility of a cache to applications and middleware code. Consider the cases of non-transparent and transparent caches:

An example of a cache that is not transparent is one where applications must explicitly deal with it. Non-transparent caches often place a big burden on application code, complicating their data access and caching code (but they do allow an application to judge which data should be stored in a cache).
On the other hand, a transparent cache is one that is encapsulated within a single data access component. Applications do not directly interact with transparent caches. Cached data is accessible to applications, but the application code accesses it in the same way as it does when it reads directly from the database. Transparent caches encourage applications to remain focused on domain logic while keeping cache-access code well encapsulated within data-access code.

Cached Data

With caches you also need to consider the form of the cached data. You can store data in its physical database format using software representations of tables, rows, columns, and relationships . In ADO.NET this would correspond to using DataSets and DataTables. Another option would be to convert cached data into the domain form that your application expects, in other words, the cache stores domain objects rather than raw data.

Patterns

The first group of cache patterns describe strategies for integrating cache storage and retrieval operations in applications and middleware components with various degrees of transparency. These patterns address how to utilize caches rather than how to implement caches directly:

Pattern	Purpose
Cache Accessor	Decouples caching logic from the data model and the data access details
Demand Cache	Populates a cache on-demand as applications request data. This is useful for data that is read frequently but unpredictably.
Primed Cache	Populates a cache with a predicted set of data. This is useful for data that is read frequently and predictably.
Cache Search Sequence	Inserts short cuts into a cache to optimize the number of operations that future searches require.

The second group of cache patterns describe strategies for efficient caching implementations

Pattern	Purpose
Cache Collector	The equivalent of a .NET Garbage Collector - purges unneeded entries.
Cache Replicator	Replicates operations across multiple caches.
Cache Statistics	Record/Publish cache and pool statistics.

These caching patterns are independent of each others and can be mixed and matched to build a comprehensive caching solution.

INTRODUCTION

Summary

Data Access Patterns

Resource Patterns

Resources and Context

Resources and Concurrency

Resources and Data Access

Resources and Management

Patterns

Input-Output Patterns

Input / Output Operations

Identity Objects

Patterns

Cache Patterns

Cache Operations and Transparency

Cached Data

Patterns