Jump to content

Database caching: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
edit for clarity; remove excessive subheadings and bolding
YiFeiBot (talk | contribs)
m Bot: Migrating 1 langlinks, now provided by Wikidata on d:q4251180
 
(30 intermediate revisions by 22 users not shown)
Line 1: Line 1:
{{more footnotes|date=January 2012}}
{{more footnotes|date=January 2012}}


'''Database caching''' is a process included in the design of computer applications which generate web pages on-demand (dynamically) by accessing backend databases.
'''Database caching''' is a process included in the design of computer applications which generate web pages on-demand (dynamically) by accessing backend databases.


When these applications today are deployed on multi-tier environments that involve browser-based clients, web application servers and backend databases.<ref>{{cite paper | id = {{citeseerx|10.1.1.95.875}} | title = MTCache: Transparent mid-tier database caching | year = 2004 | first1 = Per-åke | last1 = Larson | first2 = Jonathan | last2 = Goldstein }}</ref><ref name=dbcache>{{cite paper | id = {{citeseerx|10.1.1.104.8991}} | title = DBCache: Database Caching For Web Application Servers | year = 2002 | first1 = Mehmet | last1 = Altinel | first2 = Qiong | last2 = Luo | first3 = Sailesh | last3 = Krishnamurthy | first4 = C. | last4 = Mohan | first5 = Hamid | last5 = Pirahesh | first6 = Bruce G. | last6 = Lindsay | first7 = Honguk | last7 = Woo | first8 = Larry | last8 = Brown | url = http://www.cse.ust.hk/catalac/papers/dbcache_demo_sigmod02.pdf }}</ref> middle-tier database caching is used to achieve high scalability and performance.<ref name=dbcache/>
When these applications are deployed on multi-tier environments that involve browser-based clients, web application servers and backend databases,<ref>{{cite journal | citeseerx = 10.1.1.95.875 | title = MTCache: Transparent mid-tier database caching | year = 2004 | first1 = Per-åke | last1 = Larson | first2 = Jonathan | last2 = Goldstein }}</ref><ref name=dbcache>{{cite journal | citeseerx = 10.1.1.104.8991 | title = DBCache: Database Caching For Web Application Servers | year = 2002 | first1 = Mehmet | last1 = Altinel | first2 = Qiong | last2 = Luo | first3 = Sailesh | last3 = Krishnamurthy | first4 = C. | last4 = Mohan | first5 = Hamid | last5 = Pirahesh | first6 = Bruce G. | last6 = Lindsay | first7 = Honguk | last7 = Woo | first8 = Larry | last8 = Brown | url = http://www.cse.ust.hk/catalac/papers/dbcache_demo_sigmod02.pdf }}</ref> middle-tier database caching is used to achieve high scalability and performance.<ref name=dbcache/>


In a [[three tier architecture]], the [[application tier]] and [[data tier]] can be in different hosts. Throughput of an application can be limited by the [[Computer network|network]] speed. This limitation can be minimized by having the [[database]] at the [[application tier]]. Because commercial database software makes extensive use of system resources, it is not always practical to have the application and the [[database]] at the same host. In this case, a more light-weight database application can be used to cache data from the commercial [[database]].
In a [[three tier architecture]], the [[application software]] tier and [[Computer data storage|data storage]] tier can be in different hosts. Throughput of an application can be limited by the [[Computer network|network]] speed. This limitation can be minimized by having the [[database]] at the application tier. Because commercial database software makes extensive use of system resources, it is not always practical to have the application and the [[database]] at the same host. In this case, a more light-weight database application can be used to cache data from the commercial [[database management system]].


== Benefits ==
== Benefits ==
Database caching improves scalability by distributing query workload from backend to multiple cheap front-end systems. It allows flexibility in the processing of data; for example, the data of Platinum customers can be cached while that of ordinary customers are not. Caching can improve availability of data, by providing continued service for applications that depend only on cached tables even if the backend server is unavailable. Another benefit is improved data access speeds brought about by locality of data and smoothing out load peaks by avoiding round-trips between middle-tier and data-tier<ref>{{cite paper |id = {{citeseerx|10.1.1.140.8455}}|title= Middle-tier Database Caching for e-Business}}</ref>
Database caching improves scalability by distributing query workload from backend to multiple cheap front-end systems. It allows flexibility in the processing of data; for example, the data of Platinum customers can be cached while that of ordinary customers are not. Caching can improve availability of data, by providing continued service for applications that depend only on cached tables even if the backend server is unavailable. Another benefit is improved data access speeds brought about by locality of data and smoothing out load peaks by avoiding round-trips between middle-tier and data-tier.<ref>{{cite journal |citeseerx = 10.1.1.140.8455|title= Middle-tier Database Caching for e-Business}}</ref>


== Potential design elements ==
== Potential design elements ==
* Updateable cache tables: Most of the existing cache solutions are read-only which limits their usage to small segment of the applications, non-real time applications.
* Updateable cache tables: Many cache systems are read-only which limits their usage to small segment of the applications, non-real time applications.
*Bi-Directional updates: For updateable caches, updates, which happen in cache, should be propagated to the target database and any updates that happen directly on the target database should come to cache automatically.
*Bi-Directional updates: For updateable caches, updates, which happen in cache, should be propagated to the target database and any updates that happen directly on the target database should come to cache automatically.
*Synchronous and asynchronous update propagation: The updates on cache table shall be propagated to target database in two modes. Synchronous mode makes sure that after the database operation completes the updates are applied at the target database as well. In case of Asynchronous mode the updates are delayed to the target database. Synchronous mode gives high cache consistency and is suited for real time applications. Asynchronous mode gives high throughput and is suited for near real time applications.
*Synchronous and asynchronous update propagation: The updates on cache table shall be propagated to target database in two modes. Synchronous mode makes sure that after the database operation completes the updates are applied at the target database as well. In case of Asynchronous mode the updates are delayed to the target database. Synchronous mode gives high cache consistency and is suited for real time applications. Asynchronous mode gives high throughput and is suited for near real time applications.
*Multiple cache granularity - Database level, Table level and Result-set caching: Major portions of corporate databases are historical and infrequently accessed. But, there is some information that should be instantly accessible like premium customer’s data, etc.
*Multiple cache granularity - Database level, Table level and Result-set caching: Major portions of corporate databases are historical and infrequently accessed. But, there is some information that should be instantly accessible like premium customer's data, etc.
*Recovery for cached tables: In case of system or power failure, during the restart of caching platform all the committed transactions on the cached tables should be recovered.
*Recovery for cached tables: In case of system or power failure, during the restart of caching platform all the committed transactions on the cached tables should be recovered.
*Tools to validate the coherence of cache: In case of asynchronous mode of update propagation, cache at different cache nodes and target database may diverge. This needs to be resolved manually and the caching solution should provide tools to identify the mismatches and take corrective measures if required.
*Tools to validate the coherence of cache: In case of asynchronous mode of update propagation, cache at different cache nodes and target database may diverge. This needs to be resolved manually, with mismatches identified and corrective measures taken if required.
*Horizontally Scalable: Clustering is employed in many solutions to increase the availability and to achieve load balancing. Caching platform should work in a clustered environment spanning to multiple nodes thereby keeping the cached data coherent across nodes.
*Horizontally scalable: [[Cluster computing]] may increase availability and achieve load balancing. Caching in a clustered environment spans multiple nodes, keeping the cached data coherent across nodes.
*Transparent access to non-cached tables reside in target database: Database Cache should keep track of queries and should be able to intelligently route to the database cache or to the origin database based on the data locality without any application code modification.
*Transparent access to non-cached tables reside in target database: Database cache should keep track of queries and should be able to intelligently route to the database cache or to the origin database based on the data locality without any [[application code]] modification.
*Transparent Fail over: There should not be any service outages in case of caching platform failure. Client connections should be routed to the target database.
*Transparent Fail over: There should not be any service outages in case of caching platform failure. Client connections should be routed to the target database.
*No or very few changes to application for the caching solution: Support for standard interfaces JDBC, ODBC etc. that will make the application to work seamlessly without any application code changes. It should route all stored procedure calls to target database so that they don’t need to be migrated.
*No or very few changes to application: Support for standard interfaces JDBC, ODBC etc. that will make the application to work seamlessly without any application code changes. It should route all stored procedure calls to target database so that they don't need to be migrated.
*Implement a specialized internal cache: Performance-oriented databases such as [[ScyllaDB]] completely bypass the [[Linux]] cache during reads and use a row-based integrated internal cache instead.<ref>{{cite web|url=https://thenewstack.io/why-databases-should-bypass-the-linux-page-cache/|title=Why Databases Should Bypass the Linux Page Cache|date=13 March 2024 |access-date=2 April 2024}}</ref>


== Pitfalls in implementations ==
== Products ==
* Cache walking on deletes or invalidation events: Cache designs that leverage external cache engines such as [[Redis]] or [[Hazelcast]] will often trigger invalidation by issuing deletions against the invalidated objects. This could result in a single write operation triggering thousands of deletes, impacting performance.
* [[CSQL Cache]] - To cache tables from MySQL, Postgres and Oracle.
* Lack of key tracking: Again, if using an external cache engine, any request will often trigger a key lookup at the cache layer. If this is a miss, it can trigger an extra RTT, adding to the overall latency of requests. Engines such as [[Redis]] and [[Hazelcast]] provide for key change notification support however, allowing local cache layers to be updated when keys are changed in a remote cache layer. By tracking these keys locally, remote lookups on a cache miss can be avoided, preventing a cache hit penalty.
* [[memcached]]- To cache result set of queries
* Invalidation as an instant event, not a time range: When a table is to be changed as part of a transaction, the SQL mode can impact if a query on another connection should see the changes or not. As such, while a transaction hasn't yet been committed or rolled back, any change against a table during the transaction should trigger the table to be considered volatile until the transaction is completed. Often, cache engines will only invalidate a result before or after the query is executed.
* [[AppFabric Caching]]- To cache result set of queries
* Distributed caches w/ lack of communication: If a cache design is using an underlying storage layer, when used as a distributed cache, invalidations are done locally, based on what tables are written to at a given time. Unfortunately, other nodes may have written cache objects for the same table, and these objects won't be invalidated. When used for local session data with upstream client persistence, this may be acceptable, but for shared data that needs to maintain consistency across sessions, this can cause data consistency problems.
* [[Windows Azure Caching]]- To cache result set of queries in Windows Azure
* [[TimesTen]] - To cache ORACLE tables
* [[SafePeak]] - Automated caching of result sets of queries and procedures from SQL Server, with automated cache eviction for full data correctness


==References==
==References==
Line 34: Line 33:


==External links==
==External links==
*[http://csqlcache.wordpress.com/category/cache/ Requirements of good data caching solution]
*[http://www.codeproject.com/Articles/26616/Accelerate-database-performance-using-In-Memory-CS Requirements of Caching Solution]
*[http://pages.cs.wisc.edu/~naughton/includes/papers/middleTier.pdf Middle-Tier Database Caching for e-Business]
*[http://pages.cs.wisc.edu/~naughton/includes/papers/middleTier.pdf Middle-Tier Database Caching for e-Business]

{{Database}}
{{Database}}



Latest revision as of 18:12, 5 November 2024

Database caching is a process included in the design of computer applications which generate web pages on-demand (dynamically) by accessing backend databases.

When these applications are deployed on multi-tier environments that involve browser-based clients, web application servers and backend databases,[1][2] middle-tier database caching is used to achieve high scalability and performance.[2]

In a three tier architecture, the application software tier and data storage tier can be in different hosts. Throughput of an application can be limited by the network speed. This limitation can be minimized by having the database at the application tier. Because commercial database software makes extensive use of system resources, it is not always practical to have the application and the database at the same host. In this case, a more light-weight database application can be used to cache data from the commercial database management system.

Benefits

[edit]

Database caching improves scalability by distributing query workload from backend to multiple cheap front-end systems. It allows flexibility in the processing of data; for example, the data of Platinum customers can be cached while that of ordinary customers are not. Caching can improve availability of data, by providing continued service for applications that depend only on cached tables even if the backend server is unavailable. Another benefit is improved data access speeds brought about by locality of data and smoothing out load peaks by avoiding round-trips between middle-tier and data-tier.[3]

Potential design elements

[edit]
  • Updateable cache tables: Many cache systems are read-only which limits their usage to small segment of the applications, non-real time applications.
  • Bi-Directional updates: For updateable caches, updates, which happen in cache, should be propagated to the target database and any updates that happen directly on the target database should come to cache automatically.
  • Synchronous and asynchronous update propagation: The updates on cache table shall be propagated to target database in two modes. Synchronous mode makes sure that after the database operation completes the updates are applied at the target database as well. In case of Asynchronous mode the updates are delayed to the target database. Synchronous mode gives high cache consistency and is suited for real time applications. Asynchronous mode gives high throughput and is suited for near real time applications.
  • Multiple cache granularity - Database level, Table level and Result-set caching: Major portions of corporate databases are historical and infrequently accessed. But, there is some information that should be instantly accessible like premium customer's data, etc.
  • Recovery for cached tables: In case of system or power failure, during the restart of caching platform all the committed transactions on the cached tables should be recovered.
  • Tools to validate the coherence of cache: In case of asynchronous mode of update propagation, cache at different cache nodes and target database may diverge. This needs to be resolved manually, with mismatches identified and corrective measures taken if required.
  • Horizontally scalable: Cluster computing may increase availability and achieve load balancing. Caching in a clustered environment spans multiple nodes, keeping the cached data coherent across nodes.
  • Transparent access to non-cached tables reside in target database: Database cache should keep track of queries and should be able to intelligently route to the database cache or to the origin database based on the data locality without any application code modification.
  • Transparent Fail over: There should not be any service outages in case of caching platform failure. Client connections should be routed to the target database.
  • No or very few changes to application: Support for standard interfaces JDBC, ODBC etc. that will make the application to work seamlessly without any application code changes. It should route all stored procedure calls to target database so that they don't need to be migrated.
  • Implement a specialized internal cache: Performance-oriented databases such as ScyllaDB completely bypass the Linux cache during reads and use a row-based integrated internal cache instead.[4]

Pitfalls in implementations

[edit]
  • Cache walking on deletes or invalidation events: Cache designs that leverage external cache engines such as Redis or Hazelcast will often trigger invalidation by issuing deletions against the invalidated objects. This could result in a single write operation triggering thousands of deletes, impacting performance.
  • Lack of key tracking: Again, if using an external cache engine, any request will often trigger a key lookup at the cache layer. If this is a miss, it can trigger an extra RTT, adding to the overall latency of requests. Engines such as Redis and Hazelcast provide for key change notification support however, allowing local cache layers to be updated when keys are changed in a remote cache layer. By tracking these keys locally, remote lookups on a cache miss can be avoided, preventing a cache hit penalty.
  • Invalidation as an instant event, not a time range: When a table is to be changed as part of a transaction, the SQL mode can impact if a query on another connection should see the changes or not. As such, while a transaction hasn't yet been committed or rolled back, any change against a table during the transaction should trigger the table to be considered volatile until the transaction is completed. Often, cache engines will only invalidate a result before or after the query is executed.
  • Distributed caches w/ lack of communication: If a cache design is using an underlying storage layer, when used as a distributed cache, invalidations are done locally, based on what tables are written to at a given time. Unfortunately, other nodes may have written cache objects for the same table, and these objects won't be invalidated. When used for local session data with upstream client persistence, this may be acceptable, but for shared data that needs to maintain consistency across sessions, this can cause data consistency problems.

References

[edit]
  1. ^ Larson, Per-åke; Goldstein, Jonathan (2004). "MTCache: Transparent mid-tier database caching". CiteSeerX 10.1.1.95.875. {{cite journal}}: Cite journal requires |journal= (help)
  2. ^ a b Altinel, Mehmet; Luo, Qiong; Krishnamurthy, Sailesh; Mohan, C.; Pirahesh, Hamid; Lindsay, Bruce G.; Woo, Honguk; Brown, Larry (2002). "DBCache: Database Caching For Web Application Servers" (PDF). CiteSeerX 10.1.1.104.8991. {{cite journal}}: Cite journal requires |journal= (help)
  3. ^ "Middle-tier Database Caching for e-Business". CiteSeerX 10.1.1.140.8455. {{cite journal}}: Cite journal requires |journal= (help)
  4. ^ "Why Databases Should Bypass the Linux Page Cache". 13 March 2024. Retrieved 2 April 2024.
[edit]