Wednesday, May 22, 2019
Clustering Techniques in Oodbms (Using Objectstore)
Introduction Performance of a entropybase can be greatly squeeze by the manner in which entropy is loaded. This fact is true regardless of when the data is loaded whether loaded before the coating(s) begin retrieveing the data, or concurrently eon the application(s) are accessing the data. This paper will present various st castgies for locating data as it is loaded into the database and detail the performance implications of those strategies. entropy Clustering, Working Sets, and Performance With ObejctStore access to persistent data can perform at in-memory speeds.In order to achieve in-memory speeds, one accepts cache affinity. Cache affinity is the generic term that describes the tip to which data accessed within a program overlaps with data already retrieved on behalf of a previous request. Effective data caboodle allows for better, if not optimal, cache affinity. Data slow-wittedness is defined as the proportion of objects within a given storage block that are acce ssed by a client during some stage setting of activation. Clustering is a technique to achieve high data density. The working set is defined as the set of database pages a client needs at a given time.ObjectStore is a page-based architecture which performs best when the following goals are met Minimize the number of pages transferred between the client and server Maximize the use of pages already in the cache In order to achieve these goals, the working set of the application should be optimal. The way to achieve an optimal working set is via data clustering. With good data clustering more data can be accessed in fewer pages thus a high data density rate is obtained. A higher data density results in a smaller working set as well as a better obtain of cache affinity. A smaller working set results in fewer page transfers.The following sections in this paper will explain several clustering patterns/techniques for achieving better performance via cache affinity, higher data density and a smaller working set. NOTE clustering is used in this paper as a concept of locality of reference. The term is not existence used to refer to the physical storage unit available in ObjectStore. ObjectStore does present the user with a choice for location of allocations with the database, within a particular segment, within a particular cluster. For the remainder of this paper, the discussion of cluster is a conceptual one, not the ObjectStore physical one.Database Design Process Database practice is one of the most important steps in deviseing and implementing an ObjectStore application. The following steps are pre-requisites for a database design 1) Identify key use cases (ones which need to be fast and/or are run frequently) 2) Identify the object(s) used by the use cases called out in step 1 3) Identify the object(s) that are read or updated during the use cases called out in step 1 The focus of clustering efforts should be on the database objects which are used in the h igh priority use cases identified above.Begin to cluster based on one use case, and then validate with others. The database design strategies which lend themselves to achieving the optimal working set are Clustering Partitioning There are several divers(prenominal) types of techniques which result in data being well clustered Isolate Index Pooling Object Modeling Data Clustering Clustering is a technique used to achieve high data density. Another definition of clustering is a grouping of objects together. If a use case requires objects A, B and C to operate, then those objects should be co-located for optimal data density.If upon warhead the database, those objects are physically allocated close to one another, then we say we nonplus clustered those objects. Assume that the size of the three objects combined is less than the size of a physical database page. The clustering leads to high data density because when we fetch the page with object A, we will also get objects B and C. In this particular case, we need just one page transfer to get all objects required for our use case. To accomplish good clustering, one must know the use cases and the objects conglomerate in those use cases.Given that knowledge, the goals of clustering are Cluster objects together which are accessed together Separate (de-cluster, or partition we will discuss variance in detail later in this paper) objects which are never accessed together. This includes separating frequently accessed data from rarely accessed data. Partitioning Partitioning is a strategy to isolate subsets of objects in different physical storage units. By definition, if two objects are in different partitions, they are de-clustered. The two goals of partitioning are to gain isolation and to increase data density.Isolation is desirable when concurrent access is required. The scope of this paper is not intended to cover concurrency. For that reason our discussion of partitioning will be rather brief. Altho ugh partitioning is intended for isolating objects, its use can improve data density. This may seem, by definition, to be counter intuitive. Let us use an example to illustrate. presuppose a grocery store. If you were in need of a box of cereal, you would go down the cereal gangboard. If the grocer has do his job correctly, the aisle (or some number of shelves in the aisle) will be populated ONLY with boxes of cereal.Because other items have been located in their respective aisles/shelves, the entire cereal aisle is dense with cereal. If the grocer had not done the job correctly, a given section of a shelf might have (for instance) boxes of noodles, cans of vegetables, and bags of chips. In this scenario, the shelf does not have good data density for the goal of obtaining a box of cereal. Recall the definition of data density the proportion of objects within a given storage block that are accessed by a client during some scope. Our scope is to obtain a box of cereal.Our storage b lock is the aisle or a shelf. If the shelf in dubiousness contains many items other than cereal, then we have poor data density. If, on the other hand, we partition the non-cereal items to be in different aisles, then the cereal aisle would contain only cereal and thus a high data density would Conclusion The way in which data is loaded into the database can have significant impact on the performance of an application. Careful analysis of the use cases for an application should allow key objects to be identified. Once key objects are identified, a clustering strategy can be planned.Several of the techniques presented here can allow for a clustering strategy that will boost performance far beyond any tuning that might be done after the database is loaded and the application delivered. It is often the case that several techniques can be combined an application need not restrict itself to the use of just one technique. The goal of clustering is to reduce your working set size yield hi gher data density and reduce the number of pages which need to be transferred between the application and the ObjectStore server.
Posted by w at 7:05 PM