In this post I will give a documentation for the work I have done. The description for aim of this project is given in detail in MDEV-350 and in my earlier posts, so I am going to skip those here.

#### Table of Contents

- Getting the source code
- Interface for using cost factors
- Adding a new cost factor
- Under the hood
- Future Work
- Credits

#### Getting the source code

This code is currently not merged in `mariadb`

. You can get it from my github repository.

#### Interface for using cost factors

I have added two new files `sql/opt_costmodel.h`

and `sql/opt_costmodel.cc`

. The class of interest is `Cost_factors`

. This class is used to store all the cost factors (both global and engine specific). We have `Global_cost_factors`

class for storing global cost factors and similarly `Engine_cost_factors`

for engine specific factors.

In the current code, we have 2 global cost factors: `time_for_compare`

and `time_for_compare_rowid`

and two engine specific cost factors `read_time`

and `scan_time`

. These factors are accessible via a global `Cost_factors`

object `cost_factors`

. So you can get `time_for_compare`

by calling `cost_factors.time_for_compare()`

. For engine specific factors, the syntax is `cost_factors.read_time(handler *)`

.

The values of these cost factors are initialized from the `optimizer_cost_factors`

table of `mysql`

database, when the server starts. If the cost factor is not present, a default value is given (for example `time_for_compare`

has a default value of 5, and that of `time_for_compare_rowid`

is 500).

#### Adding a new cost factor

Suppose you want to add a new engine specific cost factor, say `new_engine_factor`

. Here is a step by step guide for doing that:

- In
`sql/opt_costmodel.h`

, each constant has been given a number so that easy to store them in an array (or whatever). Give your constant a new number, like`#define NEW_ENGINE_FACTOR 2`

. Also increase the value of`MAX_ENGINE_CONSTANTS`

by 1 and`MAX_CONSTANTS`

by 64 (`MAX_HA`

).`MAX_CONSTANTS`

is used while solving equations. - In the class
`Engine_cost_factors`

add a new variable for your constant, for example`Cost_factor new _engine_factor`

. - Give your cost factor a default value, for example
`static const double DEFAULT_NEW_ENGINE_FACTOR= 1`

. - Also, in the
`optimizer_cost_factors`

table, each constant is stored by it’s name, so we need a mapping of variable to it’s name. In the method`set_all_names()`

of`Engine_cost_factors`

, add a new entry for your variable. Example`all_names[YOUR_VARIABLE_NUMBER] = st_factor("NEW_ENGINE_FACTOR", &new_engine_factor)`

. - You will also need to modify the method
`update_engine_factor`

. It has two variants, you need to modify both of them. - The last step is to expose your cost factor. In the
`Cost_factors`

class, add a new method`double new_engine_factor(handler *)`

which will return the value of this factor based on the engine index.

Adding a new global cost factor is almost similar. (One important point to note is to increase the value of `MAX_CONSTANTS`

by 1 and not by 64).

#### Under the hood

So you want to understand how are these factors being calculated.
As I had mentioned in the first section that the cost factors values are initialized from the `optimizer_cost_factors`

table. This table currently has 5 columns, `const_name`

, `engine_name`

, `count`

, `sum`

, `sum_squared`

. (The schema is in `scripts/mysql_system_tables.sql`

). Engine name should be an empty string for global cost factors. The value for a cost factor is calculated as `sum/count`

. Let’s understand what are these `sum`

and `count`

in more detail.

Apart from a global `Cost_factors`

object, `cost_factors`

, each THD also has it’s own copy of `Cost_factors`

, `thd_cost_factors`

. In each `THD`

we measure the `query_time`

for each query and how many times an operation related to a cost factor was performed. So for each query we have a linear equation relating `cost_factors`

value, number of operations related to `cost_factor`

and total time. This has been discussed in a greater detail in my earlier blog post. We solve for cost factors when the `THD`

disconnects or when we have `MAX_EQUATIONS`

. I am using `lsqr`

for solving sparse linear equations. These values are written to the global `cost_factors`

object. The value is added to `sum`

and the `count`

is increased by 1 (Hence the equation `value=sum/count`

). The values in the global `cost_factors`

object are written to the `optimizer_cost_factors`

table when the server disconnects. You might want to check out `THD::solve_equation`

and `THD::build_equation`

in `sql/sql_class.cc`

.

#### Future work

- Since we have added only 4 cost factors, there is a lot of unaccounted time in each query. As a result of this, we get negative values for some of the coefficients when solved using
`lsqr`

. - The framework has not been tested completely. So, if you get an error while experimenting with it, please report to me. I will try to add more test cases.

#### Credits

The entire project was an idea of Sergei Golubchik and he was also my mentor for this project. I will like to thank him for his constant guidance and encouragement.

blog comments powered by Disqus