[Feature][Docs] Add metric docs (#23)

datavane · Oct 2, 2023 · bcc399e · bcc399e
1 parent 0366ce2
commit bcc399e
Show file tree

Hide file tree

Showing 22 changed files with 1,022 additions and 10 deletions.
diff --git a/docs/04-features/01-catalog/02-connector/09-connector-impala.md b/docs/04-features/01-catalog/02-connector/09-connector-impala.md
@@ -5,7 +5,7 @@ title: 'Impala'
 
 ## 介绍
 
-`Presto` 连接器用于与 `Presto` 类型的数据源建立连接，获取元数据和执行数据质量检查。
+`Impala` 连接器用于与 `Impala` 类型的数据源建立连接，获取元数据和执行数据质量检查。
 
 ## 参数
 ### Options

diff --git a/docs/04-features/01-catalog/02-connector/10-connector-databend.md b/docs/04-features/01-catalog/02-connector/10-connector-databend.md
@@ -1,11 +1,11 @@
 ---
-id: 'connector-impala'
-title: 'Impala'
+id: 'connector-databend'
+title: 'Databend'
 ---
 
 ## 介绍
 
-`Presto` 连接器用于与 `Presto` 类型的数据源建立连接，获取元数据和执行数据质量检查。
+`Databend` 连接器用于与 `Databend` 类型的数据源建立连接，获取元数据和执行数据质量检查。
 
 ## 参数
 ### Options

diff --git a/docs/04-features/02-metric/01-single-table-metric/01-column-not-null.md b/docs/04-features/02-metric/01-single-table-metric/01-column-not-null.md
@@ -1,14 +1,12 @@
 ---
 id: 'column-not-null'
-title: 'column_not_null'
+title: '非空检查'
 ---
 ## 使用方法
 - 点击创建规则作业，选择数据质量作业
 - 进入作业页面选择 非空检查 规则
 - 选择要检查的数据源信息
 
-![自定义聚合SQL规则](/doc/image/metric_column_not_null.png)
-
 ## 参数介绍
 ### Options
 
@@ -37,6 +35,21 @@ title: 'column_not_null'
 }
 ```
 
+### 检查过程中自动生成的 `SQL` 语句
+
+检查过程会用到的一些自动生成的参数，用于区分各个检查规则。
+- uniqueKey：会根据每个规则的配置信息生成一个唯一键值
+- invalidate_items_table：会创建一个视图用于存储中间表数据，中间表数据一般为命中规则的数据，即为错误数据，该视图的名字生成规则为 invalidate_items_${uniqueKey}
+
+中间表 invalidate_items_uniqueKey
+```
+select * from ${table} where  ${column} is not null and ${filter}
+```
+计算实际值的 `SQL` 
+```
+select count(1) as actual_value_"+ uniqueKey +" from ${invalidate_items_table}
+```
+
 ## 使用案例
 
 ### 场景

diff --git a/...e-table-metric/02-custom_aggregate_sql.md → ...e-table-metric/02-custom-aggregate-sql.md b/...e-table-metric/02-custom_aggregate_sql.md → ...e-table-metric/02-custom-aggregate-sql.md
diff --git a/docs/04-features/02-metric/01-single-table-metric/03-column-null.md b/docs/04-features/02-metric/01-single-table-metric/03-column-null.md
@@ -0,0 +1,62 @@
+---
+id: 'column-null'
+title: 'column_null'
+---
+## 使用方法
+- 点击创建规则作业，选择数据质量作业
+- 进入作业页面选择 空值检查 规则
+- 选择要检查的数据源信息
+
+## 参数介绍
+### Options
+
+|             name             |  type  |  required  | default value |
+|:----------------------------:|:------:|:----------:|:-------------:|
+| [database](#database-string) | string |    yes     |       -       |
+|    [table](#table-string)    | string |    yes     |       -       |
+|   [column](#column-string)   | string |    yes     |       -       |
+
+#### database [string]
+源表数据库名
+#### table [string]
+源表数据库中的表名
+#### column [string]
+要检查的列
+
+### 配置文件例子
+```
+{
+    "metricType": "column_null",
+    "metricParameter": {
+        "database": "datavines",
+        "table": "dv_catalog_entity_instance",
+        "column": "type"
+    }
+}
+```
+
+### 检查过程中自动生成的 `SQL` 语句
+
+检查过程会用到的一些自动生成的参数，用于区分各个检查规则。
+- uniqueKey：会根据每个规则的配置信息生成一个唯一键值
+- invalidate_items_table：会创建一个视图用于存储中间表数据，中间表数据一般为命中规则的数据，即为错误数据，该视图的名字生成规则为 invalidate_items_${uniqueKey}
+
+中间表 invalidate_items_uniqueKey
+```
+select * from ${table} where ${column} is null and ${filter}
+```
+计算实际值的 `SQL` 
+```
+select count(1) as actual_value_"+ uniqueKey +" from ${invalidate_items_table}
+```
+
+## 使用案例
+
+### 场景
+...
+
+### 思路
+...
+
+### 步骤
+...
diff --git a/docs/04-features/02-metric/01-single-table-metric/04-column-avg.md b/docs/04-features/02-metric/01-single-table-metric/04-column-avg.md
@@ -0,0 +1,57 @@
+---
+id: 'column-avg'
+title: 'column_avg'
+---
+## 使用方法
+- 点击创建规则作业，选择数据质量作业
+- 进入作业页面选择 平均值检查 规则
+- 选择要检查的数据源信息
+
+## 参数介绍
+### Options
+
+|             name             |  type  |  required  | default value |
+|:----------------------------:|:------:|:----------:|:-------------:|
+| [database](#database-string) | string |    yes     |       -       |
+|    [table](#table-string)    | string |    yes     |       -       |
+|   [column](#column-string)   | string |    yes     |       -       |
+
+#### database [string]
+源表数据库名
+#### table [string]
+源表数据库中的表名
+#### column [string]
+要检查的列
+
+### 配置文件例子
+```
+{
+    "metricType": "column_avg",
+    "metricParameter": {
+        "database": "datavines",
+        "table": "dv_catalog_entity_instance",
+        "column": "type"
+    }
+}
+```
+
+### 检查过程中自动生成的 `SQL` 语句
+
+检查过程会用到的一些自动生成的参数，用于区分各个检查规则。
+- uniqueKey：会根据每个规则的配置信息生成一个唯一键值
+
+计算实际值的 `SQL` 
+```
+select avg(${column}) as actual_value_${uniqueKey} from ${table} where ${filter}
+```
+
+## 使用案例
+
+### 场景
+...
+
+### 思路
+...
+
+### 步骤
+...
diff --git a/docs/04-features/02-metric/01-single-table-metric/05-column-avg-length.md b/docs/04-features/02-metric/01-single-table-metric/05-column-avg-length.md
@@ -0,0 +1,57 @@
+---
+id: 'column-avg-length'
+title: 'column_avg_length'
+---
+## 使用方法
+- 点击创建规则作业，选择数据质量作业
+- 进入作业页面选择 平均长度检查 规则
+- 选择要检查的数据源信息
+
+## 参数介绍
+### Options
+
+|             name             |  type  |  required  | default value |
+|:----------------------------:|:------:|:----------:|:-------------:|
+| [database](#database-string) | string |    yes     |       -       |
+|    [table](#table-string)    | string |    yes     |       -       |
+|   [column](#column-string)   | string |    yes     |       -       |
+
+#### database [string]
+源表数据库名
+#### table [string]
+源表数据库中的表名
+#### column [string]
+要检查的列
+
+### 配置文件例子
+```
+{
+    "metricType": "column_avg_length",
+    "metricParameter": {
+        "database": "datavines",
+        "table": "dv_catalog_entity_instance",
+        "column": "type"
+    }
+}
+```
+
+### 检查过程中自动生成的 `SQL` 语句
+
+检查过程会用到的一些自动生成的参数，用于区分各个检查规则。
+- uniqueKey：会根据每个规则的配置信息生成一个唯一键值
+
+计算实际值的 `SQL` 
+```
+select avg(length(${column})) as actual_value_${uniqueKey} from ${table} where ${filter}
+```
+
+## 使用案例
+
+### 场景
+...
+
+### 思路
+...
+
+### 步骤
+...
diff --git a/docs/04-features/02-metric/01-single-table-metric/06-column-blank.md b/docs/04-features/02-metric/01-single-table-metric/06-column-blank.md
@@ -0,0 +1,62 @@
+---
+id: 'column-blank'
+title: 'column_blank'
+---
+## 使用方法
+- 点击创建规则作业，选择数据质量作业
+- 进入作业页面选择 无值检查 规则
+- 选择要检查的数据源信息
+
+## 参数介绍
+### Options
+
+|             name             |  type  |  required  | default value |
+|:----------------------------:|:------:|:----------:|:-------------:|
+| [database](#database-string) | string |    yes     |       -       |
+|    [table](#table-string)    | string |    yes     |       -       |
+|   [column](#column-string)   | string |    yes     |       -       |
+
+#### database [string]
+源表数据库名
+#### table [string]
+源表数据库中的表名
+#### column [string]
+要检查的列
+
+### 配置文件例子
+```
+{
+    "metricType": "column_blank",
+    "metricParameter": {
+        "database": "datavines",
+        "table": "dv_catalog_entity_instance",
+        "column": "type"
+    }
+}
+```
+
+### 检查过程中自动生成的 `SQL` 语句
+
+检查过程会用到的一些自动生成的参数，用于区分各个检查规则。
+- uniqueKey：会根据每个规则的配置信息生成一个唯一键值
+- invalidate_items_table：会创建一个视图用于存储中间表数据，中间表数据一般为命中规则的数据，即为错误数据，该视图的名字生成规则为 invalidate_items_${uniqueKey}
+
+中间表 invalidate_items_uniqueKey
+```
+select * from ${table} where (${column} is null or ${column} = '') and ${filter}
+```
+计算实际值的 `SQL` 
+```
+select count(1) as actual_value_"+ uniqueKey +" from ${invalidate_items_table}
+```
+
+## 使用案例
+
+### 场景
+...
+
+### 思路
+...
+
+### 步骤
+...
diff --git a/docs/04-features/02-metric/01-single-table-metric/07-column-distinct.md b/docs/04-features/02-metric/01-single-table-metric/07-column-distinct.md
@@ -0,0 +1,57 @@
+---
+id: 'column-distinct'
+title: 'column_distinct'
+---
+## 使用方法
+- 点击创建规则作业，选择数据质量作业
+- 进入作业页面选择 Distinct检查 规则
+- 选择要检查的数据源信息
+
+## 参数介绍
+### Options
+
+|             name             |  type  |  required  | default value |
+|:----------------------------:|:------:|:----------:|:-------------:|
+| [database](#database-string) | string |    yes     |       -       |
+|    [table](#table-string)    | string |    yes     |       -       |
+|   [column](#column-string)   | string |    yes     |       -       |
+
+#### database [string]
+源表数据库名
+#### table [string]
+源表数据库中的表名
+#### column [string]
+要检查的列
+
+### 配置文件例子
+```
+{
+    "metricType": "column_distinct",
+    "metricParameter": {
+        "database": "datavines",
+        "table": "dv_catalog_entity_instance",
+        "column": "type"
+    }
+}
+```
+
+### 检查过程中自动生成的 `SQL` 语句
+
+检查过程会用到的一些自动生成的参数，用于区分各个检查规则。
+- uniqueKey：会根据每个规则的配置信息生成一个唯一键值
+
+计算实际值的 `SQL`， 输出不重复的行数
+```
+select count(distinct(${column})) as actual_value_${uniqueKey} from ${table} where ${filter}
+```
+
+## 使用案例
+
+### 场景
+...
+
+### 思路
+...
+
+### 步骤
+...