Skip to content

Commit

Permalink
[Feature][Docs] Add some docs (#11)
Browse files Browse the repository at this point in the history
  • Loading branch information
zixi0825 authored Aug 27, 2023
1 parent 14293b2 commit 11e93ef
Show file tree
Hide file tree
Showing 70 changed files with 1,042 additions and 77 deletions.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---
id: 'Catalog'
id: 'catalog'
title: 'Catalog'
---
3 changes: 3 additions & 0 deletions docs/04-features/01-catalog/02-connector/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"label": "Catalog Connector"
}
3 changes: 3 additions & 0 deletions docs/04-features/01-catalog/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"label": "Catalog"
}
4 changes: 0 additions & 4 deletions docs/04-features/02-data-quality/01-metric/01-metric-intro.md

This file was deleted.

3 changes: 0 additions & 3 deletions docs/04-features/02-data-quality/02-engine/_category_.json

This file was deleted.

3 changes: 0 additions & 3 deletions docs/04-features/02-data-quality/_category_.json

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
id: 'column-not-null'
title: 'column_not_null'
---
## 使用方法
- 点击创建规则作业,选择数据质量作业
- 进入作业页面选择 非空检查 规则
- 选择要检查的数据源信息

![自定义聚合SQL规则](/doc/image/metric_column_not_null.png)

## 参数介绍
### Options

| name | type | required | default value |
|:----------------------------:|:------:|:----------:|:-------------:|
| [database](#database-string) | string | yes | - |
| [table](#table-string) | string | yes | - |
| [column](#column-string) | string | yes | - |

#### database [string]
源表数据库名
#### table [string]
源表数据库中的表名
#### column [string]
要检查的列

### 配置文件例子
```
{
"metricType": "column_not_null",
"metricParameter": {
"database": "datavines",
"table": "dv_catalog_entity_instance",
"column": "type"
}
}
```

## 使用案例

### 场景
...

### 思路
...

### 步骤
...
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
id: 'custom-aggregate-sql'
title: 'custom_aggregate_sql'
---

## 使用方法
- 点击创建规则作业,选择数据质量作业
- 进入作业页面选择 自定义聚合SQL 规则
- 选择要检查的数据源信息,编写自定义聚合 SQL 语句

![自定义聚合SQL规则](/doc/image/metric_custom_aggregate_sql.png)

## 参数介绍
### Options

| name | type | required | default value |
|:---------------------------------------------------------------:|:------:|:----------:|:-------------:|
| [database](#database-string) | string | yes | - |
| [table](#table-string) | string | yes | - |
| [actual_aggregate_sql](#actual_execute_sql-string) | string | yes | - |

#### database [string]
源表数据库名
#### table [string]
源表数据库中的表名
#### actual_aggregate_sql [string]
自定义聚合SQL,注意 as 后面的别名一定要是 **actual_value**,否则统计会出错。

### 配置文件例子
```
{
"metricType": "custom_aggregate_sql",
"metricParameter": {
"database": "datavines",
"table": "dv_actual_values",
"actual_aggregate_sql": "select count(1) as actual_value from ${table}"
}
}
```

## 使用案例

### 场景
...

### 思路
...

### 步骤
...
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"label": "Single Table Metric"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
id: 'multi-table-value-comparison'
title: 'multi_table_value_comparison'
---

## 使用方法
- 点击创建规则作业,选择数据比对作业
- 进入作业页面选择 两表值比对 规则
- 进行源表和目标表的配置信息,编写各自的检查SQL语句

![两表值比对规则](/doc/image/metric_multi_table_value_comparison.png)

## 参数介绍
### Options

| name | type | required | default value |
|:--------------------------------------------------------------:|:------:|:----------:|:-------------:|
| [database](#database-string) | string | yes | - |
| [table](#table-string) | string | yes | - |
| [actual_execute_sql](#actual_execute_sql-string) | string | yes | - |
| [database2](#database2-string) | string | yes | - |
| [table2](#table2-string) | string | yes | - |
| [expected_execute_sql](#expected_execute_sql-string) | string | yes | - |


#### database [string]
源表数据库名
#### table [string]
源表数据库中的表名
#### actual_execute_sql [string]
实际值执行SQL,注意 as 后面的别名一定要是 **actual_value**,否则统计会出错。
#### database2 [string]
目标表数据库名
#### table2 [string]
目标表数据库中的表名
#### expected_execute_sql [string]
期望值执行SQL,注意 as 后面的别名一定要是 **expected_value**,否则统计会出错。

### 配置文件例子
```
{
"metricType": "multi_table_value_comparison",
"metricParameter": {
"database": "cbs",
"table": "cbs_ratio",
"actual_execute_sql": "select count(1) as actual_value from ${table}",
"database2": "cbs",
"table2": "cbs_ratio",
"expected_execute_sql": "select count(1) as expected_value from ${table2}"
}
}
```

## 使用案例

### 场景

比较某张表(有时间分区列)今天的数据量和昨天的数据量,如果今天的数据量小于昨天的数据量就需要告警。

### 思路
使用两表值比对规则配合内置时间参数。

>使用两表值比对的规则,通过编写 SQL 语句来统计今天和昨天的数据量,然后比较两个值,如果今天的值大于昨天的值则为真,否则为假。
### 步骤
- 选择数据比对作业中的两表值比对规则
- 选择对应的数据库和表,并编写计算实际值 SQL 语句。
- 语句里面用到了`$[today]`时间变量,系统会自动替换成今天的日期,格式为`yyyy-MM-dd`,也可以自己配置格式 `$[today(yyyyMMdd)]`
- 注意 as 后面的别名一定要是 **actual_value**,否则统计会出错。
- 表名可以用`${table}`,系统会自动替换,也可以直接写正确的表名。
```
select count(1) as actual_value from ${table} where data_date='$[today]'
```
- 选择对应的数据库和表,并编写计算期望值 SQL 语句
- 语句里面用到了`$[yesterday]`时间变量,系统会自动替换成昨天的日期,格式为`yyyy-MM-dd`,也可以自己配置格式 `$[yesterday(yyyyMMdd)]`
- 注意 as 后面的别名一定要是 **expected_value**,否则统计会出错。
- 表名可以用`${table2}`,系统会自动替换,也可以直接写正确的表名。
```
select count(1) as expected_value from ${table2} where data_date='$[yesterday]'
```
- 配置结果判断公式
- 结果公式选择:实际值-期望值
- 比较符选择:>=
- 阈值:0

如果公式`实际值-期望值 >= 0`的结果为真, 那么就证明今天的表行数大于昨天的表行数,否则就证明今天的表行数小于昨天的表行数,结果是异常的,需要告警。
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
id: 'multi-table-accuracy'
title: 'multi_table_accuracy'
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"label": "Mutil Table Metric"
}
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---
id: 'Engine Introduction'
id: 'engine-introduction'
title: 'Engine Introduction'
---
4 changes: 4 additions & 0 deletions docs/04-features/03-engine/02-local-engine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
id: 'local-engine'
title: 'Local Engine'
---
4 changes: 4 additions & 0 deletions docs/04-features/03-engine/03-spark-engine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
id: 'spark-engine'
title: 'Spark Engine'
---
3 changes: 3 additions & 0 deletions docs/04-features/03-engine/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"label": "Job Execute Engine"
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---
id: 'SLAs'
id: 'slas'
title: 'SLAs'
---
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---
id: 'Error Data Storage'
id: 'error-data-storage'
title: 'Error Data Storage'
---
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---
id: 'Issues'
id: 'issues'
title: 'Issues'
---
2 changes: 1 addition & 1 deletion docs/04-features/06-tag.md → docs/04-features/07-tag.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---
id: 'Tag'
id: 'tag'
title: 'Tag'
---
2 changes: 1 addition & 1 deletion docs/05-integration/1-dolphinscheduler.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---
id: 'DolphinScheduler'
id: 'dolphin-scheduler'
title: 'DolphinScheduler'
---
4 changes: 0 additions & 4 deletions docs/06-development/03-source-module-explanation.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,6 @@ datavines-registry is the registry in datavines. datavines-registry-api defines

datavines-notification is the notification channel management in datavines. datavines-notification-api defines the notification channel's message notification, configuration parameters and other interfaces. The email type notification channel implementation is built into datavines-notification-plugins.

## datavines-storage

datavines-storage is an error data storage management module in datavines. datavines-storage-api defines interfaces such as error data query and configuration parameters. Two types of error data storage engines, MySQL and LocalFile, are built into datavienes-storage-plugins, which are used for error data that does not conform to the rules found during the data quality inspection process. The LocalFile engine is only applicable to the Local execution engine.

## datavines-runner

datavines-runner is a module in datavines responsible for running data quality checks in script mode. Its main functions are to read configuration files, parse configuration files, perform data quality checks, judge check results, and run alarm processing.
3 changes: 0 additions & 3 deletions docs/06-development/04-plugin/01-connector/_category_.json

This file was deleted.

Loading

0 comments on commit 11e93ef

Please sign in to comment.