- log: remove APP_LOG_FILTER_CONFIG from log-processor
not really useful in practice, log system should be designed to be capable of persisting massive data
- db: updated pool.maxIdleTime from 1 hour to 2 hours
with cloud auth, metadata service could be down during GKE control plane updating (zonal GKE only, not regional GKE) increase maxIdleTime will hold db connection longer and improve efficiency of db pool, especially in cloud env, there is no firewall between app and db
- mongo: update driver to 4.8.1
- search: improved ForEach validation and clear scroll error handling
- http: use actual body length as stats.response_body_length
- json: update jackson to 2.14.0
- html: HTMLTemplate supports data uri
- http: tweaked websocket handling, support more close code
- log: finalize log-exporter design
- ext: update log-collector default sys.http.maxForwardedIPs to 2
to keep consistent with framework default values, use ENV to override if needed
- http: if x-forwarded-for header has more ip than maxForwardedIPs, put x-forwarded-for in action log context
for request passes thru http forward proxy, or x-forwarded-for spoofing, log complete chain for troubleshooting
- http: handle http request with empty path
browser or regular http client won't send empty path, only happens with raw http request sending by low level client
- http: update undertow to 2.3.0
- db: update mysql driver to 8.0.31
mysql maven group updated, it is 'com.mysql:mysql-connector-j:8.0.31'
- log: update slf4j api to 2.0.3
- log: added log-exporter to upload log to google storage
as archive, or import to big query for OLAP currently only support gs://, support other clouds if needed in future
- search: update es to 8.5.0
es 8.5.0 has bug to break monitor, elastic/elasticsearch#91259, fixed by 8.5.3
- ext: updated dockerfile for security compliance
in order to enable kube "securityContext.runAsNonRoot: true", docker image should use numeric user (UID)
- monitor: fixed kafka high disk alert message
kafka disk usage uses size as threshold, alert message should convert to percentage
- log-processor: support action log trace filter
to filter out trace to reduce storage, e.g. under massive scanning
- kafka: update to 3.3.1
kraft mode is production ready
- log-processor: always use bulk index, to simplify
ES already unified bulk index / single index thread pool, there is only one "write" pool refer to https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html
- http: update undertow to 2.2.19
- search: update es to 8.4.1
- db: support json column via @Column(json=true)
syntax sugar to simplify custom getter and setter to convert between String column and Bean
- kafka: update to 3.2.3
- mongo: json @Property should not be used on mongo entity class
same like db, view and entity should be separated
- db: mysql jdbc driver updated to 8.0.30
- mock: fixed MockRedis.list().range(), with negative start or stop
only impact unit test
- mongo: update driver to 4.7.1
- kafka: update to 3.2.1
- kafka: removed KafkaConfig.publish(messageClass) without topic support
it proved not useful to support dynamic topic, topic better be designed like table, prefer static naming / typing over dynamic if really need something like request/replay, fan-in / fan-out pattern, still can be implemented in explicit way
- monitor: monitoring kafka topic/message type changes
- mongo: improve entity decoding
use switch to replace if on field matching
- log-collector: stricter request validation
- mongo: add connection pool stats metrics
- db: add experimental postgresql support
many new db products use postgres compatible driver, e.g. GCloud AlloyDB, CockroachDB PostgreSQL lacks of many features we are using with MySQL, 1. affected rows, 2. QueryInterceptor to track no index used sql
- search: clear ForEach scroll once process done
- http: update undertow to 2.2.18
- search: update es to 8.3.2
- kafka: mark LONG_CONSUMER_DELAY as error if delay is longer than 15 mins
currently MAX_POLL_INTERVAL_MS_CONFIG = 30 mins, larger delay will trigger rebalance and message resending system should be designed to process polling messages fast
- mongo: update driver to 4.6.0
- warning: removed @DBWarning, replaced with @IOWarning
e.g. @IOWarning(operation="db", maxOperations=2000, maxElapsedInMs=5000, maxReads=2000, maxTotalReads=10_000, maxTotalWrites=10_000)
- kafka: updated default long consumer delay threshold from 60s to 30s
- mongo: supports BigDecimal (map to mongo Decimal128 type)
- mongo: supports LocalDate (map to String)
- monitor: for critical errors, use 1 min as timespan (only send one alert message per min for same kind error)
in reality, ongoing critical error may flush slack channel, 1 min timespan is good enough to bring attention, and with lesser messages
- redis: supports SortedSet.remove()
8.0.2 (06/02/2022 - 06/17/2022) !!! @DBWarning is replaced with @IOWarning on 8.0.3, better upgrade to next version
- db: redesign db max operations, removed Database.maxOperations(), introduced @DBWarning
one service may mix multiple type of workload, some could be async and call db many times while others API needs to be sync and responsive with new design, it allows to specify warning threshold in granular level currently @DBWarning only support Controller/WS method, MessageHandler method, and executor task inherits from parent @DBWarning
- db: reduced default LONG_TRANSACTION threshold from 10s to 5s, log commit/rollback elapsed time
- json: expose JSONMapper.builder() method to allow app create its own JSON parser
e.g. to parse external json with custom DeserializationFeature/MapperFeature
- search: add validation, document class must not have default values
with partialUpdate, it could override es with those default values
- search: update and partialUpdate return if target doc is updated
- log-processor: action forward supports ignoring actions
- db: query.count() and query.projectOne() honor sort/skip/limit !!! behavior changed !!!
repository.select() is designed to be syntax sugar, to make it easier to construct single table SQL (ORM is not design goal) since it supports groupBy() projectOne() project(), now we removed all customized rules, it degraded to plain SQL builder if you uses query.count(), it's better not set skip/limit/orderBy before that with this, projection works as intended, e.g. query.limit(1); query.projectOne("col1, col2", View.class); results in "select col1, col2 from table limit 1"
- log-processor: action forward supports by result
e.g. only forward OK actions
- json: disabled ALLOW_COERCION_OF_SCALARS, to make it stricter
thru JSONMapper.disable(MapperFeature.ALLOW_COERCION_OF_SCALARS) previously, [""] will convert [null] if target type is List
- search: put detailed error in SearchException
- kafka: update to 3.2.0
- db: query.project() renamed to query.projectOne(), added query.project() to return List !!!
should be easy to fix with compiler errors
- log-collector: make http().maxForwardedIPs() read property "sys.http.maxForwardedIPs"
so it can be configured by env
- stats: replace cpu usage with JDK built-in OperatingSystemMXBean.getProcessCpuLoad()
since jdk 17, java is aware of container, refer to https://developers.redhat.com/articles/2022/04/19/java-17-whats-new-openjdks-container-awareness
- rate-limit: update rate control config api to make it more intuitive
e.g. add("group", 10, 5, Duration.ofSeconds(1)) keeps 10 permits at most, fills 5 permits every second e.g. add("group", 20, 10, Duration.ofMinutes(5)) keeps 20 permits at most, fills 10 permits every 5 minutes
- search: added retryOnConflict on UpdateRequest to handle
- search: added partialUpdate
- db: mysql jdbc driver updated to 8.0.29 !!! framework depends on mysql api, must use matched jdbc driver
- search: update es to 8.2.0
- db: tweak gcloud iam auth provider expiration time, make CancelQueryTaskImpl be aware of gcloud auth provider
- db: tweak query timeout handling
increase db socket timeout, to make MySQL cancel timer (CancelQueryTaskImpl) has more room to kill query
- http: update undertow to 2.2.17
UNDERTOW-2041 Error when X-forwarded-For header contains ipv6 address with leading zeroes UNDERTOW-2022 FixedLengthStreamSourceConduit must overwrite resumeReads
- monitor: collect ES cpu usage
added highCPUUsageThreshold in monitor es config
- httpClient: verify hostname with trusting specific cert !!! behavior changed, check carefully
previously, HTTPClient.builder().trust(CERT) will use empty hostname verifier now, only trustAll() will bypass hostname verifying, so right now if you trust specific cert, make sure you have CN/ALT_NAME matches the dns name (wildcard domain name also works)
- httpClient: retry aware of maxProcessTime
for short circuit, e.g. heavy load request causes remote service busy, and client timeout triggers more retry requests, to amplify the load
- es: support refresh on bulk request
to specify whether auto refresh after bulk request
- es: added timeout in search/complete request
- es: added DeleteByQuery
- es: added perf trace for elasticSearch.refreshIndex
- monitor: treat high disk usage as error
disk full requires immediate attention to expand
- action: use id as correlationId for root action
to make kibana search easier, e.g. search by correlationId will list all actions log-processor action diagram also simplified
- search: truncate es request log
bulk index request body can be huge, to reduce heap usage
- db: support gcloud IAM auth
gcloud mysql supported IAM service account auth, to use access token instead of user/password set db user to "iam/gcloud" to use gcloud iam auth
- monitor: add retry on kube client
to reduce failures when kube cluster upgrades by cloud
- http: removed http().httpPort() and http().httpsPort(), replaced with http().listenHTTP(host) and http().listenHTTPS(host)
replaced sys.http.port, sys.https.port with sys.http.listen and sys.https.listen host are in "host:port" format, e.g. 127.0.0.1:8080 or 8080 (equivalent to 0.0.0.0:8080) with cloud env, all envs have same dns/service name, so to simplify properties config, it's actually better to setup dns name to minic cloud in local e.g. set "customer-service" to 127.0.0.2 in local dns, and hardcode "https://customer-service" as customerServiceURL
- log-collector: refer to above, use sys.http.listen and sys.https.listen env if needed
- kafka: redesigned /_sys/ controller, now they are /_sys/kafka/topic/:topic/key/:key/publish and /_sys/kafka/topic/:topic/key/:key/handle
publish is to publish message to kafka, visible to all consumers handle is to handle the message on the current pod, not visible to kafka, just call handler to process the message (manual recovery or replay message)
- maven: deleted old published version older than 7.9.0
- redis: replaced ZRANGEBYSCORE with ZRANGE, requires redis 6.2 !!!
- redis: for list.pop always use "LPOP count" to simplify, requires redis 6.2 !!!
- redis: added RedisSortedSet.popMin
- redis: improved RedisSortedSet.popByScore concurrency handling
- kafka: collect producer kafka_request_size_max, collect kafka_max_message_size
stats.kafka_request_size_max is from kafka producer metrics, which is compressed size (include key/value/header/protocol), broker size config "message.max.bytes" should be larger than this action.stats.kafka_max_message_size is uncompressed size of value bytes, (kafka().maxRequestSize() should be larger than this)
- log-processor: added stat-kafka_max_message_size vis, added to kafka dashboard
- log-processor: updated stat-kafka_request_size vis, added max request size
- log: limit max size of actionLog.context() to 5000 for one key
warn with "CONTEXT_TOO_LARGE" if too many context values
- log: tweaked action log/trace truncation
increased max request size to 2M check final action log json bytes before sending to log-kafka, if it's more than 2M, print log to console, and truncate context/trace with snappy compression, it's generally ok with broker message.max.bytes=1M in worst case, we can set log-kafka message.max.bytes=2M will review the current setup and potentially adjust in future
- db: fix: revert previous update UNEXPECTED_UPDATE_RESULT warning if updated row is 0
- executor: log task class in trace to make it easier to find the corresponding code of executor task
- log-processor: update action-* context dynamic index template to support dot in context key
it's not recommended putting dot in action log context key
- log-processor: kibana json updated for kibana 8.1.0
TSVB metrics separate left axis bug fixed, so to revert GC diagrams back
- search: update es to 8.1.0
BTW: ES cannot upgrade a node from version [7.14.0] directly to version [8.1.0], upgrade to version [7.17.0] first.
- mongo: updated driver to 4.5.0
removed mapReduce in favor of aggregate, it's deprecated by mongo 5.0
refer to https://docs.mongodb.com/manual/reference/command/mapReduce/#mapreduce
- scheduler: replaced jobExecutor with unlimited cached thread pool
no impact with regular cases, normally scheduler-service in one application should only send kafka message this change is mainly to simplify test service or non-global jobs (e.g. no need to put real logic to Executors in Job)
- jre: published neowu/jre:17.0.2
- search: update es to 8.0.0
Elastic dropped more modules from this version, now we have to include transport-netty4, mapper-extras libs and it doesn't provide standard way of integration test, refer to elastic/elasticsearch#55258 opensearch is doing opposite, https://mvnrepository.com/artifact/org.opensearch.plugin
- db: updated batchInsertIgnore, batchUpsert, batchDelete return value from boolean[] to boolean
this is drawback of MySQL thin driver, though expected behavior with batch insert ignore (or insert on duplicate key), MySQL thin driver fills entire affectedRows array with same value, java.sql.Statement.SUCCESS_NO_INFO if updated count > 0 refer to com.mysql.cj.jdbc.ClientPreparedStatement.executeBatchedInserts Line 758 if you need to know result for each entity, you have to use single operation one by one (Transaction may help performance a bit)
- db: mysql jdbc driver updated to 8.0.28
one bug fixed: After calling Statement.setQueryTimeout(), when a query timeout was reached, a connection to the server was established to terminate the query, but the connection remained open afterward. With this fix, the new connection is closed after the query termination. (Bug #31189960, Bug #99260)
- known bugs:
db, due to use affected rows (not found rows), if repository updates entity without any change, it warns with UNEXPECTED_UPDATE_RESULT, please ignore this, will fix in next version