Problems and future plans on token system #75

otakustay · 2020-01-13T10:43:27Z

Token system is a key feature of react-diff-view and is the source of its powerfulness.

I looked into current implementation of token system recently and discovered some vulnerabilities about it, mostly about performance.

Too many clones

To ensure token system works safely by default, we introduce a clone function to copy an entire path, that is clone every node inside this path. This function is used in almost every key functions like normalizeToLines and other manipulation utilities.

Actually, while we guarantee that token node is immutable, we can get rid of all these clones, boosting performance considerably.

In current situation we don't have a statement saying "you cannot mutate token node" and removing clone operations can introduce some potential breaking changes, but I think it should be safe in most cases.

Too many slices

Previously we represent a path as an array of token nodes, the last node in this array is always a text node:

[
  {type: 'element'},
  {type: 'element'},
  {type: 'mark'},
  {type: 'text', value: 'Hello World'}
]

In nearly all cases when we need to manipulate this path, we were not changing the text, we frequently insert a node before text node, the implement could be:

const parents = path.slice(0, -1);
const text = path[path.length - 1];
const newPath = [...parents, customNode, text];

We notice a .slice(0, -1) inside this code which can cause some dropdown on performance.

To optimize this, we can separate node path and its containing text into a tuple like:

[
  [
    {type: 'element'},
    {type: 'element'},
    {type: 'mark'},
  ],
  'Hello World'
]

We can insert a node before text more efficiently:

const [parents, text] = path;
const newPath = [[...parents, customNode], text];

In this solution we also unwrap text node into a single string to reduce the creation and garbage collection of node objects.

This is a breaking change since any custom enhancers written previously will fail to work on the new data structure.

Too dynamic nodes

For the lifetime of react-diff-view, we didn't have a strictly defined type for token node, that is, users can put any properties into a node and use them in custom renderToken function.

This is easy to use, but a dynamic node causes more than expected node clones and a poor performance node merge algorithm.

We decide to later define the node more strictly, leaving only type and a dynamic properties for it, the clone and compare can be much simpler:

const cloneNode = node => {
  return {
    type: node.type,
    properties: node.properties,
  }
};

const areNodesEquals = (x, y) => (x.type === y.type && shallowEquals(x.properties, y.properties));

This is also a breaking change since a more strict structure is forced.

Reuse token system

As a result of performance inspection and reusability consideration, I developed a tiny library source-tokenizer to handle these, on the initial implement we get about 25% performance boost:

highlighting a json file with 11029 lines, 10 times
legacy: 22.384s
current: 16.038s

I'm preparing to develop a react-source-view component based on this library to render source codes, react-diff-view which focused on render diff contents will have a major version to use this library in 2020, an upgrade guide will be published then.

The text was updated successfully, but these errors were encountered:

otakustay added the pinned label Jul 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems and future plans on token system #75

Problems and future plans on token system #75

otakustay commented Jan 13, 2020

Problems and future plans on token system #75

Problems and future plans on token system #75

Comments

otakustay commented Jan 13, 2020

Too many clones

Too many slices

Too dynamic nodes

Reuse token system