Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Polyline rendering #7972

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

thedmd
Copy link
Contributor

@thedmd thedmd commented Sep 11, 2024

This is follow up to #2964 Render thick lines with correct thickness and beveled corners PR.

image

New implementation introduce more features, fixes some long standing issues and aim to be fast.

Code is feature complete, no todo's are left other than bugfixes.

Change is rather large and may look intimidating. Please read description, comment, criticque code and suggest improvements.

New Features

Fully Variable Thickness

New implementation support full range of thickness for Anti-Aliased and Aliased paths.

Imgui.Polyline.Mesh.Thickness.mp4

Support for ImDrawListFlags_AllowVtxOffset

With 16-bit indices single command can handle up to 65536 vertices. Huge polylines are now split across multiple draw commands if necessary.

Cap Types

Support for caps on open polylines.

ImDrawFlags_CapNone ImDrawFlags_CapButt
(new default)
ImDrawFlags_CapSquare ImDrawFlags_CapRound
AA
Mesh
image image image image
AA
1:1
image image image image
Aliased
Mesh
image image image image
Aliased
1:1
image image image image

Important notes:

  • ImDrawFlags_CapButt is new default
  • ImDrawFlags_CapNone match old behavior which leave anti-aliased polyline without a cap
  • ImDrawFlags_CapButt is new default, this make all edges on checkbox anti-aliased
    image
  • ImDrawFlags_CapRound is pretty yet expensive in relation to other caps types
  • ImDrawFlags_CapRound use adaptive rendering via PathArcTo

Join Types

Support for more join types.

Edit: Rounded joins (and caps) are not more refined, please see #7972 (comment) below.

ImDrawFlags_JoinMiter
(default)
ImDrawFlags_JoinMiterClip ImDrawFlags_JoinBevel ImDrawFlags_JoinRound
AA
Mesh
image image image image
AA
1:1
image image image image
Aliased
Mesh
image image image image
Aliased
1:1
image image image image

Important notes:

  • ImDrawFlags_JoinMiter collapse to ImDrawFlags_JoinBevel when miter limit is exceeded (see below)
  • ImDrawFlags_JoinMiterClip provide smooth transition between Miter and Bevel, it is bit more expensive
  • ImDrawFlags_JoinBevel is require bit more math for Anti-Aliased polyline
  • ImDrawFlags_JoinRound is pretty yet expensive in relation to other join types
  • ImDrawFlags_JoinRound use adaptive rendering via PathArcTo

Miter Limit

Miter distance count from the control point to very tip of the triangle forming a join. Sharpness of the join can be limited by using 'Miter Limit'. Miter does collapse to Bevel, MiterClip does smooth transition to Bevel.

Miter limit match SVG2 stroke-miterlimit behavior.

image
(borrowed from w3.org)

AddPolyline gained new parameter miter_limit:

IMGUI_API void  AddPolyline(const ImVec2* points, int num_points, ImU32 col, ImDrawFlags flags, float thickness, float miter_limit = -1.0f);

Default Miter Limit is set 4.0 in ImDrawListSharedData (internal API) and not exposed to the user.

Fallback

Previous implementation of polyline is still available under different name:

IMGUI_API void  AddPolylineLegacy(const ImVec2* points, int num_points, ImU32 col, ImDrawFlags flags, float thickness);

New implementation does change behavior. ImDrawListFlags_LegacyPolyline flag is a gateway to opt-in to old rendering.

New fancy tools are new and fancy, but also not mature. Setting ImDrawListFlags_LegacyPolyline on ImDrawList will cause PathStroke() to route all rendering to old AddPolyline(). This in turn for all practicall purposes will route rendering of all primitives to old code path.

Reasons to set ImDrawListFlags_LegacyPolyline flag:

  • When you see noticable dips in performance. New implementation does not use texture based rendering, which make AddPolylineLegacy faster and generate less geometry.
  • When new behavior cause issues in rendering. AddPolylineLegacy does not support caps and can generate wrong geometry on acute angles.

Related issues

#2183 Bug in drawing thick antialiased polylines

Status: Fixed ✅

Old New
image image

#3366 Polyline with sharp angles causes segments to nearly disappear

Status: Fixed ✅

Old New (default) New (large Miter Limit)
image image image

#4091 Optimize IM_NORMALIZE2F_OVER_ZERO

This PR does add IM_NORMALIZE2F_OVER_ZERO_PRECISE() that is built on top of SSE2 _mm_rsqrt intrinsics. SSE2 states that this function is an approximation of 1.0f / sqrt(x) and does have relatively large error. New macro does use new ImRsqrtPrecise() instead of ImRsqrt.

ImRsqrtPrecise() improve precission by converging on answer by performing single step Newton-Raphson method. Exackly like in famous rsqrt found in Quake source code. This single step is enough to keep geometry consistent and to not produce cosines larger than 1.

On debug (and non SSE2) builds ImRsqrtPrecise fallback to 1.0f / sqrt(x), because this is faster than to perform NR step in unoptimized code. On release SSE2 builds this does use method described above.

Performance

Early measurements compare old AddPolyline() with and without texture polylines against new implementation.
I was unable to quickly test #2964, because it crashed on some tests.

Observations:

Please draw your own conclusions, or beter do your own benchmark to see change in real world.

Debug Release
image image

Note:

  • Old AddPolyline() does sometimes produce broken meshes in some tests due to lack of ImDrawListFlags_AllowVtxOffset support

Implementation notes

General:

  • Implementation see rather heavey use of macros, all are names like IM_POLYLINE_xxx
  • They should improve readability and see code flow
  • IM_POLYLINE_TRIANGLE_xxx are overly verbose to experiment with different methods of filling index buffer
  • Will be simplified to drop unused arguments after we settle on one particular implementation
  • Uses as little local variables as possible
  • accessing them on Debug builds does generate extra mov instructions
  • computations are done as expressions with intent to end with single assignment to final location in the buffer
  • all _PolylineXXX are basically for loops with a bit of math at the beginning and 'switch' routing to selected join geometry generation code
  • core responsible for generating geometry is commented with 'ascii art' explaining where magic values in indices came from
  • code paths are annotated with IM_LIKELY and IM_UNLIKELY which are resolved to [[likely]] and [[unlikely]] when available
  • this does guide optimizer, but does not affect Debug builds

New implementation does split problem to three cases:

  • _PolylineThinAntiAliased - for thickness <= 1, technically less than AA fringe
  • skips generating solid core geometry
  • skips MiterClip logic all together (fringe is 1 pixel thick so jump from Miter to Bevel will not be noticable in most cases)
  • _PolylineThickAntiAliased - for thickness > 1
  • presence of AA fringe bump complexity of generated geometry, in most extreme cases can emit 17 vertices per join, it is rare
  • _PolylineAliased - used for any thickness, does not have AA fringe

AddPolyline():

  • act as dispatcher calling one of the methods above
  • is responsible of computing polyline normals / segment lengths used by all subroutines
  • With SSE2 computes 4 normals at once, then 2 normals at once and finally fall back to scalar implementation
  • is responsible for sanatization of input values, _PolylineXXX does not need to validate input

Rounted caps and joins:

  • _PolylineEmitArc and _PolylineEmitArcsWithFringe post-processing step responsible for generating adaptive arcs
  • ImAtan2 and Acos are use for every join to feed PathArcTo
  • ImAtan2 is used for caps PathArcTo
  • Implementation definitely can be improved in the future
  • All queued arcs are processed at the end of _PolylineXXX call
  • Arcs are queued in ImDrawList::_Path, they're appended at the end
  • Change format so place for encoded arcs can be pre-allocated to avoid memory corruption

@sergeyn
Copy link
Contributor

sergeyn commented Sep 11, 2024

Badass PR !! Quesiton - is it difficult to get rid of those 2 very small triangles in ImDrawFlags_CapRound joint type ?

@thedmd
Copy link
Contributor Author

thedmd commented Sep 11, 2024

Badass PR !! Quesiton - is it difficult to get rid of those 2 very small triangles in ImDrawFlags_CapRound joint type ?

Vertices in arc come from precomputed table PathArcTo use. That make their position fixed.
Those small triangles does fill the gap between polyline and first triangle.

Replacing PathArcTo with _PathArcToN will generate equally spaced triangles. It will also be alot slower, because it will call cos/sin for every vertex on the arc.

Does those two triangles cause any issues other tham esthetic of zoomed in mesh? : )

I didn't wrote that, that fringe occupy 1px on the screen in 1:1.

@thedmd
Copy link
Contributor Author

thedmd commented Sep 12, 2024

Arcs (rounded corners) was redone:
image

Changes:

  • all uses of ImAtan2() were removed
  • there is now single call to ImCos, ImSin and ImAcos per arch, previously there was two ImCos and ImSin calls
  • single _PolylineEmitArcs handle finge and solid rendering
  • _PolylineEmitArcs now support ImDrawListFlags_AllowVtxOffset too
  • vertices are evently distributted over arc
  • AddPolyline() allocate dedicated space for arcs (next to normals and segment lengths), so _Path is no longer used, which does eliminate potential crash due to reallocation
  • _PolylineEmitArcs generate points by rotating them, PathArcTo is no longer used

Overall this method should be faster, since it does have reduced complexity. Quick measurements show around ~15% performance gain in Release.

This was was how the cookie crumbles.

image

@sergeyn FYI

@thedmd
Copy link
Contributor Author

thedmd commented Sep 12, 2024

Long Jagged (Stroke) - one of test is imgui_test_engine.

Almost 20000 control points in polyline, with JoinRound and CapJoin take under 1 ms (2.3 ms in Debug).
~160k vertices (three separate draw commands), 360k indices (120k triangles).

With default join (miter) and cap (butt) 0.16 ms in Release and 0.5 ms in Debug. Visually there are no noticable changes for such thin lines.

image

@thedmd
Copy link
Contributor Author

thedmd commented Sep 12, 2024

New implementation does fix rendering of aliased polylines.

Old New
image image

Also emitting correct geometry takes precedence over generating overlapping geometry:

Old New
image image

@sergeyn
Copy link
Contributor

sergeyn commented Sep 12, 2024 via email

@thedmd
Copy link
Contributor Author

thedmd commented Sep 13, 2024

Update:

  • allocation of TempBuffer is rounded up to make enough space for SSE2 write operations
  • IM_POLYLINE_xxx macros are simplified
  • IM_POLYLINE_xxx now allow to compare using local copy of write pointers (ex. _VtxWritePtr) with direct access to ImDrawList

Observations:

  • using local copy of write pointers gained 15% performance in Release and 7% in Debug configuration
  • overall new implementation is 9% slower (both in Debug and Release)
  • majority of time difference is in AddRect(), which is potential

@thedmd
Copy link
Contributor Author

thedmd commented Sep 29, 2024

I did push dedicated implementation of AddRect() to the PR as requested. This is second iteration. Feedback to how it can be simplified/cleaned up is welcome.

Quick summary:

  • AddRect() does support Anti-Aliased (with and without texture AA) and Aliased drawing
  • Texture based AA is used when possible (and results will not differ from geometry based method)
  • Not-rounded rectangle does have own dedicated code path
  • Rounded rectangle has own function and generalized implementation

Overall new implementation is ~2x faster for 'square' rectangles in both Debug and Release modes.
Rounded rectangles are bit slower in Debug mode, but faster in Release (please see benchmark results below).

New implementation took care of few issues:

  • overdraw with thick rectangles
  • 'jagged' aliased drawing
Imgui.Polyline.Add.Rect.mp4

'Upstream (legacy)' is current state of the master branch, New V2 is new implementation present in this PR.

@ocornut FYI

Benchmark (Stress: x5, Build: Release, Samples: 8)

image

Test Name Branch Avg ms Min ms Max ms VS Baseline
perf_draw_prim_rect_rounded_stroke master 0.73 0.67 0.76 baseline
perf_draw_prim_rect_rounded_stroke_thick master 0.73 0.69 0.77 baseline
perf_draw_prim_rect_stroke master 0.22 0.16 0.25 baseline
perf_draw_prim_rect_stroke_thick master 0.22 0.19 0.26 baseline
perf_draw_prim_rect_rounded_stroke thedmd/polyline_new 0.76 0.72 0.78 +4.38% (slower)
perf_draw_prim_rect_rounded_stroke_thick thedmd/polyline_new 0.98 0.91 1.02 +34.09% (slower)
perf_draw_prim_rect_stroke thedmd/polyline_new 0.11 0.07 0.12 -52.27% (faster)
perf_draw_prim_rect_stroke_thick thedmd/polyline_new 0.13 0.11 0.16 -41.20% (faster)

Benchmark (Stress: x5, Build: Debug, Samples: 8)

image

Test Name Branch Avg ms Min ms Max ms VS Baseline
perf_draw_prim_rect_rounded_stroke master 3.68 3.57 3.79 baseline
perf_draw_prim_rect_rounded_stroke_thick master 3.72 3.60 3.81 baseline
perf_draw_prim_rect_stroke master 1.20 1.11 1.29 baseline
perf_draw_prim_rect_stroke_thick master 1.19 1.15 1.29 baseline
perf_draw_prim_rect_rounded_stroke thedmd/polyline_new 2.40 2.31 2.48 -34.76% (faster)
perf_draw_prim_rect_rounded_stroke_thick thedmd/polyline_new 3.15 3.06 3.28 -15.10% (faster)
perf_draw_prim_rect_stroke thedmd/polyline_new 0.41 0.26 0.45 -66.13% (faster)
perf_draw_prim_rect_stroke_thick thedmd/polyline_new 0.44 0.40 0.49 -63.44% (faster)

@ocornut ocornut force-pushed the thedmd/polyline_new branch 2 times, most recently from 0e6bb15 to 14f9075 Compare September 30, 2024 17:26
@soufianekhiat
Copy link

Out of curiosity, what is the direction of the fringe for AA. Based on the video and image. It's bidirectionnal.
It make sense for subpixel line. But less (to me) for thicker line, intuitively the fringe should be for thickline only outside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants