Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arabic crown letters #833

Draft
wants to merge 37 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
34ae5a4
UnicodeData.txt lines from the proposal
eggrobin May 23, 2024
7368399
No U+
eggrobin May 23, 2024
f6d3c49
Too many semicolons
eggrobin May 23, 2024
236d30e
lb=AL for the letters, lb=CM for the crown
eggrobin May 23, 2024
7341dfb
Arabic
eggrobin May 23, 2024
aeb473d
ArabicShaping.txt from the proposal
eggrobin May 23, 2024
b8d87c7
New Joining_Groups
eggrobin May 23, 2024
bfeaa08
Regenerate UCD
eggrobin May 23, 2024
9e03935
GenerateEnums
eggrobin May 23, 2024
d66387a
Updated ArabicShaping.txt
eggrobin May 23, 2024
b88f1ae
More Joining_Groups
eggrobin May 23, 2024
a22c9d5
Regenerate UCD
eggrobin May 23, 2024
262a782
GenerateEnums
eggrobin May 23, 2024
e80750d
Move the security invariants to their own CI check
eggrobin May 24, 2024
98a04f9
run
eggrobin May 24, 2024
a9f1fd9
cd
eggrobin May 24, 2024
de244d2
Now add the file
eggrobin May 24, 2024
dbd9631
Merge branch 'separate-security-invariants' into arabic-crown-letters
eggrobin May 24, 2024
c5790b1
Bring back accidentally removed ArabicShaping.txt lines
eggrobin May 24, 2024
f175b95
MCM
eggrobin May 24, 2024
6eeed98
Regenerate UCD
eggrobin May 24, 2024
f28d02c
EMIT_GITHUB_ERRORS
eggrobin May 24, 2024
799e0fb
Merge branch 'separate-security-invariants' into arabic-crown-letters
eggrobin May 24, 2024
bd8a202
Do not use ICU property values
eggrobin May 24, 2024
f02b9b9
emit errors for the right file
eggrobin May 24, 2024
1c4752a
Merge branch 'separate-security-invariants' into arabic-crown-letters
eggrobin May 24, 2024
536954f
Merge branch 'icu-version-mismatch' into arabic-crown-letters
eggrobin May 24, 2024
cb7ce67
hoist
eggrobin May 24, 2024
91ef3cb
Merge remote-tracking branch 'la-vache/main' into icu-version-mismatch
eggrobin May 24, 2024
af812b9
Merge branch 'icu-version-mismatch' into arabic-crown-letters
eggrobin May 24, 2024
f67290f
Merge remote-tracking branch 'la-vache/main' into arabic-crown-letters
eggrobin May 25, 2024
ddef6d4
Merge remote-tracking branch 'la-vache/main' into separate-security-i…
eggrobin May 25, 2024
c35bc5b
Merge branch 'separate-security-invariants' into arabic-crown-letters
eggrobin May 25, 2024
c94962a
deduplicate
eggrobin May 25, 2024
4d60844
Merge remote-tracking branch 'la-vache/main' into arabic-crown-letters
eggrobin Jun 6, 2024
835cdda
Merge remote-tracking branch 'la-vache/main' into arabic-crown-letters
eggrobin Jun 7, 2024
3e5b4f0
Merge remote-tracking branch 'la-vache/main' into arabic-crown-letters
eggrobin Oct 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions unicodetools/data/ucd/dev/ArabicShaping.txt
Original file line number Diff line number Diff line change
Expand Up @@ -850,6 +850,28 @@ A873; PHAGS-PA CANDRABINDU; U; No_Joining_Group
10EC2; DAL WITH VERTICAL 2 DOTS BELOW; R; DAL
eggrobin marked this conversation as resolved.
Show resolved Hide resolved
10EC3; TAH WITH VERTICAL 2 DOTS BELOW; D; TAH
10EC4; KAF WITH VERTICAL 2 DOTS BELOW; D; KAF
10ED9; CROWN BEH; L; CROWN BEH
10EDA; DOTLESS CROWN BEH WITH 3 DOTS BELOW; L; CROWN BEH
10EDB; DOTLESS CROWN BEH WITH 2 DOTS ABOVE; L; CROWN BEH
10EDC; DOTLESS CROWN BEH WITH 3 DOTS ABOVE; L; CROWN BEH
10EDD; CROWN HAH WITH DOT BELOW; L; CROWN HAH
10EDE; CROWN HAH; L; CROWN HAH
10EDF; CROWN HAH WITH DOT ABOVE; L; CROWN HAH
10EE0; CROWN SEEN; L; CROWN SEEN
10EE1; CROWN SEEN WITH 3 DOTS ABOVE; L; CROWN SEEN
10EE2; CROWN SAD; L; CROWN SAD
10EE3; CROWN SAD WITH DOT ABOVE; L; CROWN SAD
10EE4; CROWN TAH; L; CROWN TAH
10EE5; CROWN TAH WITH DOT ABOVE; L; CROWN TAH
10EE6; CROWN AIN; L; CROWN AIN
10EE7; CROWN AIN WITH DOT ABOVE; L; CROWN AIN
10EE8; CROWN FEH; L; CROWN FEH
10EE9; DOTLESS CROWN FEH WITH TWO DOTS ABOVE; L; CROWN FEH
10EEA; CROWN KAF; L; CROWN KAF
10EEB; CROWN MEEM; L; CROWN MEEM
10EEC; DOTLESS CROWN BEH WITH DOT ABOVE; L; CROWN BEH
10EED; CROWN HEH; L; CROWN HEH
10EEE; DOTLESS CROWN BEH WITH 2 DOTS BELOW; L; CROWN BEH

# Sogdian Characters

Expand Down
6 changes: 4 additions & 2 deletions unicodetools/data/ucd/dev/DerivedAge.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# DerivedAge-16.0.0.txt

Check warning on line 1 in unicodetools/data/ucd/dev/DerivedAge.txt

View workflow job for this annotation

GitHub Actions / Draft unless approved

Not in the 16.0 pipeline

These characters are neither accepted for Unicode 16.0, nor for any specific version of Unicode, nor are they provisionally assigned. The Age property values for new characters are likely incorrect right now. They will be recomputed after the UTC accepts their encoding and this pull request is updated for the target version.

Check warning on line 1 in unicodetools/data/ucd/dev/DerivedAge.txt

View workflow job for this annotation

GitHub Actions / Draft unless approved

Not in the 16.0 pipeline

These characters are neither accepted for Unicode 16.0, nor for any specific version of Unicode, nor are they provisionally assigned. The Age property values for new characters are likely incorrect right now. They will be recomputed after the UTC accepts their encoding and this pull request is updated for the target version.

Check warning on line 1 in unicodetools/data/ucd/dev/DerivedAge.txt

View workflow job for this annotation

GitHub Actions / Draft unless approved

Not in the 16.0 pipeline

These characters are neither accepted for Unicode 16.0, nor for any specific version of Unicode, nor are they provisionally assigned. The Age property values for new characters are likely incorrect right now. They will be recomputed after the UTC accepts their encoding and this pull request is updated for the target version.
# Date: 2024-04-30, 21:48:12 GMT
# Date: 2024-05-23, 16:13:47 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -2022,6 +2022,8 @@
10D69..10D85 ; 16.0 # [29] GARAY VOWEL SIGN E..GARAY SMALL LETTER OLD NA
10D8E..10D8F ; 16.0 # [2] GARAY PLUS SIGN..GARAY MINUS SIGN
10EC2..10EC4 ; 16.0 # [3] ARABIC LETTER DAL WITH TWO DOTS VERTICALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS VERTICALLY BELOW
10ED9..10EEE ; 16.0 # [22] ARABIC CROWN LETTER BEH..ARABIC CROWN LETTER YEH
10EF9 ; 16.0 # ARABIC CROWN
10EFC ; 16.0 # ARABIC COMBINING ALEF OVERLAY
11380..11389 ; 16.0 # [10] TULU-TIGALARI LETTER A..TULU-TIGALARI LETTER VOCALIC LL
1138B ; 16.0 # TULU-TIGALARI LETTER EE
Expand Down Expand Up @@ -2057,6 +2059,6 @@
1FAE9 ; 16.0 # FACE WITH BAGS UNDER EYES
1FBCB..1FBEF ; 16.0 # [37] WHITE CROSS MARK..TOP LEFT JUSTIFIED LOWER RIGHT QUARTER BLACK CIRCLE

# Total code points: 5185
# Total code points: 5208

# EOF
31 changes: 21 additions & 10 deletions unicodetools/data/ucd/dev/DerivedCoreProperties.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# DerivedCoreProperties-16.0.0.txt
# Date: 2024-05-02, 15:02:37 GMT
# Date: 2024-05-23, 16:14:19 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -1053,6 +1053,7 @@ FFDA..FFDC ; Alphabetic # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANG
10EAB..10EAC ; Alphabetic # Mn [2] YEZIDI COMBINING HAMZA MARK..YEZIDI COMBINING MADDA MARK
10EB0..10EB1 ; Alphabetic # Lo [2] YEZIDI LETTER LAM WITH DOT ABOVE..YEZIDI LETTER YOT WITH CIRCUMFLEX ABOVE
10EC2..10EC4 ; Alphabetic # Lo [3] ARABIC LETTER DAL WITH TWO DOTS VERTICALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS VERTICALLY BELOW
10ED9..10EEE ; Alphabetic # Lo [22] ARABIC CROWN LETTER BEH..ARABIC CROWN LETTER YEH
10EFC ; Alphabetic # Mn ARABIC COMBINING ALEF OVERLAY
10F00..10F1C ; Alphabetic # Lo [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL
10F27 ; Alphabetic # Lo OLD SOGDIAN LIGATURE AYIN-DALETH
Expand Down Expand Up @@ -1441,7 +1442,7 @@ FFDA..FFDC ; Alphabetic # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANG
30000..3134A ; Alphabetic # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
31350..323AF ; Alphabetic # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF

# Total code points: 142759
# Total code points: 142781

# ================================================

Expand Down Expand Up @@ -3350,6 +3351,7 @@ FFF9..FFFB ; Case_Ignorable # Cf [3] INTERLINEAR ANNOTATION ANCHOR..INTERLI
10D69..10D6D ; Case_Ignorable # Mn [5] GARAY VOWEL SIGN E..GARAY CONSONANT NASALIZATION MARK
10D6F ; Case_Ignorable # Lm GARAY REDUPLICATION MARK
10EAB..10EAC ; Case_Ignorable # Mn [2] YEZIDI COMBINING HAMZA MARK..YEZIDI COMBINING MADDA MARK
10EF9 ; Case_Ignorable # Mn ARABIC CROWN
10EFC..10EFF ; Case_Ignorable # Mn [4] ARABIC COMBINING ALEF OVERLAY..ARABIC SMALL LOW WORD MADDA
10F46..10F50 ; Case_Ignorable # Mn [11] SOGDIAN COMBINING DOT BELOW..SOGDIAN COMBINING STROKE BELOW
10F82..10F85 ; Case_Ignorable # Mn [4] OLD UYGHUR COMBINING DOT ABOVE..OLD UYGHUR COMBINING TWO DOTS BELOW
Expand Down Expand Up @@ -3505,7 +3507,7 @@ E0001 ; Case_Ignorable # Cf LANGUAGE TAG
E0020..E007F ; Case_Ignorable # Cf [96] TAG SPACE..CANCEL TAG
E0100..E01EF ; Case_Ignorable # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256

# Total code points: 2749
# Total code points: 2750

# ================================================

Expand Down Expand Up @@ -6729,6 +6731,7 @@ FFDA..FFDC ; ID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL
10E80..10EA9 ; ID_Start # Lo [42] YEZIDI LETTER ELIF..YEZIDI LETTER ET
10EB0..10EB1 ; ID_Start # Lo [2] YEZIDI LETTER LAM WITH DOT ABOVE..YEZIDI LETTER YOT WITH CIRCUMFLEX ABOVE
10EC2..10EC4 ; ID_Start # Lo [3] ARABIC LETTER DAL WITH TWO DOTS VERTICALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS VERTICALLY BELOW
10ED9..10EEE ; ID_Start # Lo [22] ARABIC CROWN LETTER BEH..ARABIC CROWN LETTER YEH
10F00..10F1C ; ID_Start # Lo [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL
10F27 ; ID_Start # Lo OLD SOGDIAN LIGATURE AYIN-DALETH
10F30..10F45 ; ID_Start # Lo [22] SOGDIAN LETTER ALEPH..SOGDIAN INDEPENDENT SHIN
Expand Down Expand Up @@ -6962,7 +6965,7 @@ FFDA..FFDC ; ID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGUL
30000..3134A ; ID_Start # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
31350..323AF ; ID_Start # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF

# Total code points: 141269
# Total code points: 141291

# ================================================

Expand Down Expand Up @@ -7895,6 +7898,8 @@ FFDA..FFDC ; ID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HAN
10EAB..10EAC ; ID_Continue # Mn [2] YEZIDI COMBINING HAMZA MARK..YEZIDI COMBINING MADDA MARK
10EB0..10EB1 ; ID_Continue # Lo [2] YEZIDI LETTER LAM WITH DOT ABOVE..YEZIDI LETTER YOT WITH CIRCUMFLEX ABOVE
10EC2..10EC4 ; ID_Continue # Lo [3] ARABIC LETTER DAL WITH TWO DOTS VERTICALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS VERTICALLY BELOW
10ED9..10EEE ; ID_Continue # Lo [22] ARABIC CROWN LETTER BEH..ARABIC CROWN LETTER YEH
10EF9 ; ID_Continue # Mn ARABIC CROWN
10EFC..10EFF ; ID_Continue # Mn [4] ARABIC COMBINING ALEF OVERLAY..ARABIC SMALL LOW WORD MADDA
10F00..10F1C ; ID_Continue # Lo [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL
10F27 ; ID_Continue # Lo OLD SOGDIAN LIGATURE AYIN-DALETH
Expand Down Expand Up @@ -8370,7 +8375,7 @@ FFDA..FFDC ; ID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HAN
31350..323AF ; ID_Continue # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
E0100..E01EF ; ID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256

# Total code points: 144541
# Total code points: 144564

# ================================================

Expand Down Expand Up @@ -8915,6 +8920,7 @@ FFDA..FFDC ; XID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGU
10E80..10EA9 ; XID_Start # Lo [42] YEZIDI LETTER ELIF..YEZIDI LETTER ET
10EB0..10EB1 ; XID_Start # Lo [2] YEZIDI LETTER LAM WITH DOT ABOVE..YEZIDI LETTER YOT WITH CIRCUMFLEX ABOVE
10EC2..10EC4 ; XID_Start # Lo [3] ARABIC LETTER DAL WITH TWO DOTS VERTICALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS VERTICALLY BELOW
10ED9..10EEE ; XID_Start # Lo [22] ARABIC CROWN LETTER BEH..ARABIC CROWN LETTER YEH
10F00..10F1C ; XID_Start # Lo [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL
10F27 ; XID_Start # Lo OLD SOGDIAN LIGATURE AYIN-DALETH
10F30..10F45 ; XID_Start # Lo [22] SOGDIAN LETTER ALEPH..SOGDIAN INDEPENDENT SHIN
Expand Down Expand Up @@ -9148,7 +9154,7 @@ FFDA..FFDC ; XID_Start # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HANGU
30000..3134A ; XID_Start # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
31350..323AF ; XID_Start # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF

# Total code points: 141246
# Total code points: 141268

# ================================================

Expand Down Expand Up @@ -10082,6 +10088,8 @@ FFDA..FFDC ; XID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HA
10EAB..10EAC ; XID_Continue # Mn [2] YEZIDI COMBINING HAMZA MARK..YEZIDI COMBINING MADDA MARK
10EB0..10EB1 ; XID_Continue # Lo [2] YEZIDI LETTER LAM WITH DOT ABOVE..YEZIDI LETTER YOT WITH CIRCUMFLEX ABOVE
10EC2..10EC4 ; XID_Continue # Lo [3] ARABIC LETTER DAL WITH TWO DOTS VERTICALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS VERTICALLY BELOW
10ED9..10EEE ; XID_Continue # Lo [22] ARABIC CROWN LETTER BEH..ARABIC CROWN LETTER YEH
10EF9 ; XID_Continue # Mn ARABIC CROWN
10EFC..10EFF ; XID_Continue # Mn [4] ARABIC COMBINING ALEF OVERLAY..ARABIC SMALL LOW WORD MADDA
10F00..10F1C ; XID_Continue # Lo [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL
10F27 ; XID_Continue # Lo OLD SOGDIAN LIGATURE AYIN-DALETH
Expand Down Expand Up @@ -10557,7 +10565,7 @@ FFDA..FFDC ; XID_Continue # Lo [3] HALFWIDTH HANGUL LETTER EU..HALFWIDTH HA
31350..323AF ; XID_Continue # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF
E0100..E01EF ; XID_Continue # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256

# Total code points: 144522
# Total code points: 144545

# ================================================

Expand Down Expand Up @@ -10869,6 +10877,7 @@ FF9E..FF9F ; Grapheme_Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK.
10D24..10D27 ; Grapheme_Extend # Mn [4] HANIFI ROHINGYA SIGN HARBAHAY..HANIFI ROHINGYA SIGN TASSI
10D69..10D6D ; Grapheme_Extend # Mn [5] GARAY VOWEL SIGN E..GARAY CONSONANT NASALIZATION MARK
10EAB..10EAC ; Grapheme_Extend # Mn [2] YEZIDI COMBINING HAMZA MARK..YEZIDI COMBINING MADDA MARK
10EF9 ; Grapheme_Extend # Mn ARABIC CROWN
10EFC..10EFF ; Grapheme_Extend # Mn [4] ARABIC COMBINING ALEF OVERLAY..ARABIC SMALL LOW WORD MADDA
10F46..10F50 ; Grapheme_Extend # Mn [11] SOGDIAN COMBINING DOT BELOW..SOGDIAN COMBINING STROKE BELOW
10F82..10F85 ; Grapheme_Extend # Mn [4] OLD UYGHUR COMBINING DOT ABOVE..OLD UYGHUR COMBINING TWO DOTS BELOW
Expand Down Expand Up @@ -11024,7 +11033,7 @@ FF9E..FF9F ; Grapheme_Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK.
E0020..E007F ; Grapheme_Extend # Cf [96] TAG SPACE..CANCEL TAG
E0100..E01EF ; Grapheme_Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256

# Total code points: 2185
# Total code points: 2186

# ================================================

Expand Down Expand Up @@ -12329,6 +12338,7 @@ FFFC..FFFD ; Grapheme_Base # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEME
10EAD ; Grapheme_Base # Pd YEZIDI HYPHENATION MARK
10EB0..10EB1 ; Grapheme_Base # Lo [2] YEZIDI LETTER LAM WITH DOT ABOVE..YEZIDI LETTER YOT WITH CIRCUMFLEX ABOVE
10EC2..10EC4 ; Grapheme_Base # Lo [3] ARABIC LETTER DAL WITH TWO DOTS VERTICALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS VERTICALLY BELOW
10ED9..10EEE ; Grapheme_Base # Lo [22] ARABIC CROWN LETTER BEH..ARABIC CROWN LETTER YEH
10F00..10F1C ; Grapheme_Base # Lo [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL
10F1D..10F26 ; Grapheme_Base # No [10] OLD SOGDIAN NUMBER ONE..OLD SOGDIAN FRACTION ONE HALF
10F27 ; Grapheme_Base # Lo OLD SOGDIAN LIGATURE AYIN-DALETH
Expand Down Expand Up @@ -12811,7 +12821,7 @@ FFFC..FFFD ; Grapheme_Base # So [2] OBJECT REPLACEMENT CHARACTER..REPLACEME
30000..3134A ; Grapheme_Base # Lo [4939] CJK UNIFIED IDEOGRAPH-30000..CJK UNIFIED IDEOGRAPH-3134A
31350..323AF ; Grapheme_Base # Lo [4192] CJK UNIFIED IDEOGRAPH-31350..CJK UNIFIED IDEOGRAPH-323AF

# Total code points: 152738
# Total code points: 152760

# ================================================

Expand Down Expand Up @@ -13195,6 +13205,7 @@ FF9E..FF9F ; InCB; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HA
10D24..10D27 ; InCB; Extend # Mn [4] HANIFI ROHINGYA SIGN HARBAHAY..HANIFI ROHINGYA SIGN TASSI
10D69..10D6D ; InCB; Extend # Mn [5] GARAY VOWEL SIGN E..GARAY CONSONANT NASALIZATION MARK
10EAB..10EAC ; InCB; Extend # Mn [2] YEZIDI COMBINING HAMZA MARK..YEZIDI COMBINING MADDA MARK
10EF9 ; InCB; Extend # Mn ARABIC CROWN
10EFC..10EFF ; InCB; Extend # Mn [4] ARABIC COMBINING ALEF OVERLAY..ARABIC SMALL LOW WORD MADDA
10F46..10F50 ; InCB; Extend # Mn [11] SOGDIAN COMBINING DOT BELOW..SOGDIAN COMBINING STROKE BELOW
10F82..10F85 ; InCB; Extend # Mn [4] OLD UYGHUR COMBINING DOT ABOVE..OLD UYGHUR COMBINING TWO DOTS BELOW
Expand Down Expand Up @@ -13351,6 +13362,6 @@ FF9E..FF9F ; InCB; Extend # Lm [2] HALFWIDTH KATAKANA VOICED SOUND MARK..HA
E0020..E007F ; InCB; Extend # Cf [96] TAG SPACE..CANCEL TAG
E0100..E01EF ; InCB; Extend # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256

# Total code points: 2184
# Total code points: 2185

# EOF
4 changes: 3 additions & 1 deletion unicodetools/data/ucd/dev/EastAsianWidth.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# EastAsianWidth-16.0.0.txt
# Date: 2024-04-30, 21:48:20 GMT
# Date: 2024-05-23, 16:14:27 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -1964,6 +1964,8 @@ FFFD ; A # So REPLACEMENT CHARACTER
10EAD ; N # Pd YEZIDI HYPHENATION MARK
10EB0..10EB1 ; N # Lo [2] YEZIDI LETTER LAM WITH DOT ABOVE..YEZIDI LETTER YOT WITH CIRCUMFLEX ABOVE
10EC2..10EC4 ; N # Lo [3] ARABIC LETTER DAL WITH TWO DOTS VERTICALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS VERTICALLY BELOW
10ED9..10EEE ; N # Lo [22] ARABIC CROWN LETTER BEH..ARABIC CROWN LETTER YEH
10EF9 ; N # Mn ARABIC CROWN
10EFC..10EFF ; N # Mn [4] ARABIC COMBINING ALEF OVERLAY..ARABIC SMALL LOW WORD MADDA
10F00..10F1C ; N # Lo [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL
10F1D..10F26 ; N # No [10] OLD SOGDIAN NUMBER ONE..OLD SOGDIAN FRACTION ONE HALF
Expand Down
4 changes: 3 additions & 1 deletion unicodetools/data/ucd/dev/LineBreak.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# LineBreak-16.0.0.txt
# Date: 2024-05-11, 16:57:19 GMT
# Date: 2024-05-23, 15:30:57 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -2825,6 +2825,8 @@ FFFD ; AI # So REPLACEMENT CHARACTER
10EAD ; BA # Pd YEZIDI HYPHENATION MARK
10EB0..10EB1 ; AL # Lo [2] YEZIDI LETTER LAM WITH DOT ABOVE..YEZIDI LETTER YOT WITH CIRCUMFLEX ABOVE
10EC2..10EC4 ; AL # Lo [3] ARABIC LETTER DAL WITH TWO DOTS VERTICALLY BELOW..ARABIC LETTER KAF WITH TWO DOTS VERTICALLY BELOW
10ED9..10EEE ; AL # Lo [22] ARABIC CROWN LETTER BEH..ARABIC CROWN LETTER YEH
10EF9 ; CM # Mn ARABIC CROWN
10EFC..10EFF ; CM # Mn [4] ARABIC COMBINING ALEF OVERLAY..ARABIC SMALL LOW WORD MADDA
10F00..10F1C ; AL # Lo [29] OLD SOGDIAN LETTER ALEPH..OLD SOGDIAN LETTER FINAL TAW WITH VERTICAL TAIL
10F1D..10F26 ; AL # No [10] OLD SOGDIAN NUMBER ONE..OLD SOGDIAN FRACTION ONE HALF
Expand Down
4 changes: 3 additions & 1 deletion unicodetools/data/ucd/dev/NormalizationTest.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# NormalizationTest-16.0.0.txt
# Date: 2024-04-30, 21:48:23 GMT
# Date: 2024-05-23, 16:14:34 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -18646,6 +18646,8 @@ FFEE;FFEE;FFEE;25CB;25CB; # (○; ○; ○; ○; ○; ) HALFWIDTH WHITE CIRCLE
0061 10EAB 0315 0300 05AE 0062;0061 05AE 10EAB 0300 0315 0062;0061 05AE 10EAB 0300 0315 0062;0061 05AE 10EAB 0300 0315 0062;0061 05AE 10EAB 0300 0315 0062; # (a◌𐺫◌̕◌̀◌֮b; a◌֮◌𐺫◌̀◌̕b; a◌֮◌𐺫◌̀◌̕b; a◌֮◌𐺫◌̀◌̕b; a◌֮◌𐺫◌̀◌̕b; ) LATIN SMALL LETTER A, YEZIDI COMBINING HAMZA MARK, COMBINING COMMA ABOVE RIGHT, COMBINING GRAVE ACCENT, HEBREW ACCENT ZINOR, LATIN SMALL LETTER B
0061 0315 0300 05AE 10EAC 0062;00E0 05AE 10EAC 0315 0062;0061 05AE 0300 10EAC 0315 0062;00E0 05AE 10EAC 0315 0062;0061 05AE 0300 10EAC 0315 0062; # (a◌̕◌̀◌֮◌𐺬b; à◌֮◌𐺬◌̕b; a◌֮◌̀◌𐺬◌̕b; à◌֮◌𐺬◌̕b; a◌֮◌̀◌𐺬◌̕b; ) LATIN SMALL LETTER A, COMBINING COMMA ABOVE RIGHT, COMBINING GRAVE ACCENT, HEBREW ACCENT ZINOR, YEZIDI COMBINING MADDA MARK, LATIN SMALL LETTER B
0061 10EAC 0315 0300 05AE 0062;0061 05AE 10EAC 0300 0315 0062;0061 05AE 10EAC 0300 0315 0062;0061 05AE 10EAC 0300 0315 0062;0061 05AE 10EAC 0300 0315 0062; # (a◌𐺬◌̕◌̀◌֮b; a◌֮◌𐺬◌̀◌̕b; a◌֮◌𐺬◌̀◌̕b; a◌֮◌𐺬◌̀◌̕b; a◌֮◌𐺬◌̀◌̕b; ) LATIN SMALL LETTER A, YEZIDI COMBINING MADDA MARK, COMBINING COMMA ABOVE RIGHT, COMBINING GRAVE ACCENT, HEBREW ACCENT ZINOR, LATIN SMALL LETTER B
0061 0315 0300 05AE 10EF9 0062;00E0 05AE 10EF9 0315 0062;0061 05AE 0300 10EF9 0315 0062;00E0 05AE 10EF9 0315 0062;0061 05AE 0300 10EF9 0315 0062; # (a◌̕◌̀◌֮◌𐻹b; à◌֮◌𐻹◌̕b; a◌֮◌̀◌𐻹◌̕b; à◌֮◌𐻹◌̕b; a◌֮◌̀◌𐻹◌̕b; ) LATIN SMALL LETTER A, COMBINING COMMA ABOVE RIGHT, COMBINING GRAVE ACCENT, HEBREW ACCENT ZINOR, ARABIC CROWN, LATIN SMALL LETTER B
0061 10EF9 0315 0300 05AE 0062;0061 05AE 10EF9 0300 0315 0062;0061 05AE 10EF9 0300 0315 0062;0061 05AE 10EF9 0300 0315 0062;0061 05AE 10EF9 0300 0315 0062; # (a◌𐻹◌̕◌̀◌֮b; a◌֮◌𐻹◌̀◌̕b; a◌֮◌𐻹◌̀◌̕b; a◌֮◌𐻹◌̀◌̕b; a◌֮◌𐻹◌̀◌̕b; ) LATIN SMALL LETTER A, ARABIC CROWN, COMBINING COMMA ABOVE RIGHT, COMBINING GRAVE ACCENT, HEBREW ACCENT ZINOR, LATIN SMALL LETTER B
0061 059A 0316 1DFA 10EFD 0062;0061 1DFA 0316 10EFD 059A 0062;0061 1DFA 0316 10EFD 059A 0062;0061 1DFA 0316 10EFD 059A 0062;0061 1DFA 0316 10EFD 059A 0062; # (a◌֚◌̖◌᷺◌𐻽b; a◌᷺◌̖◌𐻽◌֚b; a◌᷺◌̖◌𐻽◌֚b; a◌᷺◌̖◌𐻽◌֚b; a◌᷺◌̖◌𐻽◌֚b; ) LATIN SMALL LETTER A, HEBREW ACCENT YETIV, COMBINING GRAVE ACCENT BELOW, COMBINING DOT BELOW LEFT, ARABIC SMALL LOW WORD SAKTA, LATIN SMALL LETTER B
0061 10EFD 059A 0316 1DFA 0062;0061 1DFA 10EFD 0316 059A 0062;0061 1DFA 10EFD 0316 059A 0062;0061 1DFA 10EFD 0316 059A 0062;0061 1DFA 10EFD 0316 059A 0062; # (a◌𐻽◌֚◌̖◌᷺b; a◌᷺◌𐻽◌̖◌֚b; a◌᷺◌𐻽◌̖◌֚b; a◌᷺◌𐻽◌̖◌֚b; a◌᷺◌𐻽◌̖◌֚b; ) LATIN SMALL LETTER A, ARABIC SMALL LOW WORD SAKTA, HEBREW ACCENT YETIV, COMBINING GRAVE ACCENT BELOW, COMBINING DOT BELOW LEFT, LATIN SMALL LETTER B
0061 059A 0316 1DFA 10EFE 0062;0061 1DFA 0316 10EFE 059A 0062;0061 1DFA 0316 10EFE 059A 0062;0061 1DFA 0316 10EFE 059A 0062;0061 1DFA 0316 10EFE 059A 0062; # (a◌֚◌̖◌᷺◌𐻾b; a◌᷺◌̖◌𐻾◌֚b; a◌᷺◌̖◌𐻾◌֚b; a◌᷺◌̖◌𐻾◌֚b; a◌᷺◌̖◌𐻾◌֚b; ) LATIN SMALL LETTER A, HEBREW ACCENT YETIV, COMBINING GRAVE ACCENT BELOW, COMBINING DOT BELOW LEFT, ARABIC SMALL LOW WORD QASR, LATIN SMALL LETTER B
Expand Down
Loading
Loading