From: Hong Minhee (洪 民憙) Date: Sun, 19 Mar 2023 23:09:57 +0000 (+0900) Subject: Let string splitters respect `East_Asian_Width` property (#3445) X-Git-Url: https://git.madduck.net/etc/vim.git/commitdiff_plain/ef6e079901d53a42dfae4ab10b081ce7a73a47b5?ds=sidebyside;hp=ef6e079901d53a42dfae4ab10b081ce7a73a47b5 Let string splitters respect `East_Asian_Width` property (#3445) This patch changes the preview style so that string splitters respect Unicode East Asian Width[^1] property. If you are not familiar to CJK languages it is not clear immediately. Let me elaborate with some examples. Traditionally, East Asian characters (including punctuation) have taken up space twice than European letters and stops when they are rendered in monospace typeset. Compare the following characters: ``` abcdefg. 글、字。 ``` The characters at the first line are half-width, and the second line are full-width. (Also note that the last character with a small circle, the East Asian period, is also full-width.) Therefore, if we want to prevent those full-width characters to exceed the maximum columns per line, we need to count their *width* rather than the number of characters. Again, the following characters: ``` 글、字。 ``` These are just 4 characters, but their total width is 8. Suppose we want to maintain up to 4 columns per line with the following text: ``` abcdefg. 글、字。 ``` How should it be then? We want it to look like: ``` abcd efg. 글、 字。 ``` However, Black currently turns it into like this: ``` abcd efg. 글、字。 ``` It's because Black currently counts the number of characters in the line instead of measuring their width. So, how could we measure the width? How can we tell if a character is full- or half-width? What if half-width characters and full-width ones are mixed in a line? That's why Unicode defined an attribute named `East_Asian_Width`. Unicode grouped every single character according to their width in fixed-width typeset. This partially addresses #1197, but only for string splitters. The other parts need to be fixed as well in future patches. This was implemented by copying rich's own approach to handling wide characters: generate a table using wcwidth, check it into source control, and use in to drive helper functions in Black's logic. This gets us the best of both worlds: accuracy and performance (and let's us update as per our stability policy too!). Co-authored-by: Jelle Zijlstra ---