Commit graph

6 commits

Author SHA1 Message Date
Linus Torvalds
231825b2e1 Revert "unicode: Don't special case ignorable code points"
This reverts commit 5c26d2f1d3.

It turns out that we can't do this, because while the old behavior of
ignoring ignorable code points was most definitely wrong, we have
case-folding filesystems with on-disk hash values with that wrong
behavior.

So now you can't look up those names, because they hash to something
different.

Of course, it's also entirely possible that in the meantime people have
created *new* files with the new ("more correct") case folding logic,
and reverting will just make other things break.

The correct solution is to not do case folding in filesystems, but
sadly, people seem to never really understand that.  People still see it
as a feature, not a bug.

Reported-by: Qi Han <hanqi@vivo.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586
Cc: Gabriel Krisman Bertazi <krisman@suse.de>
Requested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-11 14:11:23 -08:00
Linus Torvalds
060fc106b6 unicode updates
This update includes:
 
   - A patch by Thomas Weißschuh constifying a read-only struct.
 
   - A patch by André Almeida fixing the error path of unicode_load,
   which might trigger a kernel oops if it fails to find the unicode
   module.
 
   - One documentation fix by Gan Jie, updating a filename in the README.
 
   - A patch by André Almeida adding the link of my tree to MAINTAINERS.
 
 All but the MAINTAINERS patch have been sitting on my tree and in
 linux-next since early in the 6.12 cycle.
 
 Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS3XO7QfvpFoONBhH1OwQgI3t8RJgUCZ0D4JwAKCRBOwQgI3t8R
 JmjZAP988O9eB4ITF6KHKsHyY3pOhxSRXU5jpr78v7ofDDuGwAD/UBJZyF35wgJz
 S2q295kCAEP8bUKxj6RJtyMyQnamQg8=
 =irw2
 -----END PGP SIGNATURE-----

Merge tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode

Pull unicode updates from Gabriel Krisman Bertazi:

 - constify a read-only struct (Thomas Weißschuh)

 - fix the error path of unicode_load, avoiding a possible kernel oops
   if it fails to find the unicode module (André Almeida)

 - documentation fix, updating a filename in the README (Gan Jie)

 - add the link of my tree to MAINTAINERS (André Almeida)

* tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  MAINTAINERS: Add Unicode tree
  unicode: change the reference of database file
  unicode: Fix utf8_load() error path
  unicode: constify utf8 data table
2024-11-22 20:50:55 -08:00
Gabriel Krisman Bertazi
5c26d2f1d3 unicode: Don't special case ignorable code points
We don't need to handle them separately. Instead, just let them
decompose/casefold to themselves.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-10-09 13:34:01 -04:00
Thomas Weißschuh
43bf9d9755 unicode: constify utf8 data table
All users already handle the table as const data.
Move the table itself into .rodata to guard against accidental or
malicious modifications.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240809-unicode-const-v1-1-69968a258092@weissschuh.net
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-08-13 15:21:50 -04:00
Jeff Johnson
68318904a7 unicode: add MODULE_DESCRIPTION() macros
Currently 'make W=1' reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/unicode/utf8data.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/unicode/utf8-selftest.o

Add a MODULE_DESCRIPTION() to utf8-selftest.c and utf8data.c_shipped,
and update mkutf8data.c to add a MODULE_DESCRIPTION() to any future
generated utf8data file.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240524-md-unicode-v1-1-e2727ce8574d@quicinc.com
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-06-20 19:30:02 -04:00
Christoph Hellwig
2b3d047870 unicode: Add utf8-data module
utf8data.h contains a large database table which is an auto-generated
decodification trie for the unicode normalization functions.

Allow building it into a separate module.

Based on a patch from Shreeya Patel <shreeya.patel@collabora.com>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
2021-10-12 11:41:39 -03:00
Renamed from fs/unicode/utf8data.h_shipped (Browse further)