linux

1381147 commits 2 branches 894 tags 3.6 GiB

Author	SHA1	Message	Date
Linus Torvalds	231825b2e1	Revert "unicode: Don't special case ignorable code points" This reverts commit `5c26d2f1d3`. It turns out that we can't do this, because while the old behavior of ignoring ignorable code points was most definitely wrong, we have case-folding filesystems with on-disk hash values with that wrong behavior. So now you can't look up those names, because they hash to something different. Of course, it's also entirely possible that in the meantime people have created new files with the new ("more correct") case folding logic, and reverting will just make other things break. The correct solution is to not do case folding in filesystems, but sadly, people seem to never really understand that. People still see it as a feature, not a bug. Reported-by: Qi Han <hanqi@vivo.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219586 Cc: Gabriel Krisman Bertazi <krisman@suse.de> Requested-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-12-11 14:11:23 -08:00
Linus Torvalds	060fc106b6	unicode updates This update includes: - A patch by Thomas Weißschuh constifying a read-only struct. - A patch by André Almeida fixing the error path of unicode_load, which might trigger a kernel oops if it fails to find the unicode module. - One documentation fix by Gan Jie, updating a filename in the README. - A patch by André Almeida adding the link of my tree to MAINTAINERS. All but the MAINTAINERS patch have been sitting on my tree and in linux-next since early in the 6.12 cycle. Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS3XO7QfvpFoONBhH1OwQgI3t8RJgUCZ0D4JwAKCRBOwQgI3t8R JmjZAP988O9eB4ITF6KHKsHyY3pOhxSRXU5jpr78v7ofDDuGwAD/UBJZyF35wgJz S2q295kCAEP8bUKxj6RJtyMyQnamQg8= =irw2 -----END PGP SIGNATURE----- Merge tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode Pull unicode updates from Gabriel Krisman Bertazi: - constify a read-only struct (Thomas Weißschuh) - fix the error path of unicode_load, avoiding a possible kernel oops if it fails to find the unicode module (André Almeida) - documentation fix, updating a filename in the README (Gan Jie) - add the link of my tree to MAINTAINERS (André Almeida) * tag 'unicode-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: MAINTAINERS: Add Unicode tree unicode: change the reference of database file unicode: Fix utf8_load() error path unicode: constify utf8 data table	2024-11-22 20:50:55 -08:00
Gabriel Krisman Bertazi	5c26d2f1d3	unicode: Don't special case ignorable code points We don't need to handle them separately. Instead, just let them decompose/casefold to themselves. Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>	2024-10-09 13:34:01 -04:00
Gan Jie	66715f005b	unicode: change the reference of database file Commit `2b3d047870` ("unicode: Add utf8-data module") changed the database file from 'utf8data.h' to 'utf8data.c' to build separate module, but it seems forgot to update README.utf8data , which may causes confusion. Update the README.utf8data and the default 'UTF8_NAME' in 'mkutf8data.c'. Signed-off-by: Gan Jie <ganjie182@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240912031932.1161-1-ganjie182@gmail.com Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>	2024-09-13 11:23:01 -04:00
Thomas Weißschuh	43bf9d9755	unicode: constify utf8 data table All users already handle the table as const data. Move the table itself into .rodata to guard against accidental or malicious modifications. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240809-unicode-const-v1-1-69968a258092@weissschuh.net Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>	2024-08-13 15:21:50 -04:00
Jeff Johnson	68318904a7	unicode: add MODULE_DESCRIPTION() macros Currently 'make W=1' reports: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/unicode/utf8data.o WARNING: modpost: missing MODULE_DESCRIPTION() in fs/unicode/utf8-selftest.o Add a MODULE_DESCRIPTION() to utf8-selftest.c and utf8data.c_shipped, and update mkutf8data.c to add a MODULE_DESCRIPTION() to any future generated utf8data file. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240524-md-unicode-v1-1-e2727ce8574d@quicinc.com Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>	2024-06-20 19:30:02 -04:00
Christoph Hellwig	2b3d047870	unicode: Add utf8-data module utf8data.h contains a large database table which is an auto-generated decodification trie for the unicode normalization functions. Allow building it into a separate module. Based on a patch from Shreeya Patel <shreeya.patel@collabora.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>	2021-10-12 11:41:39 -03:00
Masahiro Yamada	28ba53c076	unicode: refactor the rule for regenerating utf8data.h scripts/mkutf8data is used only when regenerating utf8data.h, which never happens in the normal kernel build. However, it is irrespectively built if CONFIG_UNICODE is enabled. Moreover, there is no good reason for it to reside in the scripts/ directory since it is only used in fs/unicode/. Hence, move it from scripts/ to fs/unicode/. In some cases, we bypass build artifacts in the normal build. The conventional way to do so is to surround the code with ifdef REGENERATE_. For example, - `7373f4f83c` ("kbuild: add implicit rules for parser generation") - `6aaf49b495` ("crypto: arm,arm64 - Fix random regeneration of S_shipped") I rewrote the rule in a more kbuild'ish style. In the normal build, utf8data.h is just shipped from the check-in file. $ make [ snip ] SHIPPED fs/unicode/utf8data.h CC fs/unicode/utf8-norm.o CC fs/unicode/utf8-core.o CC fs/unicode/utf8-selftest.o AR fs/unicode/built-in.a If you want to generate utf8data.h based on UCD, put .txt files into fs/unicode/, then pass REGENERATE_UTF8DATA=1 from the command line. The mkutf8data tool will be automatically compiled to generate the utf8data.h from the *.txt files. $ make REGENERATE_UTF8DATA=1 [ snip ] HOSTCC fs/unicode/mkutf8data GEN fs/unicode/utf8data.h CC fs/unicode/utf8-norm.o CC fs/unicode/utf8-core.o CC fs/unicode/utf8-selftest.o AR fs/unicode/built-in.a I renamed the check-in utf8data.h to utf8data.h_shipped so that this will work for the out-of-tree build. You can update it based on the latest UCD like this: $ make REGENERATE_UTF8DATA=1 fs/unicode/ $ cp fs/unicode/utf8data.h fs/unicode/utf8data.h_shipped Also, I added entries to .gitignore and dontdiff. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2019-04-28 13:45:36 -04:00

Renamed from scripts/mkutf8data.c (Browse further)

8 commits