linux/drivers/gpu/drm/vkms/vkms_formats.h

22 lines
593 B
C
Raw Permalink Normal View History

drm: vkms: Refactor the plane composer to accept new formats Currently the blend function only accepts XRGB_8888 and ARGB_8888 as a color input. This patch refactors all the functions related to the plane composition to overcome this limitation. The pixels blend is done using the new internal format. And new handlers are being added to convert a specific format to/from this internal format. So the blend operation depends on these handlers to convert to this common format. The blended result, if necessary, is converted to the writeback buffer format. This patch introduces three major differences to the blend function. 1 - All the planes are blended at once. 2 - The blend calculus is done as per line instead of per pixel. 3 - It is responsible to calculates the CRC and writing the writeback buffer(if necessary). These changes allow us to allocate way less memory in the intermediate buffer to compute these operations. Because now we don't need to have the entire intermediate image lines at once, just one line is enough. | Memory consumption (output dimensions) | |:--------------------------------------:| | Current | This patch | |:------------------:|:-----------------:| | Width * Heigth | 2 * Width | Beyond memory, we also have a minor performance benefit from all these changes. Results running the IGT[1] test `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times: | Frametime | |:------------------------------------------:| | Implementation | Current | This commit | |:---------------:|:---------:|:------------:| | frametime range | 9~22 ms | 5~17 ms | | Average | 11.4 ms | 7.8 ms | [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4 V2: Improves the performance drastically, by performing the operations per-line and not per-pixel(Pekka Paalanen). Minor improvements(Pekka Paalanen). V3: Changes the code to blend the planes all at once. This improves performance, memory consumption, and removes much of the weirdness of the V2(Pekka Paalanen and me). Minor improvements(Pekka Paalanen and me). V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant. V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen). Several security/robustness improvents(Pekka Paalanen). Removes check_planes_x_bounds function and allows partial partly off-screen(Pekka Paalanen). V6: Fix a mismatch of some variable sizes (Pekka Paalanen). Several minor improvements (Pekka Paalanen). Reviewed-by: Melissa Wen <mwen@igalia.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Igor Torrente <igormtorrente@gmail.com> Signed-off-by: Melissa Wen <melissa.srw@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220905190811.25024-7-igormtorrente@gmail.com
2022-09-05 16:08:08 -03:00
/* SPDX-License-Identifier: GPL-2.0+ */
#ifndef _VKMS_FORMATS_H_
#define _VKMS_FORMATS_H_
#include "vkms_drv.h"
drm/vkms: Re-introduce line-per-line composition algorithm Re-introduce a line-by-line composition algorithm for each pixel format. This allows more performance by not requiring an indirection per pixel read. This patch is focused on readability of the code. Line-by-line composition was introduced by [1] but rewritten back to pixel-by-pixel algorithm in [2]. At this time, nobody noticed the impact on performance, and it was merged. This patch is almost a revert of [2], but in addition efforts have been made to increase readability and maintainability of the rotation handling. The blend function is now divided in two parts: - Transformation of coordinates from the output referential to the source referential - Line conversion and blending Most of the complexity of the rotation management is avoided by using drm_rect_* helpers. The remaining complexity is around the clipping, to avoid reading/writing outside source/destination buffers. The pixel conversion is now done line-by-line, so the read_pixel_t was replaced with read_pixel_line_t callback. This way the indirection is only required once per line and per plane, instead of once per pixel and per plane. The read_line_t callbacks are very similar for most pixel format, but it is required to avoid performance impact. Some helpers for color conversion were introduced to avoid code repetition: - *_to_argb_u16: perform colors conversion. They should be inlined by the compiler, and they are used to avoid repetition between multiple variants of the same format (argb/xrgb and maybe in the future for formats like bgr formats). This new algorithm was tested with: - kms_plane (for color conversions) - kms_rotation_crc (for rotations of planes) - kms_cursor_crc (for translations of planes) - kms_rotation (for all rotations and formats combinations) [3] The performance gain was mesured with kms_fb_stress [4] with some modification to fix the writeback format. The performance improvement is around 5 to 10%. [1]: commit 8ba1648567e2 ("drm: vkms: Refactor the plane composer to accept new formats") https://lore.kernel.org/all/20220905190811.25024-7-igormtorrente@gmail.com/ [2]: commit 322d716a3e8a ("drm/vkms: isolate pixel conversion functionality") https://lore.kernel.org/all/20230418130525.128733-2-mcanal@igalia.com/ [3]: https://lore.kernel.org/igt-dev/20240313-new_rotation-v2-0-6230fd5cae59@bootlin.com/ [4]: https://lore.kernel.org/all/20240422-kms_fb_stress-dev-v5-0-0c577163dc88@riseup.net/ Reviewed-by: José Expósito <jose.exposito89@gmail.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241118-yuv-v14-8-2dbc2f1e222c@bootlin.com Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
2024-11-18 19:28:23 +01:00
pixel_read_line_t get_pixel_read_line_function(u32 format);
drm: vkms: Refactor the plane composer to accept new formats Currently the blend function only accepts XRGB_8888 and ARGB_8888 as a color input. This patch refactors all the functions related to the plane composition to overcome this limitation. The pixels blend is done using the new internal format. And new handlers are being added to convert a specific format to/from this internal format. So the blend operation depends on these handlers to convert to this common format. The blended result, if necessary, is converted to the writeback buffer format. This patch introduces three major differences to the blend function. 1 - All the planes are blended at once. 2 - The blend calculus is done as per line instead of per pixel. 3 - It is responsible to calculates the CRC and writing the writeback buffer(if necessary). These changes allow us to allocate way less memory in the intermediate buffer to compute these operations. Because now we don't need to have the entire intermediate image lines at once, just one line is enough. | Memory consumption (output dimensions) | |:--------------------------------------:| | Current | This patch | |:------------------:|:-----------------:| | Width * Heigth | 2 * Width | Beyond memory, we also have a minor performance benefit from all these changes. Results running the IGT[1] test `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times: | Frametime | |:------------------------------------------:| | Implementation | Current | This commit | |:---------------:|:---------:|:------------:| | frametime range | 9~22 ms | 5~17 ms | | Average | 11.4 ms | 7.8 ms | [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4 V2: Improves the performance drastically, by performing the operations per-line and not per-pixel(Pekka Paalanen). Minor improvements(Pekka Paalanen). V3: Changes the code to blend the planes all at once. This improves performance, memory consumption, and removes much of the weirdness of the V2(Pekka Paalanen and me). Minor improvements(Pekka Paalanen and me). V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant. V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen). Several security/robustness improvents(Pekka Paalanen). Removes check_planes_x_bounds function and allows partial partly off-screen(Pekka Paalanen). V6: Fix a mismatch of some variable sizes (Pekka Paalanen). Several minor improvements (Pekka Paalanen). Reviewed-by: Melissa Wen <mwen@igalia.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Igor Torrente <igormtorrente@gmail.com> Signed-off-by: Melissa Wen <melissa.srw@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220905190811.25024-7-igormtorrente@gmail.com
2022-09-05 16:08:08 -03:00
pixel_write_t get_pixel_write_function(u32 format);
drm: vkms: Refactor the plane composer to accept new formats Currently the blend function only accepts XRGB_8888 and ARGB_8888 as a color input. This patch refactors all the functions related to the plane composition to overcome this limitation. The pixels blend is done using the new internal format. And new handlers are being added to convert a specific format to/from this internal format. So the blend operation depends on these handlers to convert to this common format. The blended result, if necessary, is converted to the writeback buffer format. This patch introduces three major differences to the blend function. 1 - All the planes are blended at once. 2 - The blend calculus is done as per line instead of per pixel. 3 - It is responsible to calculates the CRC and writing the writeback buffer(if necessary). These changes allow us to allocate way less memory in the intermediate buffer to compute these operations. Because now we don't need to have the entire intermediate image lines at once, just one line is enough. | Memory consumption (output dimensions) | |:--------------------------------------:| | Current | This patch | |:------------------:|:-----------------:| | Width * Heigth | 2 * Width | Beyond memory, we also have a minor performance benefit from all these changes. Results running the IGT[1] test `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times: | Frametime | |:------------------------------------------:| | Implementation | Current | This commit | |:---------------:|:---------:|:------------:| | frametime range | 9~22 ms | 5~17 ms | | Average | 11.4 ms | 7.8 ms | [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4 V2: Improves the performance drastically, by performing the operations per-line and not per-pixel(Pekka Paalanen). Minor improvements(Pekka Paalanen). V3: Changes the code to blend the planes all at once. This improves performance, memory consumption, and removes much of the weirdness of the V2(Pekka Paalanen and me). Minor improvements(Pekka Paalanen and me). V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant. V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen). Several security/robustness improvents(Pekka Paalanen). Removes check_planes_x_bounds function and allows partial partly off-screen(Pekka Paalanen). V6: Fix a mismatch of some variable sizes (Pekka Paalanen). Several minor improvements (Pekka Paalanen). Reviewed-by: Melissa Wen <mwen@igalia.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Igor Torrente <igormtorrente@gmail.com> Signed-off-by: Melissa Wen <melissa.srw@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220905190811.25024-7-igormtorrente@gmail.com
2022-09-05 16:08:08 -03:00
drm/vkms: Add YUV support Add support to the YUV formats bellow: - NV12/NV16/NV24 - NV21/NV61/NV42 - YUV420/YUV422/YUV444 - YVU420/YVU422/YVU444 The conversion from yuv to rgb is done with fixed-point arithmetic, using 32.32 fixed-point numbers and the drm_fixed helpers. To do the conversion, a specific matrix must be used for each color range (DRM_COLOR_*_RANGE) and encoding (DRM_COLOR_*). This matrix is stored in the `conversion_matrix` struct, along with the specific y_offset needed. This matrix is queried only once, in `vkms_plane_atomic_update` and stored in a `vkms_plane_state`. Those conversion matrices of each encoding and range were obtained by rounding the values of the original conversion matrices multiplied by 2^32. This is done to avoid the use of floating point operations. The same reading function is used for YUV and YVU formats. As the only difference between those two category of formats is the order of field, a simple swap in conversion matrix columns allows using the same function. [Louis Chauvet: - Adapted Arthur's work - Implemented the read_line_t callbacks for yuv - add struct conversion_matrix - store the whole conversion_matrix in the plane state - remove struct pixel_yuv_u8 - update the commit message - Merge the modifications from Arthur] Signed-off-by: Arthur Grillo <arthurgrillo@riseup.net> Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20250415-yuv-v18-2-f2918f71ec4b@bootlin.com Signed-off-by: Louis Chauvet <louis.chauvet@bootlin.com>
2025-04-15 15:55:33 +02:00
void get_conversion_matrix_to_argb_u16(u32 format, enum drm_color_encoding encoding,
enum drm_color_range range,
struct conversion_matrix *matrix);
#if IS_ENABLED(CONFIG_KUNIT)
struct pixel_argb_u16 argb_u16_from_yuv888(u8 y, u8 channel_1, u8 channel_2,
const struct conversion_matrix *matrix);
#endif
drm: vkms: Refactor the plane composer to accept new formats Currently the blend function only accepts XRGB_8888 and ARGB_8888 as a color input. This patch refactors all the functions related to the plane composition to overcome this limitation. The pixels blend is done using the new internal format. And new handlers are being added to convert a specific format to/from this internal format. So the blend operation depends on these handlers to convert to this common format. The blended result, if necessary, is converted to the writeback buffer format. This patch introduces three major differences to the blend function. 1 - All the planes are blended at once. 2 - The blend calculus is done as per line instead of per pixel. 3 - It is responsible to calculates the CRC and writing the writeback buffer(if necessary). These changes allow us to allocate way less memory in the intermediate buffer to compute these operations. Because now we don't need to have the entire intermediate image lines at once, just one line is enough. | Memory consumption (output dimensions) | |:--------------------------------------:| | Current | This patch | |:------------------:|:-----------------:| | Width * Heigth | 2 * Width | Beyond memory, we also have a minor performance benefit from all these changes. Results running the IGT[1] test `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times: | Frametime | |:------------------------------------------:| | Implementation | Current | This commit | |:---------------:|:---------:|:------------:| | frametime range | 9~22 ms | 5~17 ms | | Average | 11.4 ms | 7.8 ms | [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4 V2: Improves the performance drastically, by performing the operations per-line and not per-pixel(Pekka Paalanen). Minor improvements(Pekka Paalanen). V3: Changes the code to blend the planes all at once. This improves performance, memory consumption, and removes much of the weirdness of the V2(Pekka Paalanen and me). Minor improvements(Pekka Paalanen and me). V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant. V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen). Several security/robustness improvents(Pekka Paalanen). Removes check_planes_x_bounds function and allows partial partly off-screen(Pekka Paalanen). V6: Fix a mismatch of some variable sizes (Pekka Paalanen). Several minor improvements (Pekka Paalanen). Reviewed-by: Melissa Wen <mwen@igalia.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Igor Torrente <igormtorrente@gmail.com> Signed-off-by: Melissa Wen <melissa.srw@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220905190811.25024-7-igormtorrente@gmail.com
2022-09-05 16:08:08 -03:00
#endif /* _VKMS_FORMATS_H_ */