Commit graph

13 commits

Author SHA1 Message Date
Hugues Fruchet
3f6702e1be media: verisilicon: postproc: 4K support
Support input larger than 4096x2048 using extended input width/height
fields of swreg92.
This is needed to decode large WebP or JPEG pictures.

Signed-off-by: Hugues Fruchet <hugues.fruchet@foss.st.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-07-12 08:34:48 +02:00
Detlev Casanova
11beb0fc34 media: verisilicon: Free post processor buffers on error
During initialization, the post processor allocates the same number of
buffers as the buf queue.
As the init function is called in streamon(), if an allocation fails,
streamon will return an error and streamoff() will not be called, keeping
all post processor buffers allocated.

To avoid that, all post proc buffers are freed in case of an allocation
error.

Fixes: 26711491a8 ("media: verisilicon: Refactor postprocessor to store more buffers")
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2025-05-15 08:13:31 +02:00
Benjamin Gaignard
db300ab0e9 media: verisilicon: Store reference frames pixels format
The Hantro decoder always produces tiled pixel-formats, but when the
post-processor is used, the destination pixel-format is a non-tiled
pixel-format. This causes an incorrect computation of the reference
frame size and offsets. Get and save the correct tiled pixel-format for
8 and 10 bit streams to solve these computation issues.

Fluster VP9 score increase to 166/305 (vs 145/305).
HEVC score is still 141/147.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2024-12-13 19:48:25 +00:00
Benjamin Gaignard
5423e2d220 media: verisilicon: Add reference buffer compression feature
Reference frame compression is a feature added in the G2 decoder to
compress frame buffers so that the bandwidth of storing/loading
reference frames can be reduced, especially with high resolution decoded
streams.

The impact of compressed frames is confirmed when using perf to monitor
the number of memory accesses with or without the compression feature.
The following command:

perf stat -a -e \
imx8_ddr0/cycles/,imx8_ddr0/read-cycles/,imx8_ddr0/write-cycles/ \
gst-launch-1.0 filesrc \
location=Jockey_3840x2160_120fps_420_8bit_HEVC_RAW.hevc ! queue ! \
h265parse ! v4l2slh265dec ! video/x-raw,format=NV12 ! fakesink

Gives us these results without the compression feature:
Performance counter stats for 'system wide':

        1711300345      imx8_ddr0/cycles/
         892207924      imx8_ddr0/read-cycles/
        1291785864      imx8_ddr0/write-cycles/

      13.760048353 seconds time elapsed

With the compression feature:
Performance counter stats for 'system wide':

         274526799      imx8_ddr0/cycles/
         453120194      imx8_ddr0/read-cycles/
         833391434      imx8_ddr0/write-cycles/

      18.257831534 seconds time elapsed

As expected the number of read/write cycles are really lower when
compression is used.

Since storing the compression data requires more memory a module
parameter named 'hevc_use_compression' is used to enable/disable
this feature and, by default, compression isn't used.

Enabling the compression feature means that the output-frames of the
decoder
are stored with a specific compression pixel-format. Since this
pixel format is unknown, this patch restrains the compression feature
usage to the cases where post-processor pixel-formats (NV12 or NV15)
are selected by the applications.

The Fluster compliance HEVC test suite score is still 141/147 with this
patch.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Tested-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-08-28 10:01:10 +02:00
Benjamin Gaignard
26711491a8 media: verisilicon: Refactor postprocessor to store more buffers
Since vb2 queue can store more than VB2_MAX_FRAME buffers, the
postprocessor buffer storage must be capable to store more buffers too.
Change static dec_q array to allocated array to be capable to store
up to queue 'max_num_buffers'.
Keep allocating queue 'num_buffers' at queue setup time but also allows
to allocate postprocessors buffers on the fly.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Reviewed-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
CC: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
CC: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-11-23 12:42:56 +01:00
Benjamin Gaignard
8f661dc7e7 media: verisilicon: Stop direct calls to queue num_buffers field
Use vb2_get_num_buffers() to avoid using queue num_buffers field directly.
This allows us to change how the number of buffers is computed in the
future.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Reviewed-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
CC: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-11-23 12:31:27 +01:00
Marek Vasut
6e481d52d3 media: verisilicon: Do not enable G2 postproc downscale if source is narrower than destination
In case of encoded input VP9 data width that is not multiple of macroblock
size, which is 16 (e.g. 1080x1920 frames, where 1080 is multiple of 8), the
width is padded to be a multiple of macroblock size (for 1080x1920 frames,
that is 1088x1920).

The hantro_postproc_g2_enable() checks whether the encoded data width is
equal to decoded frame width, and if not, enables down-scale mode. For a
frame where input is 1080x1920 and output is 1088x1920, this is incorrect
as no down-scale happens, the frame is only padded. Enabling the down-scale
mode in this case results in corrupted frames.

Fix this by adjusting the check to test whether encoded data width is
greater than decoded frame width, and only in that case enable the
down-scale mode.

To generate input test data to trigger this bug, use e.g.:
$ gst-launch-1.0 videotestsrc ! video/x-raw,width=272,height=256,format=I420 ! \
                 vp9enc ! matroskamux ! filesink location=/tmp/test.vp9
To trigger the bug upon decoding (note that the NV12 must be forced, as
that assures the output data would pass the G2 postproc):
$ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! vp9parse ! \
                 v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert ! fbdevsink

Fixes: 79c987de8b ("media: hantro: Use post processor scaling capacities")
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-09-27 09:39:55 +02:00
Arnd Bergmann
0cb1d9c845 media: verisilicon: change confusingly named relaxed register access
The register abstraction has wrappers around both the normal writel()
and its writel_relaxed() counterpart, but this has led to a lot of users
ending up with the relaxed version.

There is sometimes a need to intentionally pick the relaxed accessor for
performance critical functions, but I noticed that each hantro_reg_write()
call also contains a non-relaxed readl(), which is typically much more
expensive than a writel, so there is little benefit here but an added
risk of missing a serialization against DMA.

To make this behave like other interfaces, use the normal accessor by
default and only provide the relaxed version as an alternative for
performance critical code. hantro_postproc.c is the only place that
used both the relaxed and normal writel, but this does not seem
cricital either, so change it all to the normal ones.

[hverkuil: fix function prototype alignment]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-07-14 09:14:10 +02:00
Benjamin Gaignard
80c7373a45 media: verisilicon: Conditionally ignore native formats
AV1 film grain feature requires to use the postprocessor to produce
valid frames. In such case the driver shouldn't propose native pixels
format but only post-processed pixels format.
Additionally if when setting a control a value could change capture
queue pixels formats it is needed to call hantro_reset_raw_fmt().

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-06-09 16:20:21 +01:00
Benjamin Gaignard
1b9ef2744c media: verisilicon: Compute motion vectors size for AV1 frames
Compute the additional space required to store motion vectors at
the end of the frames buffers.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-06-09 16:16:02 +01:00
Benjamin Gaignard
3c32d94c9c media: verisilicon: Do not change context bit depth before validating the format
It is needed to check if the proposed pixels format is valid before
updating context bit depth and other internal states.
Stop using ctx->bit_depth to check format depth match and return
result to the caller.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2023-04-10 14:18:13 +01:00
Benjamin Gaignard
5aa24d7299 media: hantro: postproc: Configure output regs to support 10bit
Move output format setting in postproc and make sure that
8/10bit configuration is correctly set.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-09-24 09:00:00 +02:00
Ezequiel Garcia
fbb6c848dd media: destage Hantro VPU driver
The Hantro mainline driver has been used in production
since several years and was only kept as a staging driver
due the stateless CODEC controls.

Now that all the stateless CODEC controls have been moved
out of staging, graduate the driver as well.

Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-08-31 10:23:50 +02:00
Renamed from drivers/staging/media/hantro/hantro_postproc.c (Browse further)