Patentable/Patents/US-11962814

US-11962814

Indication of tiles in a video picture

PublishedApril 16, 2024

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A method includes performing a conversion between a video including a video picture including one or more tiles and a bitstream of the video. The video picture refers to a picture parameter set, and the picture parameter set conforms to a format rule specifying that the picture parameter set includes a list of column widths for N tile columns, where N is an integer. An (N−1)-th tile column exists in the video picture and the (N−1)-th tile column has a width that is equal to an (N−1)-th entry in a list of explicitly included tile column widths plus one number of coding tree blocks.

Patent Claims

15 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 2

Original Legal Text

2. The method of claim 1, wherein a value of N is indicated by a second syntax element included in the picture parameter set.

Plain English Translation

A method for encoding or decoding video data involves signaling a value of N, which represents a parameter related to video processing, using a second syntax element included in a picture parameter set. The picture parameter set is a data structure that contains information applicable to one or more pictures in a video sequence. The value of N is used to control or configure a specific aspect of video encoding or decoding, such as the number of reference pictures, the size of a buffer, or a quantization parameter. The second syntax element is a specific field or flag within the picture parameter set that explicitly indicates the value of N. This allows the encoder or decoder to dynamically adjust the parameter based on the content of the video sequence, improving compression efficiency or visual quality. The method ensures that the value of N is properly signaled and interpreted by both the encoder and decoder, maintaining synchronization between them. The use of a picture parameter set for signaling N ensures that the parameter is applicable to a group of pictures, providing flexibility in video encoding and decoding processes.

Claim 3

Original Legal Text

3. The method of claim 1, wherein the width of the N-th tile column of the N tile columns in units of coding tree blocks is disallowed to be reset.

Plain English Translation

This invention relates to video encoding and decoding, specifically addressing inefficiencies in tile-based partitioning of video frames. In video coding, frames are divided into tiles to enable parallel processing, but existing methods may reset the width of tile columns in coding tree blocks (CTBs), leading to suboptimal encoding decisions and increased computational overhead. The invention improves upon this by disallowing the reset of the width for the N-th tile column, ensuring consistent partitioning and reducing redundant processing. This constraint prevents unnecessary adjustments to tile column widths, which can otherwise disrupt encoding efficiency and increase bitrate without proportional quality gains. By maintaining fixed widths for specific tile columns, the method enhances encoding consistency, reduces computational complexity, and improves overall coding performance. The approach is particularly useful in high-efficiency video coding (HEVC) and similar standards where tile-based partitioning is employed. The invention ensures that once a tile column's width is determined, it remains unchanged, avoiding inefficiencies caused by dynamic resets. This method integrates with broader tile partitioning schemes, where frames are divided into multiple tile columns, each composed of CTBs. The constraint applies specifically to the N-th tile column, ensuring its width is preserved throughout the encoding process. This solution addresses a key limitation in existing tile partitioning techniques, where arbitrary resets can lead to inefficient encoding decisions.

Claim 4

Original Legal Text

4. The method of claim 1, wherein the first syntax element is a N-th entry in the list of syntax elements, wherein a uniform tile column width is set to the width of the N-th tile column of the N tile columns, and wherein when a difference between a picture width of a luma component in units of coding tree blocks and a sum of the tile column widths of the N tile columns is greater than or equal to the width of the N-th tile column of the N tile columns, a width of a (N+1)-th tile column is set to be equal to the width of the N-th tile column of the N tile columns.

Plain English Translation

This invention relates to video encoding and decoding, specifically to methods for partitioning a picture into tiles with uniform column widths. The problem addressed is efficiently managing tile partitioning in video coding to balance computational complexity and memory access while maintaining encoding efficiency. The method involves defining a list of syntax elements representing tile column widths. A uniform tile column width is set based on the N-th entry in this list, where N is a specific position in the sequence. If the difference between the picture width (in coding tree block units) and the sum of the tile column widths is at least as large as the width of the N-th tile column, the width of the next (N+1-th) tile column is set equal to the N-th tile column's width. This ensures consistent tile partitioning while accommodating varying picture dimensions. The approach optimizes memory access patterns and parallel processing by maintaining uniform tile widths where possible, reducing overhead in encoding and decoding processes. The method is particularly useful in video codecs where efficient tile-based processing is critical for performance and scalability.

Claim 5

Original Legal Text

5. The method of claim 1, wherein when a difference between a picture width of a luma component in units of coding tree blocks and a sum of the tile column widths for the N tile columns is less than the width of the N-th tile column of the N tile columns, a width of a (N+1)-th tile column is set to be equal to the difference.

Plain English Translation

This invention relates to video encoding and decoding, specifically addressing the partitioning of video frames into tiles for efficient processing. The problem solved involves ensuring proper alignment and sizing of tiles when the frame width is not perfectly divisible by the tile column widths, which can lead to inefficient encoding or decoding. The method involves dividing a video frame into multiple tile columns, where each tile column has a defined width. The luma component of the frame is divided into coding tree blocks (CTBs), and the total width of these blocks is compared to the sum of the tile column widths. If the difference between the frame width and the sum of the tile column widths is less than the width of the last (N-th) tile column, the width of an additional (N+1)-th tile column is set to match this difference. This ensures that the entire frame width is covered without leaving any unassigned blocks, improving encoding efficiency and compatibility with existing video codecs. The approach dynamically adjusts tile sizes to handle irregular frame dimensions, preventing misalignment or wasted processing resources. This method is particularly useful in adaptive video encoding systems where frame sizes may vary.

Claim 7

Original Legal Text

7. The method of claim 6, wherein a value of M is indicated by a fourth syntax element included in the picture parameter set.

Plain English Translation

This invention relates to video encoding and decoding, specifically to signaling the number of motion vectors (M) used in a video coding process. In video compression, motion vectors are used to predict motion between frames, improving efficiency. However, determining the optimal number of motion vectors (M) for a given video sequence is challenging, as it depends on factors like scene complexity and motion patterns. The invention addresses this by dynamically signaling the value of M using a syntax element in the picture parameter set (PPS), allowing adaptability without requiring frequent updates. The PPS is a data structure in video coding standards like H.264/AVC or HEVC that contains parameters applicable to multiple frames. By including a fourth syntax element in the PPS, the invention enables efficient communication of M to the decoder, reducing redundancy and improving compression efficiency. This approach avoids hardcoding M, allowing the encoder to adjust it based on content characteristics while ensuring the decoder can correctly interpret the motion vectors. The method enhances flexibility in motion compensation, improving video quality and compression performance.

Claim 8

Original Legal Text

8. The method of claim 6, wherein the height of the M-th tile row of the M tile rows in units of coding tree blocks is disallowed to be reset.

Plain English Translation

A method for video encoding or decoding involves managing tile rows within a video frame, where the frame is divided into multiple tile rows composed of coding tree blocks. The method addresses the problem of inefficient or inconsistent tile row height management, which can lead to encoding inefficiencies or compatibility issues. Specifically, the method enforces a constraint where the height of a particular tile row (the M-th tile row) cannot be reset during the encoding or decoding process. This ensures that the height of this specific tile row remains fixed, preventing dynamic adjustments that could disrupt encoding efficiency or cause errors in the decoding process. The method may be part of a broader system for managing tile rows, where other tile rows may have their heights adjusted dynamically, but the M-th tile row is locked to a predefined height. This constraint helps maintain consistency in the encoding or decoding pipeline, particularly in scenarios where certain tile rows must adhere to strict height requirements for compatibility or performance reasons. The method is applicable in video compression standards or proprietary encoding schemes where tile-based partitioning is used to improve encoding efficiency or parallel processing capabilities.

Claim 9

Original Legal Text

9. The method of claim 6, wherein the third syntax element is a M-th entry in the second list of syntax elements, wherein a uniform tile row height is set to the height of the M-th tile row of the M tile rows, and wherein when a difference between a picture height of a luma component in units of coding tree blocks and a sum of the tile row heights for the M tile rows is greater than or equal to the height of the M-th tile row of the M tile rows, a height of a (M+1)-th tile row is set to be equal to the height of the M-th tile row of the M tile rows.

Plain English Translation

This invention relates to video encoding and decoding, specifically to methods for partitioning a picture into tiles with uniform row heights. The problem addressed is efficiently managing tile row heights in video coding to balance computational complexity and encoding efficiency. The method involves selecting a syntax element from a predefined list to determine the height of a tile row. A uniform tile row height is set based on the height of a specific tile row (the M-th tile row) in a sequence of M tile rows. If the difference between the picture height (in coding tree block units) and the sum of the tile row heights for the M tile rows is at least the height of the M-th tile row, the height of the next tile row (the (M+1)-th tile row) is set equal to the M-th tile row's height. This ensures consistent tile row heights, simplifying encoding and decoding processes while maintaining flexibility in tile partitioning. The approach optimizes memory access and parallel processing by reducing variations in tile row dimensions, improving overall video coding efficiency. The method is particularly useful in video compression standards where tile-based partitioning is employed to enhance encoding performance.

Claim 10

Original Legal Text

10. The method of claim 6, wherein when a difference between a picture height of a luma component in units of coding tree blocks and a sum of the tile row heights for the M tile rows is greater than or equal to the height of the M-th tile row of the M tile rows, a height of a (M+1)-th tile row is set to be equal to the difference.

Plain English Translation

This invention relates to video encoding and decoding, specifically addressing the partitioning of video frames into tiles for efficient processing. The problem solved involves ensuring proper alignment and sizing of tiles when the frame height does not evenly divide by the tile row heights, which can lead to inefficient encoding or decoding. The method dynamically adjusts the height of an additional tile row when the remaining frame height exceeds the sum of the heights of existing tile rows. Specifically, if the difference between the frame height and the sum of the tile row heights is at least as large as the height of the last tile row, the height of a new tile row is set equal to this difference. This ensures that the entire frame is properly partitioned without leftover blocks, improving encoding efficiency and compatibility with block-based coding structures like coding tree blocks. The approach avoids wasted processing resources and ensures consistent tile boundaries across frames, which is critical for real-time video applications. The method is particularly useful in video codecs where tile-based partitioning is used to enable parallel processing or region-of-interest encoding.

Claim 11

Original Legal Text

11. The method of claim 1, wherein the conversion includes encoding the video into the bitstream.

Plain English Translation

This invention relates to video processing, specifically methods for converting video data into a compressed bitstream. The problem addressed is the need for efficient encoding of video data to reduce storage and transmission requirements while maintaining quality. The method involves converting video data into a compressed bitstream format, where the conversion includes encoding the video into the bitstream. The encoding process may involve techniques such as compression, quantization, or entropy coding to reduce the data size. The method may also include preprocessing steps like noise reduction or frame interpolation to improve encoding efficiency. The encoded bitstream can then be stored or transmitted for playback or further processing. The invention aims to optimize the encoding process to balance compression efficiency with video quality, ensuring compatibility with various playback devices and network conditions. The method may be applied in video streaming, broadcasting, or storage systems where bandwidth and storage constraints are critical. The encoding process may also include adaptive techniques that adjust parameters based on video content or available resources to further enhance efficiency. The invention provides a flexible and efficient solution for video compression, addressing the growing demand for high-quality video delivery in constrained environments.

Claim 12

Original Legal Text

12. The method of claim 1, wherein the conversion includes decoding the video from the bitstream.

Plain English Translation

Video encoding and decoding systems often face challenges in efficiently processing compressed video data, particularly when converting between different formats or preparing data for display. This invention addresses the need for improved video conversion techniques by providing a method that includes decoding a video bitstream to extract the original video data. The method involves receiving a compressed video bitstream, which may be encoded in a standard format such as H.264, H.265, or VP9. The decoding process reconstructs the video frames by applying inverse transformations, motion compensation, and entropy decoding to the bitstream. This step ensures that the video data is accurately restored to its uncompressed form, enabling further processing or display. The decoded video may then be converted into another format, such as a different resolution, frame rate, or color space, depending on the application requirements. This method ensures compatibility with various playback devices and optimizes video quality for different use cases. The invention improves efficiency in video processing pipelines by integrating decoding as part of the conversion workflow, reducing latency and computational overhead.

Claim 14

Original Legal Text

14. The apparatus of claim 13, wherein a value of N is indicated by a second syntax element included in the picture parameter set.

Plain English Translation

The invention relates to video encoding and decoding systems, specifically to the signaling of parameter values in video bitstreams. In video compression, certain parameters must be transmitted from an encoder to a decoder to ensure proper decoding. One such parameter is N, which defines a characteristic of the video data, such as the number of reference pictures or a quantization step size. The problem addressed is the efficient signaling of such parameters without increasing bitrate overhead or complexity. The invention provides a method where the value of N is explicitly signaled in a picture parameter set (PPS) using a second syntax element. The PPS is a data structure in video coding standards like H.264/AVC or HEVC that contains parameters applicable to one or more pictures. By including N in the PPS, the decoder can access this value without parsing additional syntax elements, reducing computational overhead. The first syntax element in the PPS may define another parameter, while the second syntax element specifically indicates N. This approach ensures flexibility in parameter configuration while maintaining efficient decoding. The invention applies to both encoders, which generate the syntax elements, and decoders, which parse them to reconstruct video frames. The solution improves coding efficiency by avoiding redundant signaling and simplifying parameter access.

Claim 15

Original Legal Text

15. The apparatus of claim 13, wherein the width of the N-th tile column of the N tile columns in units of coding tree blocks is disallowed to be reset.

Plain English Translation

This invention relates to video encoding and decoding, specifically addressing the problem of efficiently managing tile structures in video frames to improve encoding performance and reduce computational complexity. The invention involves a method for processing video data using tiles, where tiles are rectangular regions of a video frame divided into coding tree blocks (CTBs). The invention focuses on controlling the width of tile columns within a frame to prevent unnecessary resets, which can disrupt encoding efficiency. Specifically, the apparatus ensures that the width of the N-th tile column, measured in coding tree blocks, cannot be reset during encoding or decoding. This restriction helps maintain consistent tile structures across frames, reducing the need for frequent reconfiguration and improving encoding speed. The invention also includes mechanisms for determining tile column positions and ensuring that tile column widths remain fixed, which simplifies the encoding and decoding processes. By preventing resets, the invention avoids inefficiencies caused by dynamic adjustments, leading to more predictable and optimized video processing. The overall goal is to enhance encoding efficiency while maintaining compatibility with existing video coding standards.

Claim 16

Original Legal Text

16. The apparatus of claim 13, wherein the first syntax element is a N-th entry in the list of syntax elements, wherein a uniform tile column width is set to the width of the N-th tile column of the N tile columns, and wherein when a difference between a picture width of a luma component in units of coding tree blocks and a sum of the tile column widths of the N tile columns is greater than or equal to the width of the N-th tile column of the N tile columns, a width of a (N+1)-th tile column is set to be equal to the width of the N-th tile column of the N tile columns.

Plain English Translation

This invention relates to video encoding and decoding, specifically to methods for partitioning a picture into tile columns for efficient processing. The problem addressed is ensuring uniform tile column widths while accommodating variations in picture dimensions. The apparatus includes a processor configured to partition a picture into N tile columns, where the first syntax element corresponds to the N-th tile column. A uniform tile column width is set to the width of the N-th tile column. If the difference between the picture width (in coding tree blocks) and the sum of all tile column widths is at least as large as the N-th tile column's width, the (N+1)-th tile column is assigned the same width as the N-th tile column. This ensures consistent partitioning while handling cases where the picture width is not perfectly divisible by the tile column widths. The method supports scalable and efficient encoding by maintaining uniformity in tile column sizes, which simplifies memory management and parallel processing. The apparatus may also include additional components for encoding or decoding the partitioned tiles, ensuring compatibility with existing video coding standards.

Claim 18

Original Legal Text

18. The non-transitory computer-readable storage medium of claim 17, wherein a value of N is indicated by a second syntax element included in the picture parameter set.

Plain English Translation

A system and method for video encoding and decoding involves managing a parameter set that includes a syntax element indicating a value of N, which defines the number of entries in a list of reference pictures. The parameter set is used to signal reference picture information to a decoder, allowing efficient prediction and reconstruction of video frames. The system ensures that the decoder can correctly interpret the reference picture list by explicitly signaling the value of N, which determines the size of the list. This approach improves encoding efficiency by reducing redundancy and enabling more accurate motion compensation. The parameter set may be part of a larger video coding structure, such as a sequence parameter set or a picture parameter set, and is transmitted alongside encoded video data. The method ensures compatibility with existing video coding standards while enhancing flexibility in reference picture management. The system is particularly useful in applications requiring high compression efficiency, such as streaming and broadcast video.

Claim 20

Original Legal Text

20. The non-transitory computer-readable recording medium of claim 19, wherein a value of N is indicated by a second syntax element included in the picture parameter set.

Plain English Translation

The invention relates to video encoding and decoding, specifically to the efficient signaling of parameter values in video coding standards. The problem addressed is the need to dynamically adjust encoding parameters while minimizing bitrate overhead. The solution involves storing a parameter value N in a picture parameter set (PPS) and signaling its value using a second syntax element within the PPS. This allows the encoder to flexibly control the parameter without requiring frequent updates in the bitstream. The parameter N may represent a coding tool configuration, such as a maximum transform size, a quantization step adjustment, or a loop filter strength. The second syntax element ensures that the value of N is explicitly signaled, avoiding ambiguity and enabling efficient decoding. The invention improves coding efficiency by reducing redundancy in parameter signaling while maintaining flexibility in parameter adaptation. The method is applicable to video coding standards like HEVC or AV1, where parameter sets are used to manage encoding configurations across multiple frames. The approach ensures that the parameter N is consistently applied across frames referenced by the PPS, improving compression efficiency and reducing bitrate overhead.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

H04N

Patent Metadata

Filing Date

August 22, 2022

Publication Date

April 16, 2024

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search