9823871

Performance of Coprocessor Assisted Memset() Through Heterogeneous Computing

PublishedNovember 21, 2017
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
21 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method comprising: receiving a request to fill a plurality of ranges of memory addresses with a particular value; selecting a first subset of said plurality of ranges; distributing said first subset of said plurality of ranges to a plurality of coprocessors; after said distributing said first subset, simultaneously: said plurality of coprocessors storing said particular value into memory at each memory address of said first subset of said plurality of ranges of memory addresses; and selecting a second subset of said plurality of ranges, wherein said second subset and said first subset are disjoint; distributing said second subset of said plurality of ranges to said plurality of coprocessors.

Plain English Translation

A system efficiently fills multiple memory ranges with a specific value using a pipeline approach with coprocessors. It receives a request to fill several memory ranges and selects a first group (subset) of these ranges to distribute to multiple coprocessors. While these coprocessors are actively filling their assigned memory ranges with the specified value, the system simultaneously selects a second, non-overlapping subset of memory ranges and distributes these to the same coprocessors. This parallel processing creates a pipeline, allowing continuous filling of memory ranges by overlapping selection and distribution with the actual filling process.

Claim 2

Original Legal Text

2. The method of claim 1 wherein distributing said first subset of said plurality of ranges comprises distributing multiple ranges of said first subset of said plurality of ranges to one of said plurality of coprocessors.

Plain English Translation

The method from the previous memory filling description involves distributing multiple memory ranges from the first subset to individual coprocessors. So, a single coprocessor may be responsible for filling several distinct regions of memory addresses within the initially selected subset, rather than each coprocessor handling only one range at a time. This allows for efficient utilization of each coprocessor's capabilities when dealing with fragmented memory ranges.

Claim 3

Original Legal Text

3. The method of claim 1 wherein: said plurality of ranges of memory addresses comprises a plurality of ranges of physical memory addresses; said plurality of ranges of physical memory addresses are associated with one range of virtual memory addresses; said particular value is zero.

Plain English Translation

In the method from the initial memory filling description, the multiple ranges of memory addresses consist of multiple physical memory address ranges. These physical memory ranges are mapped to a single virtual memory address range. Also, the particular value being written to the memory locations is zero. This is useful for zeroing out a large virtual memory region by operating on its underlying physical pages.

Claim 4

Original Legal Text

4. The method of claim 1 wherein: said distributing said first subset of said plurality of ranges to a plurality of coprocessors comprises for each range of said first subset of said plurality of ranges, adding a command to a command queue of a respective coprocessor of said plurality of coprocessors; said command identifies said each range and contains said particular value.

Plain English Translation

In the method from the initial memory filling description, distributing the first subset of memory ranges to the coprocessors includes adding a command to each coprocessor's command queue for each range it's assigned. This command specifies the memory range to be filled and contains the particular value to be used for filling. The coprocessors then process these commands from their queues to perform the memory filling operation, allowing for asynchronous operation.

Claim 5

Original Legal Text

5. The method of claim 1 wherein distributing said first subset of said plurality of ranges to a plurality of coprocessors comprises distributing at least one of said first subset of said plurality of ranges to a central processing unit (CPU).

Plain English Translation

In the method from the initial memory filling description, distributing the first subset of memory ranges to the coprocessors includes distributing at least one range of the subset to the main CPU, in addition to the coprocessors. This means the CPU participates in the memory filling operation alongside the coprocessors. This might be useful for handling small memory ranges or for scenarios where certain memory regions are more efficiently accessed by the CPU.

Claim 6

Original Legal Text

6. The method of claim 5 wherein distributing at least one of said first subset of said plurality of ranges to a CPU comprises distributing more memory addresses to said CPU than are distributed to at least one of said plurality of coprocessors.

Plain English Translation

In the method from the prior claim where at least one memory range is distributed to the CPU, the CPU is assigned more memory addresses to fill than at least one of the coprocessors. This might occur when the CPU is more efficient at filling a particularly large or contiguous memory region, or when certain coprocessors have limited memory access capabilities. This approach enables the system to use the optimal hardware for different memory regions.

Claim 7

Original Legal Text

7. The method of claim 1 wherein said selecting a first subset of said plurality of ranges comprises splitting at least one of said plurality of ranges into a plurality of subranges.

Plain English Translation

In the method from the initial memory filling description, selecting the first subset of memory ranges involves splitting one or more of the original memory ranges into smaller subranges. This splitting allows for finer-grained control over the distribution of work to the coprocessors, potentially balancing the load more effectively or improving memory access patterns. It enables the system to handle very large memory ranges by breaking them into smaller, more manageable chunks.

Claim 8

Original Legal Text

8. One or more non-transitory computer readable media storing instructions that include: first instructions which, when executed by one or more processors, cause receiving a request to fill a plurality of ranges of memory addresses with a particular value; second instructions which, when executed by one or more processors, cause selecting a first subset of said plurality of ranges; third instructions which, when executed by one or more processors, cause distributing said first subset of said plurality of ranges to a plurality of coprocessors; fourth instructions which, when executed by one or more processors, cause after said distributing said first subset, simultaneously: said plurality of coprocessors storing said particular value into memory at each memory address of said first subset of said plurality of ranges of memory addresses; and selecting a second subset of said plurality of ranges, wherein said second subset and said first subset are disjoint; fifth instructions which, when executed by one or more processors, cause distributing said second subset of said plurality of ranges to said plurality of coprocessors.

Plain English Translation

A non-transitory computer-readable storage medium stores instructions for efficiently filling multiple memory ranges with a specific value using a pipeline approach with coprocessors. The instructions cause the system to receive a request to fill several memory ranges, select a first subset of these ranges for distribution to multiple coprocessors. While coprocessors are filling their ranges, the system selects a second, non-overlapping subset and distributes it to the same coprocessors. This creates a pipeline, overlapping selection and distribution with the actual filling process for continuous memory filling.

Claim 9

Original Legal Text

9. The one or more non-transitory computer readable media of claim 8 wherein distributing said first subset of said plurality of ranges comprises distributing multiple ranges of said first subset of said plurality of ranges to one of said plurality of coprocessors.

Plain English translation pending...
Claim 10

Original Legal Text

10. The one or more non-transitory computer readable media of claim 8 wherein: said plurality of ranges of memory addresses comprises a plurality of ranges of physical memory addresses; said plurality of ranges of physical memory addresses are associated with one range of virtual memory addresses; said particular value is zero.

Plain English Translation

The computer readable medium from the initial memory filling description stores instructions where the multiple ranges of memory addresses consist of multiple physical memory address ranges. These physical memory ranges are mapped to a single virtual memory address range and the value used to fill the memory locations is zero. This approach is useful for zeroing out a large virtual memory region by operating on its underlying physical pages.

Claim 11

Original Legal Text

11. The one or more non-transitory computer readable media of claim 8 wherein: said distributing said first subset of said plurality of ranges to a plurality of coprocessors comprises for each range of said first subset of said plurality of ranges, adding a command to a command queue of a respective coprocessor of said plurality of coprocessors; said command identifies said each range and contains said particular value.

Plain English Translation

The computer readable medium from the initial memory filling description stores instructions such that distributing the first subset of memory ranges to the coprocessors includes adding a command to each coprocessor's command queue for each range it's assigned. This command specifies the memory range to be filled and contains the particular value to be used for filling. The coprocessors then process these commands from their queues to perform the memory filling operation.

Claim 12

Original Legal Text

12. The one or more non-transitory computer readable media of claim 8 wherein distributing said first subset of said plurality of ranges to a plurality of coprocessors comprises distributing at least one of said first subset of said plurality of ranges to a central processing unit (CPU).

Plain English Translation

The computer readable medium from the initial memory filling description stores instructions such that distributing the first subset of memory ranges to the coprocessors includes distributing at least one range of the subset to the main CPU, in addition to the coprocessors. This means the CPU participates in the memory filling operation alongside the coprocessors.

Claim 13

Original Legal Text

13. The one or more non-transitory computer readable media of claim 12 wherein distributing at least one of said first subset of said plurality of ranges to a CPU comprises distributing more memory addresses to said CPU than are distributed to each of said plurality of coprocessors.

Plain English Translation

In the computer readable medium from the prior claim where at least one memory range is distributed to the CPU, the instructions cause the CPU to be assigned more memory addresses to fill than are assigned to at least one of the coprocessors.

Claim 14

Original Legal Text

14. The one or more non-transitory computer readable media of claim 8 wherein said selecting a first subset of said plurality of ranges comprises splitting at least one of said plurality of ranges into a plurality of subranges.

Plain English Translation

The computer readable medium from the initial memory filling description stores instructions such that selecting the first subset of memory ranges includes splitting one or more of the original memory ranges into smaller subranges. This enables finer-grained control over the distribution of work to the coprocessors, potentially balancing the load more effectively or improving memory access patterns.

Claim 15

Original Legal Text

15. A device comprising: a plurality of coprocessors capable of storing a value at memory addresses of a range of memory addresses; and a central processing unit (CPU) connected to said plurality of coprocessors and configured to: receive a request to fill a plurality of ranges of memory addresses with a value; select a first subset of said plurality of ranges; distribute said first subset of said plurality of ranges to a plurality of coprocessors; after said distributing said first subset, simultaneously: said plurality of coprocessors storing said particular value into memory at each memory address of said first subset of said plurality of ranges of memory addresses; and select a second subset of said plurality of ranges, wherein said second subset and said first subset are disjoint; distribute said second subset of said plurality of ranges to said plurality of coprocessors.

Plain English Translation

A device that efficiently fills multiple memory ranges with a specific value. It contains multiple coprocessors capable of writing values to memory and a CPU connected to these coprocessors. The CPU receives a request to fill memory ranges and selects a first subset of them for distribution to coprocessors. While these coprocessors are filling their assigned ranges, the CPU simultaneously selects a second, non-overlapping subset and distributes it to the coprocessors, creating a pipeline for continuous memory filling by overlapping selection, distribution, and the actual filling operation.

Claim 16

Original Legal Text

16. The device of claim 15 wherein distributing said first subset of said plurality of ranges comprises distributing multiple ranges of said first subset of said plurality of ranges to one of said plurality of coprocessors.

Plain English Translation

The device from the previous memory filling description distributes multiple memory ranges from the first subset to individual coprocessors. So, a single coprocessor may be responsible for filling several distinct regions of memory addresses within the initially selected subset, rather than each coprocessor handling only one range at a time.

Claim 17

Original Legal Text

17. The device of claim 15 wherein: said plurality of ranges of memory addresses comprises a plurality of ranges of physical memory addresses; said plurality of ranges of physical memory addresses are associated with one range of virtual memory addresses; said particular value is zero.

Plain English Translation

In the device from the initial memory filling description, the multiple ranges of memory addresses consist of multiple physical memory address ranges. These physical memory ranges are associated with a single virtual memory address range. Furthermore, the particular value being written to the memory locations is zero.

Claim 18

Original Legal Text

18. The device of claim 15 wherein: said distributing said first subset of said plurality of ranges to a plurality of coprocessors comprises for each range of said first subset of said plurality of ranges, adding a command to a command queue of a respective coprocessor of said plurality of coprocessors; said command identifies said each range and contains said particular value.

Plain English Translation

This invention relates to a system for distributing computational tasks across multiple coprocessors in a parallel processing environment. The problem addressed is the efficient allocation of workloads to coprocessors to optimize performance and resource utilization. The system involves dividing a set of data ranges into subsets and distributing these subsets to different coprocessors. Each coprocessor receives a command that specifies a particular range of data and a predefined value associated with that range. The command is placed in a command queue of the respective coprocessor, allowing the coprocessor to process the specified range independently. This approach ensures that each coprocessor can handle its assigned range without interference, improving parallel processing efficiency. The system is particularly useful in high-performance computing applications where large datasets must be processed quickly and efficiently. By distributing the workload in this manner, the system minimizes bottlenecks and maximizes throughput, making it suitable for applications such as data analysis, scientific computing, and real-time processing. The invention focuses on the method of command distribution to coprocessors, ensuring that each coprocessor receives the necessary information to process its assigned range correctly.

Claim 19

Original Legal Text

19. The device of claim 15 wherein distributing said first subset of said plurality of ranges to a plurality of coprocessors comprises distributing at least one of said first subset of said plurality of ranges to a central processing unit (CPU).

Plain English Translation

In the device from the initial memory filling description, distributing the first subset of memory ranges to the coprocessors involves distributing at least one range of the subset to the main CPU, in addition to the coprocessors.

Claim 20

Original Legal Text

20. The device of claim 19 wherein distributing at least one of said first subset of said plurality of ranges to a CPU comprises distributing more memory addresses to said CPU than are distributed to each of said plurality of coprocessors.

Plain English Translation

In the device from the prior claim where at least one memory range is distributed to the CPU, the CPU is assigned more memory addresses to fill than each of the coprocessors.

Claim 21

Original Legal Text

21. The device of claim 15 wherein said selecting a first subset of said plurality of ranges comprises splitting at least one of said plurality of ranges into a plurality of subranges.

Plain English Translation

In the device from the initial memory filling description, selecting the first subset of memory ranges involves splitting one or more of the original memory ranges into smaller subranges.

Patent Metadata

Filing Date

Unknown

Publication Date

November 21, 2017

Inventors

Kishore Pusukuri
Robert D. Gardner

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “PERFORMANCE OF COPROCESSOR ASSISTED MEMSET() THROUGH HETEROGENEOUS COMPUTING” (9823871). https://patentable.app/patents/9823871

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/9823871. See llms.txt for full attribution policy.