Methods for Facilitating Preservation and Retrieval of Heterogeneous Content and Devices Thereof

PublishedJuly 7, 2020

Assigneenot available in USPTO data we have

InventorsEric J. Leinberg Clive R. Daunton Jacob A. Constantinides

Technical Abstract

Patent Claims

15 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method for facilitating preservation and retrieval of heterogeneous content, comprising: establishing, with a content management apparatus, one or more user tags on behalf of an administrator having authenticated administrator login credentials; receiving, with the content management apparatus, a storage request from one of a plurality of users different from the administrator and having user login credentials different than the administrator login credentials, wherein the storage request includes content and context information associated with the received content and the received context information comprises at least metadata, a first current date, and information for association with one or more of the user tags; identifying, with the content management apparatus, one of a plurality of types of content for the received content as a whole; extracting, with the content management apparatus, searchable information from the received content using one of a plurality of different types of extraction techniques based on the identified one of the plurality of types of content as a whole; generating, with the content management apparatus, a searchable index for the received content based on at least the extracted searchable information and the context information associated with the received content; storing, with the content management apparatus, the received content in a manner which is retrievable based on one or more associations in the generated searchable index; periodically determining, with the content management apparatus, whether a second current date is equivalent to or after a retention date for the content, wherein the retention date is determined based on the first current date and a retention period established on behalf of the one of the plurality of users or determined based on an association of the received content with a category having a default retention period established by the administrator; and providing, with the content management apparatus, a notification to one or more of the administrator or the one of the plurality of users, when the determination indicates that the second current date is equivalent to or after the retention date.

Plain English Translation

This invention relates to a content management system designed to preserve and retrieve diverse types of content efficiently. The system addresses challenges in organizing, indexing, and managing heterogeneous content while ensuring compliance with retention policies. An administrator sets up user tags and retention rules, which can be based on user-specific periods or predefined category defaults. Users submit content along with metadata, timestamps, and tag associations. The system automatically identifies the content type, applies appropriate extraction techniques to generate searchable information, and creates an index incorporating both the extracted data and contextual metadata. The content is stored in a retrievable format, allowing users and administrators to search and access it based on the generated index. The system also monitors retention periods, comparing current dates against predefined retention dates derived from submission timestamps and configured retention rules. When a retention date is reached, notifications are sent to users or administrators, ensuring timely content management and compliance with retention policies. This approach streamlines content preservation, retrieval, and lifecycle management across different content types and user roles.

Claim 2

Original Legal Text

2. The method as set forth in claim 1 , further comprising: generating, with the content management apparatus, from the received content, one or more preservation objects comprising an archival editable format file, an archival viewable format file or a text file including text included in the content; signing, with the content management apparatus, each of the received content and the one or more preservation objects with a respective digital signature; and storing, with the content management apparatus, the one or more preservation objects and the digital signatures.

Plain English Translation

This invention relates to digital content preservation, specifically ensuring long-term accessibility and integrity of digital content. The method involves receiving digital content from a user and generating preservation objects to maintain its usability over time. These preservation objects include an archival editable format file, an archival viewable format file, or a text file containing the content's text. The system then digitally signs both the original content and the preservation objects to verify their authenticity and integrity. Finally, the preservation objects and their corresponding digital signatures are stored for future retrieval. This approach addresses the challenge of preserving digital content in formats that remain accessible despite technological changes, while also ensuring the content's authenticity through cryptographic verification. The method supports multiple preservation formats to accommodate different use cases, such as editing, viewing, or text extraction, and maintains a secure record of the content's original state.

Claim 3

Original Legal Text

3. The method as set forth in claim 1 , wherein: the identified one of the plurality of types of content is a bitmap, image, graphic, or portable data format (PDF) and the extracting further comprises performing an optical character recognition technique on the received content to extract text included therein; the identified one of the plurality of types of content is text and the extracting further comprises parsing the received content to extract text included therein; and the identified one of the plurality of types of content is an electronic mail and the extracting further comprises retrieving an attachment to the electronic mail and extracting searchable information from both the electronic mail and the attachment.

Plain English Translation

This invention relates to content processing systems that extract and index text from various types of digital content for searchability. The problem addressed is the difficulty in efficiently extracting and indexing text from diverse content formats, such as images, documents, and emails, to enable effective search and retrieval. The method involves receiving digital content in different formats, including bitmaps, images, graphics, PDFs, plain text, and emails. For image-based content like bitmaps, images, or graphics, the system performs optical character recognition (OCR) to extract embedded text. For text-based content, the system parses the content to directly extract the text. For emails, the system retrieves any attached files and extracts searchable information from both the email body and the attachments. The extracted text is then processed to enhance searchability, ensuring that content from all formats can be indexed and retrieved efficiently. This approach enables comprehensive search capabilities across heterogeneous digital content types.

Claim 4

Original Legal Text

4. The method as set forth in claim 1 , further comprising: sending, with the content management apparatus, a content management agent to a client computing device, the content management agent comprising machine executable code which, when executed by a processor of the client computing device, causes the processor to perform steps comprising: facilitating designation by a user of a folder in local or network storage accessible by the user as a watch folder associated with an established set of information for the plurality of user tags; and generating the storage request in response to the user storing the content in the watch folder.

Plain English Translation

This invention relates to content management systems that automate the tagging and organization of digital content based on user-defined rules. The problem addressed is the manual effort required to categorize and manage digital files, leading to inefficiencies in retrieval and organization. The system includes a content management apparatus that sends a content management agent to a client computing device. This agent is a software component that, when executed, allows a user to designate a specific folder—either local or network-based—as a "watch folder." This folder is linked to a predefined set of user tags, which are metadata labels used to categorize content. When a user stores content in this watch folder, the agent automatically generates a storage request, triggering the system to apply the associated tags to the content. This automation reduces the need for manual tagging, improving efficiency in content organization and retrieval. The agent operates on the client device, monitoring the designated watch folder for new content. Once content is detected, the agent processes the storage request, ensuring the content is tagged according to the predefined rules. This approach streamlines workflows in environments where large volumes of digital files require consistent categorization, such as enterprise document management or media libraries. The system enhances usability by integrating seamlessly with existing storage solutions while minimizing user intervention.

Claim 5

Original Legal Text

5. The method as set forth in claim 1 , further comprising generating, with the content management apparatus, information for one or more system tags, wherein the processing further comprises processing the information for the one or more system tags and the storing further comprises storing the information for the one or more system tags.

Plain English Translation

This invention relates to content management systems, specifically methods for processing and storing content along with associated metadata. The problem addressed is the need to efficiently manage and organize digital content by automatically generating and integrating system tags, which are metadata labels used for categorization, search, and retrieval. The method involves a content management apparatus that processes digital content, extracts relevant information, and generates system tags based on the content's attributes or characteristics. These tags are then processed further to ensure consistency and accuracy before being stored alongside the original content. The system tags facilitate improved content organization, enabling faster retrieval and more precise filtering of digital assets within the management system. The invention enhances the functionality of existing content management systems by automating the tagging process, reducing manual effort, and improving the accuracy of content classification. This approach is particularly useful in large-scale digital libraries, media archives, or enterprise content management systems where efficient content organization is critical. The method ensures that system tags are properly integrated into the content management workflow, allowing users to leverage metadata for advanced search and filtering capabilities.

Claim 6

Original Legal Text

6. A non-transitory computer readable medium having stored thereon instructions for facilitating preservation and retrieval of heterogeneous content comprising machine executable code which when executed by a processor, causes the processor to perform steps comprising: establishing one or more user tags on behalf of an administrator having authenticated administrator login credentials; receiving a storage request from one of a plurality of users different from the administrator and having user login credentials different than the administrator login credentials, wherein the storage request includes content and context information associated with the received content and the received context information comprises at least metadata, a first current date, and information for association with one or more of the user tags; identifying one of a plurality of types of content for the received content as a whole; extracting searchable information from the received content using one of a plurality of different types of extraction techniques based on the identified one of the plurality of types of content as a whole; generating a searchable index for the received content based on at least the extracted searchable information and the context information associated with the received content; storing the received content in a manner which is retrievable based on one or more associations in the generated searchable index; periodically determining whether a second current date is equivalent to or after a retention date for the content, wherein the retention date is determined based on the first current date and a retention period established on behalf of the one of the plurality of users or determined based on an association of the received content with a category having a default retention period established by the administrator; and providing a notification to one or more of the administrator or the one of the plurality of users, when the determination indicates that the second current date is equivalent to or after the retention date.

Plain English Translation

This invention relates to a system for preserving and retrieving heterogeneous digital content with automated retention management. The system addresses challenges in organizing, indexing, and managing diverse content types (e.g., documents, images, videos) while ensuring compliance with retention policies. An administrator sets up user tags and retention rules, which can be default or user-specific. Users submit content along with metadata, timestamps, and tag associations. The system identifies the content type and applies appropriate extraction techniques to generate a searchable index, enabling retrieval based on metadata, tags, or extracted information. The system also monitors retention periods, comparing current dates against predefined retention dates (calculated from submission dates and retention policies). When a retention date is reached, notifications are sent to administrators or users, facilitating content archival or deletion. The solution automates compliance with retention policies while maintaining searchability across heterogeneous content.

Claim 7

Original Legal Text

7. The medium as set forth in claim 6 , further having stored thereon instructions comprising machine executable code which when executed by the processor causes the processor to perform steps further comprising: generating from the received content, one or more preservation objects comprising an archival editable format file, an archival viewable format file or a text file including text included in the content; signing each of the received content and the one or more preservation objects with a respective digital signature; and storing the one or more preservation objects and the digital signatures.

Plain English Translation

This invention relates to digital content preservation, specifically systems and methods for creating and managing archival versions of digital content. The problem addressed is ensuring the long-term integrity, authenticity, and accessibility of digital content by generating multiple preservation objects in different formats and securing them with digital signatures. The system receives digital content and processes it to generate one or more preservation objects. These objects include an archival editable format file (e.g., a format suitable for future editing), an archival viewable format file (e.g., a format suitable for display or rendering), and a text file containing the text extracted from the content. Each preservation object is derived from the original content to ensure fidelity while accommodating different use cases. The system also signs both the original content and each preservation object with a unique digital signature. This cryptographic process verifies the authenticity and integrity of the content, ensuring that any modifications can be detected. The signed preservation objects and their corresponding digital signatures are then stored for long-term archival purposes. This approach ensures that digital content remains accessible, verifiable, and editable over time, addressing challenges in digital preservation such as format obsolescence and data corruption. The use of multiple preservation objects and digital signatures provides redundancy and security, making the system suitable for applications requiring high levels of trust and longevity, such as legal, historical, or scientific records.

Claim 8

Original Legal Text

8. The medium as set forth in claim 6 , wherein: the identified one of the plurality of types of content is a bitmap, image, graphic, or portable data format (PDF) and the extracting further comprises performing an optical character recognition technique on the received content to extract text included therein; the identified one of the plurality of types of content is text and the extracting further comprises parsing the received content to extract text included therein; and the identified one of the plurality of types of content is an electronic mail and the extracting further comprises retrieving an attachment to the electronic mail and extracting searchable information from both the electronic mail and the attachment.

Plain English Translation

The invention relates to a computer-readable medium storing instructions for processing different types of digital content to extract searchable information. The system identifies the type of content received, which may include bitmaps, images, graphics, PDFs, plain text, or electronic mail (email). For bitmap, image, graphic, or PDF content, the system performs optical character recognition (OCR) to extract text from the visual elements. For text-based content, the system parses the content to directly extract the text. For email content, the system retrieves any attached files and extracts searchable information from both the email body and the attachments. This approach ensures that various forms of digital content are processed to make their information accessible for search and retrieval. The system enhances the ability to index and search diverse content types by converting them into a standardized, searchable format.

Claim 9

Original Legal Text

9. The medium as set forth in claim 6 , further having stored thereon instructions comprising machine executable code which when executed by the processor causes the processor to perform steps further comprising: sending a content management agent to a client computing device, the content management agent comprising machine executable code which, when executed by a processor of the client computing device, causes the processor to perform steps comprising: facilitating designation by a user of a folder in local or network storage accessible by the user as a watch folder associated with an established set of information for the plurality of user tags; and generating the storage request in response to the user storing the content in the watch folder.

Plain English Translation

This invention relates to content management systems that use automated tagging and storage based on user-defined rules. The problem addressed is the inefficiency of manually categorizing and storing digital content, particularly in environments where users generate or receive large volumes of files. The system includes a content management agent deployed to client devices, which monitors designated "watch folders" in local or network storage. When a user saves content to a watch folder, the agent automatically generates a storage request based on predefined tagging rules associated with that folder. These rules determine how the content should be processed, such as applying metadata tags, organizing files into specific directories, or triggering additional workflows. The agent operates transparently, reducing manual effort while ensuring consistent content organization. The system improves productivity by automating repetitive tasks and enforcing standardized tagging practices across multiple users or devices. This approach is particularly useful in enterprise settings where content must be systematically managed for compliance, searchability, or collaboration purposes. The invention leverages client-side processing to minimize server load while maintaining centralized control over content policies.

Claim 10

Original Legal Text

10. The medium as set forth in claim 6 , further having stored thereon instructions comprising machine executable code which when executed by the processor causes the processor to perform steps further comprising generating information for one or more system tags, wherein the processing further comprises processing the information for the one or more system tags and the storing further comprises storing the information for the one or more system tags.

Plain English Translation

A system and method for managing and processing system tags in a computing environment. The invention addresses the need for efficient generation, processing, and storage of system tags, which are metadata or identifiers used to categorize, track, or manage data, processes, or resources within a system. The system includes a processor and a non-transitory computer-readable medium storing instructions that, when executed, cause the processor to generate information for one or more system tags. The processor further processes this tag information, which may involve validation, formatting, or associating the tags with specific data or system components. The processed tag information is then stored in a structured manner, ensuring it is accessible for future reference or system operations. The system may also include additional functionalities such as tag retrieval, modification, or deletion, depending on the specific implementation. The invention improves system organization, searchability, and resource management by providing a structured approach to handling system tags.

Claim 11

Original Legal Text

11. A content management apparatus, comprising: a processor coupled to a memory and configured to be capable of executing programmed instructions comprising and stored in the memory to: establish one or more user tags on behalf of an administrator having authenticated administrator login credentials; receive a storage request from one of a plurality of users different from the administrator and having user login credentials different than the administrator login credentials, wherein the storage request includes content and context information associated with the received content and the received context information comprises at least metadata, a first current date, and information for association with one or more of the user tags; identify one of a plurality of types of content for the received content as a whole; extract searchable information from the received content using one of a plurality of different types of extraction techniques based on the identified one of the plurality of types of content as a whole; generate a searchable index for the received content based on at least the extracted searchable information and the context information associated with the received content; store the received content in a manner which is retrievable based on one or more associations in the generated searchable index; periodically determine whether a second current date is equivalent to or after a retention date for the content, wherein the retention date is determined based on the first current date and a retention period established on behalf of the one of the plurality of users or determined based on an association of the received content with a category having a default retention period established by the administrator; and provide a notification to one or more of the administrator or the one of the plurality of users, when the determination indicates that the second current date is equivalent to or after the retention date.

Plain English Translation

A content management system automates the organization, indexing, and lifecycle management of digital content. The system addresses challenges in efficiently categorizing, retrieving, and retaining content while ensuring compliance with retention policies. An administrator sets up user tags and retention rules, while users submit content along with metadata, timestamps, and tag associations. The system identifies the content type (e.g., documents, images, videos) and applies appropriate extraction techniques to generate searchable metadata. This metadata, combined with user-provided context, forms an index enabling fast retrieval. The system also enforces retention policies by tracking content storage dates and comparing them against predefined retention periods, either user-specific or category-based. When a retention period expires, the system notifies the administrator or user, ensuring timely content disposal or review. The solution streamlines content governance by automating tagging, indexing, and lifecycle management while maintaining flexibility for different content types and retention requirements.

Claim 12

Original Legal Text

12. The apparatus as set forth in claim 11 , wherein the processor is further configured to execute programmed instructions comprising and stored in the memory to: generate from the received content, one or more preservation objects comprising an archival editable format file, an archival viewable format file or a text file including text included in the content; sign each of the received content and the one or more preservation objects with a respective digital signature; and store the one or more preservation objects and the digital signatures.

Plain English Translation

This invention relates to digital content preservation systems, specifically addressing the need to securely archive and verify the integrity of digital content over time. The system processes received digital content to generate multiple preservation objects, including an archival editable format file, an archival viewable format file, and a text file containing the content's text. These preservation objects ensure long-term accessibility and usability of the content in different formats. The system also generates digital signatures for both the original content and each preservation object, providing cryptographic proof of authenticity and integrity. These signed objects are then stored together, enabling future verification that the content has not been altered. The system supports multiple file formats to accommodate different use cases, such as editing, viewing, or text extraction, while maintaining a secure and verifiable record of the original content. This approach is particularly useful for legal, regulatory, or historical archiving where content integrity and authenticity are critical.

Claim 13

Original Legal Text

13. The apparatus as set forth in claim 11 , wherein: the identified one of the plurality of types of content is a bitmap, image, graphic, or portable data format (PDF) and the extracting further comprises performing an optical character recognition technique on the received content to extract text included therein; the identified one of the plurality of types of content is text and the extracting further comprises parsing the received content to extract text included therein; and the identified one of the plurality of types of content is an electronic mail and the extracting further comprises retrieving an attachment to the electronic mail and extracting searchable information from both the electronic mail and the attachment.

Plain English Translation

This invention relates to content processing systems that extract and analyze text from various digital content types. The problem addressed is the difficulty in uniformly extracting searchable text from diverse content formats, such as images, documents, and emails, which often require different processing techniques. The apparatus identifies the type of content received, such as a bitmap, image, graphic, PDF, plain text, or email, and applies the appropriate extraction method. For bitmap, image, graphic, or PDF content, optical character recognition (OCR) is performed to convert visual elements into searchable text. For text-based content, parsing techniques are used to directly extract the text. For emails, the system retrieves attachments and extracts information from both the email body and the attached files. This ensures comprehensive text extraction across multiple formats, enabling efficient indexing and search capabilities. The solution enhances data accessibility by standardizing text extraction from heterogeneous digital content.

Claim 14

Original Legal Text

14. The apparatus as set forth in claim 11 , wherein the processor is further configured to execute programmed instructions comprising and stored in the memory to: send a content management agent to a client computing device, the content management agent comprising machine executable code which, when executed by a processor of the client computing device, causes the processor to perform steps comprising: facilitate designation by a user of a folder in local or network storage accessible by the user as a watch folder associated with an established set of information for the plurality of user tags; and generate the storage request in response to the user storing the content in the watch folder.

Plain English Translation

This invention relates to content management systems that use automated tagging and storage based on user-defined rules. The problem addressed is the inefficiency of manually categorizing and storing digital content, which can be time-consuming and error-prone. The solution involves a content management system that automatically processes and stores content based on predefined user tags and folder associations. The system includes a server with a processor and memory storing executable instructions. A content management agent is deployed to a client device, where it runs as machine-executable code. This agent allows a user to designate a specific folder (either local or network-based) as a "watch folder" linked to a predefined set of information tags. When the user stores content in this watch folder, the agent automatically generates a storage request. The server then processes this request, applying the associated tags to the content and storing it according to the predefined rules. This automation reduces manual effort and ensures consistent categorization of digital assets. The system is particularly useful in environments where large volumes of content must be organized, such as enterprise document management or media libraries. By leveraging watch folders and automated tagging, the invention streamlines workflows and improves content retrieval efficiency.

Claim 15

Original Legal Text

15. The apparatus as set forth in claim 11 , wherein the processor is further configured to execute programmed instructions comprising and stored in the memory to generate information for one or more system tags, wherein the processing further comprises processing the information for the one or more system tags and the storing further comprises storing the information for the one or more system tags.

Plain English Translation

This invention relates to a data processing apparatus designed to enhance system tag management in computing environments. The apparatus includes a processor and memory, where the processor executes programmed instructions to generate, process, and store information for one or more system tags. System tags are metadata labels used to categorize, track, or manage data within a system, improving organization and retrieval efficiency. The processor handles the generation of tag-related data, processes this information to ensure consistency and accuracy, and stores it in memory for future reference. This functionality supports dynamic tagging operations, allowing the system to adapt to changing data structures or user requirements. The apparatus may also include additional components, such as input/output interfaces or communication modules, to facilitate tag data exchange with external systems or users. The invention addresses the need for efficient tag management in large-scale data environments, where manual tagging is impractical and automated solutions improve system performance and usability. By integrating tag generation, processing, and storage into a unified system, the apparatus ensures seamless tag lifecycle management, reducing errors and enhancing data accessibility.

Patent Metadata

Filing Date

Unknown

Publication Date

July 7, 2020

Inventors

Eric J. Leinberg

Clive R. Daunton

Jacob A. Constantinides

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search