Patentable/Patents/US-9594783
US-9594783

Index selection for XML database systems

PublishedMarch 14, 2017
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

A method, computer-implemented system, and computer program product for creating indexes over XML data managed by a database system are provided. The method, computer-implemented system, and computer program product provide for receiving a workload for the XML data, the workload including one or more database statements, utilizing an optimizer of the database system to enumerate a set of one or more path expressions by creating a virtual universal index based on the workload received and matching a path expression to the virtual universal index, and recommending one or more path expressions from the set of one or more candidate path expressions to create the indexes over the XML data.

Patent Claims
20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method for creating indexes over XML data managed by a database system, the method comprising: receiving a workload for the XML data, the workload including one or more database statements; utilizing an optimizer of the database system to enumerate a set of one or more candidate path expressions by creating virtual universal indexes for an element, an attribute, and a data type in the XML data and matching a path expression to one of the virtual universal indexes; and recommending one or more path expressions from the set of one or more candidate path expressions to create the indexes over the XML data based on a system constraint, an estimated benefit associated with each candidate path expression, and an estimated size of an index to be created using each candidate path expression.

Plain English Translation

A method for optimizing XML database indexing involves analyzing database query workloads to suggest effective indexes. The process includes receiving a workload of database statements for XML data, then using the database system's optimizer to identify potential "path expressions" (access paths within the XML data) suitable for indexing. This is achieved by creating temporary, virtual indexes for elements, attributes, and data types, and matching query path expressions to these virtual indexes. Finally, the method recommends path expressions for actual index creation, based on system constraints like available disk space, the estimated performance benefit each index would provide, and the estimated size of each index.

Claim 2

Original Legal Text

2. The method of claim 1 , further comprising: generalizing the set of one or more candidate path expressions to generate additional candidate path expressions; utilizing the optimizer to estimate the benefit associated with each candidate path expression in relation to the workload received; and utilizing the optimizer to estimate a size of an index to be created using each candidate path expression.

Plain English Translation

The indexing method refines its recommendations by further processing the initial set of candidate path expressions. It generalizes the initial set of candidate path expressions to generate additional candidate path expressions. The optimizer then estimates both the performance benefit and the storage size for each candidate path expression, including the newly generalized ones, in relation to the workload received. This more comprehensive analysis ensures better index selection. The benefit estimation evaluates how much faster the queries would run with the index, and the size estimation predicts the index's disk footprint.

Claim 3

Original Legal Text

3. The method of claim 1 , wherein utilizing the optimizer to enumerate the set of one or more candidate path expressions comprises: sending each of the one or more database statements of the workload to the optimizer for the optimizer to: create one or more virtual universal indexes based upon one or more path expressions over the XML data, a virtual universal index being created for each data type, element, and attribute in the database statement being processed, match one or more path expressions in the database statement being processed to the one or more virtual universal indexes, and enumerate each of the one or more path expressions matched as the set of one or more candidate path expressions.

Plain English Translation

The method uses the database optimizer to find indexing candidates by submitting each database statement from the workload to the optimizer. For each statement, the optimizer creates virtual "universal indexes" covering every data type, element, and attribute present in the statement's XML data. Then, the optimizer attempts to match path expressions within the statement to these virtual indexes. Every path expression successfully matched against a virtual index is then added to the set of potential indexing candidates. This effectively identifies the XML access patterns most used by the submitted queries.

Claim 4

Original Legal Text

4. The method of claim 2 , wherein generalizing the set of one or more candidate path expressions comprises: selecting a first candidate path expression and a second candidate path expression; determining whether the first candidate path expression and the second candidate path expression have one or more common sub-expressions; responsive to the first candidate path expression and the second candidate path expression having one or more common sub-expressions, returning each of the one or more common sub-expressions as an additional candidate path expression; and responsive to the first candidate path expression and the second candidate path expression not having one or more common sub-expressions, modifying each of the first and second candidate path expressions by replacing a last non-* navigating step in each of the first and second candidate path expressions with a *navigation step, and returning the modified first and second candidate path expressions as the additional candidate path expressions.

Plain English Translation

The method improves its candidate index set by generalizing existing expressions. It selects two path expressions and checks if they share common sub-expressions. If so, it returns those common sub-expressions as new index candidates, promoting reuse. If not, it modifies the original expressions by replacing their last specific step with a wildcard (`*`) navigation step and returns these modified expressions as new candidates. This generalization aims to cover a broader range of similar queries with fewer indexes.

Claim 5

Original Legal Text

5. The method of claim 1 , wherein utilizing the optimizer to estimate the benefit associated with each candidate path expression comprises: sending each candidate path expression to the optimizer for the optimizer to: estimate a cardinality and a size of an index created over the XML data using the candidate path expression, calculate a cost of executing the one or more database statements in the workload when the index is available (a first cost), calculate a cost of executing the one or more database statements in the workload when the index is not available (a second cost), calculate a cost of maintaining the index with respect to an update, a deletion, or an insertion (a third cost), and subtract from the second cost a sum of the first cost and the third cost to obtain the estimated benefit associated with the candidate path expression.

Plain English Translation

To accurately estimate the benefit of creating an index from a path expression, the method sends each candidate path expression to the database optimizer. The optimizer estimates the cardinality and size of the index. It calculates the query execution cost with the hypothetical index enabled (first cost) and disabled (second cost). It also calculates the index maintenance cost considering updates, deletions, and insertions (third cost). The estimated benefit is then calculated by subtracting the sum of the first cost (execution with index) and the third cost (maintenance) from the second cost (execution without index).

Claim 6

Original Legal Text

6. The method of claim 1 , wherein recommending one or more path expressions from the set of one or more candidate path expressions comprises: sorting the set of one or more candidate path expressions according to a benefit-to-size ratio from highest to lowest; and starting from a highest benefit-to-size ratio candidate path expression, adding a candidate path expression to the set of one or more candidate path expressions in sort order, unless an index to be created using the candidate path expression will not fit into an available disk budget, and until one or more indexes to be created using the set of one or more candidate path expressions will exhaust the available disk budget.

Plain English Translation

The method recommends indexes based on a benefit-to-size ratio, choosing the most efficient options first. It sorts candidate path expressions by this ratio, from highest to lowest. Starting with the highest ratio, it adds candidates to the recommendation list as long as the estimated size of each index, and the combined size of all selected indexes, fits within the available disk budget. This greedy approach prioritizes high-impact, low-storage indexes to maximize performance gains within resource limits.

Claim 7

Original Legal Text

7. The method of claim 1 , wherein recommending one or more path expressions from the set of one or more candidate path expressions comprises: constructing a directed acyclic graph (DAG) using the set of one or more candidate path expressions, each node of the DAG corresponding to one candidate path expression; selecting one or more nodes of the DAG, wherein candidate path expressions corresponding to the one or more nodes selected are included in the one or more path expressions; iteratively replacing at least one of the one or more nodes selected with one or more child nodes until one or more indexes to be created using the one or more path expressions corresponding to the one or more child nodes will fit within an available disk budget; and replacing each candidate path expression included in the set of one or more candidate path expressions corresponding to the at least one of the one or more nodes with the one or more path expressions corresponding to the one or more child nodes replacing the at least one of the one or more nodes.

Plain English Translation

The method recommends indexes using a Directed Acyclic Graph (DAG) to represent path expression relationships. Each candidate path expression becomes a node in the DAG. The method selects initial nodes and iteratively refines the selection. In each iteration, it replaces selected nodes with their child nodes (more specific path expressions) if doing so allows the resulting index set to fit within the available disk budget. This process ensures that the final recommended indexes are both beneficial and fit within the system's storage constraints, favoring more specific and efficient indexes where possible.

Claim 8

Original Legal Text

8. A computer-implemented system for creating indexes over XML data managed by a database system, the system comprising: a processor; a database storing the XML data; an optimizer in communication with the database, the optimizer optimizing database statements seeking access to the XML data stored in the database; and an index advisor in communication with the optimizer, wherein the index advisor: receives a workload for the XML data, the workload including one or more database statements; utilizes an optimizer of the database system to enumerate a set of one or more candidate path expressions by creating virtual universal indexes for an element, an attribute, and a data type in the XML data and matching a path expression to one of the virtual universal indexes; and recommends one or more path expressions from the set of one or more candidate path expressions to create the indexes over the XML data based on a system constraint, an estimated benefit associated with each candidate path expression, and an estimated size of an index to be created using each candidate path expression.

Plain English Translation

A computer system automates XML database indexing. It contains a processor, a database storing XML data, a query optimizer, and an "index advisor." The index advisor receives a workload of database statements and uses the optimizer to enumerate potential path expressions for indexing. It creates virtual indexes for XML elements, attributes, and data types, and matches query path expressions against these. The advisor recommends path expressions for index creation, considering system constraints (like disk space), the estimated benefit of each potential index, and its estimated size.

Claim 9

Original Legal Text

9. The computer-implemented system of claim 8 , further wherein the index advisor: generalizes the set of one or more candidate path expressions to generate additional candidate path expressions; utilizes the optimizer to estimate the benefit associated with each candidate path expression in relation to the workload received; and utilizes the optimizer to estimate a size of an index to be created using each candidate path expression.

Plain English Translation

The computer system's index advisor improves its recommendations by generalizing candidate path expressions to generate additional candidates. It then uses the query optimizer to estimate both the performance benefit and the storage size for each candidate index, including the newly generalized ones. This comprehensive analysis ensures better index selection by considering a wider range of possibilities. This is done in relation to the workload the database is currently experiencing.

Claim 10

Original Legal Text

10. The computer-implemented system of claim 8 , wherein the index advisor utilizes the optimizer of the database system to enumerate the set of one or more candidate path expressions by: sending each of the one or more database statements of the workload to the optimizer for the optimizer to: create one or more virtual universal indexes based upon one or more path expressions over the XML data, a virtual universal index being created for each data type, element, and attribute in the database statement being processed, match one or more path expressions in the database statement being processed to the one or more virtual universal indexes, and enumerate each of the one or more path expressions matched as the set of one or more candidate path expressions.

Plain English Translation

The index advisor instructs the database optimizer to find indexing candidates by submitting each database statement from the workload. For each statement, the optimizer creates virtual indexes covering every data type, element, and attribute. Then, the optimizer attempts to match path expressions within the statement to these virtual indexes. Every path expression successfully matched against a virtual index is then considered an indexing candidate. This finds frequently used access patterns within the queries.

Claim 11

Original Legal Text

11. The computer-implemented system of claim 9 , wherein the index advisor generalizes the set of one or more candidate path expressions by: selecting a first candidate path expression and a second candidate path expression; determining whether the first candidate path expression and the second candidate path expression have one or more common sub-expressions; responsive to the first candidate path expression and the second candidate path expression having one or more common sub-expressions, returning each of the one or more common sub-expressions as an additional candidate path expression; and responsive to the first candidate path expression and the second candidate path expression not having one or more common sub-expressions, modifying each of the first and second candidate path expressions by replacing a last non-* navigating step in each of the first and second candidate path expressions with a * navigation step, and returning the modified first and second candidate path expressions as the additional candidate path expressions.

Plain English Translation

The index advisor refines its candidate index set by generalizing existing expressions. It selects two path expressions and checks if they share common sub-expressions. If so, it returns those common sub-expressions as new index candidates. If not, it modifies the original expressions by replacing their last specific step with a wildcard (`*`) navigation step and returns these modified expressions as new candidates. This generalization aims to cover a broader range of similar queries with fewer indexes.

Claim 12

Original Legal Text

12. The computer-implemented system of claim 8 , wherein the index advisor utilizes the optimizer to estimate the benefit associated with each candidate path expression by: sending each candidate path expression to the optimizer for the optimizer to: estimate a cardinality and a size of an index created over the XML data using the candidate path expression, calculate a cost of executing the one or more database statements in the workload when the index is available (a first cost), calculate a cost of executing the one or more database statements in the workload when the index is not available (a second cost), calculate a cost of maintaining the index with respect to an update, a deletion, or an insertion (a third cost), and subtract from the second cost a sum of the first cost and the third cost to obtain the estimated benefit associated with the candidate path expression.

Plain English Translation

To accurately estimate the benefit of creating an index from a path expression, the index advisor sends each candidate path expression to the database optimizer. The optimizer estimates the cardinality and size of the index. It calculates the query execution cost with the hypothetical index enabled (first cost) and disabled (second cost). It also calculates the index maintenance cost considering updates, deletions, and insertions (third cost). The estimated benefit is then calculated by subtracting the sum of the first cost (execution with index) and the third cost (maintenance) from the second cost (execution without index).

Claim 13

Original Legal Text

13. The computer-implemented system of claim 8 , wherein the index advisor recommends one or more path expressions from the set of one or more candidate path expressions by: sorting the set of one or more candidate path expressions according to a benefit-to-size ratio from highest to lowest; and starting from a highest benefit-to-size ratio candidate path expression, adding a candidate path expression to the set of one or more candidate path expressions in sort order, unless an index to be created using the candidate path expression will not fit into an available disk budget, and until one or more indexes to be created using the set of one or more candidate path expressions will exhaust the available disk budget.

Plain English Translation

The index advisor recommends indexes based on a benefit-to-size ratio, choosing the most efficient options first. It sorts candidate path expressions by this ratio, from highest to lowest. Starting with the highest ratio, it adds candidates to the recommendation list as long as the estimated size of each index, and the combined size of all selected indexes, fits within the available disk budget. This greedy approach prioritizes high-impact, low-storage indexes to maximize performance gains within resource limits.

Claim 14

Original Legal Text

14. The computer-implemented system of claim 8 , wherein the index advisor recommends one or more path expressions from the set of one or more candidate path expressions by: constructing a directed acyclic graph (DAG) using the set of one or more candidate path expressions, each node of the DAG corresponding to one candidate path expression; selecting one or more nodes of the DAG, wherein candidate path expressions corresponding to the one or more nodes selected are included in the one or more path expressions; iteratively replacing at least one of the one or more nodes selected with one or more child nodes until one or more indexes to be created using the one or more path expressions corresponding to the one or more child nodes will fit within an available disk budget; and replacing each candidate path expression included in the set of one or more candidate path expressions corresponding to the at least one of the one or more nodes with the one or more path expressions corresponding to the one or more child nodes replacing the at least one of the one or more nodes.

Plain English Translation

The index advisor recommends indexes using a Directed Acyclic Graph (DAG) to represent path expression relationships. Each candidate path expression becomes a node in the DAG. The advisor selects initial nodes and iteratively refines the selection. In each iteration, it replaces selected nodes with their child nodes (more specific path expressions) if doing so allows the resulting index set to fit within the available disk budget. This ensures the final recommended indexes are beneficial and fit the system's storage constraints, favoring specific, efficient indexes where possible.

Claim 15

Original Legal Text

15. A computer program product comprising a non-transitory computer-readable medium, the computer-readable medium being encoded with a computer program for creating indexes over XML data managed by a database system, wherein the computer program, when executed on a computer, causes the computer to: receive a workload for the XML data, the workload including one or more database statements; utilize an optimizer of the database system to enumerate a set of one or more candidate path expressions by creating v virtual universal indexes for an element, an attribute, and a data type in the XML data and matching a path expression to one of the virtual universal indexes; and recommend one or more path expressions from the set of one or more candidate path expressions to create the indexes over the XML data based on a system constraint, an estimated benefit associated with each candidate path expression, and an estimated size of an index to be created using each candidate path expression.

Plain English Translation

A computer program, stored on a non-transitory medium, automates XML database indexing. When executed, the program receives a workload of database statements, uses the database system's optimizer to identify potential path expressions for indexing by creating virtual indexes for XML elements, attributes, and data types, and matching query path expressions against these virtual indexes. Finally, the program recommends path expressions for actual index creation, considering system constraints like available disk space, the estimated performance benefit each index provides, and the estimated size of each index.

Claim 16

Original Legal Text

16. The computer program product of claim 15 , further wherein the computer program, when executed on the computer, causes the computer to: generalize the set of one or more candidate path expressions to generate additional candidate path expressions; utilize the optimizer to estimate the benefit associated with each candidate path expression in relation to the workload received; and utilize the optimizer to estimate a size of an index to be created using each candidate path expression.

Plain English Translation

The computer program extends its index recommendations by generalizing candidate path expressions to generate additional candidates. It utilizes the database optimizer to estimate both the performance benefit and the storage size for each candidate index (original and generalized) relative to the database workload. This comprehensive analysis of the candidate indexes makes sure to select the best indices.

Claim 17

Original Legal Text

17. The computer program product of claim 16 , wherein generalization of the set of one or more candidate path expressions comprises: selecting a first candidate path expression and a second candidate path expression; determining whether the first candidate path expression and the second candidate path expression have one or more common sub-expressions; responsive to the first candidate path expression and the second candidate path expression having one or more common sub-expressions, returning each of the one or more common sub-expressions as an additional candidate path expression; and responsive to the first candidate path expression and the second candidate path expression not having one or more common sub-expressions, modifying each of the first and second candidate path expressions by replacing a last non-* navigating step in each of the first and second candidate path expressions with a * navigation step, and returning the modified first and second candidate path expressions as the additional candidate path expressions.

Plain English Translation

The computer program improves its candidate index set by generalizing existing expressions. It selects two path expressions and checks if they share common sub-expressions. If so, it returns those common sub-expressions as new index candidates. If not, it modifies the original expressions by replacing their last specific step with a wildcard (`*`) navigation step and returns these modified expressions as new candidates. This generalization aims to cover a broader range of similar queries with fewer indexes.

Claim 18

Original Legal Text

18. The computer program product of claim 15 , wherein utilization of the optimizer to estimate the benefit associated with each candidate path expression comprises: sending each candidate path expression to the optimizer for the optimizer to: estimate a cardinality and a size of an index created over the XML data using the candidate path expression, calculate a cost of executing the one or more database statements in the workload when the index is available (a first cost), calculate a cost of executing the one or more database statements in the workload when the index is not available (a second cost), calculate a cost of maintaining the index with respect to an update, a deletion, or an insertion (a third cost), and subtract from the second cost a sum of the first cost and the third cost to obtain the estimated benefit associated with the candidate path expression.

Plain English Translation

To accurately estimate the benefit of creating an index from a path expression, the computer program sends each candidate path expression to the database optimizer. The optimizer estimates the cardinality and size of the index. It calculates the query execution cost with the hypothetical index enabled (first cost) and disabled (second cost). It also calculates the index maintenance cost considering updates, deletions, and insertions (third cost). The estimated benefit is then calculated by subtracting the sum of the first cost (execution with index) and the third cost (maintenance) from the second cost (execution without index).

Claim 19

Original Legal Text

19. The computer program product of claim 15 wherein to recommend one or more path expressions from the set of one or more candidate path expressions comprises: sorting the set of one or more candidate path expressions according to a benefit-to-size ratio from highest to lowest; and starting from a highest benefit-to-size ratio candidate path expression, adding a candidate path expression to the set of one or more candidate path expressions in sort order, unless an index to be created using the candidate path expression will not fit into an available disk budget, and until one or more indexes to be created using the set of one or more candidate path expressions will exhaust the available disk budget.

Plain English Translation

The computer program recommends indexes based on a benefit-to-size ratio, choosing the most efficient options first. It sorts candidate path expressions by this ratio, from highest to lowest. Starting with the highest ratio, it adds candidates to the recommendation list as long as the estimated size of each index, and the combined size of all selected indexes, fits within the available disk budget. This greedy approach prioritizes high-impact, low-storage indexes to maximize performance gains within resource limits.

Claim 20

Original Legal Text

20. The computer program product of claim 15 , wherein to recommend one or more path expressions from the set of one or more candidate path expressions comprises: constructing a directed acyclic graph (DAG) using the set of one or more candidate path expressions, each node of the DAG corresponding to one candidate path expression; selecting one or more nodes of the DAG, wherein candidate path expressions corresponding to the one or more nodes selected are included in the one or more path expressions; iteratively replacing at least one of the one or more nodes selected with one or more child nodes until one or more indexes to be created using the one or more path expressions corresponding to the one or more child nodes will fit within an available disk budget; and replacing each candidate path expression included in the set of one or more candidate path expressions corresponding to the at least one of the one or more nodes with the one or more path expressions corresponding to the one or more child nodes replacing the at least one of the one or more nodes.

Plain English Translation

The computer program recommends indexes using a Directed Acyclic Graph (DAG) to represent path expression relationships. Each candidate path expression becomes a node in the DAG. The program selects initial nodes and iteratively refines the selection. In each iteration, it replaces selected nodes with their child nodes (more specific path expressions) if doing so allows the resulting index set to fit within the available disk budget. This ensures the final recommended indexes are beneficial and fit the system's storage constraints, favoring specific, efficient indexes where possible.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

June 28, 2012

Publication Date

March 14, 2017

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Index selection for XML database systems” (US-9594783). https://patentable.app/patents/US-9594783

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/US-9594783. See llms.txt for full attribution policy.