US-9635049

Detection of suspicious domains through graph inference algorithm processing of host-domain contacts

PublishedApril 25, 2017

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A processing device comprises a processor coupled to a memory and is configured to obtain data relating to communications initiated by host devices of a computer network of an enterprise, and to process the data to identify external domains contacted by the host devices. A graph inference algorithm is applied to analyze contacts of the host devices with the external domains in order to characterize one or more of the external domains as suspicious domains. The host devices are configured to counteract malware infection from the suspicious domains. The graph inference algorithm in some embodiments comprises a belief propagation algorithm, which may be initiated with one or more seeds corresponding to respective known suspicious domains or to respective ones of the external domains determined to be associated with command and control behavior. The processing device may be implemented in the computer network or an associated network security system.

Patent Claims

22 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method comprising steps of: obtaining data relating to communications initiated by host devices of a computer network of an enterprise; processing the data to identify external domains contacted by the host devices in conjunction with the communications; applying a graph inference algorithm to analyze contacts of the host devices with the external domains in order to characterize one or more of the external domains as suspicious domains; and configuring one or more of the host devices to counteract malware infection from the suspicious domains; wherein the graph inference algorithm comprises a belief propagation algorithm; wherein the belief propagation algorithm models the contacts of the host devices with the external domains using a bipartite graph structure comprising: host device vertices corresponding to respective ones of the host devices; external domain vertices corresponding to respective ones of the external domains; and edges connecting particular ones of the host device vertices with particular ones of the external domain vertices; wherein applying the graph inference algorithm comprises generating a score for each of at least a subset of the external domains in a given one of a plurality of iterations of the belief propagation algorithm; wherein the scores are utilized to characterize the one or more external domains as suspicious domains in the given iteration; and wherein the steps are performed by at least one processing device comprising a processor coupled to a memory.

Plain English Translation

A method implemented on a computer identifies suspicious network domains. It gathers data on communications from computers within a company network and determines which external domains these computers contacted. A graph inference algorithm, specifically a belief propagation algorithm, then analyzes these contacts. This algorithm uses a bipartite graph where computers and external domains are vertices, and connections represent communication. The algorithm iteratively scores external domains, identifying potentially malicious domains. Finally, the method configures the company computers to protect against malware from these suspicious domains.

Claim 2

Original Legal Text

2. The method of claim 1 wherein a given one of the edges connects a given host device vertex to a given external domain vertex if the corresponding host device has contacted the corresponding external domain at least once during a specified observation window.

Plain English Translation

In the method for identifying suspicious domains, a connection between a company computer and an external domain is established in the analysis if that computer contacted that domain at least once within a defined period of time. This timeframe is called the "specified observation window", and it dictates the period in which the contacts are logged for analysis.

Claim 3

Original Legal Text

3. The method of claim 1 wherein the belief propagation algorithm is configured to identify particular ones of the external domains that are associated with a common attack campaign against the enterprise.

Plain English Translation

In the suspicious domain identification method, the belief propagation algorithm is designed to pinpoint external domains involved in coordinated attack campaigns against the company network. The algorithm identifies patterns of contact that suggest multiple computers are being targeted by the same malicious entity through these domains.

Claim 4

Original Legal Text

4. The method of claim 1 wherein the belief propagation algorithm is configured to update a set of rare domains in each of at least a subset of the plurality of iterations.

Plain English Translation

During the iterative analysis of the suspicious domain identification method, the belief propagation algorithm maintains and updates a list of "rare domains". This list contains external domains that are processed in each iteration to check against suspicious behavior.

Claim 5

Original Legal Text

5. The method of claim 4 wherein the set of rare domains comprises particular ones of the external domains that are contacted on only a relatively infrequent basis within a specified observation window by only a relatively small subset of the host devices.

Plain English Translation

Within the suspicious domain identification method, "rare domains" are defined as external domains that are contacted infrequently, within a specific timeframe, by only a small subset of computers in the network. These domains are unusual network contacts that warrant further investigation.

Claim 6

Original Legal Text

6. The method of claim 4 wherein in a given one of the plurality of iterations of the belief propagation algorithm a score is generated for each of the domains in the set of rare domains based at least in part on one or more of: a first set of features indicative of command and control behavior; and a second set of features indicative of similarity between the domain and one or more suspicious domains as determined in a previous iteration of the belief propagation algorithm; wherein the scores are utilized to characterize a subset of the set of rare domains as suspicious domains in the given iteration.

Plain English Translation

In the suspicious domain identification method, the belief propagation algorithm generates a score for each "rare domain" during each iteration. This score is based on features indicative of command and control activity (e.g., automated connections) and similarity to previously identified suspicious domains. The algorithm then labels a portion of these rare domains as suspicious based on their calculated scores.

Claim 7

Original Legal Text

7. The method of claim 6 wherein the first set of features indicative of command and control behavior comprises one or more of: presence of automated connections with regular timing patterns; number of host devices contacting the domain; fraction of host devices contacting the domain without a web referrer; presence of rare user-agent strings; fraction of host devices contacting the domain with no user-agent string or a rare user-agent string; number of days since registration of the domain; and number of days until expiration of registration of the domain.

Plain English Translation

Within the suspicious domain identification method, the "command and control" features used to score rare domains include: the presence of automatically timed connections, the total number of computers contacting the domain, the percentage of computers without web referrer information, the presence of unusual user-agent strings, the percentage of computers without a user-agent or with a rare one, the domain's registration age, and its registration expiration date.

Claim 8

Original Legal Text

8. The method of claim 6 wherein the second set of features indicative of similarity between the domain and one or more suspicious domains as determined in the previous iteration comprises one or more of: number of host devices contacting the domain; fraction of host devices contacting the domain without a web referrer; presence of rare user-agent strings; fraction of host devices contacting the domain with no user-agent string or a rare user-agent string; number of days since registration of the domain; number of days until expiration of registration of the domain; length of time between a given host device contacting one of the suspicious domains and the given host device contacting another one of the suspicious domains; and proximity in internet protocol address space between the domain and the one or more suspicious domains.

Plain English Translation

Within the suspicious domain identification method, the features indicating similarity between a rare domain and previously identified suspicious domains are: the number of computers contacting the domain, the fraction of computers contacting the domain without a web referrer, rare user-agent strings, the fraction of computers contacting the domain without user-agent or a rare one, days since/until registration, time between a host contacting a known suspicious domain and the rare domain, and IP address proximity to known suspicious domains.

Claim 9

Original Legal Text

9. The method of claim 6 wherein at least a subset of the first and second sets of features are weighted in accordance with weights determined by at least one linear regression model generated in a training phase.

Plain English Translation

In the suspicious domain identification method, the command-and-control and similarity features are weighted based on a linear regression model trained to identify malicious domains. These weights, which represent the relative importance of each feature, are determined during a training phase using known datasets.

Claim 10

Original Legal Text

10. The method of claim 6 wherein the belief propagation algorithm returns a list of suspicious domains ranked in order of their respective scores.

Plain English Translation

In the suspicious domain identification method, the belief propagation algorithm outputs a ranked list of suspicious domains. This ranking is based on the scores each domain received during the analysis, allowing security analysts to prioritize investigation efforts.

Claim 11

Original Legal Text

11. The method of claim 10 wherein the belief propagation algorithm terminates responsive to at least one of: a highest score among the scores of the suspicious domains being below a threshold; and a maximum number of iterations being reached.

Plain English Translation

In the suspicious domain identification method, the belief propagation algorithm stops when either the highest score among suspicious domains falls below a defined threshold, indicating diminishing returns, or the algorithm reaches a maximum number of iterations, preventing infinite loops.

Claim 12

Original Legal Text

12. The method of claim 1 wherein the belief propagation algorithm is configured for operation in a hint mode in which the belief propagation algorithm is initiated with one or more seeds corresponding to respective known suspicious domains.

Plain English Translation

The belief propagation algorithm within the suspicious domain identification method can operate in a "hint mode". In this mode, the algorithm is initialized with known suspicious domains as seeds, giving it a starting point for identifying similar malicious domains in the network.

Claim 13

Original Legal Text

13. The method of claim 12 wherein a given one of the seeds utilized to initiate the belief propagation algorithm in the hint mode is provided by a security operations center of the enterprise based at least in part on a corresponding indicator of compromise.

Plain English Translation

In the "hint mode" of the suspicious domain identification method, known suspicious domains used as seeds are provided by the company's security team. These seeds are based on indicators of compromise (IOCs), which are pieces of forensic data that identify potentially malicious activity.

Claim 14

Original Legal Text

14. The method of claim 1 wherein the belief propagation algorithm is configured for operation in a no-hint mode in which the belief propagation algorithm is initiated without any seeds corresponding to respective known suspicious domains.

Plain English Translation

The belief propagation algorithm in the suspicious domain identification method can also operate in a "no-hint mode". In this mode, the algorithm begins without any prior knowledge of suspicious domains, allowing it to discover new, previously unknown threats.

Claim 15

Original Legal Text

15. The method of claim 14 wherein in the no-hint mode the belief propagation algorithm is initiated with one or more seeds corresponding to respective ones of the external domains determined to being associated with command and control behavior.

Plain English Translation

In the "no-hint mode" of the suspicious domain identification method, the algorithm starts with seeds corresponding to external domains exhibiting command and control-like behaviors. This directs the algorithm towards domains behaving suspiciously, even without prior threat intelligence.

Claim 16

Original Legal Text

16. The method of claim 1 wherein obtaining data relating to communications initiated by host devices comprises obtaining at least a portion of the data from security logs of the enterprise wherein said security logs comprise at least one of: domain name service logs of the enterprise; and web proxy logs of the enterprise.

Plain English Translation

In the suspicious domain identification method, the data about computer communications is obtained from the company's security logs, specifically DNS logs and web proxy logs. These logs provide records of domain name resolutions and web traffic, essential for tracking computer-domain interactions.

Claim 17

Original Legal Text

17. A non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device: to obtain data relating to communications initiated by host devices of a computer network of an enterprise; to process the data to identify external domains contacted by the host devices in conjunction with the communications; to apply a graph inference algorithm to analyze contacts of the host devices with the external domains in order to characterize one or more of the external domains as suspicious domains; and to configure one or more of the host devices to counteract malware infection from the suspicious domains; wherein the graph inference algorithm comprises a belief propagation algorithm; wherein the belief propagation algorithm models the contacts of the host devices with the external domains using a bipartite graph structure comprising: host device vertices corresponding to respective ones of the host devices; external domain vertices corresponding to respective ones of the external domains; and edges connecting particular ones of the host device vertices with particular ones of the external domain vertices; wherein applying the graph inference algorithm comprises generating a score for each of at least a subset of the external domains in a given one of a plurality of iterations of the belief propagation algorithm; and wherein the scores are utilized to characterize the one or more external domains as suspicious domains in the given iteration.

Plain English Translation

A computer-readable storage medium (e.g., a hard drive or memory stick) stores software instructions that, when executed, perform the method of identifying suspicious domains. This involves gathering communication data from computers on a company network, identifying external domains contacted, applying a belief propagation graph inference algorithm to analyze these contacts (modeling them as a bipartite graph), generating scores for domains in each iteration of the algorithm, characterizing the domains as suspicious using these scores, and configuring the computers to counteract malware from the domains.

Claim 18

Original Legal Text

18. The processor-readable storage medium of claim 17 wherein the storage medium comprises at least one of an electronic memory and a storage disk.

Plain English Translation

The computer-readable storage medium described in the previous storage description is specifically an electronic memory device or a storage disk.

Claim 19

Original Legal Text

19. An apparatus comprising: at least one processing device comprising a processor coupled to a memory; said at least one processing device being configured: to obtain data relating to communications initiated by host devices of a computer network of an enterprise; to process the data to identify external domains contacted by the host devices in conjunction with the communications; to apply a graph inference algorithm to analyze contacts of the host devices with the external domains in order to characterize one or more of the external domains as suspicious domains; and to configure one or more of the host devices to counteract malware infection from the suspicious domains; wherein the graph inference algorithm comprises a belief propagation algorithm; wherein the belief propagation algorithm models the contacts of the host devices with the external domains using a bipartite graph structure comprising: host device vertices corresponding to respective ones of the host devices; external domain vertices corresponding to respective ones of the external domains; and edges connecting particular ones of the host device vertices with particular ones of the external domain vertices; wherein applying the graph inference algorithm comprises generating a score for each of at least a subset of the external domains in a given one of a plurality of iterations of the belief propagation algorithm; and wherein the scores are utilized to characterize the one or more external domains as suspicious domains in the given iteration.

Plain English Translation

An apparatus for identifying suspicious domains consists of a processor and memory. The processor is programmed to: obtain communication data from computers on the company network; identify external domains contacted; apply a belief propagation algorithm, modeling host-domain contacts as a bipartite graph, to analyze these contacts; iteratively score domains and identify suspicious ones based on scores; and configure computers to counteract malware from the suspicious domains.

Claim 20

Original Legal Text

20. The apparatus of claim 19 wherein the apparatus is implemented in a network security system.

Plain English Translation

The suspicious domain identification apparatus is part of a larger network security system, protecting the enterprise network from malicious activity.

Claim 21

Original Legal Text

21. The apparatus of claim 19 wherein the belief propagation algorithm is configured for operation in a hint mode in which the belief propagation algorithm is initiated with one or more seeds corresponding to respective known suspicious domains.

Plain English Translation

The apparatus for suspicious domain identification uses the belief propagation algorithm in a "hint mode", starting with known suspicious domains as seeds to identify similar threats.

Claim 22

Original Legal Text

22. The apparatus of claim 21 wherein a given one of the seeds utilized to initiate the belief propagation algorithm in the hint mode is provided by a security operations center of the enterprise based at least in part on a corresponding indicator of compromise.

Plain English Translation

In the "hint mode" operation of the suspicious domain identification apparatus, the initial suspicious domains used as seeds come from the company's security team, based on indicators of compromise.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

H04L

Patent Metadata

Filing Date

March 31, 2015

Publication Date

April 25, 2017

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search