Patentable/Patents/US-12008464

US-12008464

Neural network based face detection and landmark localization

PublishedJune 11, 2024

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Approaches are described for determining facial landmarks in images. An input image is provided to at least one trained neural network that determines a face region (e.g., bounding box of a face) of the input image and initial facial landmark locations corresponding to the face region. The initial facial landmark locations are provided to a 3D face mapper that maps the initial facial landmark locations to a 3D face model. A set of facial landmark locations are determined from the 3D face model. The set of facial landmark locations are provided to a landmark location adjuster that adjusts positions of the set of facial landmark locations based on the input image. The input image is presented on a user device using the adjusted set of facial landmark locations.

Patent Claims

11 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 2

Original Legal Text

2. The method of claim 1, wherein generating the refined facial landmark locations comprises adjusting a first number of facial landmarks at the initial facial landmark locations to a second number of facial landmarks at the refined facial landmark locations, wherein the first number of facial landmarks is different than the second number of facial landmarks.

Plain English Translation

This invention relates to facial landmark detection, a technique used in computer vision to identify key points on a face, such as the corners of the eyes, nose, and mouth. The problem addressed is the variability in facial landmark detection accuracy, particularly when the initial set of landmarks does not align well with the facial features due to occlusions, lighting conditions, or facial expressions. The method involves refining an initial set of facial landmark locations to improve detection accuracy. The refinement process adjusts the number of landmarks, converting a first set of landmarks into a second set where the count differs. This adjustment allows the system to dynamically adapt to different facial structures or conditions, ensuring more precise landmark placement. For example, if the initial detection misses certain features due to occlusion, the refinement step may add landmarks to better represent the visible parts of the face. Conversely, if redundant landmarks are detected, the refinement step may reduce their number to eliminate noise. The refinement process may involve techniques such as interpolation, extrapolation, or machine learning-based adjustments to reposition or add/remove landmarks. This ensures that the final set of landmarks accurately represents the facial features, improving applications like facial recognition, emotion detection, or augmented reality. The method enhances the robustness of facial landmark detection systems by dynamically adapting the landmark count based on the input image's conditions.

Claim 4

Original Legal Text

4. The method of claim 1, further comprising classifying, using a face region classifier neural network, a candidate region of the input image as containing the face and providing a representation of a corresponding boundary of the candidate region as the representation of the first boundary, to the neural network, based on the candidate region being classified as containing the face.

Plain English Translation

This invention relates to computer vision systems for detecting and classifying facial regions in images. The problem addressed is the accurate identification and localization of faces within digital images, which is essential for applications such as surveillance, biometric authentication, and facial recognition. The invention improves upon prior methods by incorporating a specialized neural network to classify candidate regions and refine boundary representations. The method involves analyzing an input image to identify potential regions that may contain a face. A face region classifier neural network is then used to evaluate these candidate regions. If the neural network determines that a candidate region contains a face, it generates a representation of the boundary of that region. This boundary representation is then used to define the face's location within the image. The neural network is trained to distinguish between regions that contain faces and those that do not, ensuring higher accuracy in face detection. The boundary representation may include coordinates or other geometric data that precisely outline the detected face. This approach enhances traditional face detection by leveraging deep learning techniques to improve classification accuracy and boundary precision. The method is particularly useful in scenarios where multiple faces or partial faces are present, as the classifier can distinguish valid face regions from non-face regions more effectively. The use of a neural network allows for adaptive learning and improved performance over time, making the system more robust in varying conditions.

Claim 5

Original Legal Text

5. The method of claim 1, wherein predicting the adjusted bounding box comprises predicting confidence scores for calibration patterns representing components of an adjustment to an initial bounding box of the face.

Plain English Translation

This invention relates to computer vision techniques for improving the accuracy of facial bounding box detection. The problem addressed is the challenge of precisely localizing facial features within an initial bounding box, which may not perfectly align with the actual face due to variations in pose, lighting, or occlusion. The solution involves predicting an adjusted bounding box by analyzing calibration patterns that represent adjustments to the initial bounding box. These calibration patterns correspond to different components of the adjustment, such as horizontal, vertical, or rotational shifts. The method predicts confidence scores for each calibration pattern, which indicate the likelihood that a particular adjustment component is needed. These scores are then used to refine the initial bounding box, resulting in a more accurate final bounding box that better captures the face. The approach leverages machine learning models trained on annotated facial data to predict the necessary adjustments, improving detection accuracy in real-world applications such as facial recognition, augmented reality, and biometric authentication. The invention enhances the robustness of facial detection systems by dynamically adjusting the bounding box based on learned calibration patterns.

Claim 6

Original Legal Text

6. The method of claim 1, wherein causing the presentation of the representation of the input image using the refined facial landmark locations comprises applying image processing to the face represented by the refined facial landmark locations.

Plain English Translation

This invention relates to facial landmark detection and image processing in computer vision systems. The problem addressed is the need for accurate facial landmark localization to enhance image processing tasks such as facial recognition, expression analysis, or augmented reality applications. Existing methods may struggle with precise landmark detection due to variations in lighting, pose, or occlusions, leading to suboptimal image processing results. The invention describes a method for refining facial landmark locations and applying image processing to the detected facial region. Initially, an input image containing a face is analyzed to detect initial facial landmark locations. These landmarks are then refined using a machine learning model trained to correct inaccuracies in the initial detection. The refined landmarks define the boundaries and key features of the face, such as eyes, nose, and mouth. Once the landmarks are refined, image processing techniques are applied to the face region defined by these landmarks. This processing may include enhancing facial features, applying filters, or generating a modified representation of the face. The refined landmarks ensure that the image processing is accurately targeted to the facial region, improving the quality and relevance of the output. The method is particularly useful in applications requiring high-precision facial analysis or augmentation.

Claim 7

Original Legal Text

7. The method of claim 1, wherein generating the refined facial landmark locations comprises predicting, for each of the refined facial landmark locations, a respective two-dimensional point in the input image.

Plain English Translation

The invention relates to facial landmark detection, a technique used in computer vision to identify key points on a face, such as eyes, nose, and mouth, for applications like facial recognition, animation, and emotion analysis. A common challenge is accurately locating these landmarks in varying lighting, poses, and expressions, where initial predictions may be imprecise. The method improves landmark detection by refining initial predictions. It processes an input image containing a face and generates refined facial landmark locations. For each landmark, the method predicts a two-dimensional point in the image, adjusting the position based on contextual or learned data to enhance accuracy. This refinement step corrects errors from initial detections, ensuring landmarks align more precisely with facial features. The method may involve deep learning models or geometric adjustments, where the refinement process leverages additional image features or pre-trained models to predict corrected landmark positions. By dynamically adjusting each landmark's location, the system achieves higher precision, making it suitable for real-time applications requiring reliable facial analysis. The technique addresses limitations in traditional landmark detection by iteratively improving predictions, ensuring robustness across diverse conditions.

Claim 11

Original Legal Text

11. The one or more non-transitory computer-readable media of claim 8, wherein determining the second set of predicted facial landmark locations comprises identifying a different number of facial landmarks locations than the initial facial landmark locations.

Plain English Translation

This invention relates to computer vision systems for facial landmark detection, addressing challenges in accurately identifying and tracking facial features under varying conditions. The system improves upon traditional facial landmark detection by dynamically adjusting the number of detected landmarks based on input data or processing requirements. The core process involves an initial detection of facial landmarks, followed by a refined prediction of a second set of landmarks. Unlike conventional methods that rely on a fixed number of landmarks, this approach identifies a different number of landmarks in the second set compared to the initial detection. This adaptability enhances accuracy and efficiency, particularly in scenarios where certain facial features may be obscured or less relevant. The system leverages machine learning models trained on diverse datasets to predict landmark locations, with the second set of predictions incorporating additional contextual or environmental factors. This dynamic adjustment allows the system to optimize performance for different applications, such as facial recognition, expression analysis, or augmented reality, where flexibility in landmark detection is critical. The invention improves upon prior art by providing a more robust and adaptable solution for facial landmark detection in real-world scenarios.

Claim 12

Original Legal Text

12. The one or more non-transitory computer-readable media of claim 8, wherein the neural network is configured to predict the adjusted bounding box as a scale change or offset vector applied to an initial bounding box of the face.

Plain English Translation

The invention relates to computer vision systems for face detection and tracking, specifically improving the accuracy of bounding box predictions for faces in images or video frames. The problem addressed is the challenge of precisely localizing faces in varying conditions, such as different lighting, angles, or occlusions, where initial bounding box estimates may be inaccurate. The solution involves a neural network that refines these initial bounding boxes by predicting adjustments as either scale changes or offset vectors. The neural network processes input data, including the initial bounding box coordinates and additional contextual features, to generate a refined bounding box that better aligns with the actual face region. This approach enhances tracking performance by dynamically adjusting the bounding box parameters, ensuring more reliable face detection and tracking in real-world applications. The system is particularly useful in applications like surveillance, augmented reality, and biometric authentication, where precise face localization is critical. The neural network's ability to learn and apply adjustments as scale changes or offset vectors allows for flexible and accurate refinements, improving overall system robustness.

Claim 13

Original Legal Text

13. The one or more non-transitory computer-readable media of claim 8, wherein causing the presentation of the representation of the input image comprises applying image processing to the face represented by the refined facial landmark locations.

Plain English Translation

This invention relates to computer vision and image processing, specifically for enhancing facial recognition and representation in digital images. The problem addressed is the need for accurate and refined facial landmark detection to improve the quality of facial representations in applications such as facial recognition, augmented reality, and biometric authentication. The invention involves a system that processes an input image containing a face to extract and refine facial landmark locations. These landmarks are key points on the face, such as the corners of the eyes, nose, and mouth, which are crucial for accurate facial analysis. The system applies image processing techniques to the face represented by these refined landmarks to generate a high-quality representation of the face. This processing may include normalization, alignment, or other enhancements to ensure the facial representation is optimized for further analysis or display. The refined facial landmarks are used to improve the accuracy of facial recognition algorithms or to generate a more precise facial model for applications like facial animation or identity verification. The system ensures that the facial representation is both visually accurate and computationally efficient, making it suitable for real-time applications. The invention enhances the reliability and performance of facial recognition systems by providing a more detailed and refined input for subsequent processing steps.

Claim 15

Original Legal Text

15. The system of claim 14, further comprising an image processor configured to use the one or more hardware processors to process the input image using the refined facial landmark locations to generate a processed input image, wherein the presentation component is configured to cause the presentation of the processed input image.

Plain English Translation

This system operates in the domain of facial landmark detection and image processing, addressing the challenge of accurately identifying and refining facial features in images for applications such as facial recognition, augmented reality, or biometric analysis. The system includes a facial landmark detector that analyzes an input image to identify initial facial landmark locations, which may include key points such as eyes, nose, and mouth. These initial landmarks are then refined using a refinement module that adjusts their positions based on additional image data or machine learning models to improve accuracy. The refined landmarks are used by an image processor to modify the input image, such as applying filters, enhancing features, or generating augmented reality overlays. The processed image is then presented to the user via a display or output interface. The system ensures precise facial feature detection and processing, enabling high-quality image manipulation and analysis. The refinement step enhances the robustness of the landmark detection, reducing errors caused by variations in lighting, pose, or occlusion. The image processor applies transformations or effects based on the refined landmarks, ensuring seamless integration of modifications with the original image. This system is particularly useful in applications requiring real-time facial analysis and processing, such as virtual try-on, emotion recognition, or security systems.

Claim 16

Original Legal Text

16. The system of claim 14, wherein the joint calibration and alignment neural network includes a common fully connected layer configured to generate a representation of both the adjusted bounding box and the initial facial landmark locations.

Plain English Translation

A system for facial landmark detection and alignment uses a joint calibration and alignment neural network to improve accuracy. The system addresses challenges in precisely locating facial landmarks, such as eyes, nose, and mouth, which are often distorted or misaligned in images. The neural network processes an initial set of facial landmark locations and a bounding box that roughly defines the face's position. The network includes a common fully connected layer that generates a unified representation of both the adjusted bounding box and the refined landmark locations. This shared representation allows the system to simultaneously optimize the alignment of the bounding box and the accuracy of the landmark positions. The network may also include separate branches for processing the bounding box and landmarks, which are then merged in the common layer. The system improves upon traditional methods by reducing errors caused by misalignment and providing more consistent landmark detection across varying facial poses and expressions. The approach is particularly useful in applications requiring high-precision facial analysis, such as biometric authentication, augmented reality, and emotion recognition.

Claim 17

Original Legal Text

17. The system of claim 14, wherein the landmark location refiner comprises a landmark location adjuster configured to use the one or more hardware processors to adjust positions of a set of facial landmark locations corresponding to the initial facial landmark locations to generate the refined facial landmark locations.

Plain English Translation

This invention relates to facial landmark detection and refinement in image processing. The system addresses the challenge of accurately identifying and refining key facial features, such as eyes, nose, and mouth, in digital images or video frames. Facial landmark detection is crucial for applications like facial recognition, emotion analysis, and augmented reality, but initial detections often contain errors due to occlusions, lighting variations, or low-resolution input. The system improves upon prior methods by refining these initial landmark locations to enhance precision. The system includes a landmark location refiner that processes initial facial landmark locations detected by a separate module. The refiner contains a landmark location adjuster, which uses one or more hardware processors to adjust the positions of these initial landmarks. The adjuster refines the coordinates of each landmark in a set, correcting misalignments or inaccuracies in the initial detection. This refinement process may involve techniques such as geometric constraints, statistical modeling, or machine learning-based adjustments to ensure the landmarks accurately represent the facial structure. The refined landmarks are then output for further use in applications requiring precise facial feature localization. The system ensures higher accuracy in facial landmark detection, improving downstream tasks like facial recognition and expression analysis.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06N G06F G06T G06V

Patent Metadata

Filing Date

November 16, 2017

Publication Date

June 11, 2024

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search