Let’s say you have two pictures of the same scene taken from different angles. Many things in both pictures are the same, you just look at them from different angles. From a computer perspective, objects are considered to have specific features such as edges, corners, etc. Matching these features is important for some applications. But what is needed to match the features between two images?
Finding connections between images is a requirement for estimating 3D structures and camera positions in computer vision tasks such as spatial modeling and mapping (SLAM) and structure-from-motion (SfM). This is done by matching local features, and is tricky to achieve due to changes in lighting conditions, lighting, dimming, etc.
Traditionally, feature matching is done in a two-step process. First, i forward step extracts visual features from images. Second, i back-to-back The step applies mass correction and shape estimation to help match the extracted physical features. Once this is completed, the features are ready, and the feature matching is modeled as an assignment queue problem.
As in all other domains, deep neural networks have played an important role in recent years in simulation problems. They have been used to learn better sparse detectors and local descriptors from data using convolutional neural networks (CNNs).
However, they were often part of the feature matching problem, not the end solution. What if a single neural network could perform content integration, matching, and filtering in a single architecture? Time to introduce SuperGlue.
SuperGlue methods introduce matching problems in a different way. It learns the matching process to pre-existing local features using a graph neural network structure. This replaces existing methods where first, task-agnostic features are learned, and compared using heuristics and simple methods. Being an end-to-end method gives SuperGlue a powerful advantage over existing methods. SuperGlue is easy to read at the middle end which can be used to improve existing methods.
So how does SuperGlue accomplish this? It reaches the surface in a new window and views the feature matching problem as a partial assignment between two sets of local features. Instead of solving a linear assignment problem to match features, it treats it as a proper transportation problem. SuperGlue uses a graph neural network (GNN) that predicts the cost function of these transportation improvements.
We all know how transformers have achieved great success in natural language processing and, more recently, computer vision tasks. SuperGlue uses a transformer to increase both the spatial relationship of key points and their visual appearance.
SuperGlue is trained in an end-to-end manner. Image pairs are used as training data. Antecedents of pose estimation are learned from a large labeled dataset; therefore, SuperGlue can have 3D scene understanding.
SuperGlue can be used in many problems where high quality feature connections are required for multi-view geometry. It runs in real-time on commodity hardware and can be used for both legacy and learned features. You can find more information about SuperGlue at the links below.
Check it out paper, project, again the code. All Credits for this study Go to the Researchers of this project. Also, don’t forget to join our Reddit page again discord channelwhere we share the latest AI research news, cool AI projects, and more.
Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image noise extraction using deep transform networks. He is currently pursuing a Ph.D. degree from the University of Klagenfurt, Austria, and works as a researcher for the ATHENA project. His research interests include deep learning, computer vision, and multimedia networking.