Our hands are the main way we interact with the world, whether we’re driving a car, checking our messages on a smartphone, or turning on a light switch. So in order to create intelligent, interactive experiences with both our electronics and non-useful, everyday objects, we need to be able to determine exactly what someone is doing with their hands. Functional applications are numerous, and run the gamut from enabling virtual reality experiences to sign language recognition and gesture recognition. Solutions to the manual tracking problem exist, and some of them work very well, but traditional solutions often require lengthy setup procedures, controlled environments, and expensive hardware.
The most effective systems usually rely on a system of hand-held trackers located in three-dimensional space with a series of anchor-holding devices installed around the perimeter of the site. Some systems use fixed cameras to track the hand, as long as we are always looking at the cameras. Although these methods are generally effective, they cannot be used on the go, and they are often expensive to use. For this reason, the use of metal has been studied, but these solutions have two important drawbacks because the sensors tend to protrude too far from the body in a way that would not be acceptable for almost all real-world users, and they tend to use cameras. . Cameras may be good for some applications, but few people want cameras pointed at them all day.
Real time output from the processing pipeline (📷: N. DeVrio)
A team of engineers from Carnegie Mellon University has developed a new approach to the problem of uncomfortable, persistent manual tracking that may eliminate the effectiveness of other methods. Suffering from a bout of boogie woogie fever, the team came up with their invention The DiscoBand; luckily the flu didn’t hinder their engineering skills. Their wristband-mounted device has a total of 16 depth cameras (8 x 8 pixels). Half of the senses are directed to the hand, while the other half has vision of the arm, the upper body, and the environment. This yields a total of 1,024 3D point measurements, which is enough to create a good image of the hand’s position even when other sensors are turned off. For this reason, the wristband can sit close to the wrist. In addition, the minimum data collected from the 64-pixel cameras can produce nothing but the worst blob situation, which protects the privacy of the wearer.
The main use cases the team wanted to support with the DiscoBand were arm and hand tracking. Accordingly, they built a prototype of the wristband and worked out the details of the data processing that was produced. To enable arm tracking, they used data from eight external depth cameras. It was analyzed to extract the most important features, and then transferred to a machine learning regression model that predicts the position in three-dimensional space of the left hand, left elbow, left shoulder, right shoulder, left hip, and right hip.
To validate this device, a study involving ten participants was conducted. They were asked to perform a variety of pre-defined arm postures. To capture ground truth measurements, a web camera was set up and MediaPipe Pose software was used to capture key points of the arm. This was compared to the critical points determined by the DiscoBand, and an average error of 5.88 centimeters was observed for all upper body points.
In the future, the researchers hope to see their technology incorporated into existing smartwatches. From there, they envision it enabling many additional use cases, including ad hoc gesture tracking and object recognition.