Artificial intelligence has demonstrated exceptional proficiency in sorting through vast amounts of data and identifying patterns or trends. However, training machine learning algorithms typically requires substantial data sets.
Researchers are exploring potential applications for AI in various fields, such as analyzing X-ray images for rare conditions or identifying rare fish species. However, the lack of sufficient data poses a challenge in accurately training these algorithms.
Professor Jenq-Neng Hwang, specializing in electrical and computer engineering at the University of Washington (UW), aims to address these challenges. His team developed a method to train AI to monitor the diverse poses infants can achieve, crucial for assessing developmental milestones. Despite limited data on babies, the team devised a unique pipeline to ensure the accuracy and utility of their algorithm. This work was recently published in the IEEE/CVF Winter Conference on Applications of Computer Vision 2024.
In an interview, Hwang explains the significance of tracking baby poses, particularly in aiding early detection of developmental disorders like autism. Traditional methods involving manual observation by doctors can be time-consuming and subjective. Leveraging AI through baby monitors could provide continuous and consistent monitoring, offering valuable insights into infant development.
Hwang emphasizes AI's suitability for this task due to its ability to learn and adapt. While traditional image processing struggles with variations, AI models can efficiently handle complex tasks like pose recognition. However, the scarcity of annotated data presents a challenge. To address this, Hwang's team employed a novel approach, finetuning a generic AI model with limited annotated data, yielding reliable results.
Jenq-Neng Hwang: “We don’t have a lot of 3D pose annotations of baby videos to train the machine learning model for privacy reasons. It’s also difficult to create a dataset where a baby is performing all the possible potential poses that we would need. Our datasets are too small, meaning that a model trained with them would not estimate reliable poses. But we do have a lot of annotated 3D motion sequences of people in general. So, we developed this pipeline. First we used the large amount of 3D motion sequences of regular people to train a generic 3D pose generative AI model, which is similar to the model used in ChatGPT and other GPT-4 types of large language models. We then finetuned our generic model with our very limited dataset of annotated baby motion sequences. The generic model can then adapt to the small dataset and produce high quality results.”
Moreover, Hwang discussed other challenging scenarios where AI shows promise but lacks sufficient training data. For instance, diagnosing rare diseases via X-rays or training autonomous driving systems to handle unforeseen events presents similar data limitations. Hwang's team tackles these challenges by utilizing generative AI and combining data from multiple sources, potentially paving the way for innovative solutions.
Things I’m reading today
Meta is set to sunset Facebook News in early April for users in the U.S. and Australia, extending its trend of reducing emphasis on news and politics (Via Sallee Ann Harrison/AP)
Reports indicate that Microsoft and OpenAI are in the process of organizing a data center initiative worth $100 billion, aiming to establish a US-based supercomputer named "Stargate" to bolster the capabilities of OpenAI's product lineup (Via Arsheeya Bajwa, Paul Simao and David Gregorio/Reuters)
OpenAI introduces Voice Engine, a technology enabling users to create synthetic replicas of voices from a mere 15-second sample. This innovation is accessible to approximately 100 partners, including HeyGen (Via Kyle Wiggers/TechCrunch)