Why AI Models Are Now Training on Synthetic Data

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Why AI Models Are Now Training on Synthetic Data

Listen for free

View show details

Episode 51 of AI Business with Fexingo explores the shift from human-annotated to synthetic training data. Lucas and Luna break down a June 2026 report showing that 60% of AI model training data is now AI-generated. They discuss Anthropic's suspension of new model access in India, the KPMG hallucination scandal, and why companies like OpenAI and Meta are turning to synthetic data despite risks like model collapse. Specific examples include Microsoft's Phi-4 model and the concept of 'distillation loops.' The hosts also address the economics: synthetic data cuts annotation costs by up to 90%, but introduces new quality control challenges. Tune in for a grounded look at how AI is learning from itself. #SyntheticData #AITraining #Anthropic #KPMG #ModelCollapse #DataAnnotation #MicrosoftPhi4 #OpenAI #Meta #Distillation #AIQuality #TrainingData #Business #Technology #FexingoBusiness #BusinessPodcast #AI #June2026 Keep every episode free: buymeacoffee.com/fexingo

No reviews yet