WAVEX Startup Challenge 2025

"Kalaa-Setu – Real-Time Content Generation Tech for Bharat"

(AI-Powered Text to Content (Video/ Audio/ Graphics) Generation for Government Communication)

Problem Statement

Develop an AI-powered solution that transforms user-input text into multimedia content—videos, audio narratives, and visual graphics—tailored for public communication, governance, and citizen engagement.The solution should provide the following:

  1. Text to Video generation based on the conditions , environment, tone and domain provided by the user.
  2. Graphics generation based on different sets of parameters.
  3. Audio Generation based on the context, environment and usage defined by the user.

The solution should be modular, adaptive, and domain-aware — making it useful for public information campaigns, creative storytelling, journalism, education, and more.

Challenge Brief: The delivered solution should be able to provide.

1. Text-to-Video Generation

  1. Convert user-input text into coherent video content.
  2. Must allow customization based on:
    • Tone: (e.g. formal, casual, emotional, documentary, etc.)
    • Domain: (e.g. education, health, governance, entertainment, etc.)
    • Environment/Setting: (e.g. rural, urban, futuristic, nature, indoors)
  3. Should support avatar-based narration or scene generation using AI (optional bonus).
  4. Optional: Include subtitles, background scores, and smooth transitions automatically.

2. Text-to-Graphics Generation

  1. Generate infographics, visual charts, illustrations, or storyboards based on input.
  2. Must allow parameter-based generation (e.g. data values, tone, color scheme, subject).

3. Text-to-Audio Generation

  1. Synthesize high-quality audio using:
    • Context-aware voice modulation
    • Language and accent options
    • Background ambience where required (e.g. a news studio, open environment)
  2. Should support emotion-aware speech synthesis (e.g. serious, excited, calm).

Prototype-to-Production Outcome Expectations (Desirable but not mandatory)

  1. Accuracy: Prompt-to-Output Alignment with CLIPScore or BLEU (≥95% for text; ≥85% semantic fidelity for visuals/audio)
  2. Latency: Generation Latency ≤ 1s (audio/graphics); ≤ 2–3s for short video; include runtime benchmarks
  3. Output Formats: Subtitles (SRT), voice overlays (MP3/WAV), graphics (PNG/SVG), REST API, multilingual PDFs (optional)
  4. Scalability: Multilingual, multi-format scalability; load-tested; API-integrated; optional container/cloud-ready.

Technical & Submission Requirements

Mandatory Features

Category Specification
Language & Prompt Support The system must accept input prompts in Hindi, English, and at least two regional languages (preferable but not mandatory). Prompts may include optional instructions on tone, environment, or domain (e.g., health, education, rural governance).
AI Models (Suggested) Participants may use any of the following or equivalent open-source models: Text-to-Video: Zeroscope v2, ModelScope T2V, CogVideo, AnimateDiff Text-to-Audio: Bark, Tortoise, Coqui TTS, ESPnet, VITS Text-to-Graphics: SDXL, DeepFloyd IF, Kandinsky 2.2, PixArt-α (or similar engines with same cost and technical benefits)
System Architecture Preferred architecture is REST API or modular microservices with JSON-based output and timestamp mapping for generated frames or scenes. However, startups may choose any architecture that is scalable and testable.
Performance Metrics Submit measured benchmarks of: 1. • Inference Latency (time per video segment) 2. • Throughput (videos/minute or frames/sec) 3. • Memory/Compute Footprint 4. • Scene-level prompt fidelity (manual or CLIPScore-based) 5. Cost for generation of a 1 minute video clipping that includes at least 10 different frames/visuals.
Integration Capability Demonstrate system readiness to connect with AIR, DD, PIB, or other government platforms using standard formats such as MP4, WebM, or HLS streaming, and expose REST endpoints for automated input/output.

Minimum Viable Concept (MVC) Demo

Submit a 7-minute video showing:

  1. Live demo: Generate a short video (15–30 seconds) from a real PIB press release or a comparable public information brief.
  2. Metrics display: During the demo, clearly show: a. Inference time / Latency per video b. Resource usage (RAM, GPU, CPU) c. Frame rate (FPS) and rendering speed d. Scene fidelity (manual scoring or CLIPScore/semantic match metrics)
  3. Architecture overview: Briefly walk through your stack, AI engine, pipeline, and pre-post processing (if any).
  4. Optional: Support for different language.

Startup Submission Package (shared via Google Drive or public link)

File Description
01_MVC_Demo_{startup_name}.mp4 Demo video as per above specs
02_Tech_Deck_{startup_name}.pdf Architecture, models used, optimizations, cost-efficiency
03_Cost_Sheet_{startup_name}.xlsx Video generation cost breakdown (as per specs in this doc)
04_Team_Profile_{startup_name}.pdf Startup team bios, GitHub/LinkedIn links, past experience
05_Roadmap_{startup_name}.pdf 6-month go-to-production plan with milestones
Optional_{startup_name}.zip GitHub repo link, extra demo clips, benchmarking logs

Upload all files to Google Drive / GitHub / Dropbox and share a public view-only link while submitting on https://wavex.wavesbazaar.com

Challenge Timeline (60 Days)

  1. Registration Opens: 7th July 2025
  2. Final Submission of POC: 30th July 2025
  3. Evaluation and Final Rounds: To be announced

Preliminary Startup Scoring Rubric (Total: 100 Points)

Criterion Description Max Score
Technical Competence Quality of generated videos in terms of prompt alignment, scene accuracy, rendering speed, visual coherence, and multilingual handling. Includes effective use of state-of-the-art AI models. 25
Scalability & Integration Readiness Ability to scale across languages, domains (health, governance, etc.), and formats (video with subtitles, narration). Must support modular deployment and API integration with AIR/DD/PIB systems. 20
Cost-Effectiveness Estimated cost per minute of generated video (incl. compute, model inference, storage); use of open-source tools or optimized architecture for sustainable deployment. 20
Innovation & Uniqueness Creativity in approach: use of avatars, emotion-aware narration, scene composition techniques, or novel stylistic outputs (e.g., animated infographics). 15
Team Capability & Background Proven experience in AI/ML, media-tech, animation, or video generation; prior work on generative systems, scalable deployments, or GovTech projects. 5
Clarity of Proposal & Demo Clearly articulated objectives, milestones, and technical roadmap. Demo should be structured, visually informative, and compelling. 5
Feasibility & Time to Pilot Realistic timeline for pilot readiness, current TRL (Technology Readiness Level), and demonstrated ability to meet near-term deadlines. 5
Policy/Compliance Awareness Awareness of government data compliance (e.g., content filtering, language appropriateness), openness to data localization, and ethical generation norms. 5

Based on the above the final selected 5 teams will be called to Delhi for showcasing of their final

Final Jury Evaluation (100 Points)

Criterion Max Score
Live Demo Quality & Visual Accuracy 25
Multilingual & Cultural Adaptability 20
Scalability, Architecture & Integration 15
Operational Cost Metrics 10
Innovation in Video Generation 10
Documentation & Technical Clarity 5
Jury Q&A Performance 15

Prize & Outcome

  1. One selected startup will receive a formal MoU to develop the full solution under WAVEX & Ministry supervision.
  2. Support for pilot integration with AIR, DD, PIB
  3. Showcase at national media & innovation forums.
  4. Incubation opportunity under WaveX and space till development of the final product.

Application Information

Application Method: Register on the WAVEX portal and select the "KalaaSetu – Real-Time Content Generation Tech for Bharat" challenge:

Submission Deadline: July 30th, 2025, 23:59 IST