WAVEX Startup Challenge 2025
"Kalaa-Setu – Real-Time Content Generation Tech for Bharat"
(AI-Powered Text to Content (Video/ Audio/ Graphics) Generation for Government Communication)
Problem Statement
Develop an AI-powered solution that transforms user-input text into multimedia content—videos, audio narratives, and visual graphics—tailored for public communication, governance, and citizen engagement.The solution should provide the following:
- Text to Video generation based on the conditions , environment, tone and domain provided by the user.
- Graphics generation based on different sets of parameters.
- Audio Generation based on the context, environment and usage defined by the user.
The solution should be modular, adaptive, and domain-aware — making it useful for public information campaigns, creative storytelling, journalism, education, and more.
Challenge Brief: The delivered solution should be able to provide.
1. Text-to-Video Generation
- Convert user-input text into coherent video content.
- Must allow customization based on:
- Tone: (e.g. formal, casual, emotional, documentary, etc.)
- Domain: (e.g. education, health, governance, entertainment, etc.)
- Environment/Setting: (e.g. rural, urban, futuristic, nature, indoors)
- Should support avatar-based narration or scene generation using AI (optional bonus).
- Optional: Include subtitles, background scores, and smooth transitions automatically.
2. Text-to-Graphics Generation
- Generate infographics, visual charts, illustrations, or storyboards based on input.
- Must allow parameter-based generation (e.g. data values, tone, color scheme, subject).
3. Text-to-Audio Generation
- Synthesize high-quality audio using:
- Context-aware voice modulation
- Language and accent options
- Background ambience where required (e.g. a news studio, open environment)
- Should support emotion-aware speech synthesis (e.g. serious, excited, calm).
Prototype-to-Production Outcome Expectations (Desirable but not mandatory)
- Accuracy: Prompt-to-Output Alignment with CLIPScore or BLEU (≥95% for text; ≥85% semantic fidelity for visuals/audio)
- Latency: Generation Latency ≤ 1s (audio/graphics); ≤ 2–3s for short video; include runtime benchmarks
- Output Formats: Subtitles (SRT), voice overlays (MP3/WAV), graphics (PNG/SVG), REST API, multilingual PDFs (optional)
- Scalability: Multilingual, multi-format scalability; load-tested; API-integrated; optional container/cloud-ready.
Technical & Submission Requirements
Mandatory Features
| Category | Specification |
|---|---|
| Language & Prompt Support | The system must accept input prompts in Hindi, English, and at least two regional languages (preferable but not mandatory). Prompts may include optional instructions on tone, environment, or domain (e.g., health, education, rural governance). |
| AI Models (Suggested) | Participants may use any of the following or equivalent open-source models: Text-to-Video: Zeroscope v2, ModelScope T2V, CogVideo, AnimateDiff Text-to-Audio: Bark, Tortoise, Coqui TTS, ESPnet, VITS Text-to-Graphics: SDXL, DeepFloyd IF, Kandinsky 2.2, PixArt-α (or similar engines with same cost and technical benefits) |
| System Architecture | Preferred architecture is REST API or modular microservices with JSON-based output and timestamp mapping for generated frames or scenes. However, startups may choose any architecture that is scalable and testable. |
| Performance Metrics | Submit measured benchmarks of: 1. • Inference Latency (time per video segment) 2. • Throughput (videos/minute or frames/sec) 3. • Memory/Compute Footprint 4. • Scene-level prompt fidelity (manual or CLIPScore-based) 5. Cost for generation of a 1 minute video clipping that includes at least 10 different frames/visuals. |
| Integration Capability | Demonstrate system readiness to connect with AIR, DD, PIB, or other government platforms using standard formats such as MP4, WebM, or HLS streaming, and expose REST endpoints for automated input/output. |
Minimum Viable Concept (MVC) Demo
Submit a 7-minute video showing:
- Live demo: Generate a short video (15–30 seconds) from a real PIB press release or a comparable public information brief.
- Metrics display: During the demo, clearly show: a. Inference time / Latency per video b. Resource usage (RAM, GPU, CPU) c. Frame rate (FPS) and rendering speed d. Scene fidelity (manual scoring or CLIPScore/semantic match metrics)
- Architecture overview: Briefly walk through your stack, AI engine, pipeline, and pre-post processing (if any).
- Optional: Support for different language.
Startup Submission Package (shared via Google Drive or public link)
| File | Description |
|---|---|
| 01_MVC_Demo_{startup_name}.mp4 | Demo video as per above specs |
| 02_Tech_Deck_{startup_name}.pdf | Architecture, models used, optimizations, cost-efficiency |
| 03_Cost_Sheet_{startup_name}.xlsx | Video generation cost breakdown (as per specs in this doc) |
| 04_Team_Profile_{startup_name}.pdf | Startup team bios, GitHub/LinkedIn links, past experience |
| 05_Roadmap_{startup_name}.pdf | 6-month go-to-production plan with milestones |
| Optional_{startup_name}.zip | GitHub repo link, extra demo clips, benchmarking logs |
Upload all files to Google Drive / GitHub / Dropbox and share a public view-only link while submitting on https://wavex.wavesbazaar.com
Challenge Timeline (60 Days)
- Registration Opens: 7th July 2025
- Final Submission of POC: 30th July 2025
- Evaluation and Final Rounds: To be announced
Preliminary Startup Scoring Rubric (Total: 100 Points)
| Criterion | Description | Max Score |
|---|---|---|
| Technical Competence | Quality of generated videos in terms of prompt alignment, scene accuracy, rendering speed, visual coherence, and multilingual handling. Includes effective use of state-of-the-art AI models. | 25 |
| Scalability & Integration Readiness | Ability to scale across languages, domains (health, governance, etc.), and formats (video with subtitles, narration). Must support modular deployment and API integration with AIR/DD/PIB systems. | 20 |
| Cost-Effectiveness | Estimated cost per minute of generated video (incl. compute, model inference, storage); use of open-source tools or optimized architecture for sustainable deployment. | 20 |
| Innovation & Uniqueness | Creativity in approach: use of avatars, emotion-aware narration, scene composition techniques, or novel stylistic outputs (e.g., animated infographics). | 15 |
| Team Capability & Background | Proven experience in AI/ML, media-tech, animation, or video generation; prior work on generative systems, scalable deployments, or GovTech projects. | 5 |
| Clarity of Proposal & Demo | Clearly articulated objectives, milestones, and technical roadmap. Demo should be structured, visually informative, and compelling. | 5 |
| Feasibility & Time to Pilot | Realistic timeline for pilot readiness, current TRL (Technology Readiness Level), and demonstrated ability to meet near-term deadlines. | 5 |
| Policy/Compliance Awareness | Awareness of government data compliance (e.g., content filtering, language appropriateness), openness to data localization, and ethical generation norms. | 5 |
Based on the above the final selected 5 teams will be called to Delhi for showcasing of their final
Final Jury Evaluation (100 Points)
| Criterion | Max Score |
|---|---|
| Live Demo Quality & Visual Accuracy | 25 |
| Multilingual & Cultural Adaptability | 20 |
| Scalability, Architecture & Integration | 15 |
| Operational Cost Metrics | 10 |
| Innovation in Video Generation | 10 |
| Documentation & Technical Clarity | 5 |
| Jury Q&A Performance | 15 |
Prize & Outcome
- One selected startup will receive a formal MoU to develop the full solution under WAVEX & Ministry supervision.
- Support for pilot integration with AIR, DD, PIB
- Showcase at national media & innovation forums.
- Incubation opportunity under WaveX and space till development of the final product.
Application Information
Application Method: Register on the WAVEX portal and select the "KalaaSetu – Real-Time Content Generation Tech for Bharat" challenge:
Submission Deadline: July 30th, 2025, 23:59 IST


