All startups registered for the Challenge are requested to update their details in the profile section, as new fields have been added. Startups that have not submitted their MVC and other required details are also requested to update their information and upload the necessary documents using the following link: Wavex Challange. All details must be submitted by 9th August 2025, as evaluations will begin on 10th August 2025.
(AI-Powered Text to Content (Video/ Audio/ Graphics) Generation for Government Communication)
Develop an AI-powered solution that transforms user-input text into multimedia contentâvideos, audio narratives, and visual graphicsâtailored for public communication, governance, and citizen engagement.The solution should provide the following:
The solution should be modular, adaptive, and domain-aware â making it useful for public information campaigns, creative storytelling, journalism, education, and more.
Mandatory Features
| Category | Specification |
|---|---|
| Language & Prompt Support | The system must accept input prompts in Hindi, English, and at least two regional languages (preferable but not mandatory). Prompts may include optional instructions on tone, environment, or domain (e.g., health, education, rural governance). |
| AI Models (Suggested) | Participants may use any of the following or equivalent open-source models: Text-to-Video: Zeroscope v2, ModelScope T2V, CogVideo, AnimateDiff Text-to-Audio: Bark, Tortoise, Coqui TTS, ESPnet, VITS Text-to-Graphics: SDXL, DeepFloyd IF, Kandinsky 2.2, PixArt-α (or similar engines with same cost and technical benefits) |
| System Architecture | Preferred architecture is REST API or modular microservices with JSON-based output and timestamp mapping for generated frames or scenes. However, startups may choose any architecture that is scalable and testable. |
| Performance Metrics | Submit measured benchmarks of: 1. ⢠Inference Latency (time per video segment) 2. ⢠Throughput (videos/minute or frames/sec) 3. ⢠Memory/Compute Footprint 4. ⢠Scene-level prompt fidelity (manual or CLIPScore-based) 5. Cost for generation of a 1 minute video clipping that includes at least 10 different frames/visuals. |
| Integration Capability | Demonstrate system readiness to connect with AIR, DD, PIB, or other government platforms using standard formats such as MP4, WebM, or HLS streaming, and expose REST endpoints for automated input/output. |
Submit a 7-minute video showing:
| File | Description |
|---|---|
| 01_MVC_Demo_{startup_name}.mp4 | Demo video as per above specs |
| 02_Tech_Deck_{startup_name}.pdf | Architecture, models used, optimizations, cost-efficiency |
| 03_Cost_Sheet_{startup_name}.xlsx | Video generation cost breakdown (as per specs in this doc) |
| 04_Team_Profile_{startup_name}.pdf | Startup team bios, GitHub/LinkedIn links, past experience |
| 05_Roadmap_{startup_name}.pdf | 6-month go-to-production plan with milestones |
| Optional_{startup_name}.zip | GitHub repo link, extra demo clips, benchmarking logs |
Upload all files to Google Drive / GitHub / Dropbox and share a public view-only link while submitting on https://wavex.wavesbazaar.com
| Criterion | Description | Max Score |
|---|---|---|
| Technical Competence | Quality of generated videos in terms of prompt alignment, scene accuracy, rendering speed, visual coherence, and multilingual handling. Includes effective use of state-of-the-art AI models. | 25 |
| Scalability & Integration Readiness | Ability to scale across languages, domains (health, governance, etc.), and formats (video with subtitles, narration). Must support modular deployment and API integration with AIR/DD/PIB systems. | 20 |
| Cost-Effectiveness | Estimated cost per minute of generated video (incl. compute, model inference, storage); use of open-source tools or optimized architecture for sustainable deployment. | 20 |
| Innovation & Uniqueness | Creativity in approach: use of avatars, emotion-aware narration, scene composition techniques, or novel stylistic outputs (e.g., animated infographics). | 15 |
| Team Capability & Background | Proven experience in AI/ML, media-tech, animation, or video generation; prior work on generative systems, scalable deployments, or GovTech projects. | 5 |
| Clarity of Proposal & Demo | Clearly articulated objectives, milestones, and technical roadmap. Demo should be structured, visually informative, and compelling. | 5 |
| Feasibility & Time to Pilot | Realistic timeline for pilot readiness, current TRL (Technology Readiness Level), and demonstrated ability to meet near-term deadlines. | 5 |
| Policy/Compliance Awareness | Awareness of government data compliance (e.g., content filtering, language appropriateness), openness to data localization, and ethical generation norms. | 5 |
Based on the above the final selected 5 teams will be called to Delhi for showcasing of their final
| Criterion | Max Score |
|---|---|
| Live Demo Quality & Visual Accuracy | 25 |
| Multilingual & Cultural Adaptability | 20 |
| Scalability, Architecture & Integration | 15 |
| Operational Cost Metrics | 10 |
| Innovation in Video Generation | 10 |
| Documentation & Technical Clarity | 5 |
| Jury Q&A Performance | 15 |
Application Method: Register on the WAVEX portal and select the "KalaaSetu â Real-Time Content Generation Tech for Bharat" challenge:
Submission Deadline: July 30th, 2025, 23:59 IST
Incomplete entries/ non-accesible shared links will be summarly rejected. This is the last opportunity to revise/submit the required MVC and details. For queries please reach out to whatsapp/whatsapp call. 1) 8860001444 < waves team>,
For any assistance, please contact: