WAVEX Startup Challenge 2025

Problem Statement

Develop an AI-powered solution that transforms user-input text into multimedia content—videos, audio narratives, and visual graphics—tailored for public communication, governance, and citizen engagement.The solution should provide the following:

Text to Video generation based on the conditions , environment, tone and domain provided by the user.
Graphics generation based on different sets of parameters.
Audio Generation based on the context, environment and usage defined by the user.

The solution should be modular, adaptive, and domain-aware — making it useful for public information campaigns, creative storytelling, journalism, education, and more.

Challenge Brief: The delivered solution should be able to provide.

1. Text-to-Video Generation

Convert user-input text into coherent video content.
Must allow customization based on:
- Tone: (e.g. formal, casual, emotional, documentary, etc.)
- Domain: (e.g. education, health, governance, entertainment, etc.)
- Environment/Setting: (e.g. rural, urban, futuristic, nature, indoors)
Should support avatar-based narration or scene generation using AI (optional bonus).
Optional: Include subtitles, background scores, and smooth transitions automatically.

2. Text-to-Graphics Generation

Generate infographics, visual charts, illustrations, or storyboards based on input.
Must allow parameter-based generation (e.g. data values, tone, color scheme, subject).

3. Text-to-Audio Generation

Synthesize high-quality audio using:
- Context-aware voice modulation
- Language and accent options
- Background ambience where required (e.g. a news studio, open environment)
Should support emotion-aware speech synthesis (e.g. serious, excited, calm).

Prototype-to-Production Outcome Expectations (Desirable but not mandatory)

Accuracy: Prompt-to-Output Alignment with CLIPScore or BLEU (≥95% for text; ≥85% semantic fidelity for visuals/audio)
Latency: Generation Latency ≤ 1s (audio/graphics); ≤ 2–3s for short video; include runtime benchmarks
Output Formats: Subtitles (SRT), voice overlays (MP3/WAV), graphics (PNG/SVG), REST API, multilingual PDFs (optional)
Scalability: Multilingual, multi-format scalability; load-tested; API-integrated; optional container/cloud-ready.

Technical & Submission Requirements

Mandatory Features

Category	Specification
Language & Prompt Support	The system must accept input prompts in Hindi, English, and at least two regional languages (preferable but not mandatory). Prompts may include optional instructions on tone, environment, or domain (e.g., health, education, rural governance).
AI Models (Suggested)	Participants may use any of the following or equivalent open-source models: Text-to-Video: Zeroscope v2, ModelScope T2V, CogVideo, AnimateDiff Text-to-Audio: Bark, Tortoise, Coqui TTS, ESPnet, VITS Text-to-Graphics: SDXL, DeepFloyd IF, Kandinsky 2.2, PixArt-α (or similar engines with same cost and technical benefits)
System Architecture	Preferred architecture is REST API or modular microservices with JSON-based output and timestamp mapping for generated frames or scenes. However, startups may choose any architecture that is scalable and testable.
Performance Metrics	Submit measured benchmarks of: 1. • Inference Latency (time per video segment) 2. • Throughput (videos/minute or frames/sec) 3. • Memory/Compute Footprint 4. • Scene-level prompt fidelity (manual or CLIPScore-based) 5. Cost for generation of a 1 minute video clipping that includes at least 10 different frames/visuals.
Integration Capability	Demonstrate system readiness to connect with AIR, DD, PIB, or other government platforms using standard formats such as MP4, WebM, or HLS streaming, and expose REST endpoints for automated input/output.

Minimum Viable Concept (MVC) Demo

Submit a 7-minute video showing:

Live demo: Generate a short video (15–30 seconds) from a real PIB press release or a comparable public information brief.
Metrics display: During the demo, clearly show: a. Inference time / Latency per video b. Resource usage (RAM, GPU, CPU) c. Frame rate (FPS) and rendering speed d. Scene fidelity (manual scoring or CLIPScore/semantic match metrics)
Architecture overview: Briefly walk through your stack, AI engine, pipeline, and pre-post processing (if any).
Optional: Support for different language.

Startup Submission Package (shared via Google Drive or public link)

File	Description
01_MVC_Demo_{startup_name}.mp4	Demo video as per above specs
02_Tech_Deck_{startup_name}.pdf	Architecture, models used, optimizations, cost-efficiency
03_Cost_Sheet_{startup_name}.xlsx	Video generation cost breakdown (as per specs in this doc)
04_Team_Profile_{startup_name}.pdf	Startup team bios, GitHub/LinkedIn links, past experience
05_Roadmap_{startup_name}.pdf	6-month go-to-production plan with milestones
Optional_{startup_name}.zip	GitHub repo link, extra demo clips, benchmarking logs

Upload all files to Google Drive / GitHub / Dropbox and share a public view-only link while submitting on https://wavex.wavesbazaar.com

Challenge Timeline (60 Days)

Registration Opens: 7th July 2025
Final Submission of POC: 30th July 2025
Evaluation and Final Rounds: To be announced

Preliminary Startup Scoring Rubric (Total: 100 Points)

Criterion	Description	Max Score
Technical Competence	Quality of generated videos in terms of prompt alignment, scene accuracy, rendering speed, visual coherence, and multilingual handling. Includes effective use of state-of-the-art AI models.	25
Scalability & Integration Readiness	Ability to scale across languages, domains (health, governance, etc.), and formats (video with subtitles, narration). Must support modular deployment and API integration with AIR/DD/PIB systems.	20
Cost-Effectiveness	Estimated cost per minute of generated video (incl. compute, model inference, storage); use of open-source tools or optimized architecture for sustainable deployment.	20
Innovation & Uniqueness	Creativity in approach: use of avatars, emotion-aware narration, scene composition techniques, or novel stylistic outputs (e.g., animated infographics).	15
Team Capability & Background	Proven experience in AI/ML, media-tech, animation, or video generation; prior work on generative systems, scalable deployments, or GovTech projects.	5
Clarity of Proposal & Demo	Clearly articulated objectives, milestones, and technical roadmap. Demo should be structured, visually informative, and compelling.	5
Feasibility & Time to Pilot	Realistic timeline for pilot readiness, current TRL (Technology Readiness Level), and demonstrated ability to meet near-term deadlines.	5
Policy/Compliance Awareness	Awareness of government data compliance (e.g., content filtering, language appropriateness), openness to data localization, and ethical generation norms.	5

Based on the above the final selected 5 teams will be called to Delhi for showcasing of their final

Final Jury Evaluation (100 Points)

Criterion	Max Score
Live Demo Quality & Visual Accuracy	25
Multilingual & Cultural Adaptability	20
Scalability, Architecture & Integration	15
Operational Cost Metrics	10
Innovation in Video Generation	10
Documentation & Technical Clarity	5
Jury Q&A Performance	15

Prize & Outcome

One selected startup will receive a formal MoU to develop the full solution under WAVEX & Ministry supervision.
Support for pilot integration with AIR, DD, PIB
Showcase at national media & innovation forums.
Incubation opportunity under WaveX and space till development of the final product.

Bhasha Setu

Kalaa Setu

"Kalaa-Setu – Real-Time Content Generation Tech for Bharat"

Problem Statement

Challenge Brief: The delivered solution should be able to provide.

1. Text-to-Video Generation

2. Text-to-Graphics Generation

3. Text-to-Audio Generation

Prototype-to-Production Outcome Expectations (Desirable but not mandatory)

Technical & Submission Requirements

Minimum Viable Concept (MVC) Demo

Startup Submission Package (shared via Google Drive or public link)

Challenge Timeline (60 Days)

Preliminary Startup Scoring Rubric (Total: 100 Points)

Final Jury Evaluation (100 Points)

Prize & Outcome

WAVEX Startup Challenge 2025

"Kalaa-Setu – Real-Time Content Generation Tech for Bharat"

Problem Statement

Challenge Brief: The delivered solution should be able to provide.

1. Text-to-Video Generation

2. Text-to-Graphics Generation

3. Text-to-Audio Generation

Prototype-to-Production Outcome Expectations (Desirable but not mandatory)

Technical & Submission Requirements

Minimum Viable Concept (MVC) Demo

Startup Submission Package (shared via Google Drive or public link)

Challenge Timeline (60 Days)

Preliminary Startup Scoring Rubric (Total: 100 Points)

Final Jury Evaluation (100 Points)

Prize & Outcome

Application Information