SmartMediaAI is an Android app that lets users select a video, summarize it using Gemini via Firebase, and listen to the summary using Text-to-Speech. Works with YouTube links and direct video URLs.
- In-App Video Player: Built-in player using Jetpack Media3 (ExoPlayer) to handle YouTube and direct video links.
- AI-Powered Summaries: Generates smart summaries of video content using the Firebase Gemini API.
- Text-to-Speech Playback: Listen to the generated summaries with controls for different accents.
- Modern UI: A clean, responsive interface built entirely with Jetpack Compose.
- Predefined Video List: A dropdown menu to easily select from a list of sample videos.
Here is an overview of the key files and directories in the project:
com.anandgaur.smartmediaai
├── MainActivity.kt # Entry point of the app
├── player/ # Video playback components
│ ├── VideoPlayer.kt # Jetpack Compose-based video player
│ └── VideoSelectionDropdown.kt # UI for selecting videos
├── screen/
│ └── VideoSummarizationScreen.kt # Main UI screen for summarization
├── ui/ # UI components for output
│ ├── TextToSpeechControls.kt # Controls for TTS playback
│ └── OutputTextDisplay.kt # Displays the summarized output
├── util/ # Utility classes and helpers
│ └── VideoItem.kt # Data model for video entries
├── viewmodel/ # ViewModel and state management
│ ├── VideoSummarizationViewModel.kt # Handles UI logic and API interactions
│ └── OutputTextState.kt # UI state model for summarized output
└── SmartMediaAIApplication.kt # Application class for global setupThis project is follows the MVVM (Model-View-ViewModel) architecture for a clean separation of concerns and Jetpack libraries for best practices. Here's a breakdown of how the app works, file by file.
-
SmartMediaAIApplication.kt- What it is: The very first code that runs when the app is launched, even before any screen appears.
- What it does: Performs one-time setups for the entire application, initializing Firebase and setting up Hilt for dependency management.
-
MainActivity.kt- What it is: The main "window" or entry point for the user interface.
- What it does: Its only responsibility is to load the overall theme and display our main screen,
VideoSummarizationScreen.
- What it is: The heart of our user interface, where the user interacts with the app.
- What it does: It acts as a manager for all visual components.
- Assembles the UI: Brings together the video player, dropdown menu, "Summarize" button, and the text area for the output.
- Manages State: Keeps track of important information, like which video is currently selected.
- Handles Video Players: Decides which video player to show. It uses a special YouTube player for YouTube links and our custom
VideoPlayerfor other links (like.mp4files). - Connects to the "Brain": Tells the
ViewModelto start working when the user clicks the "Summarize" button. - Manages Text-to-Speech: Initializes and controls the Android Text-to-Speech engine.
-
VideoSummarizationViewModel.kt- What it is: The "brain" of our app. It handles all logic and heavy lifting behind the scenes, separate from the UI.
- What it does: Takes the video link, creates a request for the Firebase Gemini AI, and sends it to Google's servers. As it gets a response, it updates its status, which the UI automatically reflects.
-
OutputTextState.kt- What it is: A simple file that defines the possible states of our summarization process (
Initial,Loading,Success,Error). - What it does: It tells the UI what's happening, allowing it to intelligently show a loading spinner, an error message, or the final summary.
- What it is: A simple file that defines the possible states of our summarization process (
-
VideoPlayer.kt(in theplayerpackage)- What it is: Our custom video player for standard video files (not YouTube).
- What it does: Uses Google's powerful
ExoPlayerto play videos and automatically shows a loading spinner when buffering.
-
VideoSelectionDropdown.kt(in theplayerpackage)- What it is: The dropdown menu that lets the user pick a video.
- What it does: Displays a list of our sample videos and informs the main screen of the user's choice.
-
OutputTextDisplay.kt(in theuipackage)- What it is: The text box at the bottom that shows the summary.
- What it does: Displays text based on the
OutputTextStatefrom the ViewModel. The success text is scrollable.
-
TextToSpeechControls.kt(in theuipackage)- What it is: The UI for the "Listen" feature.
- What it does: Provides the "Listen" / "Pause" button and the accent selection dropdown.
-
VideoItem.kt&SampleVideoList.kt- What they are: A data blueprint for a video and a predefined list of sample videos for the dropdown menu.
-
extractYouTubeVideoId.kt- What it is: A specialized helper function.
- What it does: Its only job is to take a full YouTube URL and pull out the unique 11-character video ID needed by the player.
| Architecture | MVVM Architecture |
| Language | Kotlin |
| UI | Jetpack Compose |
| Dependency Injection | Hilt |
| Jetpack Media3 | video playback |
| Firebase | Gemini AI APIs |
|
|
|
|
|
-
Clone the repository:
git clone https://github.com/anandgaur22/SmartMediaAI.git
-
Add your google-services.json in the app/ directory.
-
Enable Gemini API and Vertex AI Gemini API from Firebase > Build > Generative AI.
-
Sync Gradle and run the project on a physical/emulator Android device.
If you've found value in the Android Development in-depth industry-ready course, consider supporting the effort and dedication poured into its creation. Your contribution on Buy Me a Coffee will fuel the creation of more content, enable continuous improvement, and help build a community of motivated learners. It's not just about a cup of coffee; it's about fostering a culture of support and collaboration. Together, we can create something exceptional. Your support is not just appreciated; it's a cornerstone for the future of quality education. Thank you for being a part of this incredible journey!




