(I will add new projects below as I build them)
This project investigates the changing landscape of media consumption during the COVID-19 pandemic. It aims to identify trending genres and audience preferences between short-form and long-form content. The analysis combines sentiment analysis of IMDb reviews, viewership data analysis, and integration of additional datasets to provide a comprehensive view of media consumption trends.
Technical Approach:
Sentiment Analysis:
Conduct comprehensive sentiment analysis on IMDb reviews
Utilize natural language processing techniques to identify top-rated content
Implement machine learning models for sentiment classification
Viewership Pattern Analysis:
Process and analyze detailed viewership data
Identify trends in content consumption
Compare popularity metrics with sentiment analysis results
Utilize data visualization techniques to represent consumption patterns
Data Integration and Complex Analysis:
Incorporate additional datasets such as box office revenues and social media sentiment
Develop data integration pipelines to merge diverse data sources
Perform statistical analysis to identify correlations and trends across datasets
Insight Generation:
Develop algorithms to extract actionable insights from the analyzed data
Create predictive models for future content preferences
Generate reports and visualizations for stakeholders in the media industry
Tech Stack:
Python for data processing and analysis
Natural Language Processing libraries (e.g., NLTK, spaCy)
Machine Learning frameworks (e.g., scikit-learn, TensorFlow)
Data visualization libraries (e.g., Matplotlib, Seaborn, Plotly)
Big Data tools for large-scale data processing (e.g., Apache Spark)
SQL and NoSQL databases for data storage and retrieval
Cloud computing platforms for scalable processing (e.g., AWS, Google Cloud)
Key Features:
Multi-dimensional analysis combining sentiment, viewership, and industry data
Utilization of natural language processing for sentiment analysis
Large-scale data processing and analysis of viewership patterns
Integration of diverse datasets including box office revenues and social media sentiment
Insights generation for production houses and media companies
This project focuses on semantic segmentation of satellite imagery to identify human settlements and electricity presence. It compares the performance of two deep learning models: a custom UNet implementation and DeepLabV3 with ResNet-101 backbone. The project uses the IEEE GRSS 2021 Data Fusion Contest dataset, which includes multi-spectral satellite imagery from Sentinel-1, Sentinel-2, Landsat 8, and Suomi NPP VIIRS.
Technical Approach:
Data Preprocessing: The project includes scripts to preprocess the raw satellite imagery, including subtiling large images and applying various augmentations.
Model Architecture:
UNet: A custom implementation with skip connections to preserve spatial information.
DeepLabV3: Utilizes PyTorch's pre-implemented DeepLabV3 with ResNet-101 backbone and pretrained weights.
Training Pipeline:
Utilizes PyTorch Lightning for streamlined training process
Implements custom data loaders and augmentations
Supports hyperparameter tuning through YAML-defined sweeps
Evaluation:
Includes scripts for model evaluation on validation and test sets
Generates visualizations comparing ground truth to model predictions
Visualization:
Custom plotting utilities for visualizing multi-spectral satellite imagery
Integration with Weights & Biases for real-time training monitoring
Tech Stack:
PyTorch & PyTorch Lightning
NumPy
Weights & Biases
Scikit-learn
Custom data processing and visualization tools
Results:
UNet achieved a peak validation accuracy of 67%
DeepLabV3 reached a validation accuracy of 72% with L1 regularization
An innovative intelligent search bar designed to revolutionize how users interact with Google Calendar. This project aims to create a more intuitive and conversational interface for calendar management, allowing users to interact with their schedules using natural language queries, similar to interacting with advanced AI assistants like ChatGPT.
Technical Approach:
Google Calendar API Integration
Natural Language Processing with FLAN-T5
Response Generation and Calendar Management
Feedback Loop and Continuous Improvement