Skip to Content




Nestle USA

Senior Analyst (Dec'24 - Present)

Business Intelligence & Dashboarding: 

Designed and deployed 15+ interactive dashboards using Power BI, Tableau, Looker, and Excel, improving performance tracking and reporting efficiency by 25%. Developed reusable semantic models to support ad hoc analysis and self-service capabilities, saving over 200 hours annually through the Commercial Intelligence Hub initiative.

Cross-Functional Collaboration & Agile Delivery: 

Partnered with Sales, Finance, and Customer Operations teams to translate business needs into scalable reporting solutions. Ensured alignment with UX best practices, centralized design systems, and Agile/Scrum methodologies to deliver high-impact data products.

ETL & Data Engineering: 

Built and optimized data pipelines using SQL (T-SQL, PostgreSQL), Azure Data Factory, Python (Pandas), Power Query, and Alteryx, reducing manual processing by 40% and accelerating refresh times by 35%. Automated recurring workflows to support data readiness for analytics.

Data Quality & Monitoring: 

Implemented data validation and anomaly detection routines using Snowflake, Python, and Alteryx, reducing inconsistencies and report rework by 30%. Established proactive data quality checks to ensure trust in reporting and insights.

Documentation & Delivery Support: 

Participated in UAT coordination and post-deployment hypercare; documented data architecture, business logic, source-to-target mappings, and data dictionaries using Confluence, JIRA, and Git, ensuring long-term maintainability and smooth knowledge transfer.




Montclair State University

Research Assistant - Machine Learning (Jan'24 - Dec'24) 


As a Research Assistant specializing in Machine Learning, I'm currently involved in an ongoing project leveraging machine learning to detect malware in npm packages. This role encompasses a comprehensive pipeline, including data collection, model building, and real-time deployment.


Malware Detection & Risk Scoring: 

Developed and deployed ML-based malware detection pipelines using decision trees and random forests in scikit-learn, achieving over 85% precision and an 88% F1-score. Designed real-time threat monitoring systems with Apache Kafka and TensorFlow, flagging 1,000+ suspicious npm packages monthly and automating risk scoring workflows.

Large-Scale Data Collection & Processing: 

Collected and processed 100K+ npm package metadata entries, including 10M lines of JavaScript code and 50K+ dependency graphs. Utilized Python, BeautifulSoup, and custom scripts to parse and clean data for malware signature extraction.

Natural Language Processing & Static Analysis: 

Applied NLP techniques with spaCy on 500K+ README files, descriptions, and code comments to extract behavioral patterns and semantic features. Integrated insights into static code analysis pipelines for enhanced contextual threat detection.




BYJU'S - The Learning App

Data Engineer (Apr '20 - Jun '22) 

At BYJU'S, I worked on various data-driven projects aimed at improving student performance, engagement, and overall business efficiency. Key contributions include:

ETL Development & Pipeline Optimization:

Designed, developed, and maintained scalable ETL pipelines and data ingestion workflows using PySpark, SQL, and Apache Airflow, ensuring 100% SLA compliance for business-critical reporting and analytics. Improved data freshness and delivery accuracy across cross-functional analyst and product teams.

Big Data Transformation & Performance Engineering: 

Built and optimized complex data transformation processes to parse and cleanse millions of records daily using Hive and Amazon Redshift, increasing pipeline efficiency by 35% and meeting both functional and non-functional data quality requirements.

Data Visualization & Stakeholder Reporting: 

Collaborated with product managers and analysts to translate business needs into technical requirements, delivering 20+ executive dashboards and analytics reports in Power BI and Tableau. Enabled data-driven decisions for personalized learning modules and subscription optimization, improving forecasting accuracy and engagement metrics.

Monitoring, Automation & DataOps: 

Monitored pipeline health using custom logging, alerts, and SLA tracking, achieving 99.9% SLA adherence. Automated manual ETL tasks, implemented data partitioning and incremental load strategies, reducing data latency by 40% and enhancing overall query performance.

Streaming & Real-Time Analytics: 

Developed and maintained real-time data pipelines integrating third-party APIs and internal microservices using Hadoop, Kafka, and Apache Spark. Enabled distributed data processing and real-time engagement tracking, improving insights into user behavior by 30% and accelerating feedback loops.




Enercast GmbH

Data Scientist (Oct '17 - Mar'20) 

At Enercast GmbH, I worked on predictive analytics and renewable energy forecasting, delivering key improvements for various clients:

Renewable Energy Forecasting & Predictive Modeling: 

Wind Energy Forecasting (AP TRANSCO): Enhanced forecast accuracy from 94% to 97.6% across 1.3 GW of assets using Linear Regression, Random Forests, Gradient Boosting (GBM), and custom deep learning models (LSTM, CNN). Performed extensive EDA and feature engineering using Pandas and NumPy; leveraged RapidMiner for automated preprocessing, anomaly detection, and data quality assurance.

Solar Generation Analytics & Time Series Forecasting: 

Solar Forecasting (TS TRANSCO): Increased model accuracy by 20% by shifting to 15-minute interval time series and applying XGBoost, Support Vector Regression (SVR), and ensemble methods. Built real-time data ingestion and transformation pipelines using Apache Spark and designed actionable Power BI and Tableau dashboards to support decision-making for grid management.

Asset Performance Monitoring & Interactive Reporting: 

Designed and deployed an Asset Management Portal for the Greenko Group, integrating Enercast analytics APIs and historical SCADA data, which improved maintenance scheduling by 25% for over 3 GW of wind and solar assets. Created interactive dashboards and visualizations using Plotly, Matplotlib, and D3.js, enabling cross-functional stakeholders to monitor turbine health and performance in near real time.

Risk Assessment & Model Evaluation: 

Applied classification techniques (Decision Trees, Logistic Regression) via Scikit-Learn for predictive maintenance and component failure detection. Improved model robustness by 15% through A/B testing and evaluation using Precision, Recall, F1-score, and ROC-AUC metrics; automated model retraining pipelines for continuous learning.

Cloud Deployment & Big Data Processing: 

Containerized ML workflows with Docker and deployed at scale using Kubernetes on AWS and Google Cloud Platform (GCP), reducing model inference time by 30%. Utilized Hadoop, Spark, and Airflow for batch and streaming data workflows, handling terabyte-scale energy telemetry datasets for real-time forecasting and risk monitoring.