Data Scientist
ML Engineer
Cloud Enthusiast
Young Consultant
• A Seasoned Data Scientist, ML Engineer, Cloud Enthusiast, and Passionate Consultant with 5+ years of experience in deriving valuable insights from complex data using statistical and ML tools in large international teams.
• A calm, committed, open-minded, resilient, and focused individual with a will-do-it mindset looking for real-world exciting challenges and collaborations.
Tech Skills:
• Python data science tools: pandas, numpy, scikit-learn, Keras (TensorFlow), PyTorch, Huggingface, DGL, Llammaindex
• Containerization tools: Docker, Kubernetes
• Cloud technologies: mainly GCP, AWS
• ML Model Web deployment: Streamlit, Fast API
• Git
• And some web developer skills (HTML, CSS, PHP) and languages (SQL, bash, R, basic C++)
I recently started writing AI in Gist blog, which contains the gist of several data science topics in one place:
Feb 2018 - Present
• Lead research scientist in the study to statistically reject the null hypothesis at a very high significance level of 6*10-6 to confirm CERN’s 1st independent Higgs boson observation.
• Implemented an end-to-end data science pipeline (incl. decision tree ensembles & neural network classifiers) for (~TB) datasets on the distributed HPC cluster.
• Developed a custom Graph Neural Network Node Classification model with node and edge features, MLP as Message Passing function coupled with Multiplicative Attention to reduce ~60% statistical error in our results.
• Developed a custom Quantile Regression Algorithm to regress data quantiles to reduce systematic error in several associated scientific results.
Computer Vision, Classification, Segmentation
► Fine-tuned CNN models like VGG19, ResNet, and EfficientNet as well as transformer models like Vision Transformer to achieve an average recall of 0.81 for classification.
► Segmentation using UNet & its variation with attention and residual connection to achieve mIoU of 0.72.
► Deployed final web application with GradCAM insights for both segmentation and classification.
Computer Vision, Object Detection, Segmentation
► 0-shot object (tumor) detection using Grounding DINO.
► Using it as layout input for segmentation by Meta’s Segment Anything Model and achieved mIoU of 0.69.
NLP, NER, Multi-modal, CV
► Performed OCR to extract text and bounding box of SBB tickets using Pytesseract and UBIAI, fine-tuned Meta’s multimodal LayoutLM model to achieve an average accuracy of 0.90 for the NER task.
► Deployed final web application that saves the traveler’s data from the ticket directly to a database.
NLP, Classification, Multi-modal, CV
► Bounding box and text extracted using EasyOCR, fine-tuned Meta’s multimodal LayoutLMv3 to classify Cash flow, Income statement, and Balance sheet from its Annual Report to achieve an F1 score of 0.9.
► Deployed the ML model using FAST API application in AWS cloud through CI/CD of GitHub Actions.
LLM, NLP, RAG
► Scalable RAG built using open source BAAI general embeddings and Mistral-7B LLM available in huggingface in LlamaIndex framework to obtain a Relevancy score of 0.9 against GPT-4’s score of 1.0.
Time Series, Regression
► Multi-seasonal multi-variate energy consumption forecast of Zurich city’s energy consumption using also the weather data (temperature, cloud coverage, humidity, precipitation).
► Implemented Holt-Winter’s model, SARIMA, Hybrid models (Linear regression to extrapolate trend, XGBoost for seasonality), Meta’s Prophet model, and biLSTM models to achieve R2 score of 0.82.
LLM, NLP, Summarization
► Performed advanced prompt engineering on OpenAI GPT models to classify their articles for constructiveness.
► Bundled the final product in a lightweight web application for their internal use.
Recommendation System, Clustering, NLP
► Extracted contextualized embeddings using pre-trained distil-RoBERTa model for its products, clustered them using K-means, stored centroids in FAISS, and provided visualization of clusters using PCA.
► Searched the closest cluster to the test item & recommended the cluster items based on their popularity.
Graph Neural Network, Classification
► Trained a custom Graph Neural Network Classification model having both node and edge features using MLP as Message Passing function, Summation as an Aggregation function, and MLP as an Update function coupled with Graph Attention (GATv2) model leading up to ~60% reduction in simulation statistical error in our analysis.
► Work presented in ML4Jets conference 2022 as well as published as a Detector Performance Note by CMS, CERN.
Regression
► Developed a new Stochastic Morphing Algorithm (Binary Classification + Quantile Regression) to match quantiles of discontinuous data and simulation leading to the reduction of simulation systematic error in several associated scientific measurements.
► Master thesis at CERN, Geneva. Work presented at the SPS Conference 2018.
Regression, Combined Actuarial Neural Network, SHAP
► Performed supervised multi-class classification of risk rating using ML models like Logistic Regression, XGBoost, & Combined Actuarial Neural Network to achieve Mathew’s correlation coefficient score 0.63.
► Interpretability of classical ML models provided using the SHAP values.
Strategy Consulting
► Strategy consulting for an HR-tech firm by creating hypotheses for client’s concerns, gathering market data, and generating insights to develop an effective strategy to address the client’s needs.
► The project was done in association with the Graduate Consulting Club, Zurich.
Google Cloud Platform, MLPOps, Docker, Regression
► ETL raw data and trained a neural network classifier model using Vertex AI Custom training and deployed it on
● Vertex AI Endpoint: Vertex AI Endpoint for online predictions.
● Cloud Run & Fast API Endpoint: Containerize the FAST API endpoint of the model and deploy the custom docker image to Cloud Run for online predictions.
Google Cloud, DevOps, Docker, Kubernetes
► Sole website developer of the website for file conversion and manipulations. (HTML, CSS, PHP, JS)
Website deployment using
● Google Compute Engine: Global External HTTP-Proxy Load Balancer to direct HTTP traffic to Managed Instance Groups at the backend created from an instance template with a custom machine image to host the website.
● Google Kubernetes Engine: HTTP traffic directed to an Ingress-managed External Load Balancer Service with backend pods maintained by the Deployment Set/Workloads. A custom docker image of the website is built for the containers managed by the Deployment Set.
Google Cloud, AWS, Solution Architect, Terraform, DevOps
► Migrate a real COVID testing web application to a scalable Hybrid cloud environment using AWS and GCP.
► Used IaaC Terraform for infrastructure set-up, deployed docker container on Google Kubernetes Engine with access to Cloud SQL database, and migrated data from AWS by syncing AWS S3 bucket with the local folder.
► This project was done as a part of the Intensive Cloud Computing Hands-on Training conducted by The Cloud Bootcamp in Dec 2023
More info
Degree | Institution | Duration |
---|---|---|
Ph.D. in Physics | ETH Zürich | November 2018 - Present |
Masters in High Energy Physics | ETH Zürich | September 2017 - September 2018 |
Masters in High Energy Physics | Ecole Polytechnique, Paris | September 2016 - September 2017 |
Bachelors in Physics (Honours) | Delhi University | July 2013 - July 2016 |
Certification | Institution | Date |
---|---|---|
AWS Academy ML Foundation Badge | AWS Academy | December 2023 |
Intensive Cloud Computing Hands-on Training | The Cloud Bootcamp | December 2023 - December 2024 |
Tableau Essential Training | February 2023 | |
Advanced SQL for Data Scientists | January 2023 | |
Professional Cloud Architect | Google Cloud | August 2022 - August 2024 |
Associate Cloud Engineer | Google Cloud | August 2022 - August 2025 |
Tensorflow Developer Certificate | TensorFlow | June 2022 - June 2025 |
Tensorflow: Advanced Technique Specialization | Coursera | June 2022 |
Advanced Machine Learning | ETH Zürich | January 2020 |
Computational Statistics with R | ETH Zürich | September 2019 |
Date | Achievement |
---|---|
2016 - 2018 | Paris-Saclay Scholarship for International Masters Student (1 of ~150 international students) |
2016 - 2017 | Full Masters Charpak (French Government) scholarship (1 of ~30 students all over India) |
2016 | Rank 136 out of ~10,000 students in the IIT JAM exam (Admission for masters in India) |
2016 | Highest marks in Bachelors in College Department (of ~150 students). |
2012 - 2013 | 5-year Inspire scholarship by the Indian Govt. for being in the top 1% in Class 12 in India. |