About

I am an aspiring Artificial Intelligence professional with a Master's degree from Northeastern University, holding a 4.00 GPA and deep expertise in machine learning, natural language processing, and reinforcement learning. With diverse work experience across industries—from developing AI-driven solutions at Ribbon Communications and Universal Music Group to contributing to machine learning research at Northeastern—I bring a unique combination of technical proficiency and creative problem-solving. My technical arsenal includes Python, cloud platforms, and an array of tools like Keras, PyTorch, and Langchain. My passion lies in building data-driven solutions that deliver tangible business impact.

  • Birthday: 20 Januray 1998
  • Phone: +1 857-540-6982
  • City: Boston, MA
  • Email: ghosh.anu@northeastern.edu

Interests

Machine Learning

Natural Language Processing

Computer Vision

Software Development

Visualization

Software Engineering

Algorithms

Image Processing

Education

MS in Artificial Intelligence

Janurary 2021 - Present
Relevant Coursework
  • Machine Learning
  • Large Language Models
  • Pattern Recognition & Computer Vision

B.E. in Computer Engineering

July 2015 - May 2019
Relevant Coursework
  • Database Management Systems
  • Algorithms & Optimization for Big Data
  • Machine Learning

Certifications

Machine Learning

Big Query

Generative AI

Algorithms Advance

Android Development

Experience

Ribbon Communications

Januray 2025 - Now

Data Scientist

  • Architected a generic end-to-end log analysis platform capable of ingesting raw, unstructured logs and transforming them into feature-enriched, clustered streams of correlated events. Conducted extensive literature review to integrate advanced techniques such as real-time log template extraction via online learning, unsupervised time-series correlation analysis, and BERTopic modeling for dynamic grouping of semantically related log events. The solution’s effectiveness in automated root cause analysis and event correlation directly contributed to securing a significant purchase order from Altice, expanding enterprise adoption of Ribbon’s analytics capabilities.
  • Engineered a ML pipeline for optical performance monitoring, leveraging in-band power signal features to train quantile XGBoost and Gaussian Process regressors for Optical Signal-to-Noise Ratio (OSNR) estimation. Applied advanced statistical EDA and physics-informed feature engineering to mitigate data sparsity effects, resulting in a twofold increase in model predictive fidelity.
  • Built an multimodal AI framework in Langflow that identifies statistical correlations among network failure events and employs an LLM with LlamaIndex-based retrieval-augmented reasoning to evaluate which correlations are logically and technically meaningful, providing interpretable causal insights and prescriptive remediation strategies.
  • Built a chat-driven analytics platform for conversational EDA, automating chart/dashboard creation in Apache Superset using a multi-agent LlamaIndex workflow with a self-correcting SQL agent and MCP-exposed Superset APIs as agent tools.

Ribbon Communications

May 2024 - August 2024

Data Science and Machine Learning Intern

  • Developed a URL categorization service utilizing a Selenium based web scraping engine. Leveraged a finetuned RoBERTa model to perform categorization on scraped webpage & data with a Zero Shot DeBERTa model for further sub categorization.
  • Achieved a F1 score of 85% & reduced unknown/new URLs categorization & processing time on client (AT&T) side by 70%.
  • Developed an automated log preprocessing and enrichment system leveraging template learning algorithms, NLP-based unsupervised clustering, and few shot severity classification to structure raw logs into event streams, highlight temporal correlations, and accelerate system troubleshooting and root cause analysis.
  • Integrated preprocessing pipeline with a Streamlit app, allowing users to zoom into anomaly windows & analyze log clusters to identify potential trigger events. Enabled chat with logs of interest using an OLLAMA chatbot for further insights.

Universal Music Group

July 2023 - December 2023

Data Science Analyst Intern

  • Designed LightGBM based Customer LTV Models for predicting purchase propensity & identifying superfans (high value) customers. Processed millions of online transaction data to train model, and interpreted predictions using Shapley. Elevated leadership’s awareness of data-driven approach to fanbase analysis & customer valuation leading to its prioritization for next quarter.
  • Prepared a feature extraction module for raw emails of artist’s release campaigns. Devised a BERT based active learning technique to achieve agile labeling & training for models to tag emails with predefined categories & unsupervised labels. Achieved a 10% increase in performance of downstream in-production, engagement assessment models with the features.
  • Developed a Bayesian Linear Regression model to predict artist’s revenue from factors like music, merch release & streaming data. Provided business with a Streamlit portal to tweak various factors & use the model to analyze their effect on the revenue.

WINES Lab Northeastern University

January 2023 - September 2023

Graduate Machine Learning Research Assistant

  • Explored machine learning based radio fingerprinting by training neural networks like Alex net & Vision Transformers on sequences of in-phase (I) and quadrature (Q) data from LoRa devices to identify the devices in varying external scenarios.
  • Achieved 99% accuracy across all device configurations when training and testing data was collected on the same day.

TIAA

July 2019 - July 2022

Analyst, Software Developer

  • Lead a team of 4 developers overseeing development & production of 3 spring boot microservice applications.
  • Independently supported & collaborated with business partners on production deployment for TIAA's retirement income evaluation appication.
  • Achieved a 100% success rate with more than 20 releases within a year by making applications rapid release compatible with OpenShift integration. Reduced production deployment time from hours to a few minutes.
  • Analyzed data and generated reports for insurance agents by creating KPI dashboards using Power BI. Streamlined user session and engagement tracking through automated SQL Batch Procedures, resulting in reduced efforts for the insurance team.

Projects

  • All
  • Natural Language Processing
  • Generative AI
  • Reinforcement Learning
  • Data Science and Machine Learning

Store Sales Prediction

Bone Fracture Detection

Document Parser

Medical Query Matcher

Music Lyric Generator

Twitter Disaster Prediction

Twitter Disaster Prediction

RL Paper Implementation

Tetris Using Deep-Q Learning

Custom NeuralNet Library with AutoGradient

Publications

Hybrid Image Encryption Technique Using Genetic Algorithm and Lorenz Chaotic System

Published in: ITM Web of Conferences 32, 03009 (2020)

Image being encrypted

Abstract: One of the important application of image encryption is storing confidential and important images on a local device or a database in such a way that only the authorized party can view or perceive it. The current image encryption technique employs the genetic algorithm to increase confusion in the image, but compromises in time and space complexity. The other method employs chaos or pseudo random number generating systems which have fast and highly sensitive keys but fails to make the image sufficiently noisy and is risky due to its deterministic nature. We propose a technique which employs the non-deterministic, optimizing power of genetic algorithm and the space efficiency and key sensitivity of chaotic systems into a unified, efficient algorithm which will retain the merits of both the methods whereas tries to minimize their demerits in a software system. The encryption process proceeds in two steps, generating two keys. First, an encryption sequence is generated using Lorenz Chaotic system of differential equation. The seed values used are the user’s actual key having key sensitivity of 10-14. Second, the encrypted image’s genetic encryption sequence is generated which will result in an encrypted image with entropy value greater than 7.999 thus ensuring the image is very noisy. Proposed technique uses variations of Lorenz system seed sets to generate all random mutations and candidate solutions in Genetic encryption. Since only the seed sets leading to desired solution is stored, space efficiency is higher compared to storing the entire sequences. Using this image encryption technique we will ensure that the images are hidden securely under two layers of security, one chaotic and other non-deterministic.

Link to Publication: View Publication

Skills

Languages and Databases

vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone upload.wikimedia.org vectorlogo.zone vectorlogo.zone

Frameworks

vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone upload.wikimedia.org

Tools

vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone vectorlogo.zone

Contact

My Address

36 Egremont st

Unit 212

Brighton, MA 02135

Social Profiles

Email

ghosh.anu@northeastern.edu

Contact

+1 857-540-6982