Data Science — Zoia Panasenko

01

🚢

Optimal Routing

AI Agent - LangGraph - Claude Sonnet 4.5 - Tavily

Beta AI agent for ocean freight routing.

Conversational agent that recommends carrier services, transit times, and CO₂ emissions for any port pair. Built on LangGraph with Claude Sonnet 4.5, combining a live web search tool (Tavily) with custom tools for sea distance and emissions calculation. System prompt tuned to prevent hallucination of fake carrier services - the agent falls back to typical trade-lane ranges when named services can't be verified. Streaming responses, per-session memory, deployed on Streamlit Cloud.

Try the agent ↗ GitHub ↗

02

💬

Sentiment Analysis

NLP - OpenAI - NLTK - Twitter Data

OpenAI GPT vs NLTK sentiment comparison.

Compared GPT-3.5-turbo and NLTK on 500k tweets from the first 65 days of the Russia-Ukraine war (1.6GB dataset). Correlation analysis found no significant relationship between tweet sentiment and engagement metrics. A daily heatmap showed that while only 51% of results overlap between tools, both capture the same trend - a sharp rise in negative sentiment beginning February 24, 2022.

Open in Colab ↗ Project Narrative ↗

03

📩

Spam Predictor

Classification - KNN - SVM - Random Forest

Multi-model spam classification comparison.

Tested KNN, Logistic Regression, SVM, Decision Trees, and Random Forest on a spam dataset split 75/25 for training and testing. Used stratified k-fold cross-validation to handle class imbalance and prevent overfitting. Decision Trees emerged as the best performer based on cross-validation score, with feature engineering opportunities identified to further improve accuracy.

Open in Colab ↗

04

🌺

Neural Network

Deep Learning - Keras - Flower Classification

Predictive neural network with Keras.

Built and compared two neural network architectures for flower type classification using Keras. Model 1 - with 2 hidden layers and more neurons - outperformed Model 2 on both train and test accuracy, showing better generalization with no signs of overfitting. Model 2 exhibited a wider gap between train and test scores, indicating it struggled to generalize to unseen data.

Open in Colab ↗