I Built a Real AI Spam Classifier in Java — From Keywords to Naive Bayes (No Python Needed)

Most developers think AI requires Python. I proved it doesn’t — with a Spring Boot backend and a real trained model.

Last Week, I Had a Realization…

Last week, I was scrolling through AI tutorials.

Python everywhere.
TensorFlow.
Deep learning models.

And I caught myself thinking:

👉 “Why does AI always feel out of reach for backend developers?”

As a Java developer, it felt like stepping into AI meant starting from zero.

New language.
New tools.
New mindset.

So I decided to challenge that idea.

😤 The Problem No One Talks About

Every AI tutorial seemed to follow the same pattern:

  • Python
  • Complex models
  • Heavy data pipelines

And honestly?

It felt overwhelming.

Not because it’s impossible — but because it’s unnecessarily complicated for getting started.

The Question That Changed Everything

So I asked myself:

👉 “What if I stop overthinking… and just build something simple?”

Not perfect.
Not production-grade.
Just… something that works.

That’s when I decided:

👉 I’m going to build a spam classifier using Spring Boot.

🚨 Why Spam Detection?

Spam is everywhere:

  • 📧 Emails
  • 📱 SMS
  • 🔔 Notifications
  • 📝 Contact forms

And filtering it manually?

👉 Not scalable.

This is actually a classic AI problem called text classification:

  • Spam
  • Not Spam (Ham)

Perfect use case. Simple enough. Powerful enough.

🎯 The Goal (Keep It Simple)

I didn’t want a complex system.

I wanted something practical:

Client → Spring Boot API → Classifier → Response

🔄 What Happens Behind the Scenes

  • Client sends a message
  • Backend processes it
  • Classifier predicts
  • API returns result

That’s it.

🔨 I Started With What I Already Knew

No new frameworks.

No over-engineering.

Just:

  • Spring Boot
  • REST API
  • Simple logic
Start simple → Then scale

🧠 Step 1 — The Simple Version (Rule-Based)

At first, I thought:

“I need a real ML model.”

But then I realized something important…

👉 Spam detection doesn’t have to start complex.

So I began with a simple rule-based approach:

@Service
public class SpamClassifierService {
public String classify(String message) {
if (message.contains("free") || message.contains("win")) {
return "SPAM";
}
return "NOT_SPAM";
}
}

And I exposed it as a REST endpoint:

@RestController
@RequestMapping("/api/spam")
public class SpamController {

private final SpamClassifierService service;

public SpamController(SpamClassifierService service) {
this.service = service;
}

@PostMapping("/check")
public String checkSpam(@RequestBody String message) {
return service.classify(message);
}
}

I tested it:

{ "message": "You won a free iPhone! Click now!" }

👉 Response: SPAM

It worked. But then I asked myself a harder question:

👉 *”Is this actually AI… or just an if statement?”*

Spoiler: it was just an if statement.

⚡ Step 2 — The Real AI Version

I didn’t stop there. I built the actual upgrade.

Full source code: https://github.com/java-ai-portfolio/java-ai-spam-classifier

Here’s what changed.

🧠 From if Statements → Naive Bayes + OpenNLP

The real classifier uses Apache OpenNLP — a proper NLP library — and trains a Naive Bayes model at startup from labeled data. No external ML service. No Python. No API key. Just Java.

The training data lives in a plain text file inside the project. Each line is one labeled sample, tab-separated. More lines means better accuracy. That’s machine learning in its purest form.

spam    Congratulations! You win a free iPhone. Click now!
ham Hi, please find the attached report for review.
spam URGENT: Your account has been compromised. Verify now.
ham Can we reschedule Thursday's meeting to Friday?

🏗️ The Project Structure

The architecture is intentionally simple — three Java files and a training data file do all the work.

SpamDetectorApp.java     → Entry point
SpamService.java → OpenNLP training + classify()
SpamController.java → REST endpoints
training-data.txt → Your labeled dataset

SpamService trains the model once on startup. When you see this in your logs, the classifier is ready:

SpamService: model trained and stable.

From that point, every request gets a real prediction — not a keyword check.

🌐 The API Response — This Is the Difference

The old version returned a plain string. The upgraded version returns this:

{
"category": "spam",
"confidence": 0.9421,
"spam": true
}

That confidence score is the model telling you how certain it is. A score of 0.94 means very likely spam. A score of 0.51 means borderline — worth a human review. That’s what separates a trained model from an if statement.

🔧 Run It Yourself in 5 Minutes

Clone the repo and run one command:

git clone https://github.com/java-ai-portfolio/java-ai-spam-classifier.git
cd java-ai-spam-classifier
./gradlew bootRun

Then test it:

curl -X POST http://localhost:8080/api/spam/classify 
-H "Content-Type: application/json"
-d '{"text": "Congratulations! You win a free iPhone. Click now!"}'

You’ll get back the full JSON response with category, confidence score, and spam flag. The project also ships with Swagger UI at localhost:8080/swagger-ui.html — test every endpoint directly in the browser without Postman.

“Want to go further? Spring AI now lets you connect Spring Boot directly to OpenAI, Gemini, and other LLMs with just a dependency. That’s the next step after Naive Bayes.”

📈 The Real Lesson

The if statement got me started. The Naive Bayes model got me somewhere real.

Rule-based logic is good for understanding the problem. Trained models are good for solving it in production. And the gap between the two was smaller than I expected — same Spring Boot structure, same REST controller, just a smarter brain inside the service layer.

🚀 What I Learned From This

I don’t need Python to start with AI. I don’t need complex systems. Architecture matters more than tools. And the best way to learn ML is to train something real, even if it’s tiny.

Every API will have AI inside it in the next few years. Backend developers who understand this now have a serious edge.

🎯 Final Thought

You don’t need the perfect model, the perfect stack, or the perfect knowledge. You need two things — one simple project to start, and one real project to grow. Both are in this article. Both took less than a day combined.

💬 Your Turn

Have you integrated AI into a Java backend? What would you build next with Spring Boot + OpenNLP?

Drop your thoughts below 👇

Full source code: https://github.com/java-ai-portfolio/java-ai-spam-classifier


I Built a Real AI Spam Classifier in Java — From Keywords to Naive Bayes (No Python Needed) was originally published in Javarevisited on Medium, where people are continuing the conversation by highlighting and responding to this story.

This post first appeared on Read More