Playing with sound in java
A discovery of Java Sound API
Sound in an application isn’t always just about playing an MP3 file. What if we could see the music? Or analyze the ambient noise picked up by our microphone? The good news is that Java, through its Java Sound API, gives us all the tools we need to do exactly that.
In this article, we’ll build a Swing application that captures audio from the microphone in real time and displays its frequency spectrum — also known as an audio spectrum visualizer.
Let’s dive in!
The project: Capture / Analyze / Draw
Our goal may seem simple at first glance, but it actually breaks down into three major technical steps:
- Capture the audio — We’ll open an audio input line (the microphone) and continuously read the raw sound data.
- Analyze the frequencies — This is the heart of the project. Raw audio data is a waveform in the time domain. To find out which frequencies (bass, mid, treble) compose it, we must transform it. For that, we’ll use a powerful algorithm: the Fast Fourier Transform (FFT).
- Draw the spectrum — Once we’ve obtained the intensity of each frequency, we’ll draw them on screen as dynamic, colorful bars using Swing.
The Java Sound API: Your Audio Toolbox
Before coding, let’s understand our tools. The Java Sound API (javax.sound.sampled) might look intimidating, but it revolves around a few key concepts — much like a recording studio.
- AudioSystem – The conductor. This is the main class that gives you access to everything else. You ask it for a microphone, speakers, or audio format info, and it provides them. It’s your single point of entry.
- Mixer – Represents an audio device. Your sound card is a mixer, your USB microphone is another. A mixer can have input lines (for capture) and/or output lines (for playback).
- Line – The channel through which audio data flows. It’s the most important concept. There are several types of lines:
- TargetDataLine – An input line (IN). It captures audio data from a mixer (e.g., a microphone). “Target” means your application is the target of the data. This is what we’ll use.
- SourceDataLine – An output line (OUT). It sends audio data to a mixer (e.g., speakers). “Source” means your application is the source of the data.
- Clip – A special line that loads audio data into memory before playback. Ideal for short, repeated sounds such as game sound effects, since playback is instantaneous.
- AudioFormat – The identity card of sound. This object precisely describes the nature of the audio data flowing through a Line. Its main attributes are:
- Sample rate (sampleRate): the number of “snapshots” of sound per second (e.g., 44,100 Hz for CD quality).
- Sample size (sampleSizeInBits): the precision of each snapshot (e.g., 16 bits).
- Channels (channels): 1 for mono, 2 for stereo.
- Signed/Unsigned and Big/Little Endian: technical details about byte ordering.
In summary, the workflow is almost always the same:
- Define the desired AudioFormat.
- Ask AudioSystem for a Line (for example, a TargetDataLine) compatible with that format.
- Open the line, start it, and read (read()) or write (write()) the audio data.
Capturing sound from the microphone
The first step is to access the microphone. With our understanding of the Java Sound API, the code becomes quite readable. The capture will run in a separate Thread so as not to block the Swing interface.
Thread captureThread = new Thread(() -> {
try {
// 1. Define the audio format
AudioFormat format = new AudioFormat(AudioConstants.SAMPLE_RATE, 16, 1, true, false);
// 2. Get the microphone line
DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
TargetDataLine line = (TargetDataLine) AudioSystem.getLine(info);
line.open(format, AudioConstants.SAMPLE_COUNT * AudioConstants.BYTES_PER_SAMPLE * 2);
line.start();
byte[] buffer = new byte[AudioConstants.SAMPLE_COUNT * AudioConstants.BYTES_PER_SAMPLE];
// 3. Read loop
while (running.get()) {
int bytesRead = line.read(buffer, 0, buffer.length);
if (bytesRead <= 0) continue;
// ... data processing ...
}
line.stop();
line.close();
} catch (Exception e) {
e.printStackTrace();
}
});
captureThread.start();
- AudioFormat – Defines the sound quality. Here, we use a 96000 Hz sampling rate (high resolution), 16-bit samples, mono channel, signed (true), little-endian (false).
- TargetDataLine – Represents the input line, like our microphone. Obtained through AudioSystem.
- line.read(buffer, …) – The crucial call. It blocks until the buffer is filled with microphone data, then continues. The while loop ensures continuous capture.
Once the bytes (byte[]) are read, we convert them into numeric values (double[]) that can be processed. Each 16-bit sample consists of two bytes that we combine and normalize between -1.0 and 1.0.
// Convert bytes to samples (doubles)
for (int i = 0, s = 0; s < samplesRead; i += 2, s++) {
int low = buffer[i] & 0xFF;
int high = buffer[i + 1];
int value = (high << 8) | low;
samples[s] = value / 32768.0; // Normalize
}
The magic of FFT: seeing hidden frequencies
Now that we have an array of “samples” representing the sound wave, how do we extract the frequencies?
Imagine a piano chord. Your ear hears one note, but in reality, it’s a complex mix of a fundamental frequency and several harmonics. The Fourier Transform is a mathematical tool that acts like a prism: it takes a complex signal (white light, a sound wave) and breaks it down into its simple components (the colors of the rainbow, the frequencies).
The FFT (Fast Fourier Transform) is an efficient algorithm for performing this transformation.
// Inside the capture loop
double[] newMagnitudes = computeFFT(samples);
The computeFFT method implements this algorithm. You don’t need to understand every line of math — just remember what it does:
- Input: A double[] array representing sound amplitude over time.
- Output: A double[] array representing the intensity (magnitude) of each frequency band.
The first element corresponds to the lowest frequencies (bass), and the last to the highest (treble).
Setting the stage: visualization with Swing
Now that we have our frequency magnitudes, all that remains is to draw them! Our class extends JPanel, so we override paintComponent.
@Override
protected void paintComponent(Graphics g) {
super.paintComponent(g);
if (magnitudes == null) return;
Graphics2D g2 = (Graphics2D) g;
g2.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON);
int w = getWidth();
int h = getHeight();
g2.setColor(Color.BLACK);
g2.fillRect(0, 0, w, h);
int len = magnitudes.length;
double max = Arrays.stream(magnitudes).max().orElse(1);
// Using a logarithmic scale for the x-axis
double logBase = Math.log(len);
for (int i = 1; i < len; i++) { // Start from 1 to avoid log(0)
double norm = magnitudes[i] / max;
int barHeight = (int) (norm * h);
// X position on a logarithmic scale
double log_i = Math.log(i);
int x = (int) (w * log_i / logBase);
// Dynamic color (green → yellow → red)
float hue = (float) (0.33 - norm * 0.33);
g2.setColor(Color.getHSBColor(hue, 1.0f, 1.0f));
g2.drawLine(x, h, x, h - barHeight);
}
g2.setColor(Color.WHITE);
g2.drawString("Audio Spectrum (Log) — " + (int) (AudioConstants.SAMPLE_RATE / 2) + " Hz", 10, 20);
}
A Swing Timer calls repaint() at a regular rate (about 30 times per second), which refreshes the component and creates the animation.
Fine touches that make the difference
Our rendering code includes a few important tricks:
- Logarithmic scale — The human ear doesn’t perceive frequencies linearly. We’re far more sensitive to variations in lower frequencies than higher ones. To make the visualizer feel more “natural,” we use a logarithmic scale on the X-axis so that low frequencies occupy more space.
double log_i = Math.log(i);
int x = (int) (w * log_i / logBase);
- Dancing colors — A bar that changes color with intensity looks much better! Using the HSB (Hue, Saturation, Brightness) color model, we vary the hue from green (0.33) to red (0.0) as intensity increases, creating a smooth gradient.
float hue = (float) (0.33 - norm * 0.33);
g2.setColor(Color.getHSBColor(hue, 1.0f, 1.0f));
- Smoothing for fluidity — To prevent the bars from flickering too harshly, we apply exponential smoothing. Each new value is a weighted average of the previous one and the new one, softening the movement.
smoothedMagnitudes[i] = 0.8 * smoothedMagnitudes[i] + 0.2 * newMagnitudes[i];
The result
Now we just have to run our program, play some music, and let a wave of Windows Media Player nostalgia wash over us.
🎵 Our visualizer in action 🎵
Conclusion
We’ve seen how, with a bit of code, you can move from simple audio capture to complex and aesthetic data visualization. We’ve worked with the Java Sound API, implemented an FFT, and used Swing for dynamic rendering.
What’s next?
- Analyze an audio file instead of the microphone using AudioInputStream.
- Change the visualization type — try an oscilloscope, a circle, or sound waves.
- Detect specific notes by identifying peaks at particular frequencies.
The possibilities are endless. So now — it’s your turn to play!
All the code backing this article can be found here
GitHub – ErwanLT/sound-analyse: java sound capture and analyse
Playing with sound in java was originally published in Javarevisited on Medium, where people are continuing the conversation by highlighting and responding to this story.
This post first appeared on Read More