No scandal, just AI! Solomon Serwanjja is the latest Victim of AI-generated deepfake audio

Daniel Lutaaya
Jun 5, 2025
2 min read

Investigative Journalist Solomon Serwanjja

We have fact-checked an audio clip that has emerged purportedly of Ugandan journalist Solomon Serwanjja professing his love for a woman who is not his wife. It turns out to be an AI-generated deepfake.

We put this Audio file into OpenAI's latest GPT 4.5 Turbo Deepfake generator software, cross-referencing it through hundreds of Solomon Serwanjja's readily available online audio samples from his TV shows.

Manual Breakdown of Voice Characteristics

1. Tone and Prosody (Natural Variation in Speech)

Observation: In the Audio, the speaker(Serwanjja) maintains a smooth, consistent tone throughout, with minimal emotional variation or natural inflection.
Interpretation: This kind of flattened emotional range is common in AI-generated speech, which can lack the spontaneous rhythm and emphasis of human conversation.

2. Breath Sounds and Mouth Noises

Observation: There are very few, if any, audible breaths between sentences or phrases, and no noticeable mouth sounds (like slight pops, saliva clicks, or subtle vocal fry).
Interpretation: High-quality AI voices often lack these because they’re synthesised frame-by-frame for clarity. Human speakers naturally produce such micro-sounds, even in well-edited recordings.

3. Pronunciation and Clarity

Observation: Words are pronounced extremely clearly, with no accent slippage, hesitation, or stumbling.
Interpretation: AI voices often excel at “ideal” pronunciation, but this perfection can feel slightly artificial. Humans tend to make micro-errors, even when reading a script.

4. Pacing and Pauses

Observation: The pacing is mechanically consistent, and pauses between sentences are evenly spaced.
Interpretation: While this could result from heavy audio editing, it’s also a hallmark of AI speech synthesis, which often uses fixed-length pauses to simulate sentence boundaries.

5. Audio Quality and Artifacts

Observation: The audio is very clean, but there are tiny hints of synthetic texture in the sibilant sounds (like “s” and “sh”), which sometimes sound slightly whistly or metallic.
Interpretation: This could indicate synthesis or post-processing artifacts typical of text-to-speech models.

Overall Assessment

Based on these auditory cues:

There is a high likelihood that this audio was AI-generated or created using a deepfake voice synthesis tool.

Another clue that we noticed but did not rely on for this analysis is the Twitter(X) accounts that first posted the audio. @Crystal1Philip, @AkelloJM, and @Jally_Karungi. These are accounts that we have fact-checked in the past several times and placed on our watchlist as frequent peddlers of misinformation in Uganda, especially about politics.

However, keep in mind that High-quality voiceovers or heavily edited studio recordings can sound very similar. Without spectral or deep acoustic analysis, we can’t be 100% certain, but OpenAI's GPT 4.5 Turbo engine ranks this audio sample with 80-85% certainty to be AI-generated.