No scandal, just AI! Solomon Serwanjja is the latest Victim of AI-generated deepfake audio
- Daniel Lutaaya
- Jun 5
- 2 min read

We have fact-checked an audio clip that has emerged purportedly of Ugandan journalist Solomon Serwanjja professing his love for a woman who is not his wife. It turns out to be an AI-generated deepfake.
We put this Audio file into OpenAI's latest GPT 4.5 Turbo Deepfake generator software, cross-referencing it through hundreds of Solomon Serwanjja's readily available online audio samples from his TV shows.
Manual Breakdown of Voice Characteristics
1. Tone and Prosody (Natural Variation in Speech)
Observation: In the Audio, the speaker(Serwanjja) maintains a smooth, consistent tone throughout, with minimal emotional variation or natural inflection.
Interpretation: This kind of flattened emotional range is common in AI-generated speech, which can lack the spontaneous rhythm and emphasis of human conversation.
2. Breath Sounds and Mouth Noises
Observation: There are very few, if any, audible breaths between sentences or phrases, and no noticeable mouth sounds (like slight pops, saliva clicks, or subtle vocal fry).
Interpretation: High-quality AI voices often lack these because they’re synthesised frame-by-frame for clarity. Human speakers naturally produce such micro-sounds, even in well-edited recordings.
3. Pronunciation and Clarity
Observation: Words are pronounced extremely clearly, with no accent slippage, hesitation, or stumbling.
Interpretation: AI voices often excel at “ideal” pronunciation, but this perfection can feel slightly artificial. Humans tend to make micro-errors, even when reading a script.
4. Pacing and Pauses
Observation: The pacing is mechanically consistent, and pauses between sentences are evenly spaced.
Interpretation: While this could result from heavy audio editing, it’s also a hallmark of AI speech synthesis, which often uses fixed-length pauses to simulate sentence boundaries.
5. Audio Quality and Artifacts
Observation: The audio is very clean, but there are tiny hints of synthetic texture in the sibilant sounds (like “s” and “sh”), which sometimes sound slightly whistly or metallic.
Interpretation: This could indicate synthesis or post-processing artifacts typical of text-to-speech models.
Overall Assessment
Based on these auditory cues:
There is a high likelihood that this audio was AI-generated or created using a deepfake voice synthesis tool.
Another clue that we noticed but did not rely on for this analysis is the Twitter(X) accounts that first posted the audio. @Crystal1Philip, @AkelloJM, and @Jally_Karungi. These are accounts that we have fact-checked in the past several times and placed on our watchlist as frequent peddlers of misinformation in Uganda, especially about politics.
However, keep in mind that High-quality voiceovers or heavily edited studio recordings can sound very similar. Without spectral or deep acoustic analysis, we can’t be 100% certain, but OpenAI's GPT 4.5 Turbo engine ranks this audio sample with 80-85% certainty to be AI-generated.
Comments