Emotional Analysis of Kim Ji-woong’s Video Call Profanity Voice Analysis


 Request for Feedback on the Analysis Results and Controversy

Join Our Telegram Channel

We’re sorry for the delay in getting back to you. Many agencies turned down our requests due to not having enough samples, which caused a longer wait than expected. During the process, there were disruptions to the workflow due to repeated inquiries from individuals apart from the primary team handling the project. Consequently, some agencies declined requests just before the emotional analysis due to these reasons. We’re keeping the agency names private to avoid causing any issues for them.

Main Points:

1. Before sharing the results, we inquired about pronunciation, linguistic patterns, and the possibility of voices belonging to the same individual. Most agencies said it was hard to analyze emotions due to not enough samples. However, most agencies responded that emotional analysis would be challenging due to insufficient samples or other reasons. Therefore, we decided to focus solely on the issue considered most critical: whether the profanity was externally inserted or not. Please note that all agencies except the “National Forensic Service” and the “Prosecution’s Scientific Investigation Division” are private entities.

2. The essence of this request is to determine whether the profanity heard in the video originate from the phone call screen in the video or from an external source capturing the call. The emotional analysis report is attached in the thread. The analysis concluded that the profanity was uttered by the male participant in the video call, with high probability that the speaker was Kim Ji-woong, one of the parties involved in the video call.


1. It is confirmed officially that the profanity “씨발 (f*ck)” was indeed heard, and the video was not manipulated.

2. While acknowledging the possibility of the profanity being externally inserted, it is stated that Kim Ji-woong may not necessarily be the one who uttered the profanity.

3. The main team’s analysis confirms the profanity was from the video, not added from outside.

Therefore, we demand sincere apologies and responsible feedback from Kim Ji-woong, who engaged in inappropriate behavior during the video fan signing event, and from his agency, Wake One, which handled the controversy with falsehoods and jests to mitigate the situation.

Furthermore, despite clear evidence of who uttered the profanity in the video, Wake One, without offering an apology, continues its activities with casual disregard. We hope Wake One will reflect on whether they are contributing to the establishment of a healthy and mature K-pop culture.

<Spectrogram Analysis>

The image above represents the volume and frequency range of the sound recorded as a spectrogram in the commissioned file. The yellow arrows indicate laughter from a woman captured on scene while filming a video call with a man.

In other words, the areas marked in red indicate that the brighter the color, the louder the volume of that frequency. When comparing the spectrograms of the laughter from the woman during the video call and the man’s, it is noticeable that there is a significant difference in the brightness of the colors.

Formant refers to the shape of frequencies formed when measuring human conversation.

The three images above all represent the frequency of words and sentences spoken by the male participant in the video call, with the mountain-like shapes highlighted in yellow representing the formants in the male participant’s commissioned file.

The first of the three images represents the frequency of the word “thank you,” while the remaining two segments represent the frequencies of the words “시” and “팔,” which sound like profanities.

The formants of the three frequencies above show similarities.

The first formant, located furthest to the left, is around 70Hz, matching all three segments.

The second formant is positioned around 500~700Hz.

The third formant is consistent at 1500Hz across all three segments,

while the fourth formant, except for “팔,” is located at 4000Hz for both segments.

The fourth formant forms a small peak around 3000Hz in all three segments, and considering that “팔” also forms a peak albeit slight around 4000Hz, it is highly likely that the person pronouncing “시” and “팔” is the male participant in the video call.

The frequencies above represent the formants of laughter from the woman in the commissioned file during the video call.

Firstly, it can be observed that the form of the formants differs from those of the male participant in the video call.

Furthermore, the five formants from left to right show similar shapes, indicating that the laughter in the commissioned file is from the same individual. Additionally, there is a noticeable difference in the frequencies between the profanity segments and laughter segments.

This further suggests that the profanity did not originate from the location where the video call was recorded.

Auditory Analysis:

When the analyst listened to the segments containing profanity, the audio quality sounded muffled, similar to the speech of the male participant in the video call, while the laughter of the woman participating in the video call sounded clearer than the voice of the man.


The profanity heard in the commissioned file is determined to have originated from the male participant in the video call, and based on the formant analysis, it is highly likely that the male participant in the video call uttered the profanity.



-If only he apologized, it would’ve been over.

-He should’ve apologized from the start.

-It’s unbelievable how he cursed at a devoted fan and remains defiant.

-Is it really that difficult to apologize?

-Instead of apologizing, they just blame the fans and act all innocent. Shameless!

-So shameless.

-If it were me, I’d be really embarrassed. But since they have no conscience, they’re still shamelessly continuing their activities, right?

-They’re just making excuses. If only they had apologized early on, things wouldn’t have escalated like this. Now there’s no turning back.

-Just admit it and apologize.

-They’re really talented at twisting things like this.

-The fans are doing better than the agencyㅋㅋㅋ Not even providing proper evidence and just saying “Oh, we know it wasn’t us, it must be external audio.” It’s really ridiculous. 

-He repays the fan who sincerely supported him with curses?

-He should’ve apologized from the beginning… It would’ve been better to just say it was a staff member’s voice and apologize to the fans. 

-Even without a request for analysis, it didn’t seem like external insertion. If only they had apologized right away, it would’ve been over. Seems like the agency really can’t handle their job.

-Apologize and refund, you heartless people!!!

-If only he had apologized, it would’ve been forgotten by now.

-Just apologize already, because of that one person, the group is on the verge of collapse.

-Just say sorry, ugh.

-He still hasn’t apologized? Wow, that’s really something.

-What’s so hard about apologizing when you know you did it?

-They should’ve just lied and said it was a staff member. People would’ve turned a blind eye.