Implementasi Sintesis Ucapan Bahasa Indonesia Menggunakan Model Text-to-Speech Tacotron 2 dan HIFI-GAN - Submit Seminar

Catherina, Angela (2025) Implementasi Sintesis Ucapan Bahasa Indonesia Menggunakan Model Text-to-Speech Tacotron 2 dan HIFI-GAN - Submit Seminar. Bachelor thesis, Institut Teknologi Kalimantan.

[img] Text
11211013_cover.pdf

Download (150kB)
[img] Text
11211013_statement_of_authenticity.pdf

Download (189kB)
[img] Text
11211013_publishing_agreement.pdf

Download (62kB)
[img] Text
11211013_approval_sheet.pdf

Download (186kB)
[img] Text
11211013_preface.pdf

Download (241kB)
[img] Text
11211013_abstract_id.pdf

Download (264kB)
[img] Text
11211013_abstract_en.pdf
Restricted to Repository staff only until 4 October 2027.

Download (219kB) | Request a copy
[img] Text
11211013_table_of_content.pdf
Restricted to Repository staff only until 4 October 2027.

Download (384kB) | Request a copy
[img] Text
11211013_illustrations.pdf
Restricted to Repository staff only until 4 October 2027.

Download (256kB) | Request a copy
[img] Text
11211013_tables.pdf
Restricted to Repository staff only until 4 October 2027.

Download (258kB) | Request a copy
[img] Text
11211013_chapter_1.pdf
Restricted to Repository staff only until 4 October 2027.

Download (538kB) | Request a copy
[img] Text
11211013_chapter_2.pdf
Restricted to Repository staff only until 4 October 2027.

Download (1MB) | Request a copy
[img] Text
11211013_chapter_3.pdf
Restricted to Repository staff only until 4 October 2027.

Download (521kB) | Request a copy
[img] Text
11211013_chapter_4.pdf
Restricted to Repository staff only until 4 October 2027.

Download (1MB) | Request a copy
[img] Text
11211013_conclusions.pdf
Restricted to Repository staff only until 4 October 2027.

Download (294kB) | Request a copy
[img] Text
11211013_bibliography.pdf

Download (334kB)
[img] Text
11211013_enclosure.pdf
Restricted to Repository staff only until 4 October 2027.

Download (747kB) | Request a copy
[img] Text
11211013_paper.pdf
Restricted to Repository staff only until 4 October 2027.

Download (1MB) | Request a copy
[img] Text
11211013_presentation.pdf
Restricted to Repository staff only until 4 October 2027.

Download (3MB) | Request a copy
[img] Text
11211013_Form.TA-020.pdf
Restricted to Repository staff only until 4 October 2027.

Download (227kB) | Request a copy

Abstract

This study aims to develop a high-quality Text-to-Speech (TTS) model for the Indonesian language by utilizing the Tacotron 2 architecture as a mel spectrogram synthesizer and HiFi-GAN as the vocoder. The dataset used consists of Indonesian-language audiobooks compiled and formatted by the researcher in a structure similar to LJSpeech. Tacotron 2 is trained to convert text into mel spectrograms, while HiFi-GAN generates audio signals from the resulting spectrograms. Model training was conducted using the open-source SpeechBrain toolkit, which enables architectural modifications, including adjustments to the attention mechanism. Speech quality evaluation was carried out using the Mean Opinion Score (MOS) method and a cross-similarity matrix. Model performance was further analyzed through attention weight visualization and loss function tracking. The results show that the variation with modified content-based attention and without phonemizer achieved the highest overall MOS score of 4.09. Meanwhile, the model with both modified content-based attention and phonemizer achieved the highest embedding similarity to ground truth speech (0.916) based on the cross-similarity analysis. These findings indicate that architectural modifications can enhance model performance and that subjective and objective evaluation methods complement each other in assessing TTS quality. This study demonstrates that the Tacotron 2 and HiFi-GAN architecture can be effectively implemented for Indonesian speech synthesis with competitive results.

Item Type: Thesis (Bachelor)
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Jurusan Matematika dan Teknologi Informasi > Informatika
Depositing User: Angela Catherina
Date Deposited: 11 Jul 2025 06:05
Last Modified: 11 Jul 2025 06:05
URI: http://repository.itk.ac.id/id/eprint/23605

Actions (login required)

View Item View Item