Focused Web Crawler Untuk Jurnal Informatika Menggunakan Algoritma Shark Search Dengan IndoBERT dan KNN

., Helmi (2025) Focused Web Crawler Untuk Jurnal Informatika Menggunakan Algoritma Shark Search Dengan IndoBERT dan KNN. Bachelor thesis, Institut Teknologi Kalimantan.

[img] Text
11211043_cover.pdf
Restricted to Registered users only until 4 October 2027.

Download (305kB) | Request a copy
[img] Text
11211043_statement_of_aunthenticity.pdf
Restricted to Registered users only until 4 October 2027.

Download (393kB) | Request a copy
[img] Text
11211043_publishing_agreement.pdf
Restricted to Registered users only until 4 October 2027.

Download (440kB) | Request a copy
[img] Text
11211043_approval_sheet.pdf
Restricted to Registered users only until 4 October 2027.

Download (399kB) | Request a copy
[img] Text
11211043_preface.pdf
Restricted to Registered users only until 4 October 2027.

Download (627kB) | Request a copy
[img] Text
11211043_abstract_id.pdf
Restricted to Registered users only until 4 October 2027.

Download (278kB) | Request a copy
[img] Text
11211043_abstract_en.pdf
Restricted to Repository staff only until 4 October 2027.

Download (276kB) | Request a copy
[img] Text
11211043_table_of_content.pdf
Restricted to Repository staff only until 4 October 2027.

Download (297kB) | Request a copy
[img] Text
11211043_ilustrations.pdf
Restricted to Repository staff only until 4 October 2027.

Download (251kB) | Request a copy
[img] Text
11211043_tables.pdf
Restricted to Repository staff only until 4 October 2027.

Download (227kB) | Request a copy
[img] Text
11211043_chapter_1.pdf
Restricted to Repository staff only until 4 October 2027.

Download (383kB) | Request a copy
[img] Text
11211043_chapter_2.pdf
Restricted to Repository staff only until 4 October 2027.

Download (981kB) | Request a copy
[img] Text
11211043_chapter_3.pdf
Restricted to Repository staff only until 4 October 2027.

Download (306kB) | Request a copy
[img] Text
11211043_chapter_4.pdf
Restricted to Repository staff only until 4 October 2027.

Download (1MB) | Request a copy
[img] Text
11211043_conclusions.pdf
Restricted to Repository staff only until 4 October 2027.

Download (256kB) | Request a copy
[img] Text
11211043_bibliography.pdf
Restricted to Registered users only until 4 October 2027.

Download (229kB) | Request a copy
[img] Text
11211043_enclosure.pdf
Restricted to Repository staff only until 4 October 2027.

Download (221kB) | Request a copy
[img] Text
11211043_paper.pdf
Restricted to Repository staff only until 4 October 2027.

Download (544kB) | Request a copy
[img] Text
11211043_paper.pdf
Restricted to Repository staff only until 4 October 2027.

Download (544kB) | Request a copy
[img] Text
11211043_Form. TA-020.pdf
Restricted to Repository staff only until 4 October 2027.

Download (740kB) | Request a copy
[img] Text
11211043_presentation.pdf
Restricted to Repository staff only until 4 October 2027.

Download (1MB) | Request a copy

Abstract

Penelitian ini bertujuan untuk mengembangkan sistem focused crawler untuk pencarian jurnal informatika, memanfaatkan algoritma Shark Search yang dioptimalkan untuk melakukan pencarian sekaligus menelusuri halaman yang akan dikunjungi atau biasa disebut dengan crawling, serta memanfaatkan model bahasa IndoBERT dan algoritma K-Nearest Neighbors (KNN). Fokus utama penelitian ini adalah untuk meningkatkan harvest rate dan akurasi pengenalan dokumen yang relevan dengan topik informatika. Sistem dirancang untuk mengumpulkan dokumen secara efisien dari internet dengan memanfaatkan seed URLs dari Google Scholar, serta mengevaluasi kinerja model dengan menggunakan metrik evaluasi, seperti confusion matrix, akurasi, dan fi-score. Hasil penelitian menunjukkan bahwa integrasi IndoBERT dan KNN mampu menghasilkan akurasi hingga 94,81% dan f1-score sebesar 94,62%, dengan performa terbaik pada parameter K = 4. Selain itu, pengujian parameter Shark Search menghasilkan konfigurasi optimal pada δ = 0.4, β = 0.5, dan γ = 0.5 yang memberikan harvest rate sebesar 92,89%. Dibandingkan dengan algoritma BFS, Shark Search terbukti lebih unggul dan stabil dalam menjaga harvest rate di atas 90%, bahkan saat jumlah halaman yang di-crawl meningkat. Temuan ini menegaskan bahwa pendekatan selektif berbasis heuristik pada Shark Search lebih efektif dalam mengidentifikasi dokumen jurnal informatika yang relevan dibandingkan metode crawling konvensional.

Item Type: Thesis (Bachelor)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Jurusan Matematika dan Teknologi Informasi > Informatika
Depositing User: Helmi .
Date Deposited: 11 Jul 2025 08:09
Last Modified: 11 Jul 2025 08:09
URI: http://repository.itk.ac.id/id/eprint/23983

Actions (login required)

View Item View Item