CLASSIFICATION OF NEWS TYPES BY IMPLEMENTING ENHANCED CONFIX STRIPPING STEMMER

: News has become a community need in the world. Managing a lot of news articles is not easy and takes a long time. Indonesia has various types of media platforms that display news, one of which is an online news portal. Automation systems that are capable of managing and grouping Indonesian language news articles are needed. This study designed and built a web-based application to classify types of Indonesian language news articles by implementing the Enhanced Confix Stripping Stemmer algorithm. The categories used in the system are entertainment, lifestyle, sports, technology


Introduction
News has become a community need in the world.One of the media displays news, namely online news portals.News does not only refer to the press or the mass media, but on radio, television, film, and the internet.At first, the news only belonged to the newspaper.But now the news is attached to radio, television and the internet.Indonesia has many online news portals such as liputan6, tribunnews, kompas, detik, kompasiana, kapanlagi, and others.
The survey results of the Association of Indonesian Internet Service Providers (APJII) and Technologists stated that the growth of internet users in 2017 reached 143.26 million.Increased compared to 2016 amounting to 132.7 million people [1].This will affect the growth and exchange of information.One of the effects is that news articles uploaded on the internet are very many with a fast span of time.Managing a lot of news articles is not easy and takes a long time.Automation systems that are capable of managing and grouping Indonesian language news articles are needed.This grouping is expected to simplify and streamline time in managing news documents.
Text mining is a way for text to be processed using a computer to produce useful analysis [2].The stages of text mining are generally divided into several general stages, namely, preprocessing, feature selection, and stemming [3].Stemming is the process of mapping and decomposing various forms of a word into their basic words.The purpose of stemming is to eliminate affixes in the form of prefixes, suffixes, and confixes to each word.Indonesian has morphological rules, so the stemming process must be based on the Indonesian morphological rules [3].
Stemming has several methods, one of which will be used in this study is Enhanced Confix Stripping Stemmer (ECS).ECS is a stemming method in Indonesian which was introduced by AgusZainalArifin, I PutuKertaMahendra, Henning TitiCiptaningtyas in 2014.ECS is a development of the Confix Stripping Stemmer (CS) method introduced by Jelita Asian in 2007 [4]. 1) Modifying some of the rules in Tables 2 and 3, so that the process of stemming the words with construction "mem + p ...", "male + s ...", "meng + +", "peng + +. ... ", and" peng + k ... "can be done.These modifications are listed in Table 6.2) Add additional stemming steps to resolve the problem of deleting suffixes.This additional step is called LoopPengembalianAkhiran.This step is done when recording (step Confix Stripping Stemmer) fails [4].

Enhanced Confix Stripping (ECS) Stemmer
At each LoopPengembalianAkhiran process, a dictionary search is performed to check the results in the current word.The process in the loopPengembalianAkhiran is defined as follows: 1) Revert the word to the pre-encoding form and return all prefixes that were deleted in the last process, so that it will create the word model like this:

[DP + [DP + [DP]]] + Term
The deletion of the prefix is tried.If the dictionary search is successful, the process stops.
If not, the next step is executed.2) Return the ending that was deleted earlier.This means that the return starts from DS ("-i", "-kan", "-an") if it exists, then is followed by PP ("-ku", "-mu", "-nya"), and finally is P ("lah", "-kah", "-ah", "-pun").On each return, steps 3 to 5 are tried.A special case for DS "kan", the character "k" is restored first and steps 3 to 5 are executed.If it still fails, then "an" is restored.3) Prefixes removal is carried out according to the rules defined in Tables 2, 3, 4, and 5 (with modifications in Table 6).4) Recoding.5) If the dictionary search is not successful, return the word to the pre-encoding form and return all the prefixes that were deleted.The next suffix in the order in step 1 is restored and steps 3 to 5 are performed on the current word [4].

Rapid Application Development(RAD)
The method used to design and build a system to implement Enhanced Confix Stripping Stemmer (ECS) in the classification of news types is Rapid Application Development (RAD).Rapid Application Development (RAD) is a system development strategy that emphasizes the speed in development through user involvement in development quickly, iteratively, and incrementally from a series of prototypes and a system that can develop into a final system or a particular version [5].
The RAD stages are as follows: The type of test data used is secondary data in the form of news articles sourced from the news portal kompas.comand cnnindonesia.com.The number of news articles used is 30 news articles.Data is taken by quoting a news article that has been published on a news portal that has been determined.The news categories used are economics, technology, entertainment, lifestyle, and sports.

Results and Discussions
Based on some previous studies the classification of news types.Users only need to upload news on the system, then the news will be automatically classified by this system.
The scope of the system built is as follows:  The categories of news used are five categories, namely entertainment, lifestyle, sports, technology, and economy.The news data is obtained from online news portals. The news used in this study is Indonesian language news. In the text mining stage, the tagging stage is not carried out because it does not handle English-language texts. The system built is not integrated with existing online news portals, but by making its own homepage web-based and using offline networks.
Testing the results of the classification is done to determine the level of accuracy of the system implementation of the Enhanced Confix Stripping Stemmer algorithm for classification of news types.Tests are carried out on class results for test data.The results of the testing carried out by the system are shown in Table 7.The testing model is comparing the test news categories available on online news portals that have been determined with the system that has been built.After calculating the accuracy of 5 iterations, then calculate the average accuracy.Calculation of the accuracy of the implementation of Enhanced Confix Stripping Stemmer algorithms is shown in Table 8.
Enhanced Confix Stripping Stemmer is the result of the development of Stemmer Confix Stripping conducted by AgusZainalArifin, I PutuKertaMahendra, HenningTitiCiptaningtyas in 2014.Based on the failures of Confix Stripping Stemmer, Arifin, Mahendra, and Ciptaningtyas trying to increase Confix Stripping Stemmer, and presents a modified Confix Stripping Stemmer called Enhanced Confix Stripping Stemmer.The repair agreement is as follows:

Table 1 :
The flow of prefixes removal for the "me-" prefix

Table 4 :
The flow of prefixes removal for the "te-" prefix

al., Vol.6 (Iss.5): May 2019] ISSN: 2454-1907 DOI: 10.5281/zenodo.3232918 Http
Enhanced Confix Stripping Stemmer algorithm is an algorithm that has the highest level of accuracy compared to other stemming algorithms such as Porter Stemmer, Nazief-Adriani, and Confix Stripping Stemmer.The system to be built is implementing the Enhanced Confix Stripping Stemmer Algorithm in the form of a web-based application to classify the types of news articles.This system is expected to provide high accuracy in terms of [Medyawati et.://www.ijetmr.com©InternationalJournal of Engineering Technologies and Management Research [139]

Table 7 :
Classification Testing Table