Sentiment analysis for covid-19 in Indonesia on Twitter with TF-IDF featured extraction and stochastic gradient descent

Document Type : Research Paper


Department of Master in Informatic Engineering, Faculty of Computer Science and Information Technology, Universitas Sumatera Utara, Medan, Indonesia


 Twitter is an information platform that can be used by any internet user. The opinions of the Twitter Netizens are still random or unclassified. The technique for classifying sentiment analysis requires an algorithm. One of the classification algorithms is Stochastic Gradient Descent (SGD). The more training data provided to the machine, the accuracy of the classification function model formed by the machine is also higher. But in making representations into numerical vectors, the dimensions of data become large due to the many features. Feature optimization needs to be done to the training data by reducing the dimensions of the training data while maintaining high model accuracy. The optimization feature used is the TF-IDF (term frequency-inverse document frequency) feature extraction. sentiment analysis using TF-IDF feature extraction and stochastic gradient descent algorithm can classify Indonesian text appropriately according to positive and negative sentiment. Classification Performance using TF-IDF feature extraction and stochastic gradient descent algorithm obtained an accuracy is 85.141%.