Application of K-Means Clustering Method for City Grouping on Food Plant Productivity in North Sumatera

Received October 23, 2019 Revised November 14, 2019 Accepted December 15, 2019 The development of population increases every year causing food needs to increase, to meet food needs by increasing food crop productivity so that food availability can be sufficient. Food crops consist of rice, corn, green beans, peanuts, cassava, and sweet potatoes. Productivity in each region has different characteristics and therefore it is necessary to group the regions so that solution can be implemented in accordance with each of the characteristics of the region. The purpose of this study is to group districts/cities in North Sumatera Province based on food crop productivity using the k-means clustering method. Clustering k-means is method of grouping non-hierarchical data that attempts to partition existing data into one or more cluster or groups so that data that has the same characteristics are grouped into one same characterstics are grouped into other groups. The result of this study are the formation of 3 city district clusters namely, cluster 1 amounting to 1 regency/city, cluster 2 totaling 7 districts/cities, and cluster 3 totaling 25 districts/cities.


INTRODUCTION
The development of the population in Indonesia food is increasing in Indonesia is also increasing. Endurance Food in Indonesia is based on food, food utilization, and Food access needed sufficient for all regions in Indonesia.North Sumatra Province has the largest population on the island Sumatra and the fourth largest in Indonesia. In achieving its goals it is certainly not easy. Each region faces constraints by important factors that influence the achievement of productivity goals. Productivity is the area of harvested land and the production of crops. If this situation continues it will threaten national food availability in line with the increasing population every year. So the government must increase the productivity of food crops in each region so that food availability can be sufficient. To group regencies / cities based on food crop productivity using k-means clustering.

RESEARCH METHOD
Descriptive Analysis Descriptive analysis aims to provide a description of the data into variables. Results can be seen from the mean (mean), maximum value, minimum value, median value, and standard deviation value (Ghozali, 2009).
The formula looks for the standard deviation using the equation below: Data Standardization Usually referred to as standardization of data, standardization of data is carried out when the variable being worked on contains large unit differences. Then it requires the process of standardizing data by transforming data (standardizing the original data before further analysis). Standardization is carried out on relevant variables into the form of Z-scores (Supranto, 2004).
Cluster Analysis Cluster analysis is a multivariate technique whose aim is to classify objects or cases (respondents) into relatively homogeneous groups, commonly called clusters. The objects or cases in each group tend to be similar to each other and are not the same as objects from other clusters. Cluster analysis is also called numerical classification or taxonomy (Supranto, 2004).

Hierarchy Method
The method of grouping two or more objects / data that has the closest similarity and the process is continued to other objects / data that have a second closeness (Rencher, 2002).
According to Machfudhoh (2013) states that the agglomerative method in the clustering hierarchy method is divided into several methods, namely: 1.

Non-Hierarchy Method
The non-hierarchical method is called the k-means method. This method begins by determining the number of clusters or groups desired (two or three clusters). If the number of clusters is known, then the object of observation is combined into the cluster.

K-Means Clustering
The steps in the k-means clustering method are as follows: 1. Determine the number of clusters / k objects randomly (Madhulata, 2012). 2. Determine the initial centroid value (cluster center point) randomly as many as k cluster. 3. Calculate the distance of each object / data towards the center of the cluster to each cluster, using the Euclidian Distance formula (Nugroho, 2008).
The advantage of this method is that the distance between two objects / data will not be disturbed by the existence of new objects which are outliers. However, the distance can be affected by differences in the scale between dimensions where the distance is calculated (Dibya Jyoti Bora, 2014). 4. Allocate data into the minimum cluster center. 5. Iterate / repeat, then determine the position of the center of the new cluster. 6. Then if the cluster center point does not change again, the cluster process is complete, but if there is still data that moves the cluster, it is repeated again to step Data obtained from the results of the North Sumatra Agricultural Census of the North Sumatra Central Statistics Agency (BPS) in 2018. Data variables used in this problem are 7 variables, namely rice, corn, soybeans, green beans, peanuts, sweet potatoes, and cassava.

RESULT AND ANALYSIS
Descriptive Analysis Before carrying out the clustering-kmeans method, it is necessary to calculate the mean and standard deviation using equations 1 and 2, where the results are as follows:  Data Standardization If you know the value of the average and standard deviation, then look for the value of data standardization / standardization of data or commonly called the z-score. The formula of data standardization can be seen in equation (3), namely: The following is the standardization of data from Nias districts / cities: And so on until the district / city of Gunung Sitoli.
K-means Clustering 1. Determine the number of k (cluster) According to Edmira Rivani (2010) that the number of clusters can be determined by the researchers themselves, but researchers use the R program, then the cluster number is determined as many as 3 clusters.
Clusters were randomly formed as many as three clusters, so that there were three initial cluster centers of the k-means clustering method obtained/determined from three observation objects, namely the first cluster center in the form of Sibolga City, the second cluster center in the form of Gunung Sitoli City, and the third cluster center in the form Toba Samosir district. The results can be seen in the following table:  And so on do the district / city calculations at the point of the third cluster, then the results of the calculation are:

Calculate the average of variables between cluster
Next to find out the average of the variables between clusters using the following equation: ( )   Table 5 shows that there were 3 clusters formed. Each cluster is different from the smallest to the largest unit value in each variable. So it is said to be a low, medium, and high area. Where the average number of the first cluster consists of only one member of the cluster shows the lowest average total productivity compared to other clusters.
The second cluster consists of 7 cluster members with a high average number for the productivity of soybean and cassava food crops, while for medium food crop productivity, namely rice, peanuts, and sweet potatoes, and has the lowest average number for productivity green beans. The third cluster consists of 25 members of the cluster, the average number of high for the productivity of rice, corn, peanuts, green beans, and sweet potatoes, while for soybeans, and cassava has a moderate average amount.

CONCLUSION
From the results and discussion, it can be seen from the food crop productivity data, and it can be concluded that the results of the cluster are formed into three clusters and have the characteristics of each cluster.
Cluster 1 only consists of Sibolga districts/cities with the lowest average number of food crop productivity. The second cluster consists of Central Tapanuli Regency Labuhan Batu, Labuhan Batu Utara, West Nias, and Kota Tanjung Balai, Pematang Siantar, and Gunung Sitoli with a high average number of soybean and cassava food crop productivity, while for crop productivity medium food, namely rice, peanuts, and sweet potatoes, and has the lowest average amount for the productivity of mung beans.
The above results show that the first cluster only has 1 cluster, namely Sibolga regency/city, which has the lowest food crop productivity, therefore the government must continue to increase food crop productivity so food availability can be sufficient for each region in North Sumatra.