K-Nearest Neighbors theory

K-Nearest Neighbor (KNN) is one of the simplest machine learning algorithms. KNN is a frequency-based supervised algorithm. In this post, we will study What is KNN with a simple example. Supervised algorithms are classified as Regression and Classification.

Machine Learning Algorithm.png

KNN algorithm is used for both Regression & Classification. It is one of the simplest machine learning techniques where the algorithm classifies the test data based on their similarity with the training data.

Machine Learning Algorithm (2).png

Here, there are two categories: Class A and Class B, when KNN is applied to the data, the new data point is classified as Class B based on its similarity (closeness) with Class B rather than Class A. This classification is based on Similarities and Dissimilarities between the objects.

In the above example, the new data point is closest to Class B, therefore it is classified as Class B. This similarity is based on Distance-measure. Once the distance measure is finalized, the next step is to determine the number of neighbors with which the comparison has to be made. This is the 'K' value and based on these K neighbors, the prediction is made. Due to this process, the algorithm is named as K- Nearest Neighbor. The K has to be determined by the programmer, if the value of K is too small the chances of error increase. Therefore, it is important to determine an optimum value of K. The most optimum value of K is either 3 or 5. (Always keep it odd numbers, because if the numbers are even and both the distances are the same then a problem occurs.)

KNN example

In this example, I have used the iris dataset.

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

dataset = pd.read_csv('../input/iris/Iris.csv')

Summarize the dataset

dataset.shape

output - (150, 6)

dataset.head(5)

dataset.describe()

dataset.groupby('Species').size()

Dividing Data into features and labels

feature_columns = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm','PetalWidthCm']
X = dataset.iloc[:,1:3].values
y = dataset['Species'].values

Label encoder

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)

Spliting dataset into training and testing

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

Data Visualization

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline #This shows the plot right below this cell

from pandas.plotting import parallel_coordinates
plt.figure(figsize=(15,10))
parallel_coordinates(dataset.drop("Id",axis = 1), "Species")
plt.title('Parallel Coordinates Plot', fontsize=20, fontweight='bold')
plt.xlabel('Features', fontsize=15)
plt.ylabel('Features values', fontsize=15)
plt.legend(loc=1, prop={'size': 15}, frameon=True,shadow=True, facecolor="white", edgecolor="black")
plt.show()

Making predictions

# Fitting classifier to the Training set
# Loading libraries
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.model_selection import cross_val_score

# Instantiate learning model (k = 3)
classifier = KNeighborsClassifier(n_neighbors=3)

# Fitting the model
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

Evaluating predictions

#Calculating model accuracy
accuracy = accuracy_score(y_test, y_pred)*100
print('Accuracy of our model is equal ' + str(round(accuracy, 2)) + ' %.')

Output - Accuracy of our model is equal 73.33 %.


model = KNeighborsClassifier().fit(X,y)

pred = model.predict([[1,2]])
pred

Output - array([0])

Conclusion

The predicted output is 0, which means it belongs to 1st class which is 'Iris-setosa'.

Thank you for reading.

Have a nice day! 😁

For more such content make sure to subscribe to my Newsletter here
Follow me on

Twitter

Github

The Essentials of KNN Algorithm: Understanding with a Concrete Example

KNN with an example

K-Nearest Neighbors theory

KNN example

Summarize the dataset

Dividing Data into features and labels

Label encoder

Spliting dataset into training and testing

Data Visualization

Making predictions

Evaluating predictions

Conclusion

The Essentials of KNN Algorithm: Understanding with a Concrete Example

KNN with an example

K-Nearest Neighbors theory

KNN example

Summarize the dataset

Dividing Data into features and labels

Label encoder

Spliting dataset into training and testing

Data Visualization

Making predictions

Evaluating predictions

Conclusion

Did you find this article valuable?