In the field of Machine Learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one.
A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known.
→ The four basic terminologies to determining the metrics:
- True Positives (TP): when the actual value is Positive and predicted is also Positive.
- True negatives (TN): when the actual value is Negative and prediction is also Negative.
- False positives (FP): When the actual is negative but the prediction is Positive. Also known as the Type 1 error
- False negatives (FN): When the actual is Positive but the prediction is Negative. Also known as the Type 2 error
Understanding with an example,
We have 16 Tigers and Lions and our model predicts whether it is a Tiger or not.
Actual values = [‘Lion’, ‘Tiger’, ‘Lion’, ‘Tiger’, ‘Lion’, ‘Lion’, ‘Tiger’, ‘Lion’, ‘Tiger’, ‘Lion’, ‘Lion’, ‘Lion’, ‘Lion’, ‘Tiger’, ‘Lion’,‘Tiger’]
Predicted values = [‘Lion’, ‘Tiger’, ‘Lion’, ‘Tiger’, ‘Lion’, ‘Lion’, ‘Tiger’, ‘Lion’, ‘Tiger’, ‘Lion’, ‘Lion’, ‘Tiger’, ‘Lion’, ‘Tiger’, ‘Lion’, ‘Tiger’]
- True Positive (TP) = 5 (predicted that an animal is a Tiger and it actually is.)
- True Negative (TN) = 8 (predicted that animal is not a Tiger and it actually is not (it’s a Lion).)
- False Positive (Type 1 Error) (FP) = 2 (predicted that animal is a Tiger but it actually is not (it’s a Lion).)
- False Negative (Type 2 Error) (FN) = 1 (predicted that animal is not a Tiger but it actually is.)
Accuracy: The number of correctly (True) predicted results out of the total.
Accuracy= (TP + TN) / (TP + TN + FP + FN)
So, Here Accuracy = (5+8)/(5+8+2+1) = 0.8125
Cybercrime, or computer crime, is a crime that involves a computer and a network. The computer may have been used in the commission of a crime, or it may be the target. Cybercrime may harm someone’s security and financial health.
Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money or profit. Cybercrime is carried out by individuals or organizations.
Types of cybercrime :
- Cryptojacking (where hackers mine cryptocurrency using resources they do not own).
- Cyberespionage (where hackers access government or company data).
- Ransomware attacks (a type of cyber extortion).
- Cyberextortion (demanding money to prevent a threatened attack).
- Email and internet fraud.
- Theft of financial or card payment data.
- Theft and sale of corporate data.
Confusion Matrix in monitoring Cyber Attacks:
The data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between ``bad’’ connections, called intrusions or attacks, and good normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.
In KDD99 dataset these four attack classes (DoS, U2R,R2L, and probe) are divided into 22 different attack classes that tabulated below:
In the KDD Cup 99, the criteria used for evaluation of the participant entries is the Cost Per Test (CPT) computed using the confusion matrix and a given cost matrix.
- True Positive (TP): The amount of attack detected when it is actually attack.
- True Negative (TN): The amount of normal detected when it is actually normal.
- False Positive (FP): The amount of attack detected when it is actually normal (False alarm).
- False Negative (FN): The amount of normal detected when it is actually attack.
Confusion matrix contains information on actual and predicted classifications done by a classifier. The performance of a cyber-attack detection system is commonly evaluated using the data in a matrix.
In the present world, cybercrime offenses are happening at an alarming rate. As the use of the Internet is increasing many offenders, make use of this as a means of communication in order to commit a crime. The framework developed in our work is essential to the creation of a model that can support analytics regarding the identification, detection, and classification of integrated cybercrime offenses (structured and unstructured). The main focus of our work is to find the attacks that take advantage of the security vulnerabilities and analyze these attacks by making use of machine learning techniques. The aim is that the developed framework will provide the essential broad knowledge of cybercrime offenses in society, enable them to consider the threat landscape of such attacks, and avoid the incarnation of the cybercrime offenses.