跳转至

tool-多元数据-Pyod#

Pyod是一款专门做异常检测的python工具箱

github: https://github.com/yzhao062/pyod
文档说明: https://pyod.readthedocs.io/en/latest/index.html

基本API:

BaseDetector.fit(X, y)    # y可有可无
BaseDetector.decision_function()    # 原始的异常得分
BaseDetector.predict()
BaseDetector.predict_proba()    # 给出是异常点的概率

其属性

BaseDetector.decision_scores_      # 异常得分,score越大越异常
BaseDetector.labels_               # 1 代表异常

与sklearn区分
借鉴了sklearn的API形式,但是注意两者不通用

例子

(1) one model

# train the KNN detector
from pyod.models.knn import KNN
from pyod.utils.data import generate_data
from pyod.models.combination import aom, moa, average, maximization

## ! 输入的X必须是array [n_samples, n_features]
X, y = generate_data(train_only=True)  # load data
X.shape

clf = KNN()
clf.fit(X)

# get the prediction labels and outlier scores of the training data
y_train_pred = clf.labels_  # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_  # raw outlier scores

# get the prediction on the test data
y_pred = clf.predict(X)  # outlier labels (0 or 1)
y_scores = clf.decision_function(X) 

(2) 有时候单个model结果不一定可信,可以将多个model的结果进行combine
合并的方式主要有:
- average
- maximization
- average of maximization (AOM), 将所有的detector分成不同的组,每组先max, 然后对不同组再就mean
- maximization of average(MOA), 将所有的detector分成不同的组,每组先average, 然后对不同组取max

# initialize 20 base detectors for combination
import numpy as np
k_list = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140,
            150, 160, 170, 180, 190, 200]
n_clf = 10

train_scores = np.zeros([X.shape[0], n_clf])

for i in range(n_clf):
    k = k_list[i]
    clf = KNN(n_neighbors=k, method='largest')
    clf.fit(X)
    train_scores[:, i] = clf.decision_scores_
#
from pyod.utils.utility import standardizer
# scores have to be normalized before combination, 0均值,1 standard error
scores_norm = standardizer(train_scores)
## 不同的组合方法
comb_by_average = average(scores_norm)
comb_by_maximization = maximization(scores_norm)
comb_by_aom = aom(scores_norm, 5) # 5 groups
comb_by_moa = moa(scores_norm, 5) # 5 groups