《机器学习实战》(笔记)— Logistic回归

作者:杨润炜
日期:2018/2/23 16:07

Logistic回归

学习的内容

  • 1.Logistic回归案例实战

我的理解

  • 利用Logistic回归进行分类的主要思想是:根据现有数据对分类边界线建立回归公式,以此进行分类。训练分类器的做法就是寻找最佳拟合参数,使用的是最优化算法,这里使用梯度下降法,因为输出结果是分类形式,所以用Sigmoid函数对结果进行处理。
    Sigmoid函数公式为:
    模型用一元函数:y = wx

    下面用Logistic回归原理和具体20个特征的数据集来预测马的存活问题。
    具体的代码实现如下:
    加载数据:代码略
    Sigmoid函数:

    1. def sigmoid(inX):
    2. return 1.0/(1+exp(-inX))

    随机梯度下降算法:
    这里采用随机形式是为了减少迭代数,加快模型收敛速度。

    1. def stocGradAscent1(dataMatrix, classLabels, numIter=150):
    2. m,n = shape(dataMatrix)
    3. weights = ones(n)
    4. for j in range(numIter):
    5. dataIndex = range(m)
    6. for i in range(m):
    7. # 每次迭代时调整学习率
    8. alpha = 4/(1.0+j+i)+0.0001
    9. # 样本数据索引随机获取
    10. randIndex = int(random.uniform(0,len(dataIndex)))
    11. # 梯度下降法更新模型参数
    12. h = sigmoid(sum(dataMatrix[randIndex]*weights))
    13. error = classLabels[randIndex] - h
    14. weights = weights + alpha * error * dataMatrix[randIndex]
    15. del(dataIndex[randIndex])
    16. return weights

    Logistic回归分类器:

    1. # 分类函数
    2. def classifyVector(inX, weights):
    3. prob = sigmoid(sum(inX*weights))
    4. if prob > 0.5: return 1.0
    5. else: return 0.0
    6. # 训练迭代的实现
    7. def colicTest():
    8. frTrain = open('horseColicTraining.txt')
    9. frTest = open('horseColicTest.txt')
    10. trainingSet = []; trainingLabels = []
    11. for line in frTrain.readlines():
    12. currLine = line.strip().split('\t')
    13. lineArr =[]
    14. for i in range(21):
    15. lineArr.append(float(currLine[i]))
    16. trainingSet.append(lineArr)
    17. trainingLabels.append(float(currLine[21]))
    18. trainWeights = stocGradAscent1(array(trainingSet), trainingLabels, 1000)
    19. errorCount = 0; numTestVec = 0.0
    20. for line in frTest.readlines():
    21. numTestVec += 1.0
    22. currLine = line.strip().split('\t')
    23. lineArr =[]
    24. for i in range(21):
    25. lineArr.append(float(currLine[i]))
    26. if int(classifyVector(array(lineArr), trainWeights))!= int(currLine[21]):
    27. errorCount += 1
    28. errorRate = (float(errorCount)/numTestVec)
    29. print "the error rate of this test is: %f" % errorRate
    30. return errorRate
    31. # 训练入口
    32. def multiTest():
    33. numTests = 10; errorSum=0.0
    34. for k in range(numTests):
    35. errorSum += colicTest()
    36. print "after %d iterations the average error rate is: %f" % (numTests, errorSum/float(numTests))

    运行结果:

    1. >>> import logRegres
    2. >>> logRegres.multiTest()
    3. logRegres.py:18: RuntimeWarning: overflow encountered in exp
    4. return 1.0/(1+exp(-inX))
    5. the error rate of this test is: 0.343284
    6. the error rate of this test is: 0.268657
    7. the error rate of this test is: 0.373134
    8. the error rate of this test is: 0.373134
    9. the error rate of this test is: 0.388060
    10. the error rate of this test is: 0.373134
    11. the error rate of this test is: 0.432836
    12. the error rate of this test is: 0.298507
    13. the error rate of this test is: 0.417910
    14. the error rate of this test is: 0.388060
    15. after 10 iterations the average error rate is: 0.365672
    16. >>>

    完整代码请查看:Logistic回归

意义

了解Logistic回归作分类器的原理及随机梯度下降的简单实现。

感谢您的阅读!
如果看完后有任何疑问,欢迎拍砖。
欢迎转载,转载请注明出处:http://www.yangrunwei.com/a/96.html
邮箱:glowrypauky@gmail.com
QQ: 892413924