手写数字识别CNN版

  接触过深度学习的同学对手写数字识别的任务并不陌生。它一个入门的阶段,是每位初学深度学习的同学的基本学习任务。任务的目标是建立一个分类模型,对0-9的黑白手写数字图片进行识别。
  以下内容是使用CNN对训练集做特征提取,然后最后一层使用逻辑回归做预测。

  相信接触过深度学习的童鞋对手写数字识别的任务并不陌生。它一个入门的阶段,是每位初学深度学习的同学的基本学习任务。任务的目标是建立一个分类模型,对0-9的黑白手写数字图片进行识别。我在这里使用了CNN对训练集做特征提取,使用pyTorch库。

代码

1
2
3
4
5
import torch
import torchvision.transforms as transforms
import torch.nn as nn
import torchvision.datasets as dsets
from torch.autograd import Variable
  • 设置基本参数,迭代5次,batch size设为100,学习率设置为0.001
1
2
3
num_epochs = 5
batch_size = 100
learning_rate = 0.001
  • 对于常见的数据,pytorch都设置有数据获取API
1
2
3
4
5
6
7
8
9
10
11
12
13
14
train_dataset = dsets.MNIST(root='../../data',
train=True,
transform=transforms.ToTensor(),
download=False)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=True)

test_dataset = dsets.MNIST(root='../../data',
train=False,
transform=transforms.ToTensor())
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
batch_size=batch_size,
shuffle=False)
  • 新建类CNN
  • pytorch提供了一个torch.nn.Module父类,所有的神经网络结构可以通过继承这个父类来实现;
  • 另外子类还可以通过重写父类的方法如forward来实现神经网络的前馈连接;
  • 神经网络内部的隐层结构可以通过调用nn.Sequential类,往里面塞卷积层pooling层来实现;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(1, 16, kernel_size=5, padding=2),
nn.BatchNorm2d(16),
nn.ReLU(),
nn.MaxPool2d(2)
)
self.layer2 = nn.Sequential(
nn.Conv2d(16, 32, kernel_size=5, padding=2),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(2)
)
self.fc = nn.Linear(7*7*32, 10)

def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
out = out.view(out.size(0), -1)
out = self.fc(out)
return out
  • 定义CNN变量,并将cnn模型参数和缓存区交给cuda进行运算
  • 选择交叉熵作为损失函数
  • 选择Adam算法
1
2
3
4
cnn = CNN()
cnn.cuda()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn.parameters(), lr=learning_rate)
1
- 开始训练模型
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
images = Variable(images).cuda()
labels = Variable(labels).cuda()

optimizer.zero_grad()
outputs = cnn(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

if (i+1) % 100 == 0:
print('epoch: {}/{}, step: {}/{}, loss: {}'.format(
epoch+1, num_epochs,
i+1, len(train_dataset)/batch_size,
loss.data[0]
))
epoch: 1/5, step: 100/600.0, loss: 0.007191823795437813
epoch: 1/5, step: 200/600.0, loss: 0.029861945658922195
epoch: 1/5, step: 300/600.0, loss: 0.010123591870069504
epoch: 1/5, step: 400/600.0, loss: 0.005973990075290203
epoch: 1/5, step: 500/600.0, loss: 0.02123316377401352
epoch: 1/5, step: 600/600.0, loss: 0.043297529220581055
epoch: 2/5, step: 100/600.0, loss: 0.008047117851674557
epoch: 2/5, step: 200/600.0, loss: 0.0028986502438783646
epoch: 2/5, step: 300/600.0, loss: 0.06134017929434776
epoch: 2/5, step: 400/600.0, loss: 0.00527493841946125
epoch: 2/5, step: 500/600.0, loss: 0.0023949909955263138
epoch: 2/5, step: 600/600.0, loss: 0.04052555933594704
epoch: 3/5, step: 100/600.0, loss: 0.0025511933490633965
epoch: 3/5, step: 200/600.0, loss: 0.018858356401324272
epoch: 3/5, step: 300/600.0, loss: 0.007384308613836765
epoch: 3/5, step: 400/600.0, loss: 0.0013453364372253418
epoch: 3/5, step: 500/600.0, loss: 0.01689516380429268
epoch: 3/5, step: 600/600.0, loss: 0.014116305857896805
epoch: 4/5, step: 100/600.0, loss: 0.0011698532616719604
epoch: 4/5, step: 200/600.0, loss: 0.00758977048099041
epoch: 4/5, step: 300/600.0, loss: 0.004462475888431072
epoch: 4/5, step: 400/600.0, loss: 0.039119914174079895
epoch: 4/5, step: 500/600.0, loss: 0.004173200111836195
epoch: 4/5, step: 600/600.0, loss: 0.0031711505725979805
epoch: 5/5, step: 100/600.0, loss: 0.0017395401373505592
epoch: 5/5, step: 200/600.0, loss: 0.005144419614225626
epoch: 5/5, step: 300/600.0, loss: 0.002179956529289484
epoch: 5/5, step: 400/600.0, loss: 0.0020281272009015083
epoch: 5/5, step: 500/600.0, loss: 0.0009707188582979143
epoch: 5/5, step: 600/600.0, loss: 0.002221117028966546
  • 交叉验证
1
2
3
4
5
6
7
8
9
cnn.eval()
correct = 0
total = 0
for images, labels in test_loader:
images = Variable(images).cuda()
outputs = cnn(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted.cpu() == labels).sum()
1
print('accuracy: {}'.format(correct/total*100))
accuracy: 98.89

so easy, 妈妈再也不用担心我不会用深度学习了

enjoy it!

0%