上云无忧 > 文档中心 > 百度智能云全功能AI开发平台BML自定义作业建模 - 训练作业代码示例(PaddlePaddle 2.0.0rc)
飞桨BML 全功能AI开发平台
百度智能云全功能AI开发平台BML自定义作业建模 - 训练作业代码示例(PaddlePaddle 2.0.0rc)

文档简介:
此处提供基于Paddle框架的MNIST图像分类示例代码,数据集请点击这里下载。 单机训练时(计算节点等于1),示例代码如下: import os import numpy import paddle # 导入paddle模块 import paddle.fluid as fluid import gzip import struct work_path = os.getcwd() cluster_train_dir = "%s/train_data" % work_path def load_data(file_dir, is_train=True):
*此产品及展示信息均由百度智能云官方提供。免费试用 咨询热线:400-826-7010,为您提供专业的售前咨询,让您快速了解云产品,助您轻松上云! 微信咨询
  免费试用、价格特惠

Paddle

此处提供基于Paddle框架的MNIST图像分类示例代码,数据集请点击这里下载。

单机训练时(计算节点等于1),示例代码如下:

import os import numpy import paddle # 导入paddle模块 import paddle.fluid as
 fluid import gzip import struct
work_path = os.getcwd() cluster_train_dir = "%s/train_data" % work_path def
 load_data(file_dir, is_train=True): """
    :param file_dir:
    :param is_train:
    :return:
    """ if is_train: image_path = file_dir + '/train-images-idx3-ubyte.gz' label_path =
 file_dir + '/train-labels-idx1-ubyte.gz' else: image_path = file_dir + '/t10k-images-
idx3-ubyte.gz' label_path = file_dir + '/t10k-labels-idx1-ubyte.gz' with open(image_
path.replace('.gz', ''), 'wb') as out_f, gzip.GzipFile(image_path) as zip_f: out_f.write
(zip_f.read()) os.unlink(image_path) with open(label_path.replace('.gz', ''), 'wb') 
as out_f, gzip.GzipFile(label_path) as zip_f: out_f.write(zip_f.read()) os.unlink(label_path) 
with open(label_path[:-3], 'rb') as lbpath: magic, n = struct.unpack('>II', lbpath.read(8))
 labels = numpy.fromfile(lbpath, dtype=numpy.uint8) with open(image_path[:-3], 'rb') 
as imgpath: magic, num, rows, cols = struct.unpack('>IIII', imgpath.read(16)) images
 = numpy.fromfile(imgpath, dtype=numpy.uint8).reshape(len(labels), 784) return images,
 labels def reader_creator(file_dir, is_train=True, buffer_size=100): """
    :param file_dir:
    :param is_train:
    :param buffer_size:
    :return:
    """ images, labels = load_data(file_dir, is_train) def reader(): """
        :return:
        """ for num in range(int(len(labels) / buffer_size)): for i in range(buffer_size):
 yield images[num * buffer_size + i, :], int(labels[num * buffer_size + i]) return reader def softmax_regression(): """
    定义softmax分类器:
        一个以softmax为激活函数的全连接层
    Return:
        predict_image -- 分类的结果
    """ # 输入的原始图像数据,大小为28*28*1 img = fluid.layers.data(name='img', shape=[1, 28, 28],
 dtype='float32') # 以softmax为激活函数的全连接层,输出层的大小必须为数字的个数10 predict = fluid.
layers.fc( input=img, size=10, act='softmax') return predict def multilayer_perceptron(): """
    定义多层感知机分类器:
        含有两个隐藏层(全连接层)的多层感知器
        其中前两个隐藏层的激活函数采用 ReLU,输出层的激活函数用 Softmax
    Return:
        predict_image -- 分类的结果
    """ # 输入的原始图像数据,大小为28*28*1 img = fluid.layers.data(name='img', shape=[1, 28, 28],
 dtype='float32') # 第一个全连接层,激活函数为ReLU hidden = fluid.layers.fc(input=img, size=200, 
act='relu') # 第二个全连接层,激活函数为ReLU hidden = fluid.layers.fc(input=hidden, size=200, 
act='relu') # 以softmax为激活函数的全连接输出层,输出层的大小必须为数字的个数10 prediction =
 fluid.layers.fc(input=hidden, size=10, act='softmax') return prediction def convolutional_neural_network(): """
    定义卷积神经网络分类器:
        输入的二维图像,经过两个卷积-池化层,使用以softmax为激活函数的全连接层作为输出层
    Return:
        predict -- 分类的结果
    """ # 输入的原始图像数据,大小为28*28*1 img = fluid.layers.data(name='img', shape=[1, 28, 28],
 dtype='float32') # 第一个卷积-池化层 # 使用20个5*5的滤波器,池化大小为2,池化步长为2,
激活函数为Relu conv_pool_1 = fluid.nets.simple_img_conv_pool( input=img, filter_size=5,
 num_filters=20, pool_size=2, pool_stride=2, act="relu") conv_pool_1 = fluid.layers.
batch_norm(conv_pool_1) # 第二个卷积-池化层 # 使用20个5*5的滤波器,池化大小为2,池化步长为2,
激活函数为Relu conv_pool_2 = fluid.nets.simple_img_conv_pool( input=conv_pool_1, 
filter_size=5, num_filters=50, pool_size=2, pool_stride=2, act="relu") 
# 以softmax为激活函数的全连接输出层,输出层的大小必须为数字的个数10 prediction =
 fluid.layers.fc(input=conv_pool_2, size=10, act='softmax') return prediction def train_program(): """
    配置train_program
    Return:
        predict -- 分类的结果
        avg_cost -- 平均损失
        acc -- 分类的准确率
    """ paddle.enable_static() # 标签层,名称为label,对应输入图片的类别标签 label = 
fluid.layers.data(name='label', shape=[1], dtype='int64') # predict = softmax_regression() 
# 取消注释将使用 Softmax回归 # predict = multilayer_perceptron() # 取消注释将使用 多层感知器 
predict = convolutional_neural_network() # 取消注释将使用 LeNet5卷积神经网络 #
 使用类交叉熵函数计算predict和label之间的损失函数 cost = fluid.layers.cross_entropy(input=predict, label=label) 
# 计算平均损失 avg_cost = fluid.layers.mean(cost) # 计算分类准确率 acc = fluid.layers.accuracy(input=predict,
 label=label) return predict, [avg_cost, acc] def optimizer_program(): """
    :return:
    """ return fluid.optimizer.Adam(learning_rate=0.001) # 一个minibatch中有64个数据 BATCH_SIZE = 64 #
 每次读取训练集中的500个数据并随机打乱,传入batched reader中,batched reader 每次 yield 64个数据
 train_reader = paddle.batch( paddle.reader.shuffle( reader_creator(cluster_train_dir, is_train=True, 
buffer_size=100), buf_size=500), batch_size=BATCH_SIZE) # 读取测试集的数据,每次 yield 64个数据 
test_reader = paddle.batch( reader_creator(cluster_train_dir, is_train=False, buffer_size=100), 
batch_size=BATCH_SIZE) def event_handler(pass_id, batch_id, cost): # 打印训练的中间结果,训练轮次,
batch数,损失函数 print("Pass %d, Batch %d, Cost %f" % (pass_id, batch_id, cost)) 
# 该模型运行在单个CPU上 place = fluid.CPUPlace() # 调用train_program 获取预测值,损失值,
 prediction, [avg_loss, acc] = train_program() # 输入的原始图像数据,
大小为28*28*1 img = fluid.layers.data(name='img', shape=[1, 28, 28], dtype='float32') 
# 标签层,名称为label,对应输入图片的类别标签 label = fluid.layers.data(name='label', shape=[1], 
dtype='int64') # 告知网络传入的数据分为两部分,第一部分是img值,第二部分是label值 feeder = fluid.
DataFeeder(feed_list=[img, label], place=place) # 选择Adam优化器 optimizer = fluid.optimizer.Adam
(learning_rate=0.001) optimizer.minimize(avg_loss) PASS_NUM = 1 #训练1轮 epochs = [epoch_id for 
epoch_id in range(PASS_NUM)] # 将模型参数存储在名为 save_dirname 的文件中 save_dirname = "
./output/" def train_test(train_test_program, train_test_feed, train_test_reader): 
# 将分类准确率存储在acc_set中 acc_set = [] # 将平均损失存储在avg_loss_set中 avg_loss_set = 
[] # 将测试 reader yield 出的每一个数据传入网络中进行训练 for test_data in train_test_reader():
 acc_np, avg_loss_np = exe.run( program=train_test_program, feed=train_test_feed.feed(test_data), 
fetch_list=[acc, avg_loss]) acc_set.append(float(acc_np)) avg_loss_set.append(float(avg_loss_np))
 # 获得测试数据上的准确率和损失值 acc_val_mean = numpy.array(acc_set).mean() avg_loss_val_mean =
 numpy.array(avg_loss_set).mean() # 返回平均损失值,平均准确率 return avg_loss_val_mean, acc_val_mean
exe = fluid.Executor(place) exe.run(fluid.default_startup_program()) main_program = fluid.default_main
_program() test_program = fluid.default_main_program().clone(for_test=True) lists = [] step = 0 for 
epoch_id in epochs: for step_id, data in enumerate(train_reader()): metrics = exe.run(main_program, 
feed=feeder.feed(data), fetch_list=[avg_loss, acc]) if step % 100 == 0: #每训练100次 更新一次图片 
event_handler(step, epoch_id, metrics[0]) step += 1 # 测试每个epoch的分类效果 avg_loss_val, acc_val =
 train_test(train_test_program=test_program, train_test_reader=test_reader, train_test_feed=feeder) 
print("Test with Epoch %d, avg_cost: %s, acc: %s" % (epoch_id, avg_loss_val, acc_val)) lists.append(
(epoch_id, avg_loss_val, acc_val)) # 保存训练好的模型参数用于预测 if save_dirname is not None: fluid.io.
save_inference_model(save_dirname, ["img"], [prediction], exe, model_filename='model', params_filename='params')
 # 选择效果最好的pass best = sorted(lists, key=lambda list: float(list[1]))[0] print('Best pass is %s,
 testing Avgcost is %s' % (best[0], best[1])) print('The classification accuracy is %.2f%%' % (float(best[2]) * 100)) 

分布式训练时(计算节点大于1),示例代码如下: 说明:demo分布式程序没有做数据的分片操作,仅供参考

#!/usr/bin/env python # -*- coding: utf-8 -*- """
""" import os import gzip import struct import numpy as np from PIL import Image import time import paddle 
import paddle.distributed.fleet as fleet import paddle.static.nn as nn import paddle.fluid as fluid 
from paddle.io import Dataset
TEST_IMAGE = 't10k-images-idx3-ubyte.gz' TEST_LABEL = 't10k-labels-idx1-ubyte.gz' TRAIN_IMAGE = 
'train-images-idx3-ubyte.gz' TRAIN_LABEL = 'train-labels-idx1-ubyte.gz' class MNIST(Dataset): """
    MNIST
    """ def __init__(self, data_dir=None, mode='train', transform=None, backend=None): assert 
mode.lower() in ['train', 'test'], \ "mode should be 'train' or 'test', but got {}".format(mode) 
if backend is None: backend = paddle.vision.get_image_backend() if backend not in ['pil', 'cv2']: 
raise ValueError( "Expected backend are one of ['pil', 'cv2'], but got {}" .format(backend)) self.backend = backend
        self.mode = mode.lower() if self.mode == 'train': self.image_path = os.path.join(data_dir,
TRAIN_IMAGE) self.label_path = os.path.join(data_dir, TRAIN_LABEL) else: self.image_path = os.path.
join(data_dir, TEST_IMAGE) self.label_path = os.path.join(data_dir, TEST_LABEL) self.transform =
 transform # read dataset into memory self._parse_dataset() self.dtype = paddle.get_default_dtype()
 def _parse_dataset(self, buffer_size=100): self.images = [] self.labels = [] with gzip.GzipFile
(self.image_path, 'rb') as image_file: img_buf = image_file.read() with gzip.GzipFile(self.label_path, 
'rb') as label_file: lab_buf = label_file.read() step_label = 0 offset_img = 0 # read from Big-endian 
# get file info from magic byte # image file : 16B magic_byte_img = '>IIII' magic_img, image_num, 
rows, cols = struct.unpack_from( magic_byte_img, img_buf, offset_img) offset_img += struct.calcsize
(magic_byte_img) offset_lab = 0 # label file : 8B magic_byte_lab = '>II' magic_lab, label_num = 
struct.unpack_from(magic_byte_lab, lab_buf, offset_lab) offset_lab += struct.calcsize(magic_byte_lab) 
while True: if step_label >= label_num: break fmt_label = '>' + str(buffer_size) + 'B' labels =
 struct.unpack_from(fmt_label, lab_buf, offset_lab) offset_lab += struct.calcsize(fmt_label) step_label += buffer_size
                    fmt_images = '>' + str(buffer_size * rows * cols) + 'B' images_temp 
= struct.unpack_from(fmt_images, img_buf, offset_img) images = np.reshape(images_temp,
 (buffer_size, rows * cols)).astype('float32') offset_img += struct.calcsize(fmt_images) 
for i in range(buffer_size): self.images.append(images[i, :]) self.labels.append( np.
array([labels[i]]).astype('int64')) def __getitem__(self, idx): image, label = self.images[idx], 
self.labels[idx] image = np.reshape(image, [28, 28]) if self.backend == 'pil': image = Image.
fromarray(image.astype('uint8'), mode='L') if self.transform is not None: image = self.transform(image) 
if self.backend == 'pil': return image, label.astype('int64') return image.astype(self.dtype), 
label.astype('int64') def __len__(self): return len(self.labels) def mlp_model(): """
    mlp_model
    """ x = paddle.static.data(name="x", shape=[64, 28, 28], dtype='float32') y = 
paddle.static.data(name="y", shape=[64, 1], dtype='int64') x_flatten = paddle.reshape(x, [64, 784]) 
fc_1 = nn.fc(x=x_flatten, size=128, activation='tanh') fc_2 = nn.fc(x=fc_1, size=128, activation='tanh') 
prediction = nn.fc(x=[fc_2], size=10, activation='softmax') cost = paddle.fluid.layers.cross_entropy
(input=prediction, label=y) acc_top1 = paddle.metric.accuracy(input=prediction, label=y, k=1) avg_cost
 = paddle.mean(x=cost) res = [x, y, prediction, avg_cost, acc_top1] return res def train(epoch, exe,
 train_dataloader, cost, acc): """
    train
    """ total_time = 0 step = 0 for data in train_dataloader(): step += 1 start_time = time.time() 
loss_val, acc_val = exe.run( paddle.static.default_main_program(), feed=data, fetch_list=[cost.name,
 acc.name]) if step % 100 == 0: end_time = time.time() total_time += (end_time - start_time) print(
 "epoch: %d, step:%d, train_loss: %f, train_acc: %f, total time cost = %f, speed: %f"
 % (epoch, step, loss_val[0], acc_val[0], total_time, 1 / (end_time - start_time) )) def 
test(exe, test_dataloader, cost, acc): """
    test
    """ total_time = 0 step = 0 for data in test_dataloader(): step += 1 start_time = 
time.time() loss_val, acc_val = exe.run( paddle.static.default_main_program(), feed=data, 
fetch_list=[cost.name, acc.name]) if step % 100 == 0: end_time = time.time() total_time += 
(end_time - start_time) print( "step:%d, test_loss: %f, test_acc: %f, total time cost = 
%f, speed: %f" % (step, loss_val[0], acc_val[0], total_time, 1 / (end_time - start_time) ))
 def save(save_dir, feed_vars, fetch_vars, exe): """
    save
    """ path_prefix = os.path.join(save_dir, 'model') if fleet.is_first_worker():
 paddle.static.save_inference_model(path_prefix, feed_vars, fetch_vars, exe) if __name__
 == '__main__': # 设置训练集路径 train_data = './train_data' # 设置验证集路径 test_data
 = './test_data' # 设置输出路径 save_dir = './output' # 设置迭代轮数 epochs = 10
 # 设置验证间隔轮数 test_interval = 2 # 设置模型保存间隔轮数 save_interval = 2 paddle.enable_static()
 paddle.vision.set_image_backend('cv2') # 训练数据集  train_dataset = MNIST(data_dir=train_data, mode='train')
 # 验证数据集 test_dataset = MNIST(data_dir=test_data, mode='test') # 设置模型 [x, y, pred, cost, acc] = mlp_model(
) place = paddle.CUDAPlace(int(os.environ.get('FLAGS_selected_gpus', 0))) # 数据加载 train_dataloader =
 paddle.io.DataLoader( train_dataset, feed_list=[x, y], drop_last=True, places=place, batch_size=64, 
shuffle=True, return_list=False) test_dataloader = paddle.io.DataLoader( test_dataset, feed_list=[x, y],
 drop_last=True, places=place, batch_size=64, return_list=False) # fleet初始化 strategy = fleet.DistributedStrategy()
 fleet.init(is_collective=True, strategy=strategy) # 设置优化器 optimizer = paddle.optimizer.Adam() optimizer 
= fleet.distributed_optimizer(optimizer) optimizer.minimize(cost) exe = paddle.static.Executor(place)
 exe.run(paddle.static.default_startup_program()) prog = paddle.static.default_main_program() for epoch 
in range(epochs): train(epoch, exe, train_dataloader, cost, acc) if epoch % test_interval == 0: test
(exe, test_dataloader, cost, acc) # save model if epoch % save_interval == 0: save(save_dir, [x], [pred], exe)

相似文档
  • 超参搜索是机器学习/深度学习技术中的关键一环,无论是机器学习的树模型参数、特征选择、还是深度学习的学习率/权重衰减等等,甚至于网络结构的选择,都会涉及到搜索最优参数的需求。传统的人工超参搜索需要有经验的工程师耗费大量的时间和精力进行手动调优,而自动超参搜索能够在节省人力的情况下,自动地进行超参调优,更有效率地寻找最优解。
  • 1.前提条件 2.新建作业 3.使用自动搜索作业训练模型 3.1 基本信息 3.2 算法配置 3.3 数据集配置 3.4 自动搜索配置 3.5 资源配置 3.6 查看搜索结果及可视化 4.发布模型 前提条件: 自定义作业需要依赖于BOS对象存储读取输入文件,创建自定义作业之前需要保证您已经开通了BOS对象存储的服务。 授权自定义作业读写您的BOS对象存储,以顺利进行自定义作业的配置。
  • 1.代码入参说明 2.必要接口说明 代码入参说明: 自动搜索作业的实现过程:通过搜索算法获取多个超参数组合,每个组合都会通过训练得到一个评估结果,以此最终判断超参数组合的优劣,而用户编写的代码即是用于实现单次训练。 用户需要通过argparse模块接受在平台中填写的信息以及搜索算法反馈的超参数组合。
  • yaml文件编写规范 如下所示是进化算法pbt的yaml文件配置示例: yaml文件中主要分为四大部分: 搜索算法参数: algo:指定选择的搜索策略,下表为各种搜索算法对应的字段。 随机搜索 RANDOM_SEARCH 贝叶斯搜索 TPE_SEARCH 进化算法 PBT_SEARCH 进化-CMAES CMAES_SEARCH 粒子群算法 PARTICLE_SEARCH
  • 基于TensorFlow1.13.2框架的MNIST图像分类,训练数据集tf_train_data2.zip点击这里下载。 如下所示是其超参搜索任务中一个超参数组合的训练代码,代码会通过argparse模块接受在平台中填写的信息,请保持一致。
官方微信
联系客服
400-826-7010
7x24小时客服热线
分享
  • QQ好友
  • QQ空间
  • 微信
  • 微博
返回顶部