文档简介:
PaddleHub使用预训练模型和Finetune的工具
十行代码能干什么? 相信多数人的答案是可以写个“Hello world”,或者做个简易计算器,本章将告诉你另一个答案,还可以实现人工智能算法应用。基于PaddleHub,可以轻松使用十行代码完成所有主流的人工智能算法应用,比如目标检测、人脸识别、语义分割等任务。
PaddleHub是飞桨预训练模型应用工具,集成了最优秀的算法模型,旨在帮助开发者使用最简单的代码快速完成复杂的深度学习任务,另外,PaddleHub提供了方便的Fine-tune API,开发者可以使用高质量的预训练模型结合Fine-tune API快速完成模型迁移到部署的全流程工作。
图1是由PaddleHub实现的趣味应用街景动漫化,运行如下代码,快速体验一下。

图1:PaddleHub街景动漫化
- 安装PaddleHub并升级到最新版本。
! pip install paddlehub==2.1
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting paddlehub==2.1 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/7a/29/3bd0ca43c787181e9c22fe44b9
44b64d7fcb14ce66d3bf4602d9ad2ac76c/paddlehub-2.1.0-py3-none-any.whl (211 kB) ━━━━━
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 211.4/211.4 KB 4.
2 MB/s eta 0:00:00a 0:00:01 Requirement already satisfied: pyzmq in /opt/conda/envs/pyth
on35-paddle120-env/lib/python3.7/site-packages (from paddlehub==2.1) (22.3.0) Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/l
ib/python3.7/site-packages (from paddlehub==2.1) (4.1.1.26) Requirement already satisfied: rarfile in /opt/conda/envs/python35-paddle120-env/lib/pyth
on3.7/site-packages (from paddlehub==2.1) (3.1) Requirement already satisfied: gunicorn>=19.10.0 in /opt/conda/envs/python35-paddle120-env
/lib/python3.7/site-packages (from paddlehub==2.1) (20.0.4) Requirement already satisfied: visualdl>=2.0.0 in /opt/conda/envs/python35-paddle120-env/
lib/python3.7/site-packages (from paddlehub==2.1) (2.2.3) Requirement already satisfied: gitpython in /opt/conda/envs/python35-paddle120-env/lib
/python3.7/site-packages (from paddlehub==2.1) (3.1.14) Collecting paddle2onnx>=0.5.1 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/cf/40/7ebb5e820e80b94dbd132164
f61082df67f9588118580b93670543d6f7ad/paddle2onnx-0.9.5-cp37-cp37m-manylinux_2_17_x86_64.
manylinux2014_x86_64.whl (2.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━━━━━━━━━ 2.7/2.7 MB 4.4 MB/s eta 0:00:0000:0100:01 Requirement already satisfied: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7
/site-packages (from paddlehub==2.1) (5.1.2) Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.7
/site-packages (from paddlehub==2.1) (1.19.5) Requirement already satisfied: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python
3.7/site-packages (from paddlehub==2.1) (4.1.0) Requirement already satisfied: matplotlib in /opt/conda/envs/python35-paddle120-env/lib
/python3.7/site-packages (from paddlehub==2.1) (2.2.3) Requirement already satisfied: colorama in /opt/conda/envs/python35-paddle120-env/lib/python
3.7/site-packages (from paddlehub==2.1) (0.4.4) Requirement already satisfied: filelock in /opt/conda/envs/python35-paddle120-env/lib
/python3.7/site-packages (from paddlehub==2.1) (3.0.12) Requirement already satisfied: tqdm in /opt/conda/envs/python35-paddle120-env/lib/python3.7
/site-packages (from paddlehub==2.1) (4.27.0) Requirement already satisfied: flask>=1.1.0 in /opt/conda/envs/python35-paddle120-env/lib/
python3.7/site-packages (from paddlehub==2.1) (1.1.1) Requirement already satisfied: paddlenlp>=2.0.0rc5 in /opt/conda/envs/python35-paddle
120-env/lib/python3.7/site-packages (from paddlehub==2.1) (2.1.1) Requirement already satisfied: packaging in /opt/conda/envs/python35-paddle120-env/
lib/python3.7/site-packages (from paddlehub==2.1) (21.3) Requirement already satisfied: easydict in /opt/conda/envs/python35-paddle120-env/
lib/python3.7/site-packages (from paddlehub==2.1) (1.9) Requirement already satisfied: Pillow in /opt/conda/envs/python35-paddle120-env/lib
/python3.7/site-packages (from paddlehub==2.1) (8.2.0) Requirement already satisfied: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle
120-env/lib/python3.7/site-packages (from flask>=1.1.0->paddlehub==2.1) (1.1.0) Requirement already satisfied: Jinja2>=2.10.1 in /opt/conda/envs/python35-paddle120-
env/lib/python3.7/site-packages (from flask>=1.1.0->paddlehub==2.1) (3.0.0) Requirement already satisfied: click>=5.1 in /opt/conda/envs/python35-paddle120-env/
lib/python3.7/site-packages (from flask>=1.1.0->paddlehub==2.1) (7.0) Requirement already satisfied: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-
env/lib/python3.7/site-packages (from flask>=1.1.0->paddlehub==2.1) (0.16.0) Requirement already satisfied: setuptools>=3.0 in /opt/conda/envs/python35-paddle120-
env/lib/python3.7/site-packages (from gunicorn>=19.10.0->paddlehub==2.1) (56.2.0) Requirement already satisfied: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7
/site-packages (from paddle2onnx>=0.5.1->paddlehub==2.1) (1.16.0) Collecting onnx<=1.9.0 Downloading https://pypi.tuna.tsinghua.edu.cn/packages/3f/9b/54c950d
3256e27f970a83cd0504efb183a24312702deed0179453316dbd0/onnx-1.9.0-cp37-cp37m-manylinux2010_
x86_64.whl (12.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━ 12.2/12.2 MB 2.8 MB/s eta 0:00:0000:0100:01 Requirement already satisfied: protobuf in /opt/conda/envs/python35-paddle120-env/lib/python
3.7/site-packages (from paddle2onnx>=0.5.1->paddlehub==2.1) (3.14.0) Requirement already satisfied: multiprocess in /opt/conda/envs/python35-paddle120-env/lib/pyt
hon3.7/site-packages (from paddlenlp>=2.0.0rc5->paddlehub==2.1) (0.70.11.1) Requirement already satisfied: jieba in /opt/conda/envs/python35-paddle120-env/lib/python3.7/
site-packages (from paddlenlp>=2.0.0rc5->paddlehub==2.1) (0.42.1) Requirement already satisfied: h5py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/
site-packages (from paddlenlp>=2.0.0rc5->paddlehub==2.1) (2.9.0) Requirement already satisfied: seqeval in /opt/conda/envs/python35-paddle120-env/lib/python
3.7/site-packages (from paddlenlp>=2.0.0rc5->paddlehub==2.1) (1.2.2) Requirement already satisfied: paddlefsl==1.0.0 in /opt/conda/envs/python35-paddle120-env
/lib/python3.7/site-packages (from paddlenlp>=2.0.0rc5->paddlehub==2.1) (1.0.0) Requirement already satisfied: requests~=2.24.0 in /opt/conda/envs/python35-paddle120-env/
lib/python3.7/site-packages (from paddlefsl==1.0.0->paddlenlp>=2.0.0rc5->paddlehub==2.1) (2.24.0) Requirement already satisfied: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/
lib/python3.7/site-packages (from visualdl>=2.0.0->paddlehub==2.1) (1.0.0) Requirement already satisfied: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7
/site-packages (from visualdl>=2.0.0->paddlehub==2.1) (1.21.0) Requirement already satisfied: pandas in /opt/conda/envs/python35-paddle120-env/lib/python3.7/
site-packages (from visualdl>=2.0.0->paddlehub==2.1) (1.1.5) Requirement already satisfied: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python
3.7/site-packages (from visualdl>=2.0.0->paddlehub==2.1) (4.0.1) Requirement already satisfied: shellcheck-py in /opt/conda/envs/python35-paddle120-env/lib/python
3.7/site-packages (from visualdl>=2.0.0->paddlehub==2.1) (0.7.1.1) Requirement already satisfied: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python
3.7/site-packages (from visualdl>=2.0.0->paddlehub==2.1) (0.8.53) Requirement already satisfied: gitdb<5,>=4.0.1 in /opt/conda/envs/python35-paddle120-env/lib/python
3.7/site-packages (from gitpython->paddlehub==2.1) (4.0.5) Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/envs/python35-paddle120-env/lib
/python3.7/site-packages (from matplotlib->paddlehub==2.1) (2.8.2) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/envs/python35-paddle120-env/lib/py
thon3.7/site-packages (from matplotlib->paddlehub==2.1) (1.1.0) Requirement already satisfied: cycler>=0.10 in /opt/conda/envs/python35-paddle120-env/lib/python3.7
/site-packages (from matplotlib->paddlehub==2.1) (0.10.0) Requirement already satisfied: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-pa
ckages (from matplotlib->paddlehub==2.1) (2019.3) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/envs/python35
-paddle120-env/lib/python3.7/site-packages (from matplotlib->paddlehub==2.1) (3.0.8) Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib
/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0->paddlehub==2.1) (0.6.1) Requirement already satisfied: pyflakes<2.5.0,>=2.4.0 in /opt/conda/envs/python35-paddle120-env/lib
/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0->paddlehub==2.1) (2.4.0) Requirement already satisfied: pycodestyle<2.9.0,>=2.8.0 in /opt/conda/envs/python35-paddle120-env
/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0->paddlehub==2.1) (2.8.0) Requirement already satisfied: importlib-metadata<4.3 in /opt/conda/envs/python35-paddle120-env/l
ib/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0->paddlehub==2.1) (4.2.0) Requirement already satisfied: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.
7/site-packages (from Flask-Babel>=1.0.0->visualdl>=2.0.0->paddlehub==2.1) (2.8.0) Requirement already satisfied: smmap<4,>=3.0.1 in /opt/conda/envs/python35-paddle120-env/lib/pyt
hon3.7/site-packages (from gitdb<5,>=4.0.1->gitpython->paddlehub==2.1) (3.0.5) Requirement already satisfied: MarkupSafe>=2.0.0rc2 in /opt/conda/envs/python35-paddle120-env/lib
/python3.7/site-packages (from Jinja2>=2.10.1->flask>=1.1.0->paddlehub==2.1) (2.0.1) Requirement already satisfied: typing-extensions>=3.6.2.1 in /opt/conda/envs/python35-paddle120-env
/lib/python3.7/site-packages (from onnx<=1.9.0->paddle2onnx>=0.5.1->paddlehub==2.1) (4.2.0) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib
/python3.7/site-packages (from requests~=2.24.0->paddlefsl==1.0.0->paddlenlp>=2.0.0rc5->paddlehub==2.1) (2019.9.11) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35
-paddle120-env/lib/python3.7/site-packages (from requests~=2.24.0->paddlefsl==1.0.0->paddlenlp>=
2.0.0rc5->paddlehub==2.1) (1.25.6) Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/py
thon3.7/site-packages (from requests~=2.24.0->paddlefsl==1.0.0->paddlenlp>=2.0.0rc5->paddlehub==2.1) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/
site-packages (from requests~=2.24.0->paddlefsl==1.0.0->paddlenlp>=2.0.0rc5->paddlehub==2.1) (2.8) Requirement already satisfied: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib
/python3.7/site-packages (from bce-python-sdk->visualdl>=2.0.0->paddlehub==2.1) (3.9.9) Requirement already satisfied: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python
3.7/site-packages (from bce-python-sdk->visualdl>=2.0.0->paddlehub==2.1) (0.18.0) Requirement already satisfied: dill>=0.3.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7
/site-packages (from multiprocess->paddlenlp>=2.0.0rc5->paddlehub==2.1) (0.3.3) Requirement already satisfied: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python
3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddlehub==2.1) (1.3.4) Requirement already satisfied: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-
packages (from pre-commit->visualdl>=2.0.0->paddlehub==2.1) (0.10.0) Requirement already satisfied: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python
3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddlehub==2.1) (16.7.9) Requirement already satisfied: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python
3.7/site-packages (from pre-commit->visualdl>=2.0.0->paddlehub==2.1) (1.4.10) Requirement already satisfied: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7
/site-packages (from pre-commit->visualdl>=2.0.0->paddlehub==2.1) (2.0.1) Requirement already satisfied: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/
site-packages (from pre-commit->visualdl>=2.0.0->paddlehub==2.1) (1.3.0) Requirement already satisfied: scikit-learn>=0.21.3 in /opt/conda/envs/python35-paddle120-env/lib/
python3.7/site-packages (from seqeval->paddlenlp>=2.0.0rc5->paddlehub==2.1) (0.24.2) Requirement already satisfied: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7
/site-packages (from importlib-metadata<4.3->flake8>=3.7.9->visualdl>=2.0.0->paddlehub==2.1) (3.8.0) Requirement already satisfied: scipy>=0.19.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7
/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp>=2.0.0rc5->paddlehub==2.1) (1.6.3) Requirement already satisfied: joblib>=0.11 in /opt/conda/envs/python35-paddle120-env/lib/python3.7
/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp>=2.0.0rc5->paddlehub==2.1) (0.14.1) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/envs/python35-paddle120-env/
lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp>=2.0.0rc5->paddlehub==2.1) (2.1.0) Installing collected packages: onnx, paddle2onnx, paddlehub Attempting uninstall: paddlehub Found existing installation: paddlehub 2.0.4 Uninstalling paddlehub-2.0.4: Successfully uninstalled paddlehub-2.0.4 Successfully installed onnx-1.9.0 paddle2onnx-0.9.5 paddlehub-2.1.0
- 使用Paddlehub实现街景动漫化,只需要几行命令。
! hub install animegan_v2_hayao_64
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107
: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collect
ions.abc' is deprecated, and in 3.8 it will stop working from collections import MutableMapping /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20:
DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collect
ions.abc' is deprecated, and in 3.8 it will stop working from collections import Iterable, Mapping /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53:
DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collecti
ons.abc' is deprecated, and in 3.8 it will stop working
from collections import Sized
Download https://bj.bcebos.com/paddlehub/paddlehub_dev/animegan_v2_hayao_64_1.0.2.tar.gz
[##################################################] 100.00%
Decompress /home/aistudio/.paddlehub/tmp/tmp1nvbj0qr/animegan_v2_hayao_64_1.0.2.tar.gz
[##################################################] 100.00% [2022-05-06 10:50:27,390] [
INFO] - Successfully installed animegan_v2_hayao_64-1.0.2
import os import cv2 import paddlehub as hub import matplotlib.pyplot as plt
%matplotlib inline
os.environ['CUDA_VISIBLE_DEVICES'] = '0' model = hub.Module(name='animegan_v2_hayao_64',
use_gpu=True) # 模型预测 result = model.style_transfer(images=[cv2.imread('demo.jpg')])
plt.figure(figsize=(10,10))
plt.imshow(result[0][:,:,[2,1,0]])
plt.show()
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107:
DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections
.abc' is deprecated, and in 3.8 it will stop working from collections import MutableMapping /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: De
precationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc'
is deprecated, and in 3.8 it will stop working from collections import Iterable, Mapping /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: Depreca
tionWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is
deprecated, and in 3.8 it will stop working from collections import Sized [2022-05-06 10:50:31,022] [ WARNING] - The _initialize method in HubModule will soon be deprecated,
you can use the __init__() to handle the initialization of the object
本节将从如下几个方面介绍PaddleHub:
- 预训练模型的应用背景;
- PaddleHub的快速使用方法和PaddleHub支持的模型列表;
- 通过一个完整的案例,介绍如何使用自己的数据Fine-tune PaddleHub的预训练模型。
预训练模型的应用背景
众所周知,深度学习任务依赖较多的数据完成神经网络的训练。在实际场景中,数据量的大小与成本成正比,常遇到语料数据或者图像数据较少,不足以支持完成神经网络模型训练的场景。
经过不断的探索,人们发现有两种思路可以解决训练数据不足的问题。
多任务学习与迁移学习
人们发现处理很多任务所依赖的信息特征是相通的,比如从图片中框选出一只猫的任务与识别一个生物是不是猫的任务,均需要提取出标识猫的有效特征。这是符合认知的,人类处理一件任务也会不自觉的运用上从其他任务上学习到的知识和方法,比如我们学习英语的时候,也会代入已经掌握的很多中文语法习惯。
基于迁移学习的思想,我们可以将模型先在数据丰富的任务上学习,再使用新任务的小数据量做Fine-tune(网络参数的微调,继承了从数据丰富任务上学习到的知识),最终达到较好的效果。
图2展示了对于不同的自然语言任务,很多本质的信息和知识是可以共享的。词性标注、句子句法成分划分、命名实体识别、语义角色标注等NLP任务适合采用多任务学习来解决。PaddleHub提供了预训练好的语义表示库ERNIE,它是这方面的佼佼者。

图2:多任务学习与迁移学习
自监督学习
通过一些巧妙的方法,我们可以将一些无监督的数据样本转变成监督学习,来学习数据中的知识。如图3所示,按照通常的理解,一张无标签的图片和一段自然语言文本是无监督的数据。但我们可以将部分图像进行遮挡,未遮挡的部分作为监督模型的输入,遮挡的部分作为模型需要预测的输出。同样的,也可以将一段文本中的部分短语遮挡,未遮挡的部分作为监督模型的输入,遮挡的部分作为模型需要预测的输出。

图3:自监督学习
PaddleHub中预置了大量的预训练模型,均采用了上述两种技术,并结合了百度在互联网领域海量的独有数据积累,数十种广受开发者欢迎的模型均是PaddleHub独有的。
快速使用PaddleHub
既然PaddleHub的使用如此简单,功能又如此强大,那么读者们是否迫不及待了呢?下面我们就展示下快速使用PaddleHub的两种方式:Python代码调用和命令行调用。
通过Python代码调用方式 使用PaddleHub
首先以计算机视觉任务为例,我们选用一张测试图片test.jpg,分别实现如下四项功能:
-
人像抠图(deeplabv3p_xception65_humanseg)
-
人体部位分割(ace2p)
-
人脸检测(ultra_light_fast_generic_face_detector_1mb_640)
-
关键点检测(human_pose_estimation_resnet50_mpii)
注:有关调用的模型名字参考官方网站查询。
# 待预测图片 test_img_path = ["./test.jpg"] import matplotlib.pyplot as plt import matplotlib.image as mpimg
img = mpimg.imread(test_img_path[0]) # 展示待预测图片 plt.figure(figsize=(10,10))
plt.imshow(img)
plt.axis('off')
plt.show()
#安装预训练模型 !hub install deeplabv3p_xception65_humanseg==1.1.2
import paddlehub as hub import matplotlib.image as mpimg import matplotlib.pyplot as plt
module = hub.Module(name="deeplabv3p_xception65_humanseg")
res = module.segmentation(paths = ["./test.jpg"], visualization=True, output_dir='humanseg_output')
res_img_path = 'humanseg_output/test.png' img = mpimg.imread(res_img_path)
plt.figure(figsize=(10, 10))
plt.imshow(img)
plt.axis('off')
plt.show()
#安装预训练模型 !hub install ace2p==1.1.0
import paddlehub as hub import matplotlib.image as mpimg import matplotlib.pyplot as plt
module = hub.Module(name="ace2p")
res = module.segmentation(paths = ["./test.jpg"], visualization=True, output_dir='ace2p_output')
res_img_path = './ace2p_output/test.png' img = mpimg.imread(res_img_path)
plt.figure(figsize=(10, 10))
plt.imshow(img)
plt.axis('off')
plt.show()
#安装预训练模型 !hub install ultra_light_fast_generic_face_detector_1mb_640==1.1.2
import paddlehub as hub import matplotlib.image as mpimg import matplotlib.pyplot as plt
module = hub.Module(name="ultra_light_fast_generic_face_detector_1mb_640")
res = module.face_detection(paths = ["./test.jpg"], visualization=True, output_dir='face_detection_output')
res_img_path = './face_detection_output/test.jpg' img = mpimg.imread(res_img_path)
plt.figure(figsize=(10, 10))
plt.imshow(img)
plt.axis('off')
plt.show()
#安装预训练模型 !hub install human_pose_estimation_resnet50_mpii==1.1.0
import paddlehub as hub import matplotlib.image as mpimg import matplotlib.pyplot as plt
module = hub.Module(name="human_pose_estimation_resnet50_mpii")
res = module.keypoint_detection(paths = ["./test.jpg"], visualization=True, output_dir='keypoint_output')
res_img_path = './keypoint_output/test.jpg' img = mpimg.imread(res_img_path)
plt.figure(figsize=(10, 10))
plt.imshow(img)
plt.axis('off')
plt.show()
对于自然语言处理任务,下面以中文分词和情感分类的任务为例,待处理的数据以函数参数的形式传入。
#安装预训练模型 !hub install lac
import paddlehub as hub
lac = hub.Module(name="lac")
test_text = ["1996年,曾经是微软员工的加布·纽维尔和麦克·哈灵顿一同创建了Valve软件公司。
他们在1996年下半年从id software取得了雷神之锤引擎的使用许可,用来开发半条命系列。"]
res = lac.lexical_analysis(texts = test_text)
print("中文词法分析结果:", res)
#安装预训练模型 ! hub install senta_bilstm
import paddlehub as hub
senta = hub.Module(name="senta_bilstm")
test_text = ["味道不错,确实不算太辣,适合不能吃辣的人。就在长江边上,抬头就能看到长江的风景。鸭肠、黄鳝都比较新鲜。"]
res = senta.sentiment_classify(texts = test_text)
print("中文词法分析结果:", res)
#通过命令行方式实现人像分割任务 ! hub run deeplabv3p_xception65_humanseg --input_path test.jpg
#通过命令行方式实现文本分词任务 !hub run lac --input_text "今天是个好日子"
上面的命令中包含四个部分,分别是:
- hub 表示PaddleHub的命令。
- run 调用run执行模型的预测。
- deeplabv3p_xception65_humanseg、lac 表示要调用的算法模型。
- –input_path/–input_text 表示模型的输入数据,图像和文本的输入方式不同。
PaddleHub的命令行工具在开发时借鉴了Anaconda和PIP等软件包管理的理念,可以方便快捷的完成模型的搜索、下载、安装、升级、预测等功能。 可点击Github的网址了解详情。 目前,PaddleHub的命令行工具支持以下12个命令:
- install:用于将Module安装到本地,默认安装在{HUB_HOME}/.paddlehub/modules目录下;
- uninstall:卸载本地Module;
- show:用于查看本地已安装Module的属性或者指定目录下确定的Module的属性,包括其名字、版本、描述、作者等信息;
- download:用于下载百度提供的Module;
- search:通过关键字在服务端检索匹配的Module,当想要查找某个特定模型的Module时,使用search命令可以快速得到结果,例如hub search ssd命令,会查找所有包含了ssd字样的Module;
- list:列出本地已经安装的Module;
- run:用于执行Module的预测;
- version:显示PaddleHub版本信息;
- help:显示帮助信息;
- clear:PaddleHub在使用过程中会产生一些缓存数据,这部分数据默认存放在${HUB_HOME}/.paddlehub/cache目录下,用户可以通过clear命令来清空缓存;
- config:用于查看和设置Paddlehub相关设置,包括对server地址、日志级别的设置;
- serving:用于一键部署Module预测服务,详细用法见PaddleHub Serving一键服务部署。
PaddleHub的产品理念是模型即软件,通过Python API或命令行实现模型调用,可快速体验或集成飞桨特色预训练模型。 此外,当用户想用少量数据来优化预训练模型时,PaddleHub也支持迁移学习,通过Fine-tune API,内置多种优化策略,只需少量代码即可完成预训练模型的Fine-tuning。
使用自己的数据Fine-tune PaddleHub预训练模型
果农需要根据水果的不同大小和质量进行产品的定价,所以每年收获的季节有大量的人工对水果分类的需求。基于人工智能模型的方案,收获的大堆水果会被机械放到传送带上,模型会根据摄像头拍到的图片,控制仪器实现水果的自动分拣,节省了果农大量的人力。

图5:水果在工厂传送带上自动分类
下面我们就看看如果采集到少量的桃子数据,如何基于PaddleHub对ImageNet数据集上预训练模型进行Fine-tune,得到一个更有效的模型。桃子分类数据集取自AI Studio公开数据集桃脸识别,该桃脸识别数据集中已经将所有桃子的图片分为2个文件夹,一个是训练集一个是测试集;每个文件夹中有4个分类,分别是B1、M2、R0、S3。

图6:自动分类结果示意
实现迁移学习,包括如下步骤:
- 安装PaddleHub
- 数据准备
- 模型准备
- 训练准备
下面将根据这四个主要步骤,展示如何利用PaddleHub实现finetune。
在本次教程提供的数据文件中,已经提供了分割好的训练集、验证集、测试集的索引和标注文件。如果用户利用PaddleHub迁移CV类任务使用自定义数据,则需要自行切分数据集,将数据集切分为训练集、验证集和测试集。需要三个文本文件来记录对应的图片路径和标签,此外还需要一个标签文件用于记录标签的名称。相关方法可参考用户自定义PaddleHub的数据格式。
├─data: 数据目录 ├─train_list.txt:训练集数据列表
├─test_list.txt:测试集数据列表
├─validate_list.txt:验证集数据列表
├─label_list.txt:标签列表
└─……
训练集、验证集和测试集的数据列表文件的格式如下,列与列之间以空格键分隔。
图片1路径 图片1标签
图片2路径 图片2标签 ...
label_list.txt的格式如下:
分类1名称
分类2名称 ...
!unzip -q -o ./data/data34445/peach.zip -d ./work
准备好数据后即可使用PaddleHub完成数据读取器的构建,实现方法如下所示:构建数据读取Python类,并继承paddle.io.Dataset这个类完成数据读取器构建。在定义数据集时,需要预先定义好对数据集的预处理操作,并且设置好数据模式。在数据集定义中,需要重新定义__init__,__getitem__和__len__三个部分。示例如下:
import os import paddle import paddlehub as hub class DemoDataset(paddle.io.Dataset):
def __init__(self, transforms, num_classes=4, mode='train'): # 数据集存放位置 self.dataset_
dir = "./work/peach-classification" #dataset_dir为数据集实际路径,需要填写全路径 self.transforms = transforms
self.num_classes = num_classes
self.mode = mode if self.mode == 'train':
self.file = 'train_list.txt' elif self.mode == 'test':
self.file = 'test_list.txt' else:
self.file = 'validate_list.txt' self.file = os.path.join(self.dataset_dir ,
self.file) with open(self.file, 'r') as file:
self.data = file.read().split('\n')[:-1] def __getitem__(self, idx):
img_path, grt = self.data[idx].split(' ')
img_path = os.path.join(self.dataset_dir, img_path)
im = self.transforms(img_path) return im, int(grt) def __len__(self): return len(self.data)
将训练数据输入模型之前,我们通常还需要对原始数据做一些数据处理的工作,比如数据格式的规范化处理,或增加一些数据增强策略。
构建图像分类模型的数据读取器,负责将桃子dataset的数据进行预处理,以特定格式组织并输入给模型进行训练。
如下数据处理策略,只做了两种操作:
- 指定输入图片的尺寸,并将所有样本数据统一处理成该尺寸。
- 对所有输入图片数据进行归一化处理。
对数据预处理及加载数据集的示例如下:
import paddlehub.vision.transforms as T
transforms = T.Compose(
[T.Resize((256, 256)),
T.CenterCrop(224),
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])],
to_rgb=True)
peach_train = DemoDataset(transforms)
peach_validate = DemoDataset(transforms, mode='val')
PaddleHub提供了丰富的数据预处理方式,具体可以参见预处理。
#安装预训练模型 ! hub install resnet50_vd_imagenet_ssld==1.1.0
import paddlehub as hub
model = hub.Module(name='resnet50_vd_imagenet_ssld', label_list=["R0", "B1", "M2", "S3"])
from paddlehub.finetune.trainer import Trainer import paddle
optimizer = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())
trainer = Trainer(model, optimizer, checkpoint_dir='img_classification_ckpt', use_gpu=True)
trainer.train(peach_train, epochs=10, batch_size=16, eval_dataset=peach_validate, save_interval=1)
其中Adam:
- learning_rate: 全局学习率。默认为1e-3;
- parameters: 待优化模型参数。
运行配置
Trainer 主要控制Fine-tune的训练,包含以下可控制的参数:
- model: 被优化模型;
- optimizer: 优化器选择;
- use_gpu: 是否使用gpu;
- use_vdl: 是否使用vdl可视化训练过程;
- checkpoint_dir: 保存模型参数的地址;
- compare_metrics: 保存最优模型的衡量指标;
trainer.train 主要控制具体的训练过程,包含以下可控制的参数:
- train_dataset: 训练时所用的数据集;
- epochs: 训练轮数;
- batch_size: 训练的批大小,如果使用GPU,请根据实际情况调整batch_size;
- num_workers: works的数量,默认为0;
- eval_dataset: 验证集;
- log_interval: 打印日志的间隔, 单位为执行批训练的次数。
- save_interval: 保存模型的间隔频次,单位为执行训练的轮数。
当Fine-tune完成后,我们使用模型来进行预测,实现如下:
import paddle import paddlehub as hub
result = model.predict(['./work/peach-classification/test/M2/0.png'])
print(result)
以上为加载模型后实际预测结果(这里只测试了一张图片),返回的是预测的实际效果,可以看到我们传入待预测的是M2类别的桃子照片,经过Fine-tune之后的模型预测的效果也是M2,由此成功完成了桃子分类的迁移学习。