2016/04/16 機械学習 名古屋 第3回勉強会
後藤 俊介 ( @antimon2 )
- Rのように中身がぐちゃぐちゃでなく、
- Rubyのように遅くなく、
- Lispのように原始的またはエレファントでなく、
- Prologのように変態的なところはなく、
- Javaのように硬すぎることはなく、
- Haskellのように抽象的すぎない
ほどよい言語である
Julia の目指すもの:
(Why We Created Julia から抜粋・私訳)
Julia の Deep Learning 用パッケージの紹介。
(Julia コンソールから↓)
Pkg.add("Mocha")
using Mocha
Configuring Mocha... * CUDA disabled by default * Native Ext disabled by default Mocha configured, continue loading module...
WARNING: Method definition info(Any...) in module Base at util.jl:334 overwritten in module Logging at /Users/antimon2/.julia/v0.4/Logging/src/Logging.jl:61. WARNING: Method definition warn(Any...) in module Base at util.jl:364 overwritten in module Logging at /Users/antimon2/.julia/v0.4/Logging/src/Logging.jl:61.
DefaultBackend = Mocha.CPUBackend
println("train: ", open(readall, "train.txt"))
println("test: ", open(readall, "test.txt"))
train: MNIST_data/train.hdf5 test: MNIST_data/test.hdf5
data_layer = AsyncHDF5DataLayer(name="train-data", source="train.txt", batch_size=64, shuffle=true)
Mocha.AsyncHDF5DataLayer(train-data)
h1_layer = InnerProductLayer(name="h1", output_dim=128, neuron=Neurons.ReLU(), bottoms=[:data], tops=[:h1])
h2_layer = InnerProductLayer(name="h2", output_dim=64, neuron=Neurons.ReLU(), bottoms=[:h1], tops=[:h2])
output_layer = InnerProductLayer(name="y", output_dim=10, bottoms=[:h2], tops=[:y])
loss_layer = SoftmaxLossLayer(name="loss", bottoms=[:y,:label])
Mocha.SoftmaxLossLayer(loss)
backend = DefaultBackend()
init(backend)
common_layers = [h1_layer, h2_layer, output_layer]
net = Net("MNIST-train", backend, [data_layer, common_layers..., loss_layer])
16- 4 20:44:49:INFO:root:Constructing net MNIST-train on Mocha.CPUBackend... 16- 4 20:44:49:INFO:root:Topological sorting 5 layers... 16- 4 20:44:49:INFO:root:Setup layers... 16- 4 20:44:50:INFO:root:Network constructed!
************************************************************ NAME: MNIST-train BACKEND: Mocha.CPUBackend ARCHITECTURE: 5 layers ............................................................ *** Mocha.AsyncHDF5DataLayer(train-data) Outputs --------------------------- data: Blob(28 x 28 x 1 x 64) label: Blob(1 x 64) ............................................................ *** Mocha.InnerProductLayer(h1) Inputs ---------------------------- data: Blob(28 x 28 x 1 x 64) Outputs --------------------------- h1: Blob(128 x 64) ............................................................ *** Mocha.InnerProductLayer(h2) Inputs ---------------------------- h1: Blob(128 x 64) Outputs --------------------------- h2: Blob(64 x 64) ............................................................ *** Mocha.InnerProductLayer(y) Inputs ---------------------------- h2: Blob(64 x 64) Outputs --------------------------- y: Blob(10 x 64) ............................................................ *** Mocha.SoftmaxLossLayer(loss) Inputs ---------------------------- y: Blob(10 x 64) label: Blob(1 x 64) ************************************************************
method = SGD()
params = make_solver_parameters(method, max_iter=3000, regu_coef=0.0005,
mom_policy=MomPolicy.Fixed(0.9),
lr_policy=LRPolicy.Inv(0.01, 0.0001, 0.75))
solver = Solver(method, params)
Mocha.Solver{Mocha.SGD}(Mocha.SGD(),Dict{Symbol,Any}(:regu_coef=>0.0005,:load_from=>"",:lr_policy=>Mocha.LRPolicy.Inv(0.01,0.0001,0.75),:mom_policy=>Mocha.MomPolicy.Fixed(0.9),:max_iter=>3000),Mocha.CoffeeLounge("",1,:merge,Dict{AbstractString,Dict{Int64,AbstractFloat}}(),Mocha.CoffeeBreak[],false,13183162560,13183323408))
setup_coffee_lounge(solver, every_n_iter=1000)
:merge
# report training progress every 1000 iterations
add_coffee_break(solver, TrainingSummary(), every_n_iter=1000)
1-element Array{Mocha.CoffeeBreak,1}: Mocha.CoffeeBreak(Mocha.TrainingSummary(Any[:iter,:obj_val]),1000,0)
# show performance on test data every 1000 iterations
data_layer_test = HDF5DataLayer(name="test-data", source="test.txt", batch_size=100)
acc_layer = AccuracyLayer(name="test-accuracy", bottoms=[:y, :label])
test_net = Net("MNIST-test", backend, [data_layer_test, common_layers..., acc_layer])
add_coffee_break(solver, ValidationPerformance(test_net), every_n_iter=1000)
16- 4 20:45:11:INFO:root:Constructing net MNIST-test on Mocha.CPUBackend... 16- 4 20:45:11:INFO:root:Topological sorting 5 layers... 16- 4 20:45:11:INFO:root:Setup layers... 16- 4 20:45:11:DEBUG:root:InnerProductLayer(h1): sharing weights and bias 16- 4 20:45:11:DEBUG:root:InnerProductLayer(h2): sharing weights and bias 16- 4 20:45:11:DEBUG:root:InnerProductLayer(y): sharing weights and bias 16- 4 20:45:11:INFO:root:Network constructed!
2-element Array{Mocha.CoffeeBreak,1}: Mocha.CoffeeBreak(Mocha.TrainingSummary(Any[:iter,:obj_val]),1000,0) Mocha.CoffeeBreak(Mocha.ValidationPerformance(************************************************************ NAME: MNIST-test BACKEND: Mocha.CPUBackend ARCHITECTURE: 5 layers ............................................................ *** Mocha.HDF5DataLayer(test-data) Outputs --------------------------- data: Blob(28 x 28 x 1 x 100) label: Blob(1 x 100) ............................................................ *** Mocha.InnerProductLayer(h1) Inputs ---------------------------- data: Blob(28 x 28 x 1 x 100) Outputs --------------------------- h1: Blob(128 x 100) ............................................................ *** Mocha.InnerProductLayer(h2) Inputs ---------------------------- h1: Blob(128 x 100) Outputs --------------------------- h2: Blob(64 x 100) ............................................................ *** Mocha.InnerProductLayer(y) Inputs ---------------------------- h2: Blob(64 x 100) Outputs --------------------------- y: Blob(10 x 100) ............................................................ *** Mocha.AccuracyLayer(test-accuracy) Inputs ---------------------------- y: Blob(10 x 100) label: Blob(1 x 100) ************************************************************ ,Function[]),1000,0)
solve(solver, net)
16- 4 20:45:16:DEBUG:root:#DEBUG Checking network topology for back-propagation 16- 4 20:45:16:DEBUG:root:Init network MNIST-train 16- 4 20:45:16:DEBUG:root:Init parameter weight for layer h1 16- 4 20:45:16:DEBUG:root:Init parameter bias for layer h1 16- 4 20:45:16:DEBUG:root:Init parameter weight for layer h2 16- 4 20:45:16:DEBUG:root:Init parameter bias for layer h2 16- 4 20:45:16:DEBUG:root:Init parameter weight for layer y 16- 4 20:45:16:DEBUG:root:Init parameter bias for layer y 16- 4 20:45:18:DEBUG:root:#DEBUG Initializing coffee breaks 16- 4 20:45:18:DEBUG:root:Init network MNIST-test 16- 4 20:45:18:INFO:root: TRAIN iter=000000 obj_val=2.30202174 16- 4 20:45:18:INFO:root: 16- 4 20:45:18:INFO:root:## Performance on Validation Set after 0 iterations 16- 4 20:45:18:INFO:root:--------------------------------------------------------- 16- 4 20:45:18:INFO:root: Accuracy (avg over 10000) = 11.9800% 16- 4 20:45:18:INFO:root:--------------------------------------------------------- 16- 4 20:45:18:INFO:root: 16- 4 20:45:18:DEBUG:root:#DEBUG Entering solver loop 16- 4 20:45:20:INFO:root: TRAIN iter=001000 obj_val=0.11264999 16- 4 20:45:20:INFO:root: 16- 4 20:45:20:INFO:root:## Performance on Validation Set after 1000 iterations 16- 4 20:45:20:INFO:root:--------------------------------------------------------- 16- 4 20:45:20:INFO:root: Accuracy (avg over 10000) = 93.9900% 16- 4 20:45:20:INFO:root:--------------------------------------------------------- 16- 4 20:45:20:INFO:root: 16- 4 20:45:22:INFO:root: TRAIN iter=002000 obj_val=0.11005990 16- 4 20:45:22:INFO:root: 16- 4 20:45:22:INFO:root:## Performance on Validation Set after 2000 iterations 16- 4 20:45:22:INFO:root:--------------------------------------------------------- 16- 4 20:45:22:INFO:root: Accuracy (avg over 10000) = 95.6500% 16- 4 20:45:22:INFO:root:--------------------------------------------------------- 16- 4 20:45:22:INFO:root: 16- 4 20:45:23:INFO:root: TRAIN iter=003000 obj_val=0.09127413 16- 4 20:45:24:INFO:root: 16- 4 20:45:24:INFO:root:## Performance on Validation Set after 3000 iterations 16- 4 20:45:24:INFO:root:--------------------------------------------------------- 16- 4 20:45:24:INFO:root: Accuracy (avg over 10000) = 96.6300% 16- 4 20:45:24:INFO:root:--------------------------------------------------------- 16- 4 20:45:24:INFO:root:
3-element Array{Array{Void,1},1}: [nothing,nothing] [nothing,nothing] [nothing,nothing]
destroy(net)
destroy(test_net)
shutdown(backend)
16- 4 20:45:27:DEBUG:root:Destroying network MNIST-train 16- 4 20:45:27:INFO:root:AsyncHDF5DataLayer: Stopping IO task... 16- 4 20:45:27:INFO:root:AsyncHDF5DataLayer: IO Task reaching the end... 16- 4 20:45:27:DEBUG:root:Destroying network MNIST-test
Dict{AbstractString,Array{Mocha.AbstractParameter,1}} with 0 entries
(Julia コンソールから↓)
Pkg.add("MXNet")
using MXNet
# 3LP ネットワーク構築
mlp = @mx.chain mx.Variable(:data) =>
mx.FullyConnected(name=:fc1, num_hidden=128) =>
mx.Activation(name=:relu1, act_type=:relu) =>
mx.FullyConnected(name=:fc2, num_hidden=64) =>
mx.Activation(name=:relu2, act_type=:relu) =>
mx.FullyConnected(name=:fc3, num_hidden=10) =>
mx.SoftmaxOutput(name=:softmax)
MXNet.mx.SymbolicNode(MXNet.mx.MX_SymbolHandle(Ptr{Void} @0x00007ff1cd4f96b0))
# データ取得(データプロバイダ生成)
batch_size = 100
# include(Pkg.dir("MXNet", "examples", "mnist", "mnist-data.jl"))
# train_provider, eval_provider = get_mnist_providers(batch_size)
data_name = :data
label_name = :softmax_label
flat=true
train_provider = mx.MNISTProvider(image="MNIST_data/train-images-idx3-ubyte",
label="MNIST_data/train-labels-idx1-ubyte",
data_name=data_name, label_name=label_name,
batch_size=batch_size, shuffle=true, flat=flat, silent=true)
eval_provider = mx.MNISTProvider(image="MNIST_data/t10k-images-idx3-ubyte",
label="MNIST_data/t10k-labels-idx1-ubyte",
data_name=data_name, label_name=label_name,
batch_size=batch_size, shuffle=false, flat=flat, silent=true)
MXNet.mx.MXDataProvider(MXNet.mx.MX_DataIterHandle(Ptr{Void} @0x00007ff1ced44130),Tuple{Symbol,Tuple}[(:data,(784,100))],Tuple{Symbol,Tuple}[(:softmax_label,(100,))],100,true,true)
# モデル構築・最適化
# モデル setup
model = mx.FeedForward(mlp, context=mx.cpu())
# optimization algorithm
optimizer = mx.SGD(lr=0.1, momentum=0.9)
# fit parameters
mx.fit(model, optimizer, train_provider, n_epoch=4, eval_data=eval_provider)
16- 4 20:45:47:INFO:root:Start training on [CPU0] 16- 4 20:45:47:INFO:root:Initializing parameters... 16- 4 20:45:47:INFO:root:Creating KVStore... 16- 4 20:45:48:INFO:root:Start training... 16- 4 20:45:49:INFO:root:== Epoch 001 ========== 16- 4 20:45:49:INFO:root:## Training summary 16- 4 20:45:49:INFO:root: accuracy = 0.7548 16- 4 20:45:49:INFO:root: time = 1.2143 seconds 16- 4 20:45:49:INFO:root:## Validation summary 16- 4 20:45:49:INFO:root: accuracy = 0.9498 16- 4 20:45:50:INFO:root:== Epoch 002 ========== 16- 4 20:45:50:INFO:root:## Training summary 16- 4 20:45:50:INFO:root: accuracy = 0.9575 16- 4 20:45:50:INFO:root: time = 0.8548 seconds 16- 4 20:45:50:INFO:root:## Validation summary 16- 4 20:45:50:INFO:root: accuracy = 0.9678 16- 4 20:45:51:INFO:root:== Epoch 003 ========== 16- 4 20:45:51:INFO:root:## Training summary 16- 4 20:45:51:INFO:root: accuracy = 0.9700 16- 4 20:45:51:INFO:root: time = 0.8818 seconds 16- 4 20:45:51:INFO:root:## Validation summary 16- 4 20:45:51:INFO:root: accuracy = 0.9689 16- 4 20:45:52:INFO:root:== Epoch 004 ========== 16- 4 20:45:52:INFO:root:## Training summary 16- 4 20:45:52:INFO:root: accuracy = 0.9760 16- 4 20:45:52:INFO:root: time = 0.8602 seconds 16- 4 20:45:52:INFO:root:## Validation summary 16- 4 20:45:52:INFO:root: accuracy = 0.9694
# 予測
probs = mx.predict(model, eval_provider)
10x10000 Array{Float32,2}: 1.86025e-9 4.02211e-9 1.15507e-5 … 9.35833e-7 2.54459e-7 3.69469e-7 3.88762e-8 0.954931 2.50692e-8 3.24207e-11 1.16894e-6 0.999966 0.00123508 4.8395e-8 1.07991e-8 2.57024e-5 3.30698e-5 7.87967e-5 6.13717e-7 3.44376e-10 8.22568e-10 7.09632e-11 0.0044216 1.03419e-9 2.8778e-8 2.04646e-8 2.97188e-9 2.18991e-5 … 0.999847 2.60934e-6 5.3756e-11 1.45219e-9 2.17161e-5 4.03898e-5 0.999997 0.999967 5.11702e-7 0.0283429 2.44429e-8 2.21268e-13 7.42553e-8 1.45197e-7 0.00969576 0.000110407 6.22504e-9 5.96944e-6 1.04052e-13 0.00123922 1.27594e-7 2.23525e-11
# 予測精度確認
# collect all labels from eval data
labels = Array[]
for batch in eval_provider
push!(labels, copy(mx.get(eval_provider, batch, :softmax_label)))
end
labels = cat(1, labels...)
# Now we use compute the accuracy
correct = 0
for i = 1:length(labels)
# labels are 0...9
if indmax(probs[:,i]) == labels[i]+1
correct += 1
end
end
accuracy = 100correct/length(labels)
println(mx.format("Accuracy on eval set: {1:.2f}%", accuracy))
Accuracy on eval set: 96.94%
PyCall
を利用すれば、Python にインストールした機械学習パッケージ等も利用可能(記述に独特のクセあり)。TensorFlow
を利用してみる。# 使用したいパッケージをインストールした Python の環境を環境変数に設定(pyenv や virtualenv で環境を分けている場合)
ENV["PYTHON"] = "/path/to/user_home/.pyenv/versions/2.7.11/envs/TensorFlow/bin/python"
# PyCall 本体のインストール
Pkg.add("PyCall")
# インストール済なら、依存ファイルを削除した上で再構築↓
# rm(Pkg.dir("PyCall","deps","PYTHON"))
# Pkg.build("PyCall")
using PyCall
@pyimport tensorflow as tf
# ↑Python の import 文と同様の書き方ができる。
# データ取得
@pyimport tensorflow.examples.tutorials.mnist.input_data as input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=true)
Extracting MNIST_data/train-images-idx3-ubyte.gz Extracting MNIST_data/train-labels-idx1-ubyte.gz Extracting MNIST_data/t10k-images-idx3-ubyte.gz Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
PyObject <tensorflow.examples.tutorials.mnist.input_data.DataSets object at 0x357cd7150>
x = tf.placeholder(tf.float32, [nothing, 784])
d = tf.placeholder(tf.float32, [nothing, 10])
PyObject <tensorflow.python.framework.ops.Tensor object at 0x357d05510>
# 3LP 構築
W1 = tf.Variable(tf.random_normal(Int32[784, 128], mean=0.0, stddev=0.05))
b1 = tf.Variable(tf.zeros(Int32[128]))
W2 = tf.Variable(tf.random_normal(Int32[128, 64], mean=0.0, stddev=0.05))
b2 = tf.Variable(tf.zeros(Int32[64]))
W3 = tf.Variable(tf.random_normal(Int32[64, 10], mean=0.0, stddev=0.05))
b3 = tf.Variable(tf.zeros(Int32[10]))
h1 = tf.nn[:relu](tf.add(tf.matmul(x, W1), b1))
h2 = tf.nn[:relu](tf.add(tf.matmul(h1, W2), b2))
y = tf.nn[:softmax](tf.add(tf.matmul(h2, W3), b3))
PyObject <tensorflow.python.framework.ops.Tensor object at 0x31bf65050>
# cross_entropy = tf.neg(tf.reduce_sum(tf.mul(d, tf.log(y))))
cross_entropy = tf.neg(tf.reduce_sum(tf.mul(d, tf.log(tf.maximum(y, 1e-10)))))
PyObject <tensorflow.python.framework.ops.Tensor object at 0x31bf71690>
# optimizer = tf.train[:GradientDescentOptimizer](0.01)
optimizer = tf.train[:MomentumOptimizer](0.001, 0.9)
train_step = optimizer[:minimize](cross_entropy)
PyObject <tensorflow.python.framework.ops.Operation object at 0x35410c410>
tf_init = tf.initialize_all_variables()
PyObject <tensorflow.python.framework.ops.Operation object at 0x3540d53d0>
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(d,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
PyObject <tensorflow.python.framework.ops.Tensor object at 0x357e8e550>
sess = tf.Session()
sess[:run](tf_init)
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4 I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
for i in 1:2000
batch_xs, batch_ys = mnist[:train][:next_batch](100)
sess[:run](train_step, feed_dict=Dict(x => batch_xs, d => batch_ys))
if i % 500 == 0
train_accuracy = sess[:run](accuracy, feed_dict=Dict(x => batch_xs, d => batch_ys))
@printf(" step, accurary = %6d: %6.3f\n", i, train_accuracy)
end
end
println("accuracy:$(sess[:run](accuracy, feed_dict=Dict(x => mnist[:test][:images], d => mnist[:test][:labels])))")
step, accurary = 500: 0.940 step, accurary = 1000: 0.970 step, accurary = 1500: 0.950 step, accurary = 2000: 1.000 accuracy:0.9743000268936157
ご清聴ありがとうございます。