Julia の紹介2
Julia で Deep Learning

2016/04/16 機械学習 名古屋 第3回勉強会
後藤 俊介 ( @antimon2 )

自己紹介

  • 名前:後藤 俊介
  • 所属コミュニティ:機械学習名古屋(主催者の1人)、Python東海、Ruby東海
  • 言語:Julia, Python, Ruby, JavaScript, …
  • twitter: @antimon2
  • Facebook: antimon2
  • GitHub: antimon2

Julia とは?

  • The Julia Language
  • 2015/10/04 に v0.4.0 がリリース(2016/04/16 現在の最新は v0.4.5)
  • Python/Ruby/R 等の「いいとこどり」言語(詳細後述)
  • 動作が速い!(LLVM JIT コンパイル)

Julia の特長

  • Rのように中身がぐちゃぐちゃでなく、
  • Rubyのように遅くなく、
  • Lispのように原始的またはエレファントでなく、
  • Prologのように変態的なところはなく、
  • Javaのように硬すぎることはなく、
  • Haskellのように抽象的すぎない

ほどよい言語である

引用:http://www.slideshare.net/Nikoriks/julia-28059489

Julia の目指すもの:

  • C のように高速だけど、
    Ruby のような動的型付言語である
  • Lisp のように同じ文法で書けるマクロがあって、しかも
    Matlab のような直感的な数式表現もできる
  • Python のように総合的なプログラミングができて、
    R のように統計処理も得意で、
    Perl のように文字列処理もできて、
    Matlab のように線形代数もできて、
    shell のように複数のプログラムを組み合わせることもできる
  • 超初心者にも習得は簡単で、
    超上級者の満足にも応えられる
  • インタラクティブにも動作して、コンパイルもできる

Why We Created Julia から抜粋・私訳)

Deep Learning への適用

Julia の Deep Learning 用パッケージの紹介。

  • Mocha(C++ 用の Caffe にインスパイアされて生まれたフレームワーク。互換性・可搬性・速度がウリ)
  • MXNet(2015/10 に出たばかりの新しいフレームワーク。軽量・効率性・柔軟性がウリ)
  • PyCall(Julia から Python を呼び出すパッケージ。Python にインストール済の機械学習パッケージ(例:TensorFlow)を利用可能)

Mocha

  • すべて Julia で書かれた DeepLearning フレームワーク。
  • 考え方、記述方法等、色々 Caffe から引き継いでいる。
  • 扱えるデータ形式等、他のフレームワークとの互換性も持っている。

Mocha のインストール

(Julia コンソールから↓)

In [ ]:
Pkg.add("Mocha")

動作確認

In [1]:
using Mocha
Configuring Mocha...
 * CUDA       disabled by default
 * Native Ext disabled by default
Mocha configured, continue loading module...
WARNING: Method definition info(Any...) in module Base at util.jl:334 overwritten in module Logging at /Users/antimon2/.julia/v0.4/Logging/src/Logging.jl:61.
WARNING: Method definition warn(Any...) in module Base at util.jl:364 overwritten in module Logging at /Users/antimon2/.julia/v0.4/Logging/src/Logging.jl:61.
DefaultBackend = Mocha.CPUBackend
In [2]:
println("train: ", open(readall, "train.txt"))
println("test: ", open(readall, "test.txt"))
train: MNIST_data/train.hdf5

test: MNIST_data/test.hdf5

In [3]:
data_layer  = AsyncHDF5DataLayer(name="train-data", source="train.txt", batch_size=64, shuffle=true)
Out[3]:
Mocha.AsyncHDF5DataLayer(train-data)
In [4]:
h1_layer = InnerProductLayer(name="h1", output_dim=128, neuron=Neurons.ReLU(), bottoms=[:data], tops=[:h1])
h2_layer = InnerProductLayer(name="h2", output_dim=64, neuron=Neurons.ReLU(), bottoms=[:h1], tops=[:h2])
output_layer = InnerProductLayer(name="y", output_dim=10, bottoms=[:h2], tops=[:y])
loss_layer = SoftmaxLossLayer(name="loss", bottoms=[:y,:label])
Out[4]:
Mocha.SoftmaxLossLayer(loss)
In [5]:
backend = DefaultBackend()
init(backend)
In [6]:
common_layers = [h1_layer, h2_layer, output_layer]
net = Net("MNIST-train", backend, [data_layer, common_layers..., loss_layer])
16- 4 20:44:49:INFO:root:Constructing net MNIST-train on Mocha.CPUBackend...
16- 4 20:44:49:INFO:root:Topological sorting 5 layers...
16- 4 20:44:49:INFO:root:Setup layers...
16- 4 20:44:50:INFO:root:Network constructed!
Out[6]:
************************************************************
          NAME: MNIST-train
       BACKEND: Mocha.CPUBackend
  ARCHITECTURE: 5 layers
............................................................
 *** Mocha.AsyncHDF5DataLayer(train-data)
    Outputs ---------------------------
          data: Blob(28 x 28 x 1 x 64)
         label: Blob(1 x 64)
............................................................
 *** Mocha.InnerProductLayer(h1)
    Inputs ----------------------------
          data: Blob(28 x 28 x 1 x 64)
    Outputs ---------------------------
            h1: Blob(128 x 64)
............................................................
 *** Mocha.InnerProductLayer(h2)
    Inputs ----------------------------
            h1: Blob(128 x 64)
    Outputs ---------------------------
            h2: Blob(64 x 64)
............................................................
 *** Mocha.InnerProductLayer(y)
    Inputs ----------------------------
            h2: Blob(64 x 64)
    Outputs ---------------------------
             y: Blob(10 x 64)
............................................................
 *** Mocha.SoftmaxLossLayer(loss)
    Inputs ----------------------------
             y: Blob(10 x 64)
         label: Blob(1 x 64)
************************************************************
In [7]:
method = SGD()
params = make_solver_parameters(method, max_iter=3000, regu_coef=0.0005,
                                mom_policy=MomPolicy.Fixed(0.9),
                                lr_policy=LRPolicy.Inv(0.01, 0.0001, 0.75))
solver = Solver(method, params)
Out[7]:
Mocha.Solver{Mocha.SGD}(Mocha.SGD(),Dict{Symbol,Any}(:regu_coef=>0.0005,:load_from=>"",:lr_policy=>Mocha.LRPolicy.Inv(0.01,0.0001,0.75),:mom_policy=>Mocha.MomPolicy.Fixed(0.9),:max_iter=>3000),Mocha.CoffeeLounge("",1,:merge,Dict{AbstractString,Dict{Int64,AbstractFloat}}(),Mocha.CoffeeBreak[],false,13183162560,13183323408))
In [8]:
setup_coffee_lounge(solver, every_n_iter=1000)
Out[8]:
:merge
In [9]:
# report training progress every 1000 iterations
add_coffee_break(solver, TrainingSummary(), every_n_iter=1000)
Out[9]:
1-element Array{Mocha.CoffeeBreak,1}:
 Mocha.CoffeeBreak(Mocha.TrainingSummary(Any[:iter,:obj_val]),1000,0)
In [10]:
# show performance on test data every 1000 iterations
data_layer_test = HDF5DataLayer(name="test-data", source="test.txt", batch_size=100)
acc_layer = AccuracyLayer(name="test-accuracy", bottoms=[:y, :label])
test_net = Net("MNIST-test", backend, [data_layer_test, common_layers..., acc_layer])
add_coffee_break(solver, ValidationPerformance(test_net), every_n_iter=1000)
16- 4 20:45:11:INFO:root:Constructing net MNIST-test on Mocha.CPUBackend...
16- 4 20:45:11:INFO:root:Topological sorting 5 layers...
16- 4 20:45:11:INFO:root:Setup layers...
16- 4 20:45:11:DEBUG:root:InnerProductLayer(h1): sharing weights and bias
16- 4 20:45:11:DEBUG:root:InnerProductLayer(h2): sharing weights and bias
16- 4 20:45:11:DEBUG:root:InnerProductLayer(y): sharing weights and bias
16- 4 20:45:11:INFO:root:Network constructed!
Out[10]:
2-element Array{Mocha.CoffeeBreak,1}:
 Mocha.CoffeeBreak(Mocha.TrainingSummary(Any[:iter,:obj_val]),1000,0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
 Mocha.CoffeeBreak(Mocha.ValidationPerformance(************************************************************
          NAME: MNIST-test
       BACKEND: Mocha.CPUBackend
  ARCHITECTURE: 5 layers
............................................................
 *** Mocha.HDF5DataLayer(test-data)
    Outputs ---------------------------
          data: Blob(28 x 28 x 1 x 100)
         label: Blob(1 x 100)
............................................................
 *** Mocha.InnerProductLayer(h1)
    Inputs ----------------------------
          data: Blob(28 x 28 x 1 x 100)
    Outputs ---------------------------
            h1: Blob(128 x 100)
............................................................
 *** Mocha.InnerProductLayer(h2)
    Inputs ----------------------------
            h1: Blob(128 x 100)
    Outputs ---------------------------
            h2: Blob(64 x 100)
............................................................
 *** Mocha.InnerProductLayer(y)
    Inputs ----------------------------
            h2: Blob(64 x 100)
    Outputs ---------------------------
             y: Blob(10 x 100)
............................................................
 *** Mocha.AccuracyLayer(test-accuracy)
    Inputs ----------------------------
             y: Blob(10 x 100)
         label: Blob(1 x 100)
************************************************************
,Function[]),1000,0)
In [11]:
solve(solver, net)
16- 4 20:45:16:DEBUG:root:#DEBUG Checking network topology for back-propagation
16- 4 20:45:16:DEBUG:root:Init network MNIST-train
16- 4 20:45:16:DEBUG:root:Init parameter weight for layer h1
16- 4 20:45:16:DEBUG:root:Init parameter bias for layer h1
16- 4 20:45:16:DEBUG:root:Init parameter weight for layer h2
16- 4 20:45:16:DEBUG:root:Init parameter bias for layer h2
16- 4 20:45:16:DEBUG:root:Init parameter weight for layer y
16- 4 20:45:16:DEBUG:root:Init parameter bias for layer y
16- 4 20:45:18:DEBUG:root:#DEBUG Initializing coffee breaks
16- 4 20:45:18:DEBUG:root:Init network MNIST-test
16- 4 20:45:18:INFO:root: TRAIN iter=000000 obj_val=2.30202174
16- 4 20:45:18:INFO:root:
16- 4 20:45:18:INFO:root:## Performance on Validation Set after 0 iterations
16- 4 20:45:18:INFO:root:---------------------------------------------------------
16- 4 20:45:18:INFO:root:  Accuracy (avg over 10000) = 11.9800%
16- 4 20:45:18:INFO:root:---------------------------------------------------------
16- 4 20:45:18:INFO:root:
16- 4 20:45:18:DEBUG:root:#DEBUG Entering solver loop
16- 4 20:45:20:INFO:root: TRAIN iter=001000 obj_val=0.11264999
16- 4 20:45:20:INFO:root:
16- 4 20:45:20:INFO:root:## Performance on Validation Set after 1000 iterations
16- 4 20:45:20:INFO:root:---------------------------------------------------------
16- 4 20:45:20:INFO:root:  Accuracy (avg over 10000) = 93.9900%
16- 4 20:45:20:INFO:root:---------------------------------------------------------
16- 4 20:45:20:INFO:root:
16- 4 20:45:22:INFO:root: TRAIN iter=002000 obj_val=0.11005990
16- 4 20:45:22:INFO:root:
16- 4 20:45:22:INFO:root:## Performance on Validation Set after 2000 iterations
16- 4 20:45:22:INFO:root:---------------------------------------------------------
16- 4 20:45:22:INFO:root:  Accuracy (avg over 10000) = 95.6500%
16- 4 20:45:22:INFO:root:---------------------------------------------------------
16- 4 20:45:22:INFO:root:
16- 4 20:45:23:INFO:root: TRAIN iter=003000 obj_val=0.09127413
16- 4 20:45:24:INFO:root:
16- 4 20:45:24:INFO:root:## Performance on Validation Set after 3000 iterations
16- 4 20:45:24:INFO:root:---------------------------------------------------------
16- 4 20:45:24:INFO:root:  Accuracy (avg over 10000) = 96.6300%
16- 4 20:45:24:INFO:root:---------------------------------------------------------
16- 4 20:45:24:INFO:root:
Out[11]:
3-element Array{Array{Void,1},1}:
 [nothing,nothing]
 [nothing,nothing]
 [nothing,nothing]
In [12]:
destroy(net)
destroy(test_net)
shutdown(backend)
16- 4 20:45:27:DEBUG:root:Destroying network MNIST-train
16- 4 20:45:27:INFO:root:AsyncHDF5DataLayer: Stopping IO task...
16- 4 20:45:27:INFO:root:AsyncHDF5DataLayer: IO Task reaching the end...
16- 4 20:45:27:DEBUG:root:Destroying network MNIST-test
Out[12]:
Dict{AbstractString,Array{Mocha.AbstractParameter,1}} with 0 entries

MXNet

  • Julia, Python, R, Go, JavaScript などに対応した DeepLearning フレームワーク。
  • 記述の簡潔さと、(それに伴う)「効率」と「柔軟性」の両立。
  • 処理のコア部分は C(C++) で記述されている(それにより軽量性と多言語対応を実現している)。

MXNet のインストール

(Julia コンソールから↓)

In [ ]:
Pkg.add("MXNet")

動作確認

In [13]:
using MXNet
In [14]:
# 3LP ネットワーク構築
mlp = @mx.chain mx.Variable(:data)             =>
  mx.FullyConnected(name=:fc1, num_hidden=128) =>
  mx.Activation(name=:relu1, act_type=:relu)   =>
  mx.FullyConnected(name=:fc2, num_hidden=64)  =>
  mx.Activation(name=:relu2, act_type=:relu)   =>
  mx.FullyConnected(name=:fc3, num_hidden=10)  =>
  mx.SoftmaxOutput(name=:softmax)
Out[14]:
MXNet.mx.SymbolicNode(MXNet.mx.MX_SymbolHandle(Ptr{Void} @0x00007ff1cd4f96b0))
In [15]:
# データ取得(データプロバイダ生成)
batch_size = 100
# include(Pkg.dir("MXNet", "examples", "mnist", "mnist-data.jl"))
# train_provider, eval_provider = get_mnist_providers(batch_size)
data_name = :data
label_name = :softmax_label
flat=true
train_provider = mx.MNISTProvider(image="MNIST_data/train-images-idx3-ubyte",
                                  label="MNIST_data/train-labels-idx1-ubyte",
                                  data_name=data_name, label_name=label_name,
                                  batch_size=batch_size, shuffle=true, flat=flat, silent=true)
eval_provider = mx.MNISTProvider(image="MNIST_data/t10k-images-idx3-ubyte",
                                 label="MNIST_data/t10k-labels-idx1-ubyte",
                                 data_name=data_name, label_name=label_name,
                                 batch_size=batch_size, shuffle=false, flat=flat, silent=true)
Out[15]:
MXNet.mx.MXDataProvider(MXNet.mx.MX_DataIterHandle(Ptr{Void} @0x00007ff1ced44130),Tuple{Symbol,Tuple}[(:data,(784,100))],Tuple{Symbol,Tuple}[(:softmax_label,(100,))],100,true,true)
In [16]:
# モデル構築・最適化

# モデル setup
model = mx.FeedForward(mlp, context=mx.cpu())

# optimization algorithm
optimizer = mx.SGD(lr=0.1, momentum=0.9)

# fit parameters
mx.fit(model, optimizer, train_provider, n_epoch=4, eval_data=eval_provider)
16- 4 20:45:47:INFO:root:Start training on [CPU0]
16- 4 20:45:47:INFO:root:Initializing parameters...
16- 4 20:45:47:INFO:root:Creating KVStore...
16- 4 20:45:48:INFO:root:Start training...
16- 4 20:45:49:INFO:root:== Epoch 001 ==========
16- 4 20:45:49:INFO:root:## Training summary
16- 4 20:45:49:INFO:root:          accuracy = 0.7548
16- 4 20:45:49:INFO:root:              time = 1.2143 seconds
16- 4 20:45:49:INFO:root:## Validation summary
16- 4 20:45:49:INFO:root:          accuracy = 0.9498
16- 4 20:45:50:INFO:root:== Epoch 002 ==========
16- 4 20:45:50:INFO:root:## Training summary
16- 4 20:45:50:INFO:root:          accuracy = 0.9575
16- 4 20:45:50:INFO:root:              time = 0.8548 seconds
16- 4 20:45:50:INFO:root:## Validation summary
16- 4 20:45:50:INFO:root:          accuracy = 0.9678
16- 4 20:45:51:INFO:root:== Epoch 003 ==========
16- 4 20:45:51:INFO:root:## Training summary
16- 4 20:45:51:INFO:root:          accuracy = 0.9700
16- 4 20:45:51:INFO:root:              time = 0.8818 seconds
16- 4 20:45:51:INFO:root:## Validation summary
16- 4 20:45:51:INFO:root:          accuracy = 0.9689
16- 4 20:45:52:INFO:root:== Epoch 004 ==========
16- 4 20:45:52:INFO:root:## Training summary
16- 4 20:45:52:INFO:root:          accuracy = 0.9760
16- 4 20:45:52:INFO:root:              time = 0.8602 seconds
16- 4 20:45:52:INFO:root:## Validation summary
16- 4 20:45:52:INFO:root:          accuracy = 0.9694
In [17]:
# 予測
probs = mx.predict(model, eval_provider)
Out[17]:
10x10000 Array{Float32,2}:
 1.86025e-9   4.02211e-9   1.15507e-5  …  9.35833e-7   2.54459e-7 
 3.69469e-7   3.88762e-8   0.954931       2.50692e-8   3.24207e-11
 1.16894e-6   0.999966     0.00123508     4.8395e-8    1.07991e-8 
 2.57024e-5   3.30698e-5   7.87967e-5     6.13717e-7   3.44376e-10
 8.22568e-10  7.09632e-11  0.0044216      1.03419e-9   2.8778e-8  
 2.04646e-8   2.97188e-9   2.18991e-5  …  0.999847     2.60934e-6 
 5.3756e-11   1.45219e-9   2.17161e-5     4.03898e-5   0.999997   
 0.999967     5.11702e-7   0.0283429      2.44429e-8   2.21268e-13
 7.42553e-8   1.45197e-7   0.00969576     0.000110407  6.22504e-9 
 5.96944e-6   1.04052e-13  0.00123922     1.27594e-7   2.23525e-11
In [18]:
# 予測精度確認

# collect all labels from eval data
labels = Array[]
for batch in eval_provider
    push!(labels, copy(mx.get(eval_provider, batch, :softmax_label)))
end
labels = cat(1, labels...)

# Now we use compute the accuracy
correct = 0
for i = 1:length(labels)
    # labels are 0...9
    if indmax(probs[:,i]) == labels[i]+1
        correct += 1
    end
end
accuracy = 100correct/length(labels)
println(mx.format("Accuracy on eval set: {1:.2f}%", accuracy))
Accuracy on eval set: 96.94%

PyCall + TensorFlow

  • PyCall を利用すれば、Python にインストールした機械学習パッケージ等も利用可能(記述に独特のクセあり)。
  • 例として、TensorFlow を利用してみる。

インストールと準備

In [ ]:
# 使用したいパッケージをインストールした Python の環境を環境変数に設定(pyenv や virtualenv で環境を分けている場合)
ENV["PYTHON"] = "/path/to/user_home/.pyenv/versions/2.7.11/envs/TensorFlow/bin/python"

# PyCall 本体のインストール
Pkg.add("PyCall")

# インストール済なら、依存ファイルを削除した上で再構築↓
# rm(Pkg.dir("PyCall","deps","PYTHON"))
# Pkg.build("PyCall")

動作確認

In [19]:
using PyCall
In [20]:
@pyimport tensorflow as tf
# ↑Python の import 文と同様の書き方ができる。
In [21]:
# データ取得
@pyimport tensorflow.examples.tutorials.mnist.input_data as input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=true)
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Out[21]:
PyObject <tensorflow.examples.tutorials.mnist.input_data.DataSets object at 0x357cd7150>
In [22]:
x = tf.placeholder(tf.float32, [nothing, 784])
d = tf.placeholder(tf.float32, [nothing, 10])
Out[22]:
PyObject <tensorflow.python.framework.ops.Tensor object at 0x357d05510>
In [23]:
# 3LP 構築
W1 = tf.Variable(tf.random_normal(Int32[784, 128], mean=0.0, stddev=0.05))
b1 = tf.Variable(tf.zeros(Int32[128]))
W2 = tf.Variable(tf.random_normal(Int32[128, 64], mean=0.0, stddev=0.05))
b2 = tf.Variable(tf.zeros(Int32[64]))
W3 = tf.Variable(tf.random_normal(Int32[64, 10], mean=0.0, stddev=0.05))
b3 = tf.Variable(tf.zeros(Int32[10]))

h1 = tf.nn[:relu](tf.add(tf.matmul(x,  W1), b1))
h2 = tf.nn[:relu](tf.add(tf.matmul(h1, W2), b2))
y  = tf.nn[:softmax](tf.add(tf.matmul(h2, W3), b3))
Out[23]:
PyObject <tensorflow.python.framework.ops.Tensor object at 0x31bf65050>
In [24]:
# cross_entropy = tf.neg(tf.reduce_sum(tf.mul(d, tf.log(y))))
cross_entropy = tf.neg(tf.reduce_sum(tf.mul(d, tf.log(tf.maximum(y, 1e-10)))))
Out[24]:
PyObject <tensorflow.python.framework.ops.Tensor object at 0x31bf71690>
In [25]:
# optimizer = tf.train[:GradientDescentOptimizer](0.01)
optimizer = tf.train[:MomentumOptimizer](0.001, 0.9)
train_step = optimizer[:minimize](cross_entropy)
Out[25]:
PyObject <tensorflow.python.framework.ops.Operation object at 0x35410c410>
In [26]:
tf_init = tf.initialize_all_variables()
Out[26]:
PyObject <tensorflow.python.framework.ops.Operation object at 0x3540d53d0>
In [27]:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(d,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Out[27]:
PyObject <tensorflow.python.framework.ops.Tensor object at 0x357e8e550>
In [28]:
sess = tf.Session()

sess[:run](tf_init)
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
In [29]:
for i in 1:2000
    batch_xs, batch_ys = mnist[:train][:next_batch](100)
    sess[:run](train_step, feed_dict=Dict(x => batch_xs, d => batch_ys))
    if i % 500 == 0
        train_accuracy = sess[:run](accuracy, feed_dict=Dict(x => batch_xs, d => batch_ys))
        @printf("  step, accurary = %6d: %6.3f\n", i, train_accuracy)
    end
end

println("accuracy:$(sess[:run](accuracy, feed_dict=Dict(x => mnist[:test][:images], d => mnist[:test][:labels])))")
  step, accurary =    500:  0.940
  step, accurary =   1000:  0.970
  step, accurary =   1500:  0.950
  step, accurary =   2000:  1.000
accuracy:0.9743000268936157

参考

ご清聴ありがとうございます。