C#で自作のディープラーニングフレームワークを作るその3(Softmax層の実装)

Softmax層

Softmax関数を計算する層です。出力層でよく用いられる活性化関数の一種ですが、少し特殊なため前回は実装しませんでした。
今まで実装した活性化関数(sigmoid, tanh, ReLU)は、変数を1つ受け取って1つの値を出力する関数でした。対して、Softmax関数は変数を $n$ 個受け取って、 $n$ 個の値を出力する関数です。Softmax関数への入力を $x_1, x_2, ... , x_n$ 、出力を $y_1, y_2, ... , y_n$ としたとき、 $k$ 番目の出力 $y_k$ は以下のようになります。

$y_k = \frac{\exp(x_k)}{\sum_{i} \exp(x_i)}$

上の式から、Softmax関数の出力の和 $\sum_{i}y_i$ は1となるため、確率分布を表現する際によく用いられます。Softmax関数は多変数関数なので偏微分をします。 $y_k$ を $x_i$ で偏微分すると次式のようになります。

$\frac{\partial y_k}{\partial x_i}= \begin{cases} y_k(1 - y_k)& (k = i)\\ -y_k y_i & (k \neq i) \end{cases}$

$k=i$ の時、シグモイド関数の微分と全く同じ見た目になりますが、これはSoftmax関数がシグモイド関数を多変数に拡張したものだからです。実際、Sotmax関数の2変数バージョンは、式変形をするとシグモイド関数に一致します。

Softmax層の逆伝播

Softmax層は、逆伝播の際に入力 $x_1, x_2, ... , x_n$ それぞれに勾配を伝えます。ここでは、入力 $x_k$ に関する勾配 $\delta_{x_k}$ を求めてみます。
Softmax層の出力側から逆伝播してきた勾配を $\delta_{y_1}, \delta_{y_2}, ... , \delta_{y_n}$ とします。このとき、 $\delta_{x_k}$ の値は次式のように表せます。

$\delta_{x_k} = \sum_{i} \frac{\partial y_i}{\partial x_k} \delta_{y_i}$

これをSoftmax関数の微分を用いて式変形していきます。

$\sum_{i} \frac{\partial y_i}{\partial x_k} \delta_{y_i} = y_k\left(-\sum_{i \neq k} y_i \delta_{y_i} + (1 - y_k)\delta_{y_k}\right) = y_k\left(\delta_{y_k} - \sum_{i}y_i\delta_{y_i}\right)$

よって、 $\delta_{x_k} = y_k\left(\delta_{y_k} - \sum_{i}y_i\delta_{y_i}\right)$ と求まります。

実装

それではSoftmax層を実装していきます。バッチ処理に対応するため、Softmax層は行列を入力にとり、入力行列の各列に対してSoftmax関数を適用していきます。

using MathNet.Numerics.LinearAlgebra.Single;

namespace NeuralNET.Layers.Activation
{
    /// <summary>
    /// ソフトマックス関数
    /// </summary>
    public class SoftmaxLayer : IActivationLayer
    {
        DenseMatrix? output;
        readonly bool SAVE_OUTPUT_REF;

        public SoftmaxLayer() : this(false) { }

        public SoftmaxLayer(bool saveOutputRef) => this.SAVE_OUTPUT_REF = saveOutputRef;

        public DenseMatrix Forward(DenseMatrix x, DenseMatrix y)
        {
            x.ColumnSoftmax(y);
            SaveOutput(y);
            return y;
        }

        public DenseMatrix Forward(DenseMatrix x)
        {
            var y = DenseMatrix.Create(x.RowCount, x.ColumnCount, 0.0f);
            Forward(x, y);
            SaveOutput(y);
            return y;
        }

        public DenseMatrix Backward(DenseMatrix dOutput, DenseMatrix res)
        {
            if (this.output is null)
                throw new InvalidOperationException("Backward method must be called after forward.");

            this.output.PointwiseMultiply(dOutput, res);
            var colSums = (DenseVector)res.ColumnSums();
            dOutput.SubtractRowVector(colSums, res);
            res.PointwiseMultiply(this.output, res);
            return res;
        }

        public DenseMatrix Backward(DenseMatrix dOutput)
        {
            if (this.output is null)
                throw new InvalidOperationException("Backward method must be called after forward.");

            var res = DenseMatrix.Create(dOutput.RowCount, dOutput.ColumnCount, 0.0f);
            return Backward(dOutput, res);
        }

        void SaveOutput(DenseMatrix output)
        {
            if (this.SAVE_OUTPUT_REF)
            {
                this.output = output;
                return;
            }

            this.output = output.CopyToOrClone(this.output);
        }
    }
}

上のコードでは、いくつかオリジナルの関数を用意しています。まず、DenseMatrix.ColumnSoftmaxメソッドは、DenseMatrixの各列に対してSoftmax関数を適用した行列を返します。
そして、DenseMatrix.SubtractRowVectorメソッドでは、行列と行ベクトルとの引き算を行います。この引き算では、行列の各行から行ベクトルが引かれます。NumPyでいうブロードキャストです。
全てのコードを載せると長すぎるので、詳しくはリポジトリを参照してください。

次回

次回は損失関数を実装していきます。

今回のコミットは以下です。
(2024/04/08 追記) Softmax関数のオーバーフロー対策を追加で実装しました。

github.com