Blog

  • Why Predicting the Next Day Price of a Stock Using the Stock’s Price History Is Impossible, with a DNN.

    It can always be a fascinating dream to predict the next day price of a stock by looking at the stock’s price curve. However, by thinking it through, I doubt the meaningfulness of do so, even with a very powerful technology, deep neural network.

    A stock’s price, as time goes, rises or falls, from minute to minute. To view it abstractly, one can think of it can a continuous curve. A prediction task usually shifts a time window from past time toward current time. In each shift of the window, the oldest value is removed, and a newest value is added. In this process, one problem happens, that is the subsequence of any window is very similar to the subsequence of the next window. And next day’s price can either rise or fall, seemingly randomly. So, the input dataset is like assigning different labels to the same input data. Hence, the best that any model can do it to make a random guess on whether the next day’s price will rise or fall.

    Some people have made some videos on predicting stock price using LSTM, or other models. As long as they use the stock price history to predict price of the same stock, it will not work, just because the inputs do not distinguish labels.

    However, does it mean that stock prices cannot be predicted? Asserting this is still too early. There can be some relationships among various stocks, so there is still some possibility that one stock’s price can be predicted by other stocks.

  • Understand Relu as a Piecewise Function

    Relu is an activation function defined as:$$
    Relu(x)=\begin{cases}
    x, & x > 0 \\
    0, & otherwise
    \end{cases}
    $$. This function has an interesting property that it can act like an open/close switch. To see why, suppose $\vec{x}$ is a vector, its entries are $x_i$’s. When a Relu layer is applied to $\vec{x}$, it will be an entry-wise check. Only those $x_i$’s that are greater than $0$ are passed, just as if electricity passes through a closed switch. All non-positive values are zeroed out, just as if electricity is blocked by an open switch.

    When a neural network uses Relu layers inside it, it can be thought as each of the Relu layers is selecting some outputs of its preceding layer, therefore behaves as a feature selection. The training process of the neural network is then training a feature selection model. When the model has multiple layers, Relu is in fact selecting computed features.

    So, one can try to design a feature selecting process using Relu. Like $$\text{Input} \rightarrow \text{D} \rightarrow \text{Relu} \rightarrow \text{More DNN}$$. Here, the $\text{D}$ is a layer that is equivalent to multiplying a diagonal matrix. After several epochs of training, all entries in the input that correspond to non-positive entries in $\text{D}$ can be discarded.

  • Understand Why A Sole Linear Layer Does Not Classify Well

    In deep neural network, the most frequently used component may be the linear layer. However, the linear layer itself does not work well as a classifier. In this article, I intend to explain why from my own point of view.

    A linear layer is a linear function. Just as in the following form: $$F(A) = A (\vec{x})$$, where $A$ is a matrix, $\vec{x}$ is the input vector. The training process of a DNN is to find an optimal $A$ to maximize an objective function. In this process, $\frac{\partial{F}}{\partial{A}}$ is computed, and it is used to update $A$. To view $\frac{\partial{F}}{\partial{A}}$, we can rewrite $F$ as $$
    F'(A)=M vec(A)
    $$, where $vec(A)$ is the vectorized $A$, and $M$ is a corresponding matrix which contains suitably distributed entries from $\vec{x}$ so that $$
    F(A) = F'(vec(A)) = M vec(A)
    $$.
    Now, it is clear that $$
    \frac{\partial{F}}{\partial{A}} = \frac{\partial{F’}}{\partial{vec(A)}} = M
    $$. When an optimizer uses a multiple of $M$ to update $A$ as in a gradient descending algorithm, $A \leftarrow \vec{\Lambda} M$, where $\vec{\Lambda}$ is some vector that summarizes backpropagation of gradients from previous layers. Hence, it is clear that $A$ eventually only changes along some vector in the column space of $M$. When the optimal $A$ is not in the column space of $M$, no matter how $A$ is updated, the optimal point will never be reached.

    The above findings can help us improve design of even simple networks. For example, assume a layer $G(x) = A \vec{x}$, where $A$ is a $2 \times m$ matrix. When $\vec{x}$s are almost colinear, $A$ will only move along the $\vec{x}$s. Hence, it is quite difficult to achieve an effect like switching two entries of $A$.

    To solve the problem, the effective dimension of $\vec{x}$s must be increased, either directly by adding more various samples, or indirectly by expanding the model to multiple layers. For example, define $$
    F(\vec{x}) = A v( B \vec{x} )
    $$, where $v$ is some activation function. This is in fact two linear layers. $B \vec{x}$ first expands $\vec{x}$ into more various values, then $v$ introduces more linear-independency, before right-multiplying with $A$. With this result, Relu is clearly a good choice as an activation function, because it behaves far from a linear function, therefore, produces higher dimensions. Then, the high dimension makes columns of A combine with more variations.

    On the contrary, if $v$ is a sigmoid function, it changes smoothly everywhere, and behaves closely to a linear function at the origin. Hence, its ability of adding dimension is low, as compared to Relu.

    That is all! ……Wait! Not all!

    There is still a problem. The nonlinearity of Relu only occurs at the 0 point. But if the input to Relu is far from 0, then Relu will be virtually linear, hence lose the merits of Relu. This is where a good initialization needed. A good initialization will cover both + and – side of a Relu input for every input entry if the layer is large enough, hence will give enough dimensions for the column space of $B$. However, if $B$ is too large, it will be wasteful. A natural question is at least how large a layer must be. Support the input $\vec{x}$ has $p$ entries. Then to give chance to each of the entries to be passed and blocked by the Relu function, there must be at least two entries of $B$. So, there must be $2p$ entries in each row of B. Still, this does not guarantee that each $x$ is both passed and blocked in every row of $B$, because an initialization process is very likely a random process, it is only probable that the initialization goes perfectly. Hence, adding more than $2p$ entries can also be practical. Since then, $A$ can give enough combinations that can cover the whole space of $F$.

    In addition, to avoid overfitting, dropout has been the de facto method. If the dropout ratio is $d$, then $B$ must have at least $2p/d$ entries.

    Conclusion: A least practical DNN need have the form $F(\vec{x}) = A \circ B \circ \vec{x}$, $B$ should be at least double number of columns of $\vec{x}$.

  • バイブコーディングのブームがそろそろ過ぎ去ろうとしているのではないか

    動画サイトでは、まだまだバイブコーディングの話題で盛り上げています。しかし、すでにいくつかの楽観視できない結果でデータとして出ています。最近は下記の結果が目に飛び込んできました。

    The AI Productivity Boom that Wasn’t | L&DeepDive

    要約しますと、ベテランプログラマーであれば、生成AIを使って、20%早くなったという錯覚が生じて、実際は20%遅くなったという結果になった。

    これでは、バイブコーディングのきれいな泡はほぼ弾けてしまいました。

    バイブコーディングでは、もともとテクにある負債を神速に積み上げることが予見されています。今回の結果を加えて、生産性を落としていることも再度確認されました。

    もちろん、生成AIは一つの発展方向ですが、実際のビジネス環境というすごく複雑な文脈(コンテキスト)をどうやって一つの生成AIに低コストに詰め込むことは相当難しいでしょう。

    Claude Code社では、80%-90%のコードはAIに書かせていると宣言したことですが、あれば、自社製品作成の一環としてもみなすことができ、通常のプロジェクト開発と比べて、コスト上昇に対する容認度はかなり高いでしょう。

    バイブコーディングはプロダクションコードの主力になるまでまだまだ時間がかかりそうですね。

  • I made a browser extension that helps generate low-code test cases.

    Are you a pure programming lover? Or you are someone who dares not to challenge programming?

    I’d prefer to choose the latter as testers. The reason is that the people who are not programmers tend to focus more on how users would look at the system. The problem is then how to let them make automated tests.

    Letting testers do even low code tests could be a bit overwhelming. The only real solution seems to record and replay. But most of the tools just record to Python or JS codes. And I do not want source codes presented in faces of testers.

    How to solve this dilemma?

    My answer is a self-made browser plugin! I call it the CommandRecorder.
    The CommandRecorder simply records user actions and replay them in order. The codes that are generated are in Japanese! Accompanied is a test runner which is hidden behind a GUI program. The testers just need to select which case file and which sheet to run.

    The test case books are in Excel file format. An Excel file has some good features that a plain text file cannot provide.
    E.g.,
    – Excel sheets can use powerful formulae.
    – Cells are well aligned.
    – To repeat a pattern, you just need to drag.
    – Data areas by purpose can be cleanly separated by sheets.
    – Editing are straight forward, you don’t have to know structuring grammars.

    The only difficulty of using Excel files is there lacks an effective comparison tool for finding version differences. But what the hell! I just need to find time to make one.

    And more, have you noticed the output of the CommandRecorder is in Japanese? Yes, a natural language! This makes the resulting code very easily understandable. Even one who is completely unaware of the system and unaware of any programming language can understand what a test does. This largely flattens the learning curve.

  • Rust練習問題:数の整除

    問題文

    定理:二桁以上の正整数であれば、その一の位を取り除いて、残った数を前記一の位の数字の五倍で割ったら、残りの数が17の倍数の場合かつその場合に限って、元の数も17の倍数である。

    例えば、34が17の場合である。なぜなら、3-20=-17が17の倍数である。201は17の倍数ではない。なぜなら、20-5=15は17の倍数ではない。一個正整数nを入力して、あなたの任務はこれが17の倍数であるかを判断することだ。

    入力フォーマット

    入力ファイルは最多で10セットのテストデータを有する。一セットのテストデータは一行を占める。その行は一個の整数n(1<=n<=10^100)のみがあり、判断待ちの正整数を表す。n=0であれば、終了を意味し、この行を処理すべきではない。

    出力フォーマット

    一セットのテストデータに対して一行を出力して、相応のnが17の倍数であるかを表す。1が肯定を表し、0が否定を表す。

    入力サンプル

    34
    201
    2098765413
    1717171717171717171717171717171717171717171717171718
    0

    出力サンプル

    1
    0
    1
    0

    分析

    この問題の難点は、nの範囲にある。64-bitの整数型を使っても、表示範囲が最大で20桁の十進数くらいになる。100桁まで昇るnとしても使えない。つまり、プログラミング言語のビルトイン型では、nを表現できない。したがって、nを表現できる型またはそれに類するものを作らないといけない。

    また、前記定理によると、nが17の倍数であるかは、nより一桁小さい別の整数で判定できる。

    つまり、$n=n_0$が17の倍数という問題が下記の問題に相当する。
    ・$n_1$が17の倍数であるか、さらに次の問題に相当する。
    ・$n_2$が17の倍数であるか、さらに次の問題に相当する。
    ・$n_3$が17の倍数であるか、さらに次の問題に相当する。
    ・……
    そして、$n_1 > 10 n_2 > 10^2 n_3 > 10^3 n_4 > ……$

    すると、ある$n_p$がプログラミング言語のビルトイン型で表現できる大きさになったら、通常の割り算で17の倍数であるかは判定できるようになる。

    前記の定理の中に、掛け算と引き算がある。掛け算に参加する数字は小さく、引き算に参加する数字が大きい。このため、大きい数字の引き算を実装する必要があることになる。

    回答案

    use std::io;
    
    fn main() {
        let mut line = String::new();
        while io::stdin().read_line(&mut line).unwrap_or(0) > 0 {
            line = String::from(line.trim());
            if line == "0" {
                break;
            }
            let bn = BigNumber { digits: line.clone() };
            println!("{}", if is_divisible_by_17(&bn) { 1 } else { 0 });
            line.clear()
        }
    }
    
    struct BigNumber {
        digits: String
    }
    
    impl BigNumber {
        fn is_negative(&self) -> bool {
            if let Some(first_char) = self.digits.chars().next() {
                match first_char {
                    '-' => true,
                    _ => false
                }
            }
            else {
                false
            }
        }
    
        fn subtract(&self, other: &BigNumber) -> BigNumber {
            let mut result_digits = String::new();
            let mut ards : Vec<i8> = Vec::new();
            let mut brds : Vec<i8> = Vec::new();
            for c in self.digits.chars().rev() {
                match c {
                    '0'..='9' => ards.push((c as u8 - '0' as u8) as i8),
                    _ => panic!("Only unsigned numbers are allowed in minuend")
                }
            }
            for c in other.digits.chars().rev() {
                match c {
                    '0'..='9' => brds.push((c as u8 - '0' as u8) as i8),
                    _ => panic!("Only unsigned numbers are allowed in subtrahend")
                }
            }
            let mut diffds : Vec<i8> = Vec::new();
            let mut carry : i8 = 0;
            let maxlen = if ards.len() > brds.len() {ards.len()} else {brds.len()};
            for di in 0..maxlen {
                let mut df = 
                    if di >= ards.len() {0} else {ards[di]}
                    - if di >= brds.len() {0} else {brds[di]}
                    - carry;
                if df < 0 {
                    carry = 1;
                    df += 10;
                }
                else {
                    carry = 0;
                }
                diffds.push(df);
            }
            if carry == 1 {
                // The result is negative, and need be complemented.
                for d in diffds.iter_mut() {
                    *d = 9 - *d + carry;
                    if *d < 10 {
                        carry = 0;
                    }
                    else {
                        *d -= 10;
                    }
                }
                result_digits.push('-');
            }
            for d in diffds.iter().rev() {
                result_digits.push(('0' as u8 + *d as u8) as char);
            }
            BigNumber {
                digits: result_digits
            }
        }
    
        fn last_digit(&self) -> Option<i8> {
            if let Some(c) = self.digits.chars().rev().next() {
                Some((c as u32 - '0' as u32) as i8)
            }
            else {
                None
            }
        }
    
        fn remove_last_digit(&mut self) {
            if self.is_negative() {
                if self.digits.len() > 2 {
                    self.digits.pop();
                }
                else {
                    self.digits.clear();
                    self.digits.push('0');
                }
            }
            else {
                if self.digits.len() > 1 {
                    self.digits.pop();
                }
                else {
                    self.digits.clear();
                    self.digits.push('0');
                }
            }
        }
    }
    
    fn is_divisible_by_17(number: &BigNumber) -> bool {
        if number.digits.len() < 4 {
            let intval = number.digits.parse::<i32>().unwrap();
            return intval % 17 == 0;
        }
        let last_digit_times_5: u32 = number.last_digit().unwrap() as u32 * 5;
        let mut shortened = BigNumber {digits: number.digits.clone()};
        shortened.remove_last_digit();
        let subtrahend = BigNumber {digits: last_digit_times_5.to_string()};
        let difference = shortened.subtract(&subtrahend);
        return is_divisible_by_17(&difference);
    }
  • Rust練習問題:弟の算数検査

    問題文

    弟は100以内の足し算と引き算をやりました。チャックしてあげてください。核問題の形式はa+b=c或いはa-b=c、どれも100を超えない非負整数です。cは弟が算出した回答で、200以内の比数整数であるか、一個の「?」かです。「?」は回答不能を意味します。

    入力形式

    入力は100行以内とし、EOF記号で終了します。一行ごとに、一問があります。形式は前述規定に則し、いかなるスペースを含みません。入力されたすべての整数に左に不要な0をつけていません。

    出力形式

    一行のみを出力します。一個の非負整数のみが出て、つまり、弟が正解した問題の数。

    入力サンプル

    1+2=3
    3-1=5
    6+7=?
    99-0=99

    回答案

    use std::io;
    use core::iter::Peekable;
    use core::str::Chars;
    pub struct Expression {
        a: i32,
        b: i32,
        op: char,
        answer: i32,
        unanswered: bool
    }
    
    fn main() {
        let mut line: String = String::new();
        let mut correct_count: i32 = 0;
        while io::stdin().read_line(&mut line).unwrap() > 0 {
            let mut char_seq: Peekable<Chars<'_>> = line.trim().chars().peekable();
            let a: i32   = read_number(&mut char_seq).unwrap().unwrap();
            let op: char = char_seq.next().unwrap();
            let b: i32   = read_number(&mut char_seq).unwrap().unwrap();
            assert!(char_seq.next().unwrap()=='=');
            let got_answer = read_number(&mut char_seq).unwrap();
            let expression = Expression {
                a: a,
                op: op,
                b: b,
                answer: got_answer.unwrap_or_default(),
                unanswered: got_answer.is_none()
            };
            match check_expression(&expression) {
                Ok(r) => correct_count += if r {1} else {0},
                Err(e) => println!("Error: {}", e)
            }
            line.clear();
        }
        println!("{}", correct_count);
    }
    
    fn read_number(text_iter: &mut Peekable<Chars<'_>>) -> Result<Option<i32>, &'static str> {
        let mut number_str: String = String::new();
        while let Some(&c) = text_iter.peek() {
            match c {
                c if c >= '0' && c <= '9' => {
                    number_str.push(c);
                    text_iter.next();
                },
                '?' => {
                    number_str.push(c);
                    text_iter.next();
                    break;
                },
                _ => break
            }
        }
        if number_str == "?" {
            Ok(None)
        } else if number_str.len() > 0 {
            Ok(Some(number_str.parse().unwrap()))
        }
        else {
            Err("Reading a number while not at a number.")
        }
    }
    
    fn check_expression(expr: &Expression) -> Result<bool, &'static str> {
        let expected: i32;
        match expr.op {
            '+' => expected = expr.a + expr.b,
            '-' => expected = expr.a - expr.b,
            '*' => expected = expr.a * expr.b,
            '/' => expected = expr.a / expr.b,
            _ => return Err("Invalid operator.")
        }
        Ok(!expr.unanswered && expr.answer == expected)
    }
  • $\frac{1}{2022}+\frac{5}{2022}+\frac{7}{2022}+…+$約分できない真分数を合計する。

    約分できない真分数を加算するので、分数が分母の素因数の倍数の場合は、加算されないことになる。

    2022に三つの素因数がある。2と3と337である。つまり、$U=\{1,2,3,…,2021\}$の中に、2の倍数でもなく、3の倍数でもなく、337の倍数でもない数字がどのくらいあるかを数える問題になる。式を書きやすくするために、記号を定義する。$U$の中の2の倍数の集合を$A$となる。$U$の中の3の倍数の集合を$B$となる。Uの中の337の倍数の集合を$C$となる。すると、求めたい数は$|U|-|A \cup B \cup C|$。

    $$\begin{align}
    |A|&=\lfloor 2021 \div 2 \rfloor = 1010 \\
    |B|&=\lfloor 2021 \div 3 \rfloor = 673 \\
    |C|&=\lfloor 2021 \div 337 \rfloor = 5 \\
    |A \cap B|&=\lfloor 2021 \div 2 \div 3 \rfloor = 336 \\
    |A \cap C|&=\lfloor 2021 \div 2 \div 337 \rfloor = 2 \\
    |B \cap C|&=\lfloor 2021 \div 3 \div 337 \rfloor = 1 \\
    |A \cap B \cap C|&=\lfloor 2021 \div 2 \div 3 \div 337 \rfloor = 0 \\
    \end{align}$$

    $$\begin{align}
    |A \cup B \cup C|&=|A|+|B|+|C|-|A \cap B|-|A \cap C|-|B \cap C|+|A \cap B \cap C| \\
    &=1010+673+5-336-2-1+0 \\
    &=1349
    \end{align}$$

    $$\begin{align}
    |U|-|A \cup B \cup C|=2021-1349=672
    \end{align}$$

    つまり、$\frac{1}{2022}+\frac{5}{2022}+\frac{7}{2022}+…+$の合計式に、全部で672項がある。

    この合計式の構造をもっと詳しく観察する。仮に$gcd(a, 2022)=f$であれば、仮に$gcd(2022-a, 2022)=k$とする。$k$を求める。仮に、$d$を整数とすると、

    $$\begin{align}
    & (d \mid 2022-a) \wedge (d \mid 2022) \\
    & \iff (d \mid a) \wedge (d \mid 2022)
    \end{align}$$

    即ち、$a$又は$2022$又は$2022-a$の任意の約数$d$が常に、同時に、$gcd(a, 2022)$の約数でありながら、$gcd(2022-a, 2022)$の約数である。即ち、$gcd(a, 2022)=gcd(2022-a, 2022)$.

    問題の合算式は、分子の小さいほうを左に置き、右へ分子が大きくなる一方である。つまり、$\frac{a}{2022}$が左から$n$番目であれば、$\frac{2022-a}{2022}$が右から$n$番目である。$\frac{a}{2022}$と$\frac{2022-a}{2022}$を対にすれば、この対の合計が$\frac{a}{2022}+\frac{2022-a}{2022}=1$である。全部で672項あるから、全部で、336対ある。つまり、問題の式の合計値が336である。

    簡単なPythonプログラムで足して結果を確かめよう。

    >>> from math import gcd
    >>> sum=0
    >>> for i in range(1, 2022):
    ...     sum+=0 if gcd(i, 2022)>1 else i
    >>> print(sum/2022)
    336.0

    結果がぴったりだ。

  • Compute Sigmoid Function Smart

    The sigmoid function frequently appears in machine learning contexts. It has a simple form: $$
    \sigma(x)=\frac{1}{1+e^{-x}}
    $$. A naive implementation in C would be:

    double sigma(double x)
    {
        return 1.0/(1.0+exp(-x));
    }

    It looks good, right? No, it is not. When $x << 0$, say $x=1000$, exp(-x) will yield NaN, even though $\sigma(x)$ approaches $0$. We need a way to bypass the NaN. Let’s look at the graph of $\sigma(x)$.

    It looks symmetric about $(0, \frac{1}{2})$. If it is the case, $$\begin{equation}
    \sigma(x)=1-\sigma(-x)
    \end{equation}$$ will have to hold. Let’s try to prove it.

    $$\begin{align}
    1-\sigma(-x) &= 1-\frac{1}{1+e^x} \\
    &=\frac{1+e^x}{1+e^x}-\frac{1}{1+e^x} \\
    &=\frac{1+e^x-1}{1+e^x} \\
    &=\frac{e^x}{1+e^x} \\
    &=\frac{1}{\frac{1}{e^x}+\frac{e^x}{e^x}} \\
    &=\frac{1}{e^{-x}+1} \\
    &=\sigma(x)
    \end{align}$$

    Done. It’s proven. Now, whenever $x<0$, we can switch $\sigma(x)$ to $1-\sigma(-x)$, which yields the same value and completely avoids the NaN problem. The following is the code.

    double sigma(double x)
    {
        if (x>=0)
            return 1.0/(1.0+exp(-x));
        return 1.0 - 1.0/(1.0+exp(x));
    }
  • HOWTO: Eclipse 2023-12 + Cygwin + GDB 13 — A Work Around

    It was said that GDB 11 or above does not work well with Eclipse with Cygwin. The true reason seems to be that Eclipse sends Windows-style path of your executable to GDB of Cygwin, while GDB of cygwin does not recognize “C:\” as beginning of an absolute path. So GDB will start end up some ugly combined path like

    /cygdrive/c/Users/yourname/Desktop/C:/store/your-project-name/C:/store/your-project-name/Debug/your-project-name.exe

    I haven’t found a beautiful way to work around it as GDB does not fix this problem, and Eclipse developers insist this as a problem of GDB. So, stalemate.

    But, I am not going to get jammed here. GDB wants that path? I give GDB the path.

    So, under Cygwin, I make a symbolic link like the following:

    mkdir -p "/cygdrive/c/Users/yourname/Desktop/C:/store/your-project-name/C:"
    cd "/cygdrive/c/Users/yourname/Desktop/C:/store/your-project-name/C:"
    ln -s /cygdrive/c/store store

    This will let GDB find your executable.

    What if you cannot be sure the path that GDB is looking at? Don’t worry! Follow this operation path. Window > Preferences > C/C++ > Debug > GDB, check “Show the GDB traces consoles with character limit:”. Run debugging again. Then in the console tab, open the console named “… gdb traces”. The path that GDB looks for will be in it.

    Eclipse cannot find source files when debugging? Don’t worry! I bet you see window reporting “Can’t find a source file at …”. Then click the “Edit Source Lookup Path …” button. Then Add > Path Mapping > OK >(enter a new mapping name, e.g. “cygwin”) > Add . Enter “/cygdrive/c/” as the “Compilcation path”, select “C:\” as Local file system path. Repeat this adding operation until all source files are found.

    By the way, if you get some strange exit code when starting GDB, like 0xc0000135 or something alike, be sure to look into environment variable settings. Your system PATH variable, project environment variables in Debug settings. Be sure to set a complete list of paths or nothing.

    But there is still a drawback. I still have not found how to make breakpoint work. …

    So, the better solution may still be installing gdb 9.2-1. If you cannot find it in your Cygwin setup program, try using another mirror.