Saturday, April 07, 2007

Erlang and neural networks, part II

Two weeks ago, I did a post about Erlang (Part I), and how a simple feed-forward neural network might be a nice little project to do on the side, just to learn about Erlang. Here's what came next.

State of the Purely Functional

In the transition from imperative/procedural programming to functional programming, there are obviously things that you have to get over. You'll hear this from a lot of people just learning functional programming for the first time (myself included). The hardest thing for me to get over in a pure functional language is the absence of state. My first reaction was, "Well, how do you get anything done?"

Not having state has its advantages, and you'll hear stuff about side-effects and referential transparency. But I'd like to think of it as, things that don't have state can't be broken--they just exist. However, state is useful in computation, and different languages have different ways of getting around it. With Haskell, you use monads. At first, I figured it was the same with Erlang. But in this short tutorial on Erlang, it simply states that Erlang uses the threads to keep state.

This maps pretty well with what I'm trying to do. Each perceptron will be a thread, and send messages back and forth to each other as they fire and stimulate each other.

The essence of a perceptron




So once again, this is a perceptron. It's a weighted sum (a dot product) of the inputs, which is then thresholded by f(e). So we'll write a thresholding function and a weighted sum in Erlang.

We start by declaring the name of the module, and the functions to export from the module.
-module(ann).
-export([perceptron/3, sigmoid/1, dot_prod/2, feed_forward/2,
replace_input/2, convert_to_list/1]).
I exported most of the functions, so I can run them from the command line. I'll remove them later on.

First we write our thresholding function. We will use the sigmoid function as our thresholding function. It's pretty easy to explain. A value, X goes in, another value comes out. It's a math function.
sigmoid(X) ->
1 / (1 + math:exp(-X)).
Since I wasn't as familiar with all the libraries in Erlang, and I wrote a dot product function, and it wasn't too bad. Erlang, for the most part, doesn't use loops, just as Ruby doesn't. They both can, if you want to write a FOR control function, but the common way is to use library functions for list processing, list comprehensions, or recursion. The first part is the base case, and the second part is what you'd do if the "recursion fairy" took care of the rest.
dot_prod([], []) ->
0;
dot_prod([X_head | X_tail], [Y_head | Y_tail]) ->
X_head * Y_head + dot_prod(X_tail, Y_tail).
Simple, so far, right? So to calculate the feed forward output of a perceptron, we'll do this:
feed_forward(Weights, Inputs) ->
sigmoid(dot_prod(Weights, Inputs)).

The body of a nerve

So far, so good. But we still need to create the actual perceptron! This is where the threads and state-keeping comes up.
perceptron(Weights, Inputs, Output_PIDs) ->
receive
{stimulate, Input} ->
% add Input to Inputs to get New_Inputs...
% calculate output of perceptron...
perceptron(Weight, New_inputs, Output_PIDs)
end.
This is a thread, and it receives messages from other threads. Currently, it only accepts one message, stimulate(Input) from other threads. This is a message that other perceptrons will use to send its output to this perceptron's inputs. Notice that at the end of the message, we call the thread again, with New_Inputs. That's how we will maintain and change state.

Note this won't result in a stack overflow, because Erlang somehow figures out not to keep the call stack. I'm guessing it knows it can do so, since no state is ever kept between messages calls that everything you need to know is passed into the function perceptron, so we can throw away the previous instances of the call to perceptron.

We do come to a snag though. How do we know which other perceptron the incoming input is from? We need to know this because we need to be able to weight it correctly. My solution is that Input is actually a tuple, consisting of {Process_ID_of_sender, Input_value}. And then I keep a list of these tuples, like a hash of PID to input values, and convert them to a list of input values when I need to calculate the output. Therefore, we end up with:
perceptron(Weights, Inputs, Output_PIDs) ->
receive
{stimulate, Input} ->
% add Input to Inputs to get New_Inputs...
New_inputs = replace_input(Inputs, Input),

% calculate output of perceptron...
Output = feed_forward(Weights, convert_to_list(New_inputs)),

perceptron(Weights, New_inputs, Output_PIDs)
end.

replace_input(Inputs, Input) ->
{Input_PID, _} = Input,
lists:keyreplace(Input_PID, 1, Inputs, Input).

convert_to_list(Inputs) ->
lists:map(fun(Tup) ->
{_, Val} = Tup,
Val
end,
Inputs).
The map function you see in convert_to_list() is the same as the map function in ruby that would go:
def convert_to_list(inputs)
inputs.map { |tup| tup.last }
end
Now, there's one last thing that needs to be done. Once we calculate an output, we need to fire that off to other perceptrons that accept this perceptron's output as its input. And if it's not connected to another perceptron, then it should just output its value. So then we end up with:
perceptron(Weights, Inputs, Output_PIDs) ->
receive
{stimulate, Input} ->
New_inputs = replace_input(Inputs, Input),
Output = feed_forward(Weights, convert_to_list(New_inputs)),
if Output_PIDs =/= [] ->
lists:foreach(fun(Output_PID) ->
Output_PID ! {stimulate, {self(), Output}}
end,
Output_PIDs);
Output_PIDs =:= [] ->
io:format("~n~w outputs: ~w", [self(), Output])
end,
perceptron(Weights, New_inputs, Output_PIDs)
end.
We know which perceptrons to output to, because we keep a list of perceptron PIDs that registered with us. So if the list of Output_PIDs is not empty, then for each PID, send them a message with a tuple that contains this perceptron's PID as well as the calculated Output value. Let's try it out:

Test Drive


1> c(ann).
{ok,ann}
2> Pid = spawn(ann, perceptron, [[0.5, 0.2], [{1,0.6}, {2,0.9}], []]).
<0.39.0>
3> Pid ! {stimulate, {1,0.3}}.

<0.39.0> outputs: 0.581759
{stimulate,{1,0.300000}}
4>
So you can see, we got an output of 0.581759. We can verify this by doing this on our TI-85 calculator:
x = 0.5 * 0.3 + 0.2 * 0.9
Done
1 / (1 + e^-x)
.581759376842
And so we know our perceptron is working feeding forward! Next time, we'll have to figure out how to propagate its error back to adjust the weights, and how to connect them up to each other.

Erlang and Neural Networks Part I
Erlang and Neural Networks Part II
Erlang and Neural Networks Part III

6 comments:

  1. Anonymous7:05 AM

    Sweet, when is part 3 coming?

    ReplyDelete
  2. Huh, it's been a while hasn't it? I'll post on it this week! I got sidetracked with meta programming and all.

    ReplyDelete
  3. The pattern matching in dot_product is incomplete. If two lists are not same long, there will be an exception out.

    ReplyDelete
  4. Anonymous3:57 PM

    dot_product function is not an efficient implementation as it is not true "tail recursive".

    I would rather change it as:

    dot_product(X, Y) -> dot_product(X, Y, 0).

    do_product([], [], Acc) -> Acc;
    dot_product([[X_Head|X_Tail], [Y_Head|Y_Tail], Acc) -> dot_product(X_Tail, Y_Tail, Acc + X_Head * Y_Head).

    ReplyDelete
  5. Anonymous11:01 PM

    I've just recently begun picking up erlang myself, and have recently been interested in machine learning, which lead me to this site. Anyway, wouldn't it make more sense to use the standard modules for more stuff? For instance, my dot_product:
    dot_product([A],[B]) where length(A) = length(B) ->
    lists:sum([ X * Y || {X,Y} <- lists:zip(A,B)]).

    ReplyDelete
  6. Anonymous8:06 AM

    Thanks Wil, this was a great reference.

    ReplyDelete