Friday, May 04, 2007

Erlang and Neural Networks Part III

I had meant to do more on Erlang more quickly, but I got sidetracked by meta-programming. Here's Part III of Erlang and Neural Networks!

Last time, I did Erlang and Neural Networks Part II. And we saw that neural network is basically made up of interconnected perceptrons (or neurons), and they are basically modeled as a linear combination of inputs and weights with a non-linear function that modifies the output.

Drawing a line in the sand

Classifiers often do very well strictly on probabilities. But often times, we don't know what the underlying probabilities are for the data, and not only that, we don't have lots of training data to build accurate probability densities. One way around that is to draw a line in the data space that acts as the decision boundary between two classes. That way, you only have to find the parameters (i.e. weights) of the line, which is often fewer in number than the entire probability space.

This is exactly what a perceptron does. It creates a decision boundary in data space. If the data space is a plane (2D, or having two inputs), then it draws a line. For higher data space dimensions (4D or more), it draws a hyperplane.

So Why Not Go Linear?

The problem with just using a perceptron is that it can only classify data that is linearly separable--meaning data you can separate with a line. The XOR problem is a simple illustration of how you can't draw a line that separates between on and off in an XOR. Minsky and Papert wrote a famous paper that kinda killed off research in this field for about a decade because they pointed this out.

So to get around this linearity, smart people eventually figured out that they can chain perceptrons together in layers, and that gives them the ability to express ANY non-linear function, given an adequate number of hidden layers.

Shake my hand and link up to form Voltron

Let's try linking our perceptrons together. We're going to add two more messages to our perceptrons:
perceptron(Weights, Inputs, Output_PIDs) ->
receive
% The other messages from part II

{connect_to_output, Receiver_PID} ->
Combined_output = [Receiver_PID | Output_PIDs],
io:format("~w output connected to ~w: ~w~n", [self(), Receiver_PID, Combined_output]),
perceptron(Weights, Inputs, Combined_output);
{connect_to_input, Sender_PID} ->
Combined_input = [{Sender_PID, 0.5} | Inputs],
io:format("~w inputs connected to ~w: ~w~n", [self(), Sender_PID, Combined_input]),
perceptron([0.5 | Weights], Combined_input, Output_PIDs)
end.

connect(Sender_PID, Receiver_PID) ->
Sender_PID ! {connect_to_output, Receiver_PID},
Receiver_PID ! {connect_to_input, Sender_PID}.
We would never call connect_to_output() or connect_to_input() directory [1]. We'd just use connect(). It basically just adds the perceptron's process ID to each other, so they know who to send messages to when they have an output.

We can now connect up our perceptrons, but with the way it is, currently, we'd have to send a separate message to each perceptron connected to an input to the network. This is tedious. We are programmers and we are lazy. Let's make a perceptron also double as an source node. As source node simply passes its input to to its outputs.
perceptron(Weights, Inputs, Output_PIDs) ->
receive
% previous messages above and in part II

{pass, Input_value} ->
lists:foreach(fun(Output_PID) ->
io:format("Stimulating ~w with ~w~n", [Output_PID, Input_value]),
Output_PID ! {stimulate, {self(), Input_value}}
end,
Output_PIDs);
end.
Now we can start creating perceptrons.
64> N1_pid = spawn(ann, perceptron, [[],[],[]]).
<0.325.0>
65> N2_pid = spawn(ann, perceptron, [[],[],[]]).
<0.327.0>
66> N3_pid = spawn(ann, perceptron, [[],[],[]]).
<0.329.0>
Note that we get back three process IDs of the three perceptrons we created. Then we start connecting them.
67> ann:connect(N1_pid, N2_pid).
<0.325.0> output connected to <0.327.0>: [<0.327.0>]
<0.327.0> inputs connected to <0.325.0>: [{<0.325.0>,0.500000}]
{connect_to_input,<0.325.0>}
68> ann:connect(N1_pid, N3_pid).
<0.325.0> output connected to <0.329.0>: [<0.329.0>,<0.327.0>]
<0.329.0> inputs connected to <0.325.0>: [{<0.325.0>,0.500000}]
{connect_to_input,<0.325.0>}
We used N1 as an input node connected to perceptrons 2 and 3. So if N1 is passed a value, N2 and N3 should be stimulated with that value.
69> N1_pid ! {pass, 0.5}.
Stimulating <0.329.0> with 0.500000
{pass,0.500000}Stimulating <0.327.0> with 0.500000

<0.329.0> outputs: 0.562177

<0.327.0> outputs: 0.562177
Hurray! So now, the network's got tentacles, that we can connect all over the place, writhing, and wiggling with all its glee. However, this is currently a DUMB network. It can't classify anything because we haven't told it how to learn anything yet. How does it learn to classify things? It does so by adjusting the weights of the inputs of each perceptron in the network. And this, is the crux of neural networks in all its glory. But you'll have to wait til next time!

(1) Note that the last message connect_to_input() isn't followed by a semicolon. That means every message before it in perceptron needs to end with one. So if you've been following along, the stimulate() message from part II needs a semicolon at the end of it now.

Erlang and Neural Networks Part I
Erlang and Neural Networks Part II
Erlang and Neural Networks Part III

5 comments:

  1. Anonymous11:56 AM

    Oh...Aaahh...Ooooh!

    ReplyDelete
  2. Anonymous2:59 PM

    Thanks, please get the next artciel in the series up ASAP!!!!

    ReplyDelete
  3. Haha, I had wrote this a while back and submitted it to reddit, and there was little interest. But it seems like someone else submitted it and now there's some people coming in to read it.

    Yeah, since there seems to be some interest, I'll finish up the next part by the end of this next week.

    ReplyDelete
  4. Anonymous11:34 AM

    Hi Wilhelm,

    any news on the neural networks front?
    Like a Part IV?

    Thanks.

    ReplyDelete
  5. Part IV is up, though it mostly explains the math of back propagation (no proof). Part V is going to get around to the code for back propagation. Since that'd be easier to understand if the math was understood, I figured it'd be the right thing to do. Enjoy!

    ReplyDelete