Hi Cybernetics people,
 I take the liberty to send with this email a brief description of my AI 
 robot brain design ideas.  My purpose with this is to ask all of you, if 
 you can, to point me to existing data that overlaps with this, and above 
 all to people and groups who might at this moment be working on these 
 kinds of things.  Thanks very much for any pointers you can give me. 
-----
 * Goal : 
   ~~~~
 To create a computer program (called ''Brain Program'') that would make 
 a PC to operate like the brain of an intelligent robot. 
 The PC is connected to
  1. a set of robot arms (''Motor'') controlled by the brain program, and
  2. a set of Sensors that provide the brain with data about its environment.
 Therefore the assembly { Brain + Motors + Sensors } acts like an 
 intelligent robot. 
 
 * Further analysis of the goal :
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The Brain Program, in my opinion, is the essential part of the 
 intelligent robot.  The Brain Program should be constructed in such a 
 way that it automatically learns to make use of input Sensor data and 
 learns to operate its Motors, even in the case where e.g. new 
 Sensors/Motors are plugged into the robot in the course of its life. 
 That is: the Brain Program compensates for the ideosyncracies of the
 particular Sensors/Motors that it is connected with.
 I see the above problem in evolutionary terms :  The Brain Program 
 should control the robot ( = { Brain+Motors+Sensors} ) in such a way 
 that the robot survives.  The ''intelligence'' if the robot consists of 
 the circumstance that it learns autonomously how to survive optimally. 
 For the time being, I only look at this survival of the robot as 
 survival of the individual robot itself.  My goal is to make a robot 
 that learns selfishly useful behaviour.  The last word ''useful'' is 
 here to be interpreted as a synonymn for behaviour that is favourable 
 for the survival of the robot. 
 * General design :
   ~~~~~~~~~~~~~~
 I've created a minimally simple Brain Program that consists basically 
 simply of a neural network.  This simple Brain Program is constructed
 as follows (source code and some documentation at
 http://www.rubingscience.org/aiclub/toc.html) :
 
 The data maintained by the Brain Program is a list of items, each of 
 which is arbitrarily called a ''Brain Cell''.  The contents of each 
 Brain Cell is as follows: 
                                          Motor   Priority
     Sensor values                        value   value
      s1  s2  s3  s4  s5             sN     m       p
    +---+---+---+---+---+--- ... --+---+  +----+  +---+
    | a | b | c | e | f |          | z |  | m1 |  | P |
    +---+---+---+---+---+--- ... --+---+  +----+  +---+
  
 which means: if the Sensors are in states (a,b,c,d,e,f,...,z), then
 actuate Motor 'm1'.
 This presupposes that the robot has N Sensors (labelled s1...sN) and M 
 Motors (labelled m1...mM).  The Sensor states are discrete, and might be 
 e.g. binary values (e.g. 1 = light sensor receives light, 0 = doesn't 
 receive light). 
 Each Brain Cell must be unique.  The Brain Program operates in 
 discrete, successive moves, each of which consists of : 
      1. Receive the states of the Sensors.
      2. Choose from all Brain Cells the one cell the sensor values
         of which are most ''like'' the input Sensor states.
         The likelihood with which a certain Brain Cell is selected,
         *also* depends on the ''Priority'' value p of the Brain Cell.
         Higher p means higher likelihood of being selected.  
      3. Get the Motor value from the chosen Brain Cell, and
         send a signal to that Motor that makes that motor go active
         for a (short) time.
      4. Get the pain/pleasure feedback signal that results from the
         physical action of the activated Motor, and change the Priority
         values of the Brain Cells on the basis of the value that 
         pain/pleasure signal.  (More on this below.)
 In step 2, if the degree of alike-ness between the input Sensor states 
 and the chosen Brain Cell is below a certain threshold, the Sensor input 
 is considered as a ''new'' thing, and another action than the above is 
 executed, namely: a set of new Brain Cells is inserted, with as Sensor 
 states that ''new'' set of Sensor input values, and with all possible 
 values for the Motor ('m'). 
 Each of the Brain Cells is a PROGRAM in which is encoded the action that 
 the robot executes when the sensors see the pattern (a,b,c,....z). That 
 is: all behaviours that the robot can potentially execute are stored in 
 the Brain Cells.  The sensor input values, in this minimally complex 
 design, fairly directly control the robot's actions. 
 The set of all the Brain Cells in the robot's brain is a population of 
 competing programs.  (I suspect that this may overlap with some of 
 Koza's ideas/designs.) 
 The Priority value of a Brain Cell is a non-negative real value, and 
 represents the population size of the program (= behaviour) coded in 
 that Brain Cell. 
 
 Additionally to the input Sensors, the robot is also equipped with 
 sensors that sense the pain/pleasure state of the robot.  Pain might 
 mean that the robot has bumped into a wall and has thereby damaged 
 itself, or that the robot's fuel level is becoming uncomfortably low.  
 Pleasure might mean the robot's fuel level has just increased. All 
 pain/pleasure sensor inputs are summed (pain with negative weights, 
 pleasure with positive weights) in a fixed (probably hard-wired) 
 function, and result signal of that fixed function is fed to the Brain. 
 The set of Brain Cells is a *list*, in which the Brain Cell that has 
 just been executed is always removed from its old place and moved to the 
 top of the list.  Near the top of the list are thus always the Brain 
 Cells that have been used most recently. When a pain/pleasure signal is 
 received, the Priority values of the Brain Cells near the top of the 
 list are multiplied by a non-negative quantity that depends on the value 
 v of the pain signal as C * exp( v ), or a similar function.  The effect 
 of this is that a net-positive feedback signal (meaning ''Pleasure'') 
 rewards the most-recently executed Brain Cells, by increasing the 
 Priority value (population size) of these Brain cells; analogously, a 
 net-negative feedback signal (meaning ''Pain'') decreases the Priority 
 value (population size) of those Brain Cells. 
 Good behaviour, i.e. behaviour that results in Pleasure signals, 
 therefore reaches ever higher Priority, and is therefore more and more 
 likely to be re-executed.  Bad behaviour, i.e. behaviour that results in 
 Pain signals, gets lower and lower Priority, and thereby gets executed 
 less and less. 
* Ideas for extending the above simple design
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 In my perception, the key idea in the above is that the Brain Cells are 
 programs in which *behaviour* is encoded, and that these programs 
 compete with each other via an evolutionary process.  These evolutionary 
 processes are what makes the robot adapt and learn, and what makes the 
 robot ''intelligent'' and also *autonomous*.  
 At the moment, the very simple and minimal version of the Brain Program 
 that I have running right now, and which effectively can control a 
 simple simulated robot that learns to avoid walls and learns to ''seek 
 out'' food pellets in a simulated environment it walks around it, 
 doesn't yet do genetic mutation or crossover of the contents of Brain 
 Cells -- this would be IMO the next interesting thing to extend the 
 above design with.  Crossover IMO could be included e.g. as an extra 
 step in the cyclical operation of the Brain: e.g. in each cycle select 
 two high-Priority Brain Cells, let them mate, and insert the created 
 offspring behaviour pattern as a new Brain Cell, with a certain (small) 
 population size (= Priority). In this way, it would seem possible to 
 create in the operation of the Brain an infrastructure through which 
 successful sub-patterns in Brain Cells can be communicated between Brain 
 Cells. 
 One of my promiment longer-term goals is to extend the above simple 
 neural-network design into a Brain in which the programs (which are the 
 entities that undergo evolution) consist of ''memes'', in the sense of 
 being pieces of data/information that the robot communicates to/from 
 other intelligent entities in its environment.  I mean that in such a 
 meme-communicating robot, the ''*thinking*'' going on in the Brain 
 consists of an evolutionary process on these ''memes''; that is : 
 ''thinking'' is nothing else than a dumb evolutionary process on these 
 ''memes'' (which are themselves only dumb pieces of data).  The 
 meme-communicating robot would have one or more buttons, providing to 
 the Brain the pain/pleasure signals, that the entities with which the 
 robot communicates can press when they are pleased or disgusted with 
 what the robot says to them.  Result would be that the robot learns to 
 communicate in a way that these parties find ''pleasant'' -- or in other 
 words : the robot learns that behaviour that is in that case optimal for 
 the survival of the robot is to communicate in an interesting and 
 pleasant way with those people empowered to press its buttons. 
 
 An even more interesting variant on the latter would be to eliminate the 
 buttons, and to replace them by a pain/pleasure signal derived from the 
 amount of new data that the robot learns.  This would result in a robot 
 that -- independently of whether it ''pleases'' persons in its 
 surroundings -- autonomously and independently seeks to maximize its own 
 knowledge. 
 
 (A further thesis of mine is of course that scientists are already 
 robots of that last kind :-).) 
---The above is a fairly minimal description of my ideas. (But enough for now, given this medium of emails in a mailing list, I think.)
Again, anyone who can point me to people who are already working on things like this, please inform me of those, if you will. At the moment, I have a feeling that I cannot proceed easily without interaction with others with overlapping ideas. It IMO just has to be the case that there exist research groups which do things that overlap with the above, but I find it a very strenous task to find them. Thanks very much !
---
Best regards, Menno (rubingh@delftnet.nl)
Ir. Menno Rubingh, Scientific programmer, Software designer, & Software documentation writer Doelenstraat 62, 2611 NV Delft, Netherlands phone +31 15 2146915 (answering machine backup) email rubingh@delftnet.nl http://www.rubinghscience.org/ ======================================== Posting to pcp-discuss@lanl.gov from "Menno RUBINGH" <rubingh@delftnet.nl>
This archive was generated by hypermail 2b29 : Thu Nov 09 2000 - 22:22:48 GMT