Five Working Assumptions:
The goal in this blog is the following: to help understanding of the quantum of information by using an ontology engineering example idea. Building an ontology from a blank-slate is the most human-intensive task today. We would like machines to do this. The notion of interaction based computing, that underlies the quantum metaphor, is to build theories using an ontology induced from raw data and then to reconstruct the hitherto unknown ontology from theories abduced from that data and back-tested using a criterion of informativeness: the most informative theories will reconstruct a probability density matrix that represents the most precise ontology states with the best utility for decision making or reasoning.
Earlier, we introduced a few concepts about what you know, or don't know and the idea of interaction. Now, I would like to make a commitment to a working set of assumptions about how I think about what Quantum Artificial Intelligence (AI) is. The objective, in me personal viewpoint, of Quantum AI, is to enable computers to function at a higher level of value in the knowledge production chain: that is, to reach towards creativity and innovation.
Quantum AI will need to perform the following five core functions:
1. Analysis: ability to process, re-represent and learn from data
2. Synthesis: transform date into knowledge and then into re-usable wisdom
3. Context: hypotheses of plausible futures as interpretations with least information "on the fly".
4. Fusion: represent higher-order concepts (or ideas) from different, possibly incommensurate semiotic paradigms into a single unified paradigm.
5. Consciousness: self-awareness and then beyond the “self” the ability to explore higher-order thinking within a "meta-self".
What computational models and representations could we use and how can a system reach towards creativity and innovate itself? Is that possible or would it violate Godel's Incompleteness Theorem? Is this still Turing Computation - is it a Turing machine.
The short answer is that no, due to the nature of computation by interaction it would not violate Godel's theorem and, yes, this is actually still Turing machine computation --- to be precise, it fits the model of a Turing Choice Machine.
Let's attend to the the Quantum of Information, which underlies the first point about data "re-representation": note, I did not say data representation, but rather, re-representation, to encourage thinking about the representation itself as an interactive coming about of how data becomes usable and what its minimal bounds must be in order to be useful (from a Quantum AI point of view).
In order to ground the ideas that I am presenting, let me start by stating that whatever we as humans exchange as thoughts, and that also whatever any sort of intelligence has to exchange in order to express itself must somehow be conveyed by having some concrete statement to make. Let us take as an axiom of our system that First Order Predicate Logic is the modality of that expression and that this can be rewritten in natural languages (Arabic, French, Chinese, English or Ancient Egyptian) as well as computer languages (C, Prolog, COBOL, Java, etc...).
Of course, we are not making the statement that the mental representation is first-order logic, but only that statements made by the mental machinery can be rewritten in some variant of (modal) predicate logic.
Therefore, we can now set the stage for a preliminary discussion of state-vectors:
1) That the intelligent machinery has some kind of internal state vector or state representation; and,
2) That the output of the mental machinery must serve its basic survival which means that it ought to behave efficiently in the world; and,
3) That the mental machinery approximates the state vector of the world by making good-enough approximations of the true probability distribution of its
states; so that,
4) Good decisions can be made; and,
5) Future survivability is increased by taking utilitarian actions.
Therefore, from a mathematical perspective, the minimal number of interactions that optimize informativeness of the exchanges, which in turn implies well formed, rich statements, then the higher the fidelity of the approximation of the true state of the world and the better the outcomes (from decision making).
An interaction, seen as the exchange of a logical statement, will be valid with some probability between 0 and 1: this leads to a the notion of refinements of the interactions so that the each current estimate of the world-state is updated by a rewriting the statements to augment the estimates.
Now we can write this as a mathematical model (of course, as you can see, I have started to sneak in the ideas of Minimum Message length as well its complementary idea of Minimum Interaction distance).
In simple terms, a formula in first order predicate logic (FOPL) will have a level of informativeness that, through an interaction will be rewritten to the point that it represents a close enough approximation to the true state of the world, or at least, that part that is consistently observable of the true state vector.
Therefore, the question becomes what is that difference between the initial proposition in FOPL and the smallest rewrite step forward as a quantum of information that transforms informativeness towards the true state? And what can be used to measure it?
Here is where we will gingerly suggest an approach with some testable hypotheses - not as a right answer (as I don't know the answer, yet) but as a place to begin, and, as a placeholder for a preliminary model that is executable.
For the sake of discourse, let us call our intelligent machinery an agent.
The agent will create a statement about the world and let us call that statement a theory. We are justified in using the word theory as it implies some kind of ontology and also that the agent has not actually gotten a full understanding of the totality of the world, but theorizes some understanding of some part of the world.
The agent, obviously, needs to test the theory: in this case, the agent conducts an experiment: doesn't this sound just like the Scientific Method?. Well, my intention is to draw attention to it and to the work of Charles S. Peirce and his pragmatism of inquiry.
If the agent, as a result of the experiment, which could take the form of a conversation with another agent, is able to rewrite the original theory by enriching its information, then, the agent can produce a new theory (which is usually, though not always, a modification of the original theory).
A simple way to progress is to think in terms of adding one new relation at a minimum or, instead, one new concept and relation to the existent theory, T1, to produce a new theory, T2 where the informativeness of T2 is strictly greater than T1.
The temptation is to linearize thinking and to assume that the steps are somehow linear. They are not. The reason is that we do not know what form the steps take. One notion is that adding a new relationship would at most provide expand the space by a factorial in the number of concepts related by the relationship and the combinations of the relationships in concert with the concepts would add another increase to the descriptive or informative power.
If we take a distinctly Quantum approach and we liken the ur-elements (atoms or literals) of some ontology with a probability distribution calculable in terms of a complex probability amplitude then make the statement: let Ω be an ontology with probability amplitude distributions amongst its atoms. Furthermore let the complex plane determine the order structure and the real plane determine the conceptual association structure so that a given theory is defined by the ordered associations between concepts from the ontology.
It should be clear that using this approach, that the two theories T2 and T1 can be distinguished by a divergence measure which we could justifiably state as being in inverse proportion to the informativeness increase from T1 to T2.
Candidates for such measures include quantum Jensen-Shannon, Renyi Relative Entropy, Kullback-Liebler, and Bregman divergence measures.
The way I think about this is that an atom of the ontology is represented as a virtual particle that is defined by its wavefunction ψ(x). The quantum state corresponds to the configuration in which the ontology is meaningfully constructed such that the Theories T1 and T2 are expressible.
However, in an unknown quantum state (i.e. we do not know the ontological structure) we are given data for which we need to determine a particle in an unknown quantum state as well as to measure observables from which to induce its wavefunction ψ(x): here is an example in which quantum inspired views can work forwards and backwards.
But the problem is that we do not know what bases are appropriate so we have to try out several - hence, from these efforts we can determine the wavefunctions and compute the density matrix which then tells us everything we need to know.
The key here is that we must consider that the different roles of the measures correspond to different roles of the particles play in the definition of the ontology and only when truly new particles contribute to state information (i.e non degenerate copies and measurements) then only can we reconstruct the state and from this compute the divergence between states and therefore the informativeness between T1 (made up of some configuration of particles) and T2.
If your heads hurts - well, this is because where we are headed with this blog is Quantum Tomography.
I offer instead, the concept of Quantum Inspired Semantic Tomography: that is the use of quantum mathematical and operational methods to model and execute ontology synthesis from raw data.
In this sense, ontology synthesis from data is the analog to reconstruct a quantum state from experimental data inputs and this in turn enables theories to be produced and driven by their informativeness computed via the divergence measures.
To put it another way, in order to synthesize the ontology which is the understanding of a domain or the world, a quantum system model which is self-emergent is needed. That is, the quantum system must correspond to seperate interacting quantum systems that produce theories about the world subject to revision via divergence considerations until a good-enough result is achieved where further quantum effects make no tangible difference to the theory revision.
Mathematically, we could call this a fixed-point.
If we measure the quantized "jumps" between divergences according to some optimality criterion (and I know I have not answered what the optimality is) then we would have a profile of ontological development for any given domain and know where we can make the biggest difference to the biggest gaps within that domain.
Therefore, the theory that produces the largest informativeness from T1 to T2 allows one to reconstruct an unknown distribution of states characterizing an ontology as determined by the probability density matrix. This is the Quantum Tomography part of the semantics.
At present, this process is done by many talented humans.
But quantum inspired methods might do this too!
Until next time.
No comments:
Post a Comment