Cogprints: No conditions. Results ordered -Date, Title.

A Unified Quantitative Model of Vision and Audition

2014-08-24T21:06:36Z

We have put forwards a unified quantitative framework of vision and audition, based on existing data and theories. According to this model, the retina is a feedforward network self-adaptive to inputs in a specific period. After fully grown, cells become specialized detectors based on statistics of stimulus history. This model has provided explanations for perception mechanisms of colour, shape, depth and motion. Moreover, based on this ground we have put forwards a bold conjecture that single ear can detect sound’s direction. This is complementary to existing theories and has provided better explanations for sound localization.

A Survey on Image Retrieval Methods

2017-02-18T20:30:54Z

The Image retrieval plays a key role in day-to-days world. This work is a review of various references of various image retrieval methods. This paper starts with discussing the working conditions of text based image retrieval then the content-based retrieval: patterns of use, levels, the role of semantics, and the semantic gap. We briefly discuss about various techniques of content based image retrieval such as retrieval by color, shape and the texture and the various algorithms involved in content based image retrieval. Then the semantic based image retrieval aspects are discussed using local content descriptors the regions are segmented and retrieved the semantic regions of image

The emergence of choice: Decision-making and strategic thinking through analogies

2013-11-18T21:03:14Z

Consider the chess game: When faced with a complex scenario, how does understanding arise in one’s mind? How does one integrate disparate cues into a global, meaningful whole? how do humans avoid the combinatorial explosion? How are abstract ideas represented? The purpose of this paper is to propose a new computational model of human chess intuition and intelligence. We suggest that analogies and abstract roles are crucial to solving these landmark problems. We present a proof-of-concept model, in the form of a computational architecture, which may be able to account for many crucial aspects of human intuition, such as (i) concentration of attention to relevant aspects, (ii) how humans may avoid the combinatorial explosion, (iii) perception of similarity at a strategic level, and (iv) a state of meaningful anticipation over how a global scenario may evolve.

How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

2013-05-04T23:24:45Z

Supervised learning from high-dimensional data, e.g., multimedia data, is a challenging task. We propose an extension of slow feature analysis (SFA) for supervised dimensionality reduction called graph-based SFA (GSFA). The algorithm extracts a label-predictive low-dimensional set of features that can be post-processed by typical supervised algorithms to generate the ﬁnal label or class estimation. GSFA is trained with a so-called training graph, in which the vertices are the samples and the edges represent similarities of the corresponding labels. A new weighted SFA optimization problem is introduced, generalizing the notion of slowness from sequences of samples to such training graphs. We show that GSFA computes an optimal solution to this problem in the considered function space, and propose several types of training graphs. For classiﬁcation, the most straightforward graph yields features equivalent to those of (nonlinear) Fisher discriminant analysis. Emphasis is on regression, where four different graphs were evaluated experimentally with a subproblem of face detection on photographs. The method proposed is promising particularly when linear models are insufficient, as well as when feature selection is difficult.

The International Conference on Information and Communication Systems (ICICS 2011)

2011-09-17T17:40:15Z

he International Conference on Information and Communication Systems (ICICS 2011) is a forum for scientists, engineers, and practitioners to present their latest research results, ideas, developments, and applications in all areas of Computer and Information Sciences.

Computing with space: a tangle formalism for chora and difference

2011-05-02T17:10:41Z

What is space computing,simulation, or understanding? Converging from several sources, this seems to be something more primitive than what is meant nowadays by computation, something that was along with us since antiquity (the word "choros", "chora", denotes "space" or "place" and is seemingly the most mysterious notion from Plato, described in Timaeus 48e - 53c) which has to do with cybernetics and with the understanding of the front end visual system. It may have some unexpected applications, also. Here, inspired by Bateson (see Supplementary Material), I explore from the mathematical side the point of view that there is no difference between the map and the territory, but instead the transformation of one into another can be understood by using a formalism of tangle diagrams.

Pattern Recognition by Hierarchical Temporal Memory

2014-02-25T12:49:04Z

Hierarchical Temporal Memory (HTM) is still largely unknown by the pattern recognition community and only a few studies have been published in the scientific literature. This paper reviews HTM architecture and related learning algorithms by using formal notation and pseudocode description. Novel approaches are then proposed to encode coincidence-group membership (fuzzy grouping) and to derive temporal groups (maxstab temporal clustering). Systematic experiments on three line-drawing datasets have been carried out to better understand HTM peculiarities and to extensively compare it against other well-know pattern recognition approaches. Our results prove the effectiveness of the new algorithms introduced and that HTM, even if still in its infancy, compares favorably with other existing technologies.

Affine Registration of label maps in Label Space

2010-05-21T15:01:44Z

Two key aspects of coupled multi-object shape analysis and atlas generation are the choice of representation and subsequent registration methods used to align the sample set. For example, a typical brain image can be labeled into three structures: grey matter, white matter and cerebrospinal fluid. Many manipulations such as interpolation, transformation, smoothing, or registration need to be performed on these images before they can be used in further analysis. Current techniques for such analysis tend to trade off performance between the two tasks, performing well for one task but developing problems when used for the other. This article proposes to use a representation that is both flexible and well suited for both tasks. We propose to map object labels to vertices of a regular simplex, e.g. the unit interval for two labels, a triangle for three labels, a tetrahedron for four labels, etc. This representation, which is routinely used in fuzzy classification, is ideally suited for representing and registering multiple shapes. On closer examination, this representation reveals several desirable properties: algebraic operations may be done directly, label uncertainty is expressed as a weighted mixture of labels (probabilistic interpretation), interpolation is unbiased toward any label or the background, and registration may be performed directly. We demonstrate these properties by using label space in a gradient descent based registration scheme to obtain a probabilistic atlas. While straightforward, this iterative method is very slow, could get stuck in local minima, and depends heavily on the initial conditions. To address these issues, two fast methods are proposed which serve as coarse registration schemes following which the iterative descent method can be used to refine the results. Further, we derive an analytical formulation for direct computation of the "group mean" from the parameters of pairwise registration of all the images in the sample set. We show results on richly labeled 2D and 3D data sets.

Domain Decomposition Based High Performance Parallel Computing

2009-11-14T11:28:38Z

The study deals with the parallelization of finite element based Navier-Stokes codes using domain decomposition and state-ofart sparse direct solvers. There has been significant improvement in the performance of sparse direct solvers. Parallel sparse direct solvers are not found to exhibit good scalability. Hence, the parallelization of sparse direct solvers is done using domain decomposition techniques. A highly efficient sparse direct solver PARDISO is used in this study. The scalability of both Newton and modified Newton algorithms are tested.

Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells

2007-09-12Z

We present a model for the self-organized formation of place cells, head-direction cells, and spatial-view cells in the hippocampal formation based on unsupervised learning on quasi-natural visual stimuli. The model comprises a hierarchy of Slow Feature Analysis (SFA) nodes, which were recently shown to reproduce many properties of complex cells in the early visual system. The system extracts a distributed grid-like representation of position and orientation, which is transcoded into a localized place-field, head-direction, or view representation, by sparse coding. The type of cells that develops depends solely on the relevant input statistics, i.e., the movement pattern of the simulated animal. The numerical simulations are complemented by a mathematical analysis that allows us to accurately predict the output of the top SFA layer

Learning image components for object recognition

2006-05-25Z

In order to perform object recognition it is necessary to learn representations of the underlying components of images. Such components correspond to objects, object-parts, or features. Non-negative matrix factorisation is a generative model that has been specifically proposed for finding such meaningful representations of image data, through the use of non-negativity constraints on the factors. This article reports on an empirical investigation of the performance of non-negative matrix factorisation algorithms. It is found that such algorithms need to impose additional constraints on the sparseness of the factors in order to successfully deal with occlusion. However, these constraints can themselves result in these algorithms failing to identify image components under certain conditions. In contrast, a recognition model (a competitive learning neural network algorithm) reliably and accurately learns representations of elementary image features without such constraints.

SEPARATING NONLINEAR IMAGE MIXTURES USING A PHYSICAL MODEL TRAINED WITH ICA

2006-06-10Z

This work addresses the separation of real-life nonlinear mixtures of images, which occur when a paper document is scanned and the image from the back page shows through. A physical model of the mixing process, based on the consideration of the halftoning process used to print grayscale images, is presented. The corresponding inverse model is then used to perform image separation. The parameters of the inverse model are optimized through the MISEP technique of nonlinear ICA, which uses an independence criterion based on minimal mutual information. The quality of the separated images is competitive with the one achieved by other techniques, namely by MISEP with a generic MLP-based separation network and by Denoising Source Separation. The separation results show that MISEP is an appropriate technique for training the parameters and that the model fits the mixing process well, although not perfectly. Prospects for improvement of the model are presented.

WAVELET BASED NONLINEAR SEPARATION OF IMAGES

2006-06-10Z

This work addresses a real-life problem corresponding to the separation of the nonlinear mixture of images which arises when we scan a paper document and the image from the back page shows through. The proposed solution consists of a non-iterative procedure that is based on two simple observations: (1) the high frequency content of images is sparse, and (2) the image printed on each side of the paper appears more strongly in the mixture acquired from that side than in the mixture acquired from the opposite side. These ideas had already been used in the context of nonlinear denoising source separation (DSS). However, in that method the degree of separation achieved by applying these ideas was relatively weak, and the separation had to be improved by iterating within the DSS scheme. In this paper the application of these ideas is improved by changing the competition function and the wavelet transform that is used. These improvements allow us to achieve a good separation in one shot, without the need to integrate the process into an iterative DSS scheme. The resulting separation process is both nonlinear and non-local. We present experimental results that show that the method achieves a good separation quality.

Accurate and robust image superresolution by neural processing of local image representations

2005-10-20Z

Image superresolution involves the processing of an image sequence to generate a still image with higher resolution. Classical approaches, such as bayesian MAP methods, require iterative minimization procedures, with high computational costs. Recently, the authors proposed a method to tackle this problem, based on the use of a hybrid MLP-PNN architecture. In this paper, we present a novel superresolution method, based on an evolution of this concept, to incorporate the use of local image models. A neural processing stage receives as input the value of model coefficients on local windows. The data dimension-ality is firstly reduced by application of PCA. An MLP, trained on synthetic se-quences with various amounts of noise, estimates the high-resolution image data. The effect of varying the dimension of the network input space is exam-ined, showing a complex, structured behavior. Quantitative results are presented showing the accuracy and robustness of the proposed method.

Event Prediction and Object Motion Estimation in the Development of Visual Attention

2006-07-16Z

A model of gaze control is describes that includes mechanisms for predictive control using a forward model and event driven expectations of target behavior. The model roughly undergoes stages similar to those of human infants if the influence of the predictive systems is gradually increased.

Learning viewpoint invariant perceptual representations from cluttered images

2006-05-25Z

In order to perform object recognition, it is necessary to form perceptual representations that are sufficiently specific to distinguish between objects, but that are also sufficiently flexible to generalise across changes in location, rotation and scale. A standard method for learning perceptual representations that are invariant to viewpoint is to form temporal associations across image sequences showing object transformations. However, this method requires that individual stimuli are presented in isolation and is therefore unlikely to succeed in real-world applications where multiple objects can co-occur in the visual input. This article proposes a simple modification to the learning method, that can overcome this limitation, and results in more robust learning of invariant representations.

Out in the World: What Did The Robot Hear And See?

2006-07-23Z

Segmentation Stability: a Key Component for Joint Attention

2006-07-23Z

Integrated 2-D Optical Flow Sensor

2004-01-13Z

I present a new focal-plane analog VLSI sensor that estimates optical flow in two visual dimensions. The chip significantly improves previous approaches both with respect to the applied model of optical flow estimation as well as the actual hardware implementation. Its distributed computational architecture consists of an array of locally connected motion units that collectively solve for the unique optimal optical flow estimate. The novel gradient-based motion model assumes visual motion to be translational, smooth and biased. The model guarantees that the estimation problem is computationally well-posed regardless of the visual input. Model parameters can be globally adjusted, leading to a rich output behavior. Varying the smoothness strength, for example, can provide a continuous spectrum of motion estimates, ranging from normal to global optical flow. Unlike approaches that rely on the explicit matching of brightness edges in space or time, the applied gradient-based model assures spatiotemporal continuity on visual information. The non-linear coupling of the individual motion units improves the resulting optical flow estimate because it reduces spatial smoothing across large velocity differences. Extended measurements of a 30x30 array prototype sensor under real-world conditions demonstrate the validity of the model and the robustness and functionality of the implementation.

Blind man’s bluff and the Turing test

2004-03-07Z

It seems plausible that under the conditions of the Turing test, congenitally blind people could nevertheless, with sufficient preparation, successfully represent themselves to remotely located interrogators as sighted. Having never experienced normal visual sensations, the successful blind player can prevail in this test only by playing a ‘lying game’—imitating the phenomenological claims of sighted people, in the absence of the qualitative visual experiences to which such statements purportedly refer. This suggests that a computer or robot might pass the Turing test in the same way, in the absence not only of visual experience, but qualitative consciousness in general. Hence, the standard Turing test does not provide a valid criterion for the presence of consciousness. A ‘sensorimetric’ version of the Turing test fares no better, for the apparent correlations we observe between cognitive functions and qualitative conscious experiences seems to be contingent, not necessary. We must therefore define consciousness not in terms of its causes and effects, but rather, in terms of the distinctive properties of its content, such as its possession of qualitative character and apparent intrinsic value—the property which confers upon consciousness its moral significance. As a means of determining whether or nor a machine is conscious, in this sense, an alternative to the standard Turing test is proposed.

The Effects on Visual Information in a Robot in Environments with Oriented Contours

2005-04-14Z

For several decades experiments have been performed where animals have been reared in environments with orientationally restricted contours. The aim has been to find out what effects the visual field has on the development of the visual system in the brain. In this paper we describe similar experiments performed with a robot acting in an environment with only vertical contours and compare the results with the same robot in an ordinary office environment. Using metric projections of the informational distances between sensors it is shown that all visual sensors in the same vertical column are clustered together in the environment with only vertical contours. We also show how the informational structure of the sensors unfold when the robot moves from the environment with oriented contours to a normal environment.

Feel the beat: using cross-modal rhythm to integrate perception of objects, others, and self

2005-04-14Z

For a robot to be capable of development, it must be able to explore its environment and learn from its experiences. It must find (or create) opportunities to experience the unfamiliar in ways that reveal properties valid beyond the immediate context. In this paper, we develop a novel method for using the rhythm of everyday actions as a basis for identifying the characteristic appearance and sounds associated with objects, people, and the robot itself. Our approach is to identify and segment groups of signals in individual modalities (sight, hearing, and proprioception) based on their rhythmic variation, then to identify and bind causally-related groups of signals across different modalities. By including proprioception as a modality, this cross-modal binding method applies to the robot itself, and we report a series of experiments in which the robot learns about the characteristics of its own body.

An Ontogenetic Model of Perceptual Organization for a developmental Robot

2005-04-14Z

This paper presents an ontogenetic model of self-organization for robotic intermediary vision. Two mechanisms are under concern. First, the development of low-level local feature detectors that perform a piecewise categorization of the sensory signal. Second, the hierarchical grouping of these local features in a holistic perception. While the grouping mechanism is expressed as a classical agglomerative clustering, underlying similarity measures are not pre-given but developed from the signal statistics.

Taking Synchrony Seriously: A Perceptual-Level Model of Infant Synchrony Detection

2005-04-14Z

Synchrony detection between different sensory and/or motor channels appears critically important for young infant learning and cognitive development. For example, empirical studies demonstrate that audio-visual synchrony aids in language acquisition. In this paper we compare these infant studies with a model of synchrony detection based on the Hershey and Movellan (2000) algorithm augmented with methods for quantitative synchrony estimation. Four infant-model comparisons are presented, using audio-visual stimuli of increasing complexity. While infants and the model showed learning or discrimination with each type of stimuli used, the model was most successful with stimuli comprised of one audio and one visual source, and also with two audio sources and a dynamic-face visual motion source. More difficult for the model were stimuli conditions with two motion sources, and more abstract visual dynamics—an oscilloscope instead of a face. Future research should model the developmental pathway of synchrony detection. Normal audio-visual synchrony detection in infants may be experience-dependent (e.g., Bergeson, et al., 2004).

Variations and Application Conditions Of the Data Type »Image« - The Foundation of Computational Visualistics

2006-06-10Z

Few years ago, the department of computer science of the University Magdeburg invented a completely new diploma programme called 'computational visualistics', a curriculum dealing with all aspects of computational pictures. Only isolated aspects had been studied so far in computer science, particularly in the independent domains of computer graphics, image processing, information visualization, and computer vision. So is there indeed a coherent domain of research behind such a curriculum? The answer to that question depends crucially on a data structure that acts as a mediator between general visualistics and computer science: the data structure "image". The present text investigates that data structure, its components, and its application conditions, and thus elaborates the very foundations of computational visualistics as a unique and homogenous field of research. Before concentrating on that data structure, the theory of pictures in general and the definition of pictures as perceptoid signs in particular are closely examined. This includes an act-theoretic consideration about resemblance as the crucial link between image and object, the communicative function of context building as the central concept for comparing pictures and language, and several modes of reflection underlying the relation between image and image user. In the main chapter, the data structure "image" is extendedly analyzed under the perspectives of syntax, semantics, and pragmatics. While syntactic aspects mostly concern image processing, semantic questions form the core of computer graphics and computer vision. Pragmatic considerations are particularly involved with interactive pictures but also extend to the field of information visualization and even to computer art. Four case studies provide practical applications of various aspects of the analysis.

Slow feature analysis yields a rich repertoire of complex cell properties

2003-03-04Z

In this study, we investigate temporal slowness as a learning principle for receptive fields using slow feature analysis, a new algorithm to determine functions that extract slowly varying signals from the input data. We find that the learned functions trained on image sequences develop many properties found also experimentally in complex cells of primary visual cortex, such as direction selectivity, non-orthogonal inhibition, end-inhibition and side-inhibition. Our results demonstrate that a single unsupervised learning principle can account for such a rich repertoire of receptive field properties.

Compact Integrated Transconductance Amplifier Circuit for Temporal Differentiation

2003-11-03Z

A compact integrated CMOS circuit for temporal differentiation is presented. It consists of a high-gain inverting amplifier, an active non-linear transconductance and a capacitor and requires only 4 transistors in its minimal configuration.The circuit provides two rectified current outputs that are proportional to the temporal derivative of the input voltage signal. Besides the compactness of its design, the presented circuit is not dependent on the DC-value of the input signal, as compared with known integrated differentiator circuits. Measured chip results show that the circuit operates on a large input frequency range for which it provides nearideal temporal differentiation. The circuit is particularly suited for focal-plane implementations of gradient-based visual motion systems.

Concept Acquisition Using Isomap on Sensorimotor Experiences of a Mobiole Robot

2004-02-12Z

We present results about the application of a novel method for multi dimensional scaling (Isomap) for concept acquisition in mobile robotics. The aim of this work is to develop a general architecture for Symbol Anchoring in the context of research to enable artefacts to grow-up. We describe Isomap functionality, results of using it in a real robot and breifly discuss implications of using this technique for concept acquisition in the mobile robot domain.

Motivational principles for visual know-how development

2004-02-12Z

What dynamics can enable a robot to continuously develop new visual know-how? We present a first experimental investigation where an AIBO robot develops visual competences from scratch driven only by internal motivations. The motivational principles used by the robot are independent of any particular task. As a consequence, they can constitute the basis for a general approach to sensory-motor development.

Robots, language, and meaning

2004-02-12Z

People use language to exchange ideas and influence the actions of others through shared conceptions of word meanings, and through a shared understanding of how word meanings are combined. Under the surface form of words lie complex networks of mental structures and processes that give rise to the richly textured semantics of natural language. Machines, in contrast, are unable to use language in human-like ways due to fundamental limitations of current computational approaches to semantic representation. To address these limitations, and to serve as a catalyst for exploring alternative approaches to language and meaning, we are developing conversational robots. The problem of endowing robots with language highlights the impossibility of isolating language from other cognitive processes. Instead, we embrace a holistic approach in which various non-linguistic elements of perception, action, and memory, provide the foundations for grounding word meaning. I will review recent results in grounding language in perception and action and sketch ongoing work for grounding a wider range of words including social terms such as "I" and "my".

Sparse visual models for biologically inspired sensorimotor control

2004-02-12Z

Given the importance of using resources efficiently in the competition for survival, it is reasonable to think that natural evolution has discovered efficient cortical coding strategies for representing natural visual information. Sparse representations have intrinsic advantages in terms of fault-tolerance and low-power consumption potential, and can therefore be attractive for robot sensorimotor control with powerful dispositions for decision-making. Inspired by the mammalian brain and its visual ventral pathway, we present in this paper a hierarchical sparse coding network architecture that extracts visual features for use in sensorimotor control. Testing with natural images demonstrates that this sparse coding facilitates processing and learning in subsequent layers. Previous studies have shown how the responses of complex cells could be sparsely represented by a higher-order neural layer. Here we extend sparse coding in each network layer, showing that detailed modeling of earlier stages in the visual pathway enhances the characteristics of the receptive fields developed in subsequent stages. The yield network is more dynamic with richer and more biologically plausible input and output representation.

Visual binding, reentry, and neuronal synchrony in a physically situated brain-based device

2004-02-12Z

By constructing and analyzing a physically situated brain-based device (i.e. a device with sensors and actuators whose behavior is guided by a simulated nervous system), we show that reentrant connectivity and dynamic synchronization can provide an effective mechanism for binding the visual features of objects.

Visual Expectations in Infants: Evaluating the Gaze-Direction Model

2004-02-12Z

Schlesinger (in press) recently proposed a model of eye movements as a tool for investigating infants’ visual expectations. In the present study, this gaze-direction model was evaluated by (a) generating a set of predictions concerning how infants distribute their attention during possible and impossible events, and (b) testing these predictions in a replication of Baillargeon’s "car study" (1986; Baillargeon & DeVos, 1991). We find that the model successfully predicts general features of infants’ gaze direction, but not specific differences obtained during the possible and impossible events. The implications of these results for infant cognition research and theory are discussed.

The Whole World in Your Hand: Active and Interactive Segmentation

2004-02-12Z

Object segmentation is a fundamental problem in computer vision and a powerful resource for development. This paper presents three embodied approaches to the visual segmentation of objects. Each approach to segmentation is aided by the presence of a hand or arm in the proximity of the object to be segmented. The first approach is suitable for a robotic system, where the robot can use its arm to evoke object motion. The second method operates on a wearable system, viewing the world from a human's perspective, with instrumentation to help detect and segment objects that are held in the wearer's hand. The third method operates when observing a human teacher, locating periodic motion (finger/arm/object waving or tapping) and using it as a seed for segmentation. We show that object segmentation can serve as a key resource for development by demonstrating methods that exploit high-quality object segmentations to develop both low-level vision capabilities (specialized feature detectors) and high-level vision capabilities (object recognition and localization).

Applying Slow Feature Analysis to Image Sequences Yields a Rich Repertoire of Complex Cell Properties

2003-01-09Z

We apply Slow Feature Analysis (SFA) to image sequences generated from natural images using a range of spatial transformations. An analysis of the resulting receptive fields shows that they have a rich spectrum of invariances and share many properties with complex and hypercomplex cells of the primary visual cortex. Furthermore, the dependence of the solutions on the statistics of the transformations is investigated.

Better Vision Through Manipulation

2003-10-04Z

For the purposes of manipulation, we would like to know what parts of the environment are physically coherent ensembles - that is, which parts will move together, and which are more or less independent. It takes a great deal of experience before this judgement can be made from purely visual information. This paper develops active strategies for acquiring that experience through experimental manipulation, using tight correlations between arm motion and optic flow to detect both the arm itself and the boundaries of objects with which it comes into contact. We argue that following causal chains of events out from the robot's body into the environment allows for a very natural developmental progression of visual competence, and relate this idea to results in neuroscience.

An improved 2D optical flow sensor for motion segmentation

2002-07-18Z

A functional focal-plane implementation of a 2D optical flow system is presented that detects an preserves motion discontinuities. The system is composed of two different network layers of analog computational units arranged in a retinotopical order. The units in the first layer (the optical flow network) estimate the local optical flow field in two visual dimensions, where the strength of their nearest-neighbor connections determines the amount of motion integration. Whereas in an earlier implementation \cite{Stocker_Douglas99} the connection strength was set constant in the complete image space, it is now \emph{dynamically and locally} controlled by the second network layer (the motion discontinuities network) that is recurrently connected to the optical flow network. The connection strengths in the optical flow network are modulated such that visual motion integration is ideally only facilitated within image areas that are likely to represent common motion sources. Results of an experimental aVLSI chip illustrate the potential of the approach and its functionality under real-world conditions.

Probabilistic Search for Object Segmentation and Recognition

2002-08-09Z

The problem of searching for a model-based scene interpretation is analyzed within a probabilistic framework. Object models are formulated as generative models for range data of the scene. A new statistical criterion, the truncated object probability, is introduced to infer an optimal sequence of object hypotheses to be evaluated for their match to the data. The truncated probability is partly determined by prior knowledge of the objects and partly learned from data. Some experiments on sequence quality and object segmentation and recognition from stereo data are presented. The article recovers classic concepts from object recognition (grouping, geometric hashing, alignment) from the probabilistic perspective and adds insight into the optimal ordering of object hypotheses for evaluation. Moreover, it introduces point-relation densities, a key component of the truncated probability, as statistical models of local surface shape.

Humanoid Theory Grounding

2001-11-27Z

In this paper we consider the importance of using a humanoid physical form for a certain proposed kind of robotics, that of theory grounding. Theory grounding involves grounding the theory skills and knowledge of an embodied artificially intelligent (AI) system by developing theory skills and knowledge from the bottom up. Theory grounding can potentially occur in a variety of domains, and the particular domain considered here is that of language. Language is taken to be another problem space in which a system can explore and discover solutions. We argue that because theory grounding necessitates robots experiencing domain information, certain behavioral-form aspects, such as abilities to socially smile, point, follow gaze, and generate manual gestures, are necessary for robots grounding a humanoid theory of language.

The theory of the organism-environment system: III. Role of efferent influences on receptors in the formation of knowledge.

2000-02-09Z

The present article is an attempt to give - in the frame of the theory of the organism-environment system (Jarvilehto 1998a) - a new interpretation to the role of efferent influences on receptor activity and to the functions of senses in the formation of knowledge. It is argued, on the basis of experimental evidence and theoretical considerations, that the senses are not transmitters of environmental information, but they create a direct connection between the organism and the environment, which makes the development of a dynamic living system, the organism-environment system, possible. In this connection process the efferent influences on receptor activity are of particular significance, because with their help the receptors may be adjusted in relation to the parts of the environment which are most important in the achievement of behavioral results. Perception is the process of joining of new parts of the environment to the organism-environment system; thus, the formation of knowledge by perception is based on reorganization (widening and differentiation) of the organism-environment system, and not on transmission of information from the environment. With the help of the efferent influences on receptors each organism creates its own peculiar world which is simultaneously subjective and objective. The present considerations have far reaching influences as well on experimental work in neurophysiology and psychology of perception as on philosophical considerations of knowledge formation.

Facial beauty and fractal geometry

1998-06-18Z

What is it that makes a face beautiful? Average faces obtained by photographic (Galton 1878) or digital (Langlois & Roggman 1990) blending are judged attractive but not optimally attractive (Alley & Cunningham 1991) --- digital exaggerations of deviations from average face blends can lead to higher attractiveness ratings (Perrett, May, & Yoshikawa 1994). My novel approach to face design does not involve blending at all. Instead, the image of a female face with high ratings is composed from a fractal geometry based on rotated squares and powers of two. The corresponding geometric rules are more specific than those previously used by artists such as Leonardo and Duerer. They yield a short algorithmic description of all facial characteristics, many of which are compactly encodable with the help of simple feature detectors similar to those found in mammalian brains. This suggests that a face's beauty correlates with simplicity relative to the subjective observer's way of encoding it.

What sort of architecture is required for a human-like agent?

1998-06-22Z

This paper is about how to give human-like powers to complete agents. For this the most important design choice concerns the overall architecture. Questions regarding detailed mechanisms, forms of representations, inference capabilities, knowledge etc. are best addressed in the context of a global architecture in which different design decisions need to be linked. Such a design would assemble various kinds of functionality into a complete coherent working system, in which there are many concurrent, partly independent, partly mutually supportive, partly potentially incompatible processes, addressing a multitude of issues on different time scales, including asynchronous, concurrent, motive generators. Designing human like agents is part of the more general problem of understanding design space, niche space and their interrelations, for, in the abstract, there is no one optimal design, as biological diversity on earth shows.

The evolution of what?

1998-06-22Z

There is now a huge amount of interest in consciousness among scientists as well as philosophers, yet there is so much confusion and ambiguity in the claims and counter-claims that it is hard to tell whether any progress is being made. This ``position paper'' suggests that we can make progress by temporarily putting to one side questions about what consciousness is or which animals or machines have it or how it evolved. Instead we should focus on questions about the sorts of architectures that are possible for behaving systems and ask what sorts of capabilities, states and processes, might be supported by different sorts of architectures. We can then ask which organisms and machines have which sorts of architectures. This combines the standpoint of philosopher, biologist and engineer. If we can find a general theory of the variety of possible architectures (a characterisation of ``design space'') and the variety of environments, tasks and roles to which such architectures are well suited (a characterisation of ``niche space'') we may be able to use such a theory as a basis for formulating new more precisely defined concepts with which to articulate less ambiguous questions about the space of possible minds. For instance our initially ill-defined concept (``consciousness'') might split into a collection of more precisely defined concepts which can be used to ask unambiguous questions with definite answers. As a first step this paper explores a collection of conjectures regarding architectures and their evolution. In particular we explore architectures involving a combination of coexisting architectural levels including: (a) reactive mechanisms which evolved very early, (b) deliberative mechanisms which evolved later in response to pressures on information processing resources and (c) meta-management mechanisms that can explicitly inspect evaluate and modify some of the contents of various internal information structures. It is conjectured that in response to the needs of these layers, perceptual and action subsystems also developed layers, and also that an ``alarm'' system which initially existed only within the reactive layer may have become increasingly sophisticated and extensive as its inputs and outputs were linked to the newer layers. Processes involving the meta-management layer in the architecture could explain the origin of the notion of ``qualia''. Processes involving the ``alarm'' mechanism and mechanisms concerned with resource limits in the second and third layers gives us an explanation of three main forms of emotion, helping to account for some of the ambiguities which have bedevilled the study of emotion. Further theoretical and practical benefits may come from further work based on this design-based approach to consciousness. A deeper longer term implication is the possibility of a new science investigating laws governing possible trajectories in design space and niche space, as these form parts of high order feedback loops in the biosphere.

Neural model of transfer-of-binding in visual relative motion perception.

1998-04-28Z

A new way of measuring generalization in unsupervised learning is presented. The measure is based on an exclusive allocation, or credit assignment, criterion. In a classifier that satisfies the criterion, input patterns are parsed so that the credit for each input feature is assigned exclusively to one of multiple, possibly overlapping, output categories. Such a classifier achieves context-sensitive, global representations of pattern data. Two additional constraints, sequence masking and uncertainty multiplexing, are described; these can be used to refine the measure of generalization. The generalization performance of EXIN networks, winner-take-all competitive learning networks, linear decorrelator networks, and Nigrin's SONNET-2 network is compared.

Computation of Smooth Optical Flow in a Feedback Connected Analog Network

1999-04-26Z

In 1986, Tanner and Mead \cite{Tanner_Mead86} implemented an interesting constraint satisfaction circuit for global motion sensing in aVLSI. We report here a new and improved aVLSI implementation that provides smooth optical flow as well as global motion in a two dimensional visual field. The computation of optical flow is an ill-posed problem, which expresses itself as the aperture problem. However, the optical flow can be estimated by the use of regularization methods, in which additional constraints are introduced in terms of a global energy functional that must be minimized. We show how the algorithmic constraints of Horn and Schunck \cite{Horn_Schunck81} on computing smooth optical flow can be mapped onto the physical constraints of an equivalent electronic network.

Neural networks as a model for visual perception: what is lacking?

1998-08-05Z

A central mystery of visual perception is the classical problem of invariant object recognition: Different appearances of an object can be perceived as ``the same'', despite, e.g., changes in position or illumination, distortions, or partial occlusion by other objects. This article reports on a recent email discussion over the question whether a neural network can learn the simplest of these invariances, i.e. generalize over the position of a pattern on the input layer, including the author's view on what ``learning shift-invariance'' could mean. That definition leaves the problem unsolved. A similar problem is the one of learning to detect symmetries present in an input pattern. It has been solved by a standard neural network requiring some 70000 input examples. Both leave some doubt if backpropagation learning is a realistic model for perceptual processes. Abandoning the view that a stimulus-response system showing the desired behavior must be learned from scratch, yields a radically different solution. Perception can be seen as an active process that rapidly converges from some initial state to an ordered state, which in itself codes for a percept. As an example, I will present a solution to the visual correspondence problem, which greatly alleviates both problems mentioned above.

Object recognition by matching symbolic edge graphs

1998-07-31Z

We present an object recognition system based on symbolic graphs with object corners as vertices and outlines as edges. Corners are determined in a robust way by a multiscale combination of an operator modeling cortical end-stopped cells. Graphs are constructed by line-following between corners. Model matching is then done by finding subgraph isomorphisms in the image graph. The complexity is reduced by adding labels to corners and edges. The choice of labels makes the recognition system invariant under translation, rotation, and scaling.

Object Recognition Using Spatiotemporal Signatures

1998-06-15Z

The sequence of images generated by motion between observer and object specifies a spatiotemporal signature for that object. Evidence is presented that such spatiotemporal signatures are used in object recognition. Subjects learned novel, three-dimensional, rotating objects from image sequences in a continuous recognition task. During learning, the temporal order of images of a given object was constant. During testing, the order of images in each sequence was reversed, relative to its order during learning. This image sequence reversal produced significant reaction time increases and recognition rate decreases. Results are interpreted in terms of object-specific spatiotemporal signatures.

The ``Semantics'' of Evolution: Trajectories and Trade-offs in Design Space and Niche Space

1998-06-22Z

This paper attempts to characterise a unifying overview of the practice of software engineers, AI designers, developers of evolutionary forms of computation, designers of adaptive systems, etc. The topic overlaps with theoretical biology, developmental psychology and perhaps some aspects of social theory. Just as much of theoretical computer science follows the lead of engineering intuitions and tries to formalise them, there are also some important emerging high level cross disciplinary ideas about natural information processing architectures and evolutionary mechanisms and that can perhaps be unified and formalised in the future. There is some speculation about the evolution of human cognitive architectures and consciousness.

The ``Semantics'' of Evolution: Trajectories and Trade-offs in Design Space and Niche Space.

1998-07-09Z

Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli

1998-06-24Z

Efficient categorizations of complex visual stimuli require effective encodings of their distinctive properties. However, the question remains of how processes of object and scene categorization use the information associated with different perceptual spatial scales. The psychophysics of scale perception suggests a scenario in which recognition uses coarse blobs before fine scale edges, because the former is perceptually available before the latter. Although possible, this perceptually determined scenario neglects the nature of the task the recognition system must solve. If different spatial scales transmit different information about the input, an identical scene might be flexibly encoded and perceived at the scale that optimizes information for the considered task. This paper tests the hypothesis that scale diagnosticity can determine scale selection for recognition. Experiment 1 tested whether coarse and fine spatial scales were both available at the onset of scene categorization. The second experiment tested that the selection of one scale could change depending on the diagnostic information present at this scale. The third and fourth experiments investigated whether scale-specific cues were independently processed, or whether they perceptually cooperated in the recognition of the input scene. Results suggest that a mandatory perception of multiple spatial scales promotes flexible scene encodings, perceptions and categorizations.

Context dependent feature groups, a proposal for object representation.

1998-07-31Z

The usefulness of contextually guided processors is investigated a little further. A more general use for binding V1 cell responses than the one in the target article is proposed, which takes into account that strong responses of these cells can mean more than the presence of lines and edges. The possibility for different grouping depending on the activities of neighboring cells is essential for the approach.

Corner detection in color images by multiscale combination of end-stopped cortical cells.

1998-07-31Z

We present a corner-detection algorithm based on a model for end-stopping cells in the visual cortex. Shortcomings of this model are overcome by a combination over several scales. The notion of an end-stopped cell and the resulting corner detector is generalized to color channels in a biologically plausible way. The resulting corner detection method yields good results in the presence of high frequency texture, noise, varying contrast, and rounded corners. This compares favorably with known corner detectors.

Modeling dynamic receptive field changes in primary visual cortex using inhibitory learning

1998-04-28Z

The position, size, and shape of the visual receptive field (RF) of some primary visual cortical neurons change dynamically, in response to artificial scotoma conditioning in cats (Pettet & Gilbert, 1992) and to retinal lesions in cats and monkeys (Darian-Smith & Gilbert, 1995). The "EXIN" learning rules (Marshall, 1995) are used to model dynamic RF changes. The EXIN model is compared with an adaptation model (Xing & Gerstein, 1994) and the LISSOM model (Sirosh & Miikkulainen, 1994; Sirosh et al., 1996). To emphasize the role of the lateral inhibitory learning rules, the EXIN and the LISSOM simulations were done with only lateral inhibitory learning. During scotoma conditioning, the EXIN model without feedforward learning produces centrifugal expansion of RFs initially inside the scotoma region, accompanied by increased responsiveness, without changes in spontaneous activation. The EXIN model without feedforward learning is more consistent with the neurophysiological data than are the adaptation model and the LISSOM model. The comparison between the EXIN and the LISSOM models suggests experiments to determine the role of feedforward excitatory and lateral inhibitory learning in producing dynamic RF changes during scotoma conditioning.

Object Recognition Robust Under Translations, Deformations, and Changes in Background.

1998-07-24Z

Recognition systems based on model matching using low level features often fail due to a variation in background. As a solution I present a system for the recognition of human faces independent of hairstyle. Correspondence maps between an image and a model are established by coarse-fine matching in a Gabor pyramid. These are used for hierarchical recognition.

Self-Organization, Plasticity, and Low-level Visual Phenomena in a Laterally Connected Map Model of the Primary Visual Cortex

1999-01-03Z

Based on a Hebbian adaptation process, the afferent and lateral connections in the RF-LISSOM model organize simultaneously and cooperatively, and form structures such as those observed in the primary visual cortex. The neurons in the model develop local receptive fields that are organized into orientation, ocular dominance, and size selectivity columns. At the same time, patterned lateral connections form between neurons that follow the receptive field organization. This structure is in a continuously-adapting dynamic equilibrium with the external and intrinsic input, and can account for reorganization of the adult cortex following retinal and cortical lesions. The same learning processes may be responsible for a number of low-level functional phenomena such as tilt aftereffects, and combined with the leaky integrator model of the spiking neuron, for segmentation and binding. The model can also be used to verify quantitatively the hypothesis that the visual cortex forms a sparse, redundancy-reduced encoding of the input, which allows it to process massive amounts of visual information efficiently.

Teaching a robot to see how it moves

1998-07-03Z

The positioning of a robot hand in order to grasp an object is a problem fundamental to robotics. The task we want to perform can be described as follows: given a visual scene the robot arm must reach an indicated point in that visual scene. This marked point indicates the observed object that has to be grasped. In order to accomplish this task, a mapping from the visual scene to the corresponding robot joint values must be available. The task set out in this chapter is to design a self-learning controller that constructs that mapping without knowledge of the geometry of the camera-robot system.

Visual feedback in motion

1998-07-03Z

In this chapter we introduce a method for model-free monocular visual guidance of a robot arm. The robot arm, with a single camera in its end effector, should be positioned above a stationary target. It is shown that a trajectory can be planned in visual space by using components of the optic flow, and this trajector can be translated to joint torques by a self-learning neural network. No model of the robot, camera, or environment is used. The method reaches a high grasping accuracy after only a few trials.

Visual Schemas in Neural Networks for Object Recognition and Scene Analysis

1999-01-03Z

VISOR is a large connectionist system that shows how visual schemas can be learned, represented, and used through mechanisms natural to neural networks. Processing in VISOR is based on cooperation, competition, and parallel bottom-up and top-down activation of schema representations. Simulations show that VISOR is robust against noise and variations in the inputs and parameters. It can indicate the confidence of its analysis, pay attention to important minor differences, and use context to recognize ambiguous objects. Experiments also suggest that the representation and learning are stable, and its behavior is consistent with human processes such as priming, perceptual reversal, and circular reaction in learning. The schema mechanisms of VISOR can serve as a starting point for building robust high-level vision systems, and perhaps for schema-based motor control and natural language processing systems as well.

Neural model of visual stereomatching: Slant, transparency, and clouds

1998-04-28Z

Stereomatching of oblique and transparent surfaces is described using a model of cortical binocular "tuned" neurons selective for disparities of individual visual features and neurons selective for the position, depth, and 3-D orientation of local surface patches. The model is based on a simple set of learning rules. In the model, monocular neurons project excitatory connection pathways to binocular neurons at appropriate disparities. Binocular neurons project excitatory connection pathways to appropriately tuned "surface patch" neurons. The surface patch neurons project reciprocal excitatory connection pathways to the binocular neurons. Anisotropic intralayer inhibitory connection pathways project between neurons with overlapping receptive fields. The model's responses to simulated stereo image pairs depicting a variety of oblique surfaces and transparently overlaid surfaces are presented. For all the surfaces, the model (1) assigns disparity matches and surface patch representations based on global surface coherence and uniqueness, (2) permits coactivation of neurons representing multiple disparities within the same image location, (3) represents oblique slanted and tilted surfaces directly, rather than approximating them with a series of frontoparallel steps, (4) assigns disparities to a cloud of points at random depths, like human observers and unlike Prazdny's (1985) method, and (5) causes globally consistent matches to override greedy local matches. The model represents transparency, unlike the model of Marr and Poggio (1976), and it assigns unique disparities, unlike the model of Prazdny (1985).

Occlusion edge blur: A cue to relative visual depth

1998-04-28Z

We studied whether the blur/sharpness of an occlusion boundary between a sharply focused surface and a blurred surface is used as a relative depth cue. Observers judged relative depth in pairs of images that differed only in the blurriness of the common boundary between two adjoining texture regions, one blurred and one sharply focused. Two experiments were conducted; in both, observers consistently used the blur of the boundary as a cue to relative depth. However, the strength of the cue, relative to other cues, varied across observers. The occlusion edge blur cue can resolve the near/far ambiguity inherent in depth-from-focus computations.

Actual Possibilities

1998-07-18Z

This is a philosophical `position paper', starting from the observation that we have an intuitive grasp of a family of related concepts of ``possibility'', ``causation'' and ``constraint'' which we often use in thinking about complex mechanisms, and perhaps also in perceptual processes, which according to Gibson are primarily concerned with detecting positive and negative affordances, such as support, obstruction, graspability, etc. We are able to talk about, think about, and perceive possibilities, such as possible shapes, possible pressures, possible motions, and also risks, opportunities and dangers. We can also think about constraints linking such possibilities. If such abilities are useful to us (and perhaps other animals) they may be equally useful to intelligent artefacts. All this bears on a collection of different more technical topics, including modal logic, constraint analysis, qualitative reasoning, naive physics, the analysis of functionality, and the modelling design processes. The paper suggests that our ability to use knowledge about ``de-re'' modality is more primitive than the ability to use ``de-dicto'' modalities, in which modal operators are applied to sentences. The paper explores these ideas, links them to notions of ``causation'' and ``machine'', suggests that they are applicable to virtual or abstract machines as well as physical machines. Some conclusions are drawn regarding the nature of mind and consciousness.

Learning to predict visibility and invisibility from occlusion events

1998-04-28Z

Visual occlusion events constitute a major source of depth information. This paper presents a self-organizing neural network that learns to detect, represent, and predict the visibility and invisibility relationships that arise during occlusion events, after a period of exposure to motion sequences containing occlusion and disocclusion events. The network develops two parallel opponent channels or "chains" of lateral excitatory connections for every resolvable motion trajectory. One channel, the "On" chain or "visible" chain, is activated when a moving stimulus is visible. The other channel, the "Off" chain or "invisible" chain, carries a persistent, amodal representation that predicts the motion of a formerly visible stimulus that becomes invisible due to occlusion. The learning rule uses disinhibition from the On chain to trigger learning in the Off chain. The On and Off chain neurons can learn separate associations with object depth ordering. The results are closely related to the recent discovery (Assad & Maunsell, 1995) of neurons in macaque monkey posterior parietal cortex that respond selectively to inferred motion of invisible stimuli.

Uncalibrated visual servoing

2002-08-08Z

Visual servoing is a process to enable a robot to position a camera with respect to known landmarks using the visual data obtained by the camera itself to guide camera motion. A solution is described which requires very little a priori information freeing it from being specific to a particular configuration of robot and camera. The solution is based on closed loop control together with deliberate perturbations of the trajectory to provide calibration movements for refining that trajectory. Results from experiments in simulation and on a physical robot arm (camera-in-hand configuration) are presented.

Adaptive perceptual pattern recognition by self-organizing neural networks: Context, uncertainty, multiplicity, and scale

1998-04-28Z

A new context-sensitive neural network, called an "EXIN" (excitatory+ inhibitory) network, is described. EXIN networks self-organize in complex perceptual environments, in the presence of multiple superimposed patterns, multiple scales, and uncertainty. The networks use a new inhibitory learning rule, in addition to an excitatory learning rule, to allow superposition of multiple simultaneous neural activations (multiple winners), under strictly regulated circumstances, instead of forcing winner-take-all pattern classifications. The multiple activations represent uncertainty or multiplicity in perception and pattern recognition. Perceptual scission (breaking of linkages) between independent category groupings thus arises and allows effective global context-sensitive segmentation constraint satisfaction, and exclusive credit attribution. A Weber Law neuron-growth rule lets the network learn and classify input patterns despite variations in their spatial scale. Applications of the new techniques include segmentation of superimposed auditory or biosonar signals, segmentation of visual regions, and representation of visual transparency.

Face Recognition and Gender Determination

2001-05-09Z

The system presented here is a specialized version of a general object recognition system. Images of faces are represented as graphs, labeled with topographical information and local templates. Different poses are represented by different graphs. New graphs of faces are generated by an elastic graph matching procedure comparing the new face with a set of precomputed graphs: the "general face knowledge". The final phase of the matching process can be used to generate composite images of faces and to determine certain features represented in the general face knowledge, such as gender or the presence of glasses or a beard. The graphs can be compared by a similarity function which makes the system efficient in recognizing faces.

A self-organizing neural network that learns to detect and represent visual depth from occlusion events

1998-04-28Z

Visual occlusion events constitute a major source of depth information. We have developed a neural network model that learns to detect and represent depth relations, after a period of exposure to motion sequences containing occlusion and disocclusion events. The network's learning is governed by a new set of learning and activation rules. The network develops two parallel opponent channels or ``chains'' of lateral excitatory connections for every resolvable motion trajectory. One channel, the ``On'' chain or ``visible'' chain, is activated when a moving stimulus is visible. The other channel, the ``Off'' chain or ``invisible'' chain, is activated when a formerly visible stimulus becomes invisible due to occlusion. The On chain carries a predictive modal representation of the visible stimulus. The Off chain carries a persistent, amodal representation that predicts the motion of the invisible stimulus. The new learning rule uses disinhibitory signals emitted from the On chain to trigger learning in the Off chain. The Off chain neurons learn to interact reciprocally with other neurons that indicate the presence of occluders. The interactions let the network predict the disappearance and reappearance of stimuli moving behind occluders, and they let the unexpected disappearance or appearance of stimuli excite the representation of an inferred occluder at that location. Two results that have emerged from this research suggest how visual systems may learn to represent visual depth information. First, a visual system can learn a nonmetric representation of the depth relations arising from occlusion events. Second, parallel opponent On and Off channels that represent both modal and amodal stimuli can also be learned through the same proce

Unsmearing visual motion: Development of long-range horizontal intrinsic connections

1998-04-28Z

Human vision systems integrate information nonlocally, across long spatial ranges. For example, a moving stimulus appears smeared when viewed briefly (30 ms), yet sharp when viewed for a longer exposure (100 ms) (Burr, 1980). This suggests that visual systems combine information along a trajectory that matches the motion of the stimulus. Our self-organizing neural network model shows how developmental exposure to moving stimuli can direct the formation of horizontal trajectory-specific motion integration pathways that unsmear representations of moving stimuli. These results account for Burr's data and can potentially also model other phenomena, such as visual inertia.