Neural structural risk models | Construpedia

Neural Structural Risk Models

Introduction

In machine learning, an artificial neural network (abbreviated ANN or NN) is a model within so-called connectionist systems, inspired by the structure and function of biological neural networks in animal brains. An ANN consists of connected units or nodes called artificial neurons, which loosely model neurons in the brain. These are connected by edges, which model the synapses of the brain. Each artificial neuron receives “signals” from connected neurons, then processes them and sends a signal to other connected neurons. The "signal" is a real number, and the output of each neuron is calculated by a nonlinear function of the sum of its inputs, called the activation function. The signal strength on each connection is determined by a weight, which is adjusted during the learning process.

Neurons are usually grouped into layers. Different layers can perform different transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly passing through multiple intermediate layers (hidden layers). A network is typically called a deep neural network if it has at least two hidden layers.

Training

Neural networks are typically trained through empirical risk minimization. This method is based on the idea of optimizing network parameters to minimize the difference, or empirical risk, between the predicted outcome and the actual target values in a given data set. Gradient-based methods, such as Backpropagation, are generally used to estimate network parameters. During the training phase, ANNs learn from labeled training data, iteratively updating their parameters to minimize a defined Loss Function. Artificial neural networks are used for various tasks such as predictive modeling, adaptive control, and problem solving in the field of artificial intelligence. They can learn from experience and draw conclusions from a complex and seemingly unrelated set of information. They excel in areas where solution or feature detection is difficult to express with conventional programming. To perform this machine learning, we typically try to minimize a loss function that evaluates the network as a whole. The values of the neuron weights are updated seeking to reduce the value of the loss function. This process is done by backward propagation.

Neural Structural Risk Models

Introduction

Training

History

Contenido

Warren McCulloch y Walter Pitts[1] (1943) crearon un modelo informático para redes neuronales, que se llama lógica umbral, que se basa en las matemáticas y los algoritmos. Este modelo señaló el camino para que la investigación de redes neuronales se divida en dos enfoques distintos. Un enfoque se centró en los procesos biológicos en el cerebro y el otro se centró en la aplicación de redes neuronales para la inteligencia artificial.

Hebb's learning

In the late 1940s, psychologist Donald Hebb[2][3] created a learning hypothesis based on the mechanism of neuronal plasticity that is now known as Hebb's learning. Hebb's learning is considered to be a "typical" unsupervised learning and its later variants were the first models of long-term potentiation. Researchers began applying these ideas to computational models in 1948 with Turing's suggestion that the human infant cortex is what he called a "disorganized machine" (also known as a "Turing Type B machine").[4][5].

Farley and Wesley A. Clark")[6] (1954) initially used computing machines, then called "calculators," to simulate a Hebb network at MIT. Other computer simulations of neural networks have been created by Rochester, Holland, Habit, and Duda (1956).[7].

Frank Rosenblatt[8][9] (1958) created the perceptron, a pattern recognition algorithm based on a two-layer computer learning network, which used simple addition and subtraction. Using mathematical notation, Rosenblatt also describes circuitry that is not in the basic perceptron, such as the exclusive-or circuit, a circuit that could not be processed by neural networks before the creation of the backpropagation algorithm by Paul Werbos (1975).[10].

In 1959, a biological model proposed by two Nobel Prize laureates, David H. Hubel and Torsten Wiesel, was based on their discovery of two types of cells in the primary visual cortex: simple cells and complex cells.[11].

The first report on multilayer functional networks was published in 1965 by Ivakhnenko and Lapa, and is known as the clustering method for data management[12][13][14].

Neural network research stalled after the publication of machine learning research by Marvin Minsky and Seymour Papert (1969),[15] which uncovered two fundamental issues with the computational machines that process neural networks. The first was that basic perceptrons were incapable of processing the exclusive-or circuit. The second major issue was that computers did not have enough processing power to effectively handle the large execution time required by large neural networks.

Backward propagation and resurgence

A later key advance was the backpropagation algorithm, which effectively solves the exclusive-or problem, and in general the problem of rapid training of multilayer neural networks (Werbos 1975). The backpropagation process uses the difference between the produced result and the desired result to change the "weights" of the connections between the artificial neurons.[10].

In the mid-1980s, parallel distributed processing became popular under the name connectionism. The book by David E. Rumelhart") and James McClelland&action=edit&redlink=1 "James McClelland (psychologist) (not yet written)") (1986) provides a comprehensive exposition of the use of connectionism in computers to simulate neural processes.[16].

Neural networks, as used in artificial intelligence, have traditionally been considered simplified models of neural processing in the brain, although the relationship between this model and the biological architecture of the brain is debated; It is unclear to what extent artificial neural networks reflect brain functioning.

Support vector machines and other much simpler methods, such as linear classifiers, gradually gained popularity in machine learning. However, the use of neural networks has changed some fields, such as the prediction of protein structures[17][18].

In 1992, max-pooling (a form of subsampling, in which data is divided into groups of equal sizes, which have no elements in common, and only the maximum value of each group is transmitted) was introduced to help with the recognition of three-dimensional objects.[19]

[20]

[21].

In 2010, the use of max-pooling in backpropagation training was accelerated by GPUs, and was shown to offer better performance than other types of pooling.[22].

The gradient fading problem affects multi-layer feedforward neural networks, which use backward propagation, and also recurrent neural networks (RNNs).[23][24] Although errors propagate from layer to layer, they decrease exponentially with the number of layers, and this prevents backward adjustment of neuron weights based on those errors. Deep networks are particularly affected.

To overcome this problem, Schmidhuber adopted a multilayer hierarchy of networks (1992) preformed, one layer at a time, by unsupervised learning, and refined by back propagation.[25] Behnke (2003) relied only on the sign of the gradient (Rprop)[26] when dealing with problems such as image reconstruction and face localization.

As previous challenges in training deep neural networks were solved with methods such as unsupervised pretraining and increased computing power through the use of GPUs and distributed computing, neural networks were again deployed on a large scale, especially in image processing and visual recognition problems. This became known as “deep learning,” although deep learning is not strictly synonymous with deep neural networks.

Hardware-based designs

Computing devices for biophysical simulation as well as neuromorphic computing have been created in CMOS. Nanodevices[27] for very large-scale principal component analysis and convolution may create a new class of neural computing, because they are fundamentally analog rather than digital (although early implementations may use digital devices).[28] Ciresan and colleagues (2010)[29] in Schmidhuber's group showed that, despite the problem of gradient fading, GPUs make backpropagation feasible for feedforward neural networks with multiple layers.

Improvements since 2006

Computational devices have been created in CMOS, both for biophysical simulation and neuromorphic computing. More recent efforts show promise for creating nanodevices[30] for large-scale principal component analysis. If successful, it would create a new class of neural computing, since it relies on machine learning rather than programming and because it is fundamentally analog rather than digital although the first instances may indeed be with digital CMOS devices.[31].

Between 2009 and 2012, recurrent neural networks and feedforward deep neural networks developed in Jürgen Schmidhuber's research group at the Swiss AI laboratory IDSIA) have won eight international competitions in pattern recognition and machine learning. For example, Alex Graves' bidirectional and multidimensional long short-term memory (LSTM) has won three competitions in connected handwriting recognition at the International Conference on Document Analysis and Recognition (ICDAR). 2009, without any prior knowledge about the three different languages that can be learned.

Implementations of this method based on fast graphical processing units, made by Dan Ciresan and his colleagues at IDSIA") have won several pattern recognition competitions, including the 2011 Traffic Sign Recognition Competition,[32] the 2012 ISBI Challenge for Segmentation of Neural Structures in Electron Microscopy Image Series,[33] and others. Their neural networks were also the first artificial pattern recognizers to achieve a superior-than-human performance on important benchmarks, such as traffic sign recognition (IJCNN 2012) or the handwritten digit classification problem.

Highly nonlinear deep architectures similar to those of Kunihiko Fukushima's 1980 Neocognitron) and the "standard architecture of vision," inspired by the simple and complex cells identified by David H. Hubel and Torsten Wiesel in the visual cortex, can also be preformed by unsupervised methods at the University of Toronto laboratory. A team from this laboratory won a competition in 2012 sponsored by Merck to design software to help find molecules that could lead to new drugs.

Models

Los modelos de redes neuronales en la inteligencia artificial se refieren generalmente a las redes neuronales artificiales (RNA); estos son modelos matemáticos esencialmente simples que definen una función f:X→Y o una distribución más X o ambos X e Y. Pero a veces los modelos también están íntimamente asociadas con un algoritmo de aprendizaje en particular o regla de aprendizaje. Un uso común de la frase «modelo ANN» es en realidad la definición de una clase de tales funciones (donde los miembros de la clase se obtiene variando parámetros, los pesos de conexión, o específicos de la arquitectura, tales como el número de neuronas o su conectividad).

Network function

The word network in the term “artificial neural network” refers to the interconnections between neurons in the different layers of each system. An exemplary system has three layers. The first layer has input neurons that send data through synapses to the second layer of neurons, and then through more synapses to the third layer of output neurons. More complex systems will have more layers, some increasing the input and output layers of neurons. Synapses store parameters called "weights" that manipulate data in calculations.

An ANN is typically defined by three types of parameters:

The pattern of interconnection between the different layers of neurons.

The learning process for updating the weights of the interconnections.

The activation function that converts the weighted inputs of a neuron to its activation at the output.

Mathematically, the network function of a neuron is defined as a composition of other functions. This is represented as a network structure, with arrows representing the dependencies between variables. A widely used type of composition is the nonlinear weighted sum, where, where k (commonly referred to as the activation function[34]) is a predefined function, such as the hyperbolic tangent or sigmoid function. The important feature of the activation function is that it provides a smooth transition as input values change, that is, a small change in the input produces a small change in the output. It will be convenient for the following to refer to a collection of functions simply as a vector.

This figure represents a decomposition of such , with the dependencies between the variables indicated by the arrows. These can be interpreted in two ways.

The first view is the functional view: the input is transformed into a 3-dimensional vector, which is then transformed into a 2-dimensional vector, which is finally transformed into. This view is most commonly found in the context of optimization "Optimization (mathematical)").

The second view is the probabilistic view: the random variable depends on the random variable , Which depends on , Which depends on the random variable. This view is most commonly found in the context of graphical models.

The two views are largely equivalent. In any case, for this particular network architecture, the components of the individual layers are independent of each other (e.g., the components of are independent of each other, given their contribution). This naturally allows for a degree of parallelism in execution.

Networks like the above are commonly called feed-forward, because their graph is an acyclic directed graph. Networks with "Path (graph theory)" cycles are commonly called recurrent. Such networks are commonly represented in the manner shown at the top of the figure, where they are shown as dependent on themselves. However, no implicit temporal dependency is shown.

The learning

What has attracted the most interest in neural networks is the possibility of learning. Given a certain task to be solved, and a class of functions, learning consists of using a set of observations to find which one solves the task in some optimal way.

This involves the definition of a cost function such that, for the optimal solution . That is, no solution has a cost less than the cost of the optimal solution.

The cost function is an important concept in learning, since it represents how far a particular solution is from the optimal solution to the problem to be solved. Learning algorithms search through the solution space to find a function that has the lowest possible cost.

For applications where the solution is dependent on some data, the cost must necessarily be a function of observations, otherwise we would not be modeling everything related to the data. It is often defined as a statistic that can only be approximated. As a simple example, consider the problem of finding the model, which minimizes, for pairs of data drawn from some distribution. In practical situations we would only have samples of and, therefore, for the previous example, we would only have minimize . Therefore, the cost is minimized by generating a sample of the data rather than the entire distribution of the data.

Where some form of online machine learning must be used, where the cost is partially minimized as each new example is seen. While online machine learning is often used when fixed, it is most useful in the case where the distribution changes slowly over time. In neural network methods, some form of online machine learning is frequently used for finite data sets.

Choosing a cost function

While it is possible to define some cost function, often a particular cost will be used, either because it has desirable properties (such as convexity) or because it arises naturally from a particular formulation of the problem (for example, in a probabilistic formulation the posterior probability of the model can be used as an inverse cost). Ultimately, the cost function will depend on the desired task.

Learning paradigms

There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning **, unsupervised learning and reinforcement learning.

In supervised learning, we are given a series of paired examples and the goal is to find a function in the allowed class of functions that correspond to the examples. In other words, we want to infer the mapping derived from the data; The cost function is related to the mismatch between our mapping and the data, and implicitly contains prior knowledge about the problem domain.[35].

A commonly used cost is the mean square error, which attempts to minimize the mean square error between the network outputs, and the target value over all exemplar pairs. When one tries to minimize this cost using "gradient descent" for the class of neural networks called multilayer perceptrons (MLP), one obtains the common and well-known backpropagation algorithm for training neural networks.

Tasks that fall under the supervised learning paradigm are pattern recognition (also known as classification) and regression (also known as function approximation). The supervised learning paradigm is also applicable to sequential data (e.g., speech, handwriting, and gesture recognition). This can be considered as a form of learning with a “teacher”, in the form of a function that provides continuous feedback on the quality of the solutions obtained so far.

In unsupervised learning, some data is given and the cost function is minimized, which can be any function of the data and the network output, ..

The cost function depends on the task (what we are trying to model) and our a priori implicit assumptions (the properties of our model, its parameters, and the observed variables).

As a trivial example, consider the model where is a constant and is cost . Minimizing this cost will give us a value that is equal to the mean of the data. The cost function can be much more complicated. Its form depends on the application: for example, in compression it might be related to the mutual information between and , while in statistical modeling, it might be related to the posterior probability of the model given the data (note that in both of these examples those quantities would be maximized rather than minimized).

Tasks that fall within the unsupervised learning paradigm are in general problem estimation; Applications include clustering, estimation of statistical distributions, data compression, and Bayesian spam filtering.

In reinforcement learning, data is usually not given, but generated by the interaction of an agent with the environment. At each point in time, the agent performs an action and the environment generates an observation and an instantaneous cost, according to some (usually unknown) dynamics. The goal is to discover a for selecting actions that minimizes some measure of a long-run cost, for example, the expected cumulative cost. The environmental dynamics and long-term costs for each general policy are unknown, but can be estimated.

Entry type

Finally, ANNs can also be classified according to whether they are capable of processing different types of information:

• - Analog networks: process input data with continuous and, usually, bounded values. Examples of this type of networks are: Hopfield "Hopfield (RNA)"), Kohonen "Kohonen (RNA)") and competitive learning networks).

• - Discrete networks: process input data of a discrete nature; usually boolean logical values. Examples of this second type of networks are: the Boltzmann and Cauchy machines"), and the discrete Hopfield network "Hopfield (RNA)").

Learning algorithms

La formación de un modelo de red neuronal en esencia significa seleccionar un modelo de la serie de modelos permitidos (o, en un bayesiano marco, la determinación de una distribución en el conjunto de modelos permitidos) que minimiza el criterio de costo. Hay numerosos algoritmos disponibles para la formación de los modelos de redes neuronales; la mayoría de ellos puede ser vista como una aplicación directa de la teoría de optimización "Optimización (matemática)") y la estimación estadística.

La mayoría de los algoritmos utilizados en las redes neuronales artificiales de formación emplean alguna forma de descenso de gradiente, utilizando propagación hacia atrás para calcular los gradientes reales. Esto se hace simplemente tomando la derivada de la función de coste con respecto a los parámetros de la red y a continuación, cambiando los parámetros en una dirección relacionada al gradiente. Los algoritmos de formación de propagación hacia atrás generalmente se clasifican en tres categorías:.

Descenso del gradiente (con tasa variable de aprendizaje y momentum, retropropagación elástica (Rprop));.

- cuasi-Newton ( Broyden-Fletcher-Goldfarb-Shannon"), Método de la secante );.

- Levenberg-Marquardt") y gradiente conjugado (actualización Fletcher-Reeves, actualizaación Polak-Ribiere, Powell-Beale reinicio, gradiente conjugado escalado).[42].

Métodos evolutivos,[43] de programación de la expresión génica,[44] de recocido simulado,[45] de esperanza-maximización, los métodos no paramétricos y la optimización por enjambre de partículas[46] son algunos otros métodos para la formación de redes neuronales.

Este es un método de aprendizaje específicamente diseñado para redes neuronales controladores de articulación (CMAC por sus siglas en inglés) de modelo cerebelosa. En 2004, un algoritmo recursivo de mínimos cuadrados fue introducido para formar en línea redes neuronales CMAC.[47] Este algoritmo puede converger en un solo paso, y actualizar todos los pesos en un solo paso con cualquier dato nuevo de entrada. Al principio, este algoritmo tenía complejidad computacional de O(N). Basado en factorización QR, este algoritmo recursivo de aprendizaje había sido simplificado para hacerlo O(N).[48].

The use of artificial neural networks

Perhaps the greatest advantage of ANNs is their ability to be used as an arbitrary approximation function mechanism that "learns" from observed data. However, its use is not that simple, and a relatively good understanding of the underlying theory is essential.

With the right application, ANNs can be used naturally in online learning and large data set applications. Their simple application and the existence of mostly local dependencies exposed in the framework allow for fast and parallel implementations in hardware.

Applications

RNA las hacen bastante apropiadas para aplicaciones en las que no se dispone a priori de un modelo identificable que pueda ser programado, pero se dispone de un conjunto básico de ejemplos de entrada (previamente clasificados o no). Asimismo, son altamente robustas tanto al ruido como a la disfunción de elementos concretos y son fácilmente paralelizables.

Esto incluye problemas de clasificación y reconocimiento de patrones de voz, imágenes, señales, etc. Asimismo se han utilizado para encontrar patrones de fraude económico, hacer predicciones en el mercado financiero, hacer predicciones de tiempo atmosférico, etc.

También se pueden utilizar cuando no existen modelos matemáticos precisos o algoritmos con complejidad razonable, por ejemplo la red de Kohonen ha sido aplicada con un éxito más que razonable al clásico problema del viajante (un problema para el que no se conoce solución algorítmica de complejidad polinómica).

Otro tipo especial de redes neuronales artificiales se ha aplicado en conjunción con los algoritmos genéticos (AG) para crear controladores para robots. La disciplina que trata la evolución de redes neuronales mediante algoritmos genéticos se denomina Robótica Evolutiva. En este tipo de aplicación el genoma del AG lo constituyen los parámetros de la red (topología, algoritmo de aprendizaje, funciones de activación, etc.) y la adecuación de la red viene dada por la adecuación del comportamiento exhibido por el robot controlado (normalmente una simulación de dicho comportamiento).

Real life applications

The tasks applied to artificial neural networks tend to fall within the following general categories:

• - Function approximation, or regression analysis, including time series prediction, fitness functions") and modeling.

• - Classification, including pattern recognition and sequence recognition, detection and sequential decision making.

• - Data processing, including filtering, grouping,[49] blind separation of signals") and compression.

• - Robotics, including the direction of manipulators and prostheses.

• - Control engineering, including computer numerical control.

Application areas include system identification and control (vehicle control, trajectory prediction,[50] process control, natural resource management), quantum chemistry, games and decision making (backgammon, chess, poker), pattern recognition (radar systems, facial recognition, signal classification,[51] object recognition and more), sequence recognition (gesture, voice, handwritten text recognition), medical diagnosis, economic-financial applications (e.g. automated systems for commerce in various sectors of activity), data mining (or knowledge discovery in databases, "KDD"), visualization, automatic translation, differentiating between wanted and unwanted reports in social networks,[52] email spam prevention.

Artificial neural networks have also been used for the diagnosis of various types of cancer. An ANN-based hybrid lung cancer detection system called HLND improves the diagnostic accuracy and speed of lung cancer radiology. These networks have also been used to diagnose prostate cancer. Diagnostics can be used to make specific models taken from a large group of patients compared to information from a given patient. The models do not depend on assumptions about the correlations of different variables. The use of neural networks has also been planned for colorectal cancer. Neural networks could predict the outcome of a colorectal cancer patient more accurately than current clinical methods. After training, the networks could predict multiple patient outcomes from related institutions, among other things.

Neural networks and neuroscience

Theoretical and computational Neuroscience are the field that deals with the theoretical analysis and computational modeling of biological neural systems. Since neural systems are closely related to cognitive and behavioral processes, the field is closely related to cognitive and behavioral modeling.

The goal of the field is to model biological neural systems in order to understand how biological systems work. To gain this understanding, neuroscientists strive to make a link between observed biological processes (data), biologically plausible mechanisms for neural processing and learning (biological neural networks models), and theory (statistical learning theory and information theory).

Types of models.

Many models are used in the field, defining at different levels of abstraction and modeling different aspects of neural systems. They extend from models of the short-term behavior of individual neurons), then models of the emergence of the dynamics of neural circuits from the interaction between individual neurons to, finally, models of the emergence of the behavior of abstract neural modules representing entire subsystems. These include models of long-term and short-term plasticity, and of neural systems and their relationships to individual neuron learning and memory at the system level.

Networks with memory.

The integration of external memory components with artificial neural networks has a long history dating back to early research into distributed representations and self-organizing maps. For example, in sparse distributed memory, the patterns encoded by neural networks are used as memory addresses for addressable content memory, with "neurons" essentially serving as address encoders and decoders.

More recently deep learning has proven useful in semantic hashing, where a deep graphical model of word count vectors is obtained from a large set of documents. Documents are mapped to memory addresses in such a way that semantically similar documents are found at nearby addresses. Documents similar to a query document can then be found by simply accessing all addresses that differ by only a few bits from the address of the query document.

Memory Networks is another extension of neural networks that incorporate long-term memory that was developed by Facebook Research.[53] Long-term memory can be read and written to, with the goal of using it for prediction. These models have been applied in the context of "search for answers" (QA), where long-term memory actually acts as a (dynamic) knowledge base, and the output is a textual answer.

Neural Turing Machines" developed by Google DeepMind allow us to extend the capabilities of deep neural networks by coupling to external memory resources, which can interact with attentional processes. The combined system is analogous to a Turing machine but is end-to-end differentiable, allowing it to be trained efficiently with gradient descent. Preliminary results demonstrate that neural Turing machines can deduce simple algorithms, such as copying, sorting, and associative recall from examples of entry and exit.

Calculation power

The multilayer perceptron") is an approximation of the universal function, as shown by the universal approximation theorem"). However, the test is not constructive on the number of neurons needed, the network topology, the configuration of the weights and the learning parameters.

The work of Hava Siegelmann") and Eduardo D. Sontag has provided proof that a specific recurrent architecture with rational valued weights (as opposed to total precision real number-valued weights) has all the power of a universal Turing machine using a finite number of neurons and standard linear connections. Furthermore, it has been shown that using irrational values for weights results in a machine with super-Turing power.

Ability.

Artificial neural network models have a property called “capacity,” which roughly corresponds to their ability to model any given function. It is related to the amount of information that can be stored on the network and the notion of complexity.

Convergence.

Nothing can be said in general about convergence since it depends on a series of factors. First, there may be many local minima. This depends on the cost function and the model. Second, the optimization method used cannot be guaranteed to converge when far from a local minimum. Third, for a very large amount of data or parameters, some methods become impractical. In general, theoretical guarantees about convergence have been found to be a reliable guide to practical application.

Generalization and statistics.

In applications where the goal is to create a system that generalizes well to unseen examples, the problem of overtraining has arisen. This arises in complicated or over-specified systems when the network capacity significantly exceeds the necessary free parameters. There are two schools of thought to avoid this problem: The first is to use cross-validation&action=edit&redlink=1 "Cross-validation (statistics) (not yet written)") similar techniques and to check for the presence of overtraining and optimally select hyper- such that generalization error is minimized. The second is to use some kind of regularization&action=edit&redlink=1 "Regularization (mathematics) (not yet written)"). This is a concept that arises naturally in a probabilistic (Bayesian) framework, where regularization can be performed by selecting a larger prior probability over simpler models; but also in statistical learning theory, where the goal is to minimize more than two quantities: the "empirical risk" and the "structural risk", which roughly corresponds to the error on the training set and the prediction error on data that is not seen due to overfitting. Supervised neural networks that use a mean square error (MSE) cost function can use formal statistical methods to determine the confidence of the trained model. The MSE on a validation set can be used as an estimate of the variance. This value can be used to calculate the confidence interval of the network output, assuming a normal distribution. A confidence analysis performed in this way is statistically valid as long as the output probability distribution remains the same and the network is not modified.

By assigning a softmax activation function, a generalization of the logistic function, at the output layer of the neural network (or a softmax component in a component-based neural network) to the target categorical variables, the outputs can be interpreted as the probabilities. This is very useful in classification, as it gives a measure of the security in classifications.

The softmax activation function is:.

The criticism

Training stuff

A common criticism of neural networks, particularly in robotics, is that they require a large diversity of training for real-world operation. This is not surprising, since any learning machine needs enough representative examples in order to capture the underlying structure that allows it to generalize to new cases. Dean A. Powerless, in his research presented in the paper “Knowledge-Based Training of Artificial Neural Networks for Autonomous Robot Driving,” uses a neural network to train a robotic vehicle to drive on multiple types of roads (single lane, multi-lane, dirt, etc.). A lot of his research is dedicated to (1) extrapolating multiple training scenarios from a single training experience, and (2) preserving past training diversity so that the system does not become over-trained (if, for example, it is presented with a series of right turns - it should not learn to always turn right). These problems are common in neural networks that must decide from a wide variety of responses, but they can be treated in several ways, for example by randomly shuffling training examples, by using a numerical optimization algorithm that does not take too large steps when changing network connections following an example, or by grouping examples into so-called mini-batches.

Theoretical issues

AK Dewdney"), a mathematical and computer science scientist at the University of Western Ontario and former Scientific American columnist, wrote in 1997, "Although neural networks do solve some toy problems, their computing power is so limited that I am surprised no one takes them seriously as a general problem-solving tool." No neural network has ever been shown to solve computationally difficult problems, such as the N-Queens problem, the traveling salesman problem, or the problem of factoring large integers.

Aside from their usefulness, a fundamental objection to artificial neural networks is that they fail to reflect how real neurons work. Backward propagation is at the heart of artificial neural networks and most not only is there no evidence of any mechanism for such natural neural networks,[54] it appears to contradict the fundamental principle of real neurons that information can only flow forward along the axon. How information is encoded by actual neurons is not yet known. What is known is that sensory neurons fire action potentials more frequently with sensor activation and muscle cells fire more strongly when their associated motor neurons receive action potentials more frequently.[55] Apart from the simplest case of just transmitting information from a neuron to a sensor motor neuron almost nothing is known of the underlying general principles of how information is handled by real neural networks.

The purpose of artificial neural networks is not necessarily to replicate real neural function but to be inspired by natural neural networks as an approach to inherently parallel computing that provides solutions to problems that until now have been intractable. Therefore, a central claim of artificial neural networks is that they embody some new and powerful general principle for information processing. Unfortunately, these general principles are poorly defined and are often claimed to be emergent from the neural network itself. This allows simple statistical association (the basic function of artificial neural networks), which is described as learning or recognition. As a result, artificial neural networks have, according to Dewdney, a "somewhat not at all quality, which imparts a peculiar aura of laziness and a distinct lack of curiosity about how good these computing systems are. No human hand (or mind) intervenes; solutions. They meet as if by magic, and no one, it seems, has learned anything."[56].

Hardware problems.

Implementing large, efficient neural network software requires considerable processing and storage resources. While the brain has adapted its hardware to the task of signal processing through a graph of neurons, simulating even a simplified form in the von Neumann architecture can force a neural network designer to use many millions of database rows for its connections, which can consume large amounts of RAM and hard drive space. Furthermore, the designer of neural network systems will often need to use an incredible amount of processing power and CPU time to simulate the transmission of signals through many of these connections and their associated neurons.

Jürgen Schmidhuber notes that the resurgence of neural networks in the 20th century, and their renewed success in image recognition tasks, is largely attributable to advances in hardware: from 1991 to 2015, computing power, especially as delivered by GPGPUs (on GPUs), has increased about a million times, making the standard backpropagation algorithm viable for training networks that are several layers deeper than before. (but he adds that this does not solve algorithmic problems such as the gradient vanishing problem "in a fundamental way"). Using the GPU instead of ordinary CPUs can bring training times for some networks down from months to mere days.

Computing power continues to grow more or less in accordance with Moore's Law, which can provide sufficient resources to carry out new tasks. Neuromorphic engineering addresses the hardware difficulty directly, by building non-von Neumann chips with circuits designed to implement neural networks from scratch. Google has also designed a chip optimized for neural network processing called the Tensor Processing Unit, or TPU.

Practical counterexamples to criticism.

Arguments against Dewdney's position are that neural networks have been successfully used to solve many complex and diverse tasks, ranging from autonomously flying airplanes to credit card fraud detection.

Technology writer Roger Bridgman has commented on DEWDNEY's statements about neural networks:

Neural networks, for example, are in the dock not only because they have been hyped to the high heavens, (what have you, right?), but also because you can create a successful network without understanding how it worked: the pile of numbers that captures its behavior would in all likelihood be "an opaque, unreadable table... of no value as a scientific resource."

Despite his emphatic statement that science is not technology, it seems Dewdney is here to deride neural networks as bad science when most of those devising them are trying to be good engineers. A readable table that a useful machine could read would still be well worth having.

While it is true that analyzing what has been learned by an artificial neural network is difficult, it is much easier to do than analyzing what has been learned by a biological neural network. On the other hand, researchers involved in exploring learning algorithms for neural networks are gradually discovering generic principles that allow a learning machine to be successful. For example, Bengio and LeCun (2007) wrote an article on local vs. local learning. Non-local as well as shallow versus deep architecture.

Hybrid approaches.

Some other criticism comes from proponents of hybrid models (combination of neural networks and symbolic approaches), who believe that the intermix of these two approaches can better capture the mechanisms of the human mind.

Examples

Quake II Neuralbot

A bot is a program that simulates a human player. The Neuralbot") is a bot for the game Quake II that uses an artificial neural network to decide its behavior and a genetic algorithm for learning. It is very easy to test it to see its evolution.

More information here [1].

Unbiased Protein Classifier

It is a program that combines various computational techniques with the objective of classifying protein families. One possible method is to use adaptive metrics such as self-organizing "Kohonen (RNA)" maps and genetic algorithms.

The problem of unbiased classification based on the expression of proteins in amino acids can be reduced, conceptually, to the following:

• - The identification of groups of proteins that share common characteristics.

• - The determination of the structural reasons why the proteins in question are grouped in the indicated way.

• - Avoid the idea of establishing classification criteria ("biased") based on preconceived ideas to achieve classification. In this sense, there are two issues to consider:

How to achieve protein characterization in an unbiased manner

How to achieve the above without resorting to grouping measures that, in turn, imply some type of bias regarding said grouping.

ANNs have been applied to an increasing number of real-life problems of considerable complexity, where their greatest advantage is in solving problems that are quite complex for current technology, being problems that do not have an algorithmic solution or whose algorithmic solution is too complex to be found.

In general, because they are similar to those of the human brain, ANNs are well named since they are good at solving problems that humans can solve but computers cannot. These problems include pattern recognition and weather prediction. In any case, humans have the capacity for pattern recognition, but the capacity of neural networks is not affected by fatigue, working conditions, emotional state, and compensations.

Five widespread technological applications are known:

- Recognition of handwritten texts.

- Speech recognition.

- Simulation of energy production plants.

- Explosives detection.

- Identification of radar targets.

References

[1] ↑ McCulloch, Warren; Walter Pitts (1943). «A Logical Calculus of Ideas Immanent in Nervous Activity». Bulletin of Mathematical Biophysics 5 (4): 115-133. doi:10.1007/BF02478259.: https://dx.doi.org/10.1007%2FBF02478259
[2] ↑ Figueroba, Alex. «Ley de Hebb: la base neuropsicológica del aprendizaje». Psicología y mente. Consultado el 24 de octubre de 2018.: https://psicologiaymente.com/neurociencias/ley-de-hebb
[3] ↑ Hebb, Donald (1949). The Organization of Behavior [La Organización del Comportamiento]. Nueva York: Wiley. ISBN 978-1-135-63190-1.: https://books.google.com/books?id=ddB4AgAAQBAJ
[4] ↑ Republicación del artículode Turing de 1948: Turing, A.M. (1992). Collected works of AM Turing — Mechanical Intelligence. [Obras colectivas de AM Turing — Inteligencia mecánica] (en inglés). Elsevier Science Publishers.
[5] ↑ Webster, C.S. (2012). Alan Turing's unorganized machines and artificial neural networks: his remarkable early work and future possibilities [Las máquinas desorganizadas y redes de neuronas artificiales de Alan Turing: su notable trabajo inicial y posibilidades futuras] (en inglés publicación=Evolutionary Intelligence) (5). pp. 35-43.
[6] ↑ Farley, B.G.; W.A. Clark (1954). «Simulation of Self-Organizing Systems by Digital Computer» [Simulación de Sistemas Autoorganizadoras por Computadora Digital]. IRE Transactions on Information Theory 4 (4): 76-84. doi:10.1109/TIT.1954.1057468.: https://dx.doi.org/10.1109%2FTIT.1954.1057468
[7] ↑ Rochester, N.; J.H. Holland; L.H. Habit; W.L. Duda (1956). «Tests on a cell assembly theory of the action of the brain, using a large digital computer» [Pruebas de una teoría de la acción del cerebro por asamblea de células, usando una computadora digital grande]. IRE Transactions on Information Theory 2 (3): 80-93. doi:10.1109/TIT.1956.1056810.: https://dx.doi.org/10.1109%2FTIT.1956.1056810
[8] ↑ Matich, Damián Jorge (2001). «Redes Neuronales: Conceptos Básicos y Aplicaciones.» (PDF). Buenos Aires, Argentina. p. 6. Consultado el 26 de octubre de 2018.: https://www.frro.utn.edu.ar/repositorio/catedras/quimica/5_anio/orientadora1/monograias/matich-redesneuronales.pdf
[9] ↑ Rosenblatt, F. (1958). «The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain». Psychological Review 65 (6): 386-408. PMID 13602029. doi:10.1037/h0042519.: https://archive.org/details/sim_psychological-review_1958-11_65_6/page/386
[10] ↑ a b Werbos, P.J. (1975). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Harvard University.: https://books.google.com/books?id=z81XmgEACAAJ
[11] ↑ Hubel, David H.; Wiesel, Torsten (2005). Brain and visual perception: the story of a 25-year collaboration [Cerebro y percepción visual: la historia de una colaboración de 25 años]. Oxford University Press US. p. 106. ISBN 978-0-19-517618-6.: https://books.google.com/books?id=8YrxWojxUA4C&pg=PA106
[12] ↑ Schmidhuber, J. (2015). «Deep Learning in Neural Networks: An Overview». Neural Networks 61: 85-117. PMID 25462637. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003.: https://es.wikipedia.org//www.ncbi.nlm.nih.gov/pubmed/25462637
[13] ↑ Ivakhnenko, A. G. (1973). Cybernetic Predicting Devices. CCM Information Corporation.: https://books.google.com/books?id=FhwVNQAACAAJ
[14] ↑ Ivakhnenko, A. G.; Grigorʹevich Lapa, Valentín (1967). Cybernetics and forecasting techniques. American Elsevier Pub. Co.: https://books.google.com/books?id=rGFgAAAAMAAJ
[15] ↑ Minsky, Marvin; Papert, Seymour (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press. ISBN 0-262-63022-2.: https://books.google.com/books?id=Ow1OAQAAIAAJ
[16] ↑ Rumelhart, D.E; McClelland, James (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge: MIT Press. ISBN 978-0-262-63110-5.: https://books.google.com/books?id=davmLgzusB8C
[17] ↑ Qian, N.; Sejnowski, T.J. (1988). «Predicting the secondary structure of globular proteins using neural network models.». Journal of Molecular Biology 202: 865-884. Qian1988.
[18] ↑ Rost, B.; Sander, C. (1993). «Prediction of protein secondary structure at better than 70% accuracy». Journal of Molecular Biology 232: 584-599. Rost1993.
[19] ↑ Weng, J.; Ahuja, N.; Huang, T. S. (1992). «Cresceptron: a self-organizing neural network which grows adaptively». Proc. International Joint Conference on Neural Networks (Baltimore, Maryland) 1: 576-581.: http://www.cse.msu.edu/~weng/research/CresceptronIJCNN1992.pdf
[20] ↑ Weng, J.; Ahuja, N.; Huang, T.S. (1993). «Learning recognition and segmentation of 3-D objects from 2-D images». Proc. 4th International Conf. Computer Vision (Berlin, Alemania): 121-128.: http://www.cse.msu.edu/~weng/research/CresceptronICCV1993.pdf
[21] ↑ Weng, J.; Ahuja, N.; Huang, T.S. (1997). «Learning recognition and segmentation using the Cresceptron». International Journal of Computer Vision 25 (2): 105-139.: http://www.cse.msu.edu/~weng/research/CresceptronIJCV.pdf
[22] ↑ Scherer, Dominik; Müller, Andreas C.; Behnke, Sven (2010). «Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition». 20th International Conference Artificial Neural Networks (ICANN): 92-101. doi:10.1007/978-3-642-15825-4_10.: https://www.ais.uni-bonn.de/papers/icann2010_maxpool.pdf
[23] ↑ S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen, Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber, 1991.: http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf
[24] ↑ Hochreiter, S.; et al. (15 de enero de 2001). «Gradient flow in recurrent nets: the difficulty of learning long-term dependencies». En Kolen, John F.; Kremer, Stefan C., eds. A Field Guide to Dynamical Recurrent Networks. John Wiley & Sons. ISBN 978-0-7803-5369-5.: https://books.google.com/books?id=NWOcMVA64aAC
[25] ↑ J. Schmidhuber. Learning complex, extended sequences using the principle of history compression. Neural Computation, 4, pp. 234–242, 1992.
[26] ↑ Behnke, Sven (2003). Hierarchical Neural Networks for Image Interpretation.. Lecture Notes in Computer Science 2766. Springer.: http://www.ais.uni-bonn.de/books/LNCS2766.pdf
[27] ↑ Yang, J. J.; Pickett, M. D.; Li, X. M.; Ohlberg, D. A. A.; Stewart, D. R.; Williams, R. S. (2008). «Memristive switching mechanism for metal/oxide/metal nanodevice». Nat. Nanotechnol. 3: 429-433. doi:10.1038/nnano.2008.160.: https://dx.doi.org/10.1038%2Fnnano.2008.160
[28] ↑ Strukov, D. B.; Snider, G. S.; Stewart, D. R.; Williams, R. S. (2008). «The missing memristor found». Nature 453 (7191): 80-83. Bibcode:2008Natur.453...80S. PMID 18451858. doi:10.1038/nature06932.: https://archive.org/details/sim_nature-uk_2008-05-01_453_7191/page/80
[29] ↑ Cireşan, Dan Claudiu; Meier, Ueli; Gambardella, Luca María; Schmidhuber, Jürgen (21 de septiembre de 2010). «Deep, Big, Simple Neural Nets for Handwritten Digit Recognition». Neural Computation 22 (12): 3207-3220. ISSN 0899-7667. doi:10.1162/neco_a_00052.: http://www.mitpressjournals.org/doi/10.1162/NECO_a_00052
[30] ↑ Yang, J. J.; Pickett, M. D.; Li, X. M.; Ohlberg, D. A. A.; Stewart, D. R.; Williams, R. S. (2008). «Memristive switching mechanism for metal/oxide/metal nanodevices». Nat. Nanotechnol. 3 (7): 429-433. doi:10.1038/nnano.2008.160.: https://dx.doi.org/10.1038%2Fnnano.2008.160
[31] ↑ Strukov, D. B.; Snider, G. S.; Stewart, D. R.; Williams, R. S. (2008). «The missing memristor found». Nature 453 (7191): 80-83. Bibcode:2008Natur.453...80S. PMID 18451858. doi:10.1038/nature06932.: https://archive.org/details/sim_nature-uk_2008-05-01_453_7191/page/80
[32] ↑ Cireşan, Dan; Meier, Ueli; Masci, Jonathan; Schmidhuber, Jürgen (Agosto de 2012). «Multi-column deep neural network for traffic sign classification». Neural Networks. Selected Papers from IJCNN 2011 32: 333-338. doi:10.1016/j.neunet.2012.02.023.: http://www.sciencedirect.com/science/article/pii/S0893608012000524
[33] ↑ Ciresan, Dan; Giusti, Alessandro; Gambardella, Luca M.; Schmidhuber, Juergen (2012). Pereira, F.; Burges, C. J. C.; Bottou, L. et al., eds. Advances in Neural Information Processing Systems 25. Curran Associates, Inc. pp. 2843-2851. Se sugiere usar |número-editores= (ayuda).: http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images.pdf
[34] ↑ «The Machine Learning Dictionary». Archivado desde el original el 26 de agosto de 2018. Consultado el 19 de septiembre de 2018.: https://web.archive.org/web/20180826151959/http://www.cse.unsw.edu.au/~billw/mldict.html#activnfn
[35] ↑ Ojha, Varun Kumar; Abraham, Ajith; Snášel, Václav (1 de abril de 2017). «Metaheuristic design of feedforward neural networks: A review of two decades of research». Engineering Applications of Artificial Intelligence 60: 97-116. doi:10.1016/j.engappai.2017.01.013.: http://www.sciencedirect.com/science/article/pii/S0952197617300234
[36] ↑ Dominic, S.; Das, R.; Whitley, D.; Anderson, C. (July 1991). «Genetic reinforcement learning for neural networks». IJCNN-91-Seattle International Joint Conference on Neural Networks (Seattle, Washington, USA: IEEE). ISBN 0-7803-0164-1. doi:10.1109/IJCNN.1991.155315.: https://archive.org/details/ijcnn91seattlein01ieee
[37] ↑ Hoskins, J.C.; Himmelblau, D.M. (1992). «Process control via artificial neural networks and reinforcement learning». Computers & Chemical Engineering 16 (4): 241-251. doi:10.1016/0098-1354(92)80045-B.: https://archive.org/details/sim_computers-chemical-engineering_1992-04_16_4/page/241
[38] ↑ Bertsekas, D.P.; Tsitsiklis, J.N. (1996). Neuro-dynamic programming. Athena Scientific. p. 512. ISBN 1-886529-10-8.: https://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images
[39] ↑ de Rigo, D.; Rizzoli, A. E.; Soncini-Sessa, R.; Weber, E.; Zenesi, P. (2001). «Neuro-dynamic programming for the efficient management of reservoir networks». Proceedings of MODSIM 2001, International Congress on Modelling and Simulation (Canberra, Australia: Modelling and Simulation Society of Australia and New Zealand). ISBN 0-867405252. doi:10.5281/zenodo.7481. Consultado el 29 de julio de 2012.: https://zenodo.org/record/7482/files/de_Rigo_etal_MODSIM2001_activelink_authorcopy.pdf
[40] ↑ Damas, M.; Salmeron, M.; Díaz, A.; Ortega, J.; Prieto, A.; Olivares, G. (2000). «Genetic algorithms and neuro-dynamic programming: application to water supply networks». Proceedings of 2000 Congress on Evolutionary Computation (La Jolla, California, USA: IEEE). ISBN 0-7803-6375-2. doi:10.1109/CEC.2000.870269.: https://archive.org/details/proceedingsof2000000cong
[41] ↑ Deng, Geng; Ferris, M.C. (2008). «Neuro-dynamic programming for fractionated radiotherapy planning». Springer Optimization and Its Applications. Springer Optimization and Its Applications 12: 47-70. ISBN 978-0-387-73298-5. doi:10.1007/978-0-387-73299-2_3.: https://dx.doi.org/10.1007%2F978-0-387-73299-2_3
[42] ↑ M. Forouzanfar; H. R. Dajani; V. Z. Groza; M. Bolic; S. Rajan (July 2010). Comparison of Feed-Forward Neural Network Training Algorithms for Oscillometric Blood Pressure Estimation. Arad, Romania: IEEE.: https://www.researchgate.net/profile/Mohamad_Forouzanfar/publication/224173336_Comparison_of_Feed-Forward_Neural_Network_training_algorithms_for_oscillometric_blood_pressure_estimation/links/00b7d533829c3a7484000000.pdf?ev=pub_int_doc_dl&origin=publication_detail&inViewer=true&msrp=TyT96%2BjWOHJo%2BVhkMF4IzwHPAImSd442n%2BAkEuXj9qBmQSZ495CpxqlaOYon%2BSlEzWQElBGyJmbBCiiUOV8ImeEqPFXiIRivcrWsWmlPBYU%3D
[43] ↑ de Rigo, D.; Castelletti, A.; Rizzoli, A. E.; Soncini-Sessa, R.; Weber, E. (January 2005). «A selective improvement technique for fastening Neuro-Dynamic Programming in Water Resources Network Management». En Pavel Zítek, ed. Proceedings of the 16th IFAC World Congress – IFAC-PapersOnLine (Prague, Czech Republic: IFAC) 16. ISBN 978-3-902661-75-3. doi:10.3182/20050703-6-CZ-1902.02172. Consultado el 30 de diciembre de 2011.: http://www.nt.ntnu.no/users/skoge/prost/proceedings/ifac2005/Papers/Paper4269.html
[44] ↑ Ferreira, C. (2006). «Designing Neural Networks Using Gene Expression Programming». In A. Abraham, B. de Baets, M. Köppen, and B. Nickolay, eds., Applied Soft Computing Technologies: The Challenge of Complexity, pages 517–536, Springer-Verlag.: http://www.gene-expression-programming.com/webpapers/Ferreira-ASCT2006.pdf
[45] ↑ Da, Y.; Xiurun, G. (July 2005). «An improved PSO-based ANN with simulated annealing technique». En T. Villmann, ed. New Aspects in Neurocomputing: 11th European Symposium on Artificial Neural Networks (Elsevier). doi:10.1016/j.neucom.2004.07.002.: https://dx.doi.org/10.1016%2Fj.neucom.2004.07.002
[46] ↑ Wu, J.; Chen, E. (May 2009). Wang, H., Shen, Y., Huang, T., Zeng, Z., ed. A Novel Nonparametric Regression Ensemble for Rainfall Forecasting Using Particle Swarm Optimization Technique Coupled with Artificial Neural Network. Springer. ISBN 978-3-642-01215-0. doi:10.1007/978-3-642-01513-7-6.: https://dx.doi.org/10.1007%2F978-3-642-01513-7-6
[47] ↑ Ting Qin, et al. A learning algorithm of CMAC based on RLS. Neural Processing Letters 19.1 (2004): 49–61.
[48] ↑ Ting Qin, et al. Continuous CMAC-QRLS and its systolic array. Neural Processing Letters 22.1 (2005): 1–16.
[49] ↑ Ochando Terreros, Cantero Obregón, Ventura Soto, Martínez Heredia,, F., A., S., A.M. (Noviembre 2021). «Diseño, implementación, entrenamiento y validación de un sistema de clasificación automático de las muestras de aceites lubricantes y de líquidos hidráulicos basado en Redes de Neuronas Artificiales aplicado al Programa de Análisis de Aceites del Ejército de Tierra». IX Congreso nacional de i+d en Defensa y Seguridad. Consultado el 10-02-2024.: https://www.researchgate.net/publication/364914381_Diseno_implementacion_entrenamiento_y_validacion_de_un_sistema_de_clasificacion_automatico_de_las_muestras_de_aceites_lubricantes_y_de_liquidos_hidraulicos_basado_en_Redes_de_Neuronas_Artificiales_apl
[50] ↑ Zissis, Dimitrios (October 2015). «A cloud based architecture capable of perceiving and predicting multiple vessel behaviour». Applied Soft Computing 35: 652-661. doi:10.1016/j.asoc.2015.07.002.: http://www.sciencedirect.com/science/article/pii/S1568494615004329
[51] ↑ Sengupta, Nandini; Sahidullah, Md; Saha, Goutam (August 2016). «Lung sound classification using cepstral-based statistical features». Computers in Biology and Medicine 75 (1): 118-129. doi:10.1016/j.compbiomed.2016.05.013.: http://www.sciencedirect.com/science/article/pii/S0010482516301263
[52] ↑ Schechner, Sam (15 de junio de 2017). «Facebook Boosts A.I. to Block Terrorist Propaganda». Wall Street Journal (en inglés estadounidense). ISSN 0099-9660. Consultado el 16 de junio de 2017.: https://www.wsj.com/articles/facebook-boosts-a-i-to-block-terrorist-propaganda-1497546000
[53] ↑ https://nmas1.org/news/2018/06/22/fb-musica-ra-tecnologia.: https://nmas1.org/news/2018/06/22/fb-musica-ra-tecnologia
[54] ↑ Crick, Francis (1989). «The recent excitement about neural networks». Nature 337 (6203): 129-132. Bibcode:1989Natur.337..129C. PMID 2911347. doi:10.1038/337129a0.: http://europepmc.org/abstract/med/2911347
[55] ↑ Adrian, Edward D. (1926). «The impulses produced by sensory nerve endings». The Journal of Physiology 61 (1): 49-72. PMC 1514809. PMID 16993776. doi:10.1113/jphysiol.1926.sp002273.: http://onlinelibrary.wiley.com/doi/10.1113/jphysiol.1926.sp002273/full
[56] ↑ Dewdney, A. K. (1 de abril de 1997). Yes, we have no neutrons: an eye-opening tour through the twists and turns of bad science. Wiley. p. 82. ISBN 978-0-471-10806-1.: https://books.google.com/books?id=KcHaAAAAMAAJ&pg=PA82

History

Contenido

Hebb's learning

The first report on multilayer functional networks was published in 1965 by Ivakhnenko and Lapa, and is known as the clustering method for data management[12][13][14].

Backward propagation and resurgence

[20]

[21].

In 2010, the use of max-pooling in backpropagation training was accelerated by GPUs, and was shown to offer better performance than other types of pooling.[22].

Hardware-based designs

Improvements since 2006

Models

Network function

An ANN is typically defined by three types of parameters:

The pattern of interconnection between the different layers of neurons.

The learning process for updating the weights of the interconnections.

The activation function that converts the weighted inputs of a neuron to its activation at the output.

This figure represents a decomposition of such , with the dependencies between the variables indicated by the arrows. These can be interpreted in two ways.

The learning

This involves the definition of a cost function such that, for the optimal solution . That is, no solution has a cost less than the cost of the optimal solution.

Choosing a cost function

Learning paradigms

There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning **, unsupervised learning and reinforcement learning.

In unsupervised learning, some data is given and the cost function is minimized, which can be any function of the data and the network output, ..

The cost function depends on the task (what we are trying to model) and our a priori implicit assumptions (the properties of our model, its parameters, and the observed variables).

Entry type

Finally, ANNs can also be classified according to whether they are capable of processing different types of information:

Learning algorithms

Descenso del gradiente (con tasa variable de aprendizaje y momentum, retropropagación elástica (Rprop));.

- cuasi-Newton ( Broyden-Fletcher-Goldfarb-Shannon"), Método de la secante );.

- Levenberg-Marquardt") y gradiente conjugado (actualización Fletcher-Reeves, actualizaación Polak-Ribiere, Powell-Beale reinicio, gradiente conjugado escalado).[42].

The use of artificial neural networks

Applications

Real life applications

The tasks applied to artificial neural networks tend to fall within the following general categories:

• - Function approximation, or regression analysis, including time series prediction, fitness functions") and modeling.

• - Classification, including pattern recognition and sequence recognition, detection and sequential decision making.

• - Data processing, including filtering, grouping,[49] blind separation of signals") and compression.

• - Robotics, including the direction of manipulators and prostheses.

• - Control engineering, including computer numerical control.

Neural networks and neuroscience

Types of models.

Networks with memory.

Calculation power

Ability.

Convergence.

Generalization and statistics.

The softmax activation function is:.

The criticism

Training stuff

Theoretical issues

Hardware problems.

Practical counterexamples to criticism.

Technology writer Roger Bridgman has commented on DEWDNEY's statements about neural networks:

Hybrid approaches.

Examples

Quake II Neuralbot

More information here [1].

Unbiased Protein Classifier

The problem of unbiased classification based on the expression of proteins in amino acids can be reduced, conceptually, to the following:

• - The identification of groups of proteins that share common characteristics.

• - The determination of the structural reasons why the proteins in question are grouped in the indicated way.

• - Avoid the idea of establishing classification criteria ("biased") based on preconceived ideas to achieve classification. In this sense, there are two issues to consider:

How to achieve protein characterization in an unbiased manner

How to achieve the above without resorting to grouping measures that, in turn, imply some type of bias regarding said grouping.

Five widespread technological applications are known:

- Recognition of handwritten texts.

- Speech recognition.

- Simulation of energy production plants.

- Explosives detection.

- Identification of radar targets.

References

[1] ↑ McCulloch, Warren; Walter Pitts (1943). «A Logical Calculus of Ideas Immanent in Nervous Activity». Bulletin of Mathematical Biophysics 5 (4): 115-133. doi:10.1007/BF02478259.: https://dx.doi.org/10.1007%2FBF02478259
[2] ↑ Figueroba, Alex. «Ley de Hebb: la base neuropsicológica del aprendizaje». Psicología y mente. Consultado el 24 de octubre de 2018.: https://psicologiaymente.com/neurociencias/ley-de-hebb
[3] ↑ Hebb, Donald (1949). The Organization of Behavior [La Organización del Comportamiento]. Nueva York: Wiley. ISBN 978-1-135-63190-1.: https://books.google.com/books?id=ddB4AgAAQBAJ
[4] ↑ Republicación del artículode Turing de 1948: Turing, A.M. (1992). Collected works of AM Turing — Mechanical Intelligence. [Obras colectivas de AM Turing — Inteligencia mecánica] (en inglés). Elsevier Science Publishers.
[5] ↑ Webster, C.S. (2012). Alan Turing's unorganized machines and artificial neural networks: his remarkable early work and future possibilities [Las máquinas desorganizadas y redes de neuronas artificiales de Alan Turing: su notable trabajo inicial y posibilidades futuras] (en inglés publicación=Evolutionary Intelligence) (5). pp. 35-43.
[6] ↑ Farley, B.G.; W.A. Clark (1954). «Simulation of Self-Organizing Systems by Digital Computer» [Simulación de Sistemas Autoorganizadoras por Computadora Digital]. IRE Transactions on Information Theory 4 (4): 76-84. doi:10.1109/TIT.1954.1057468.: https://dx.doi.org/10.1109%2FTIT.1954.1057468
[7] ↑ Rochester, N.; J.H. Holland; L.H. Habit; W.L. Duda (1956). «Tests on a cell assembly theory of the action of the brain, using a large digital computer» [Pruebas de una teoría de la acción del cerebro por asamblea de células, usando una computadora digital grande]. IRE Transactions on Information Theory 2 (3): 80-93. doi:10.1109/TIT.1956.1056810.: https://dx.doi.org/10.1109%2FTIT.1956.1056810
[8] ↑ Matich, Damián Jorge (2001). «Redes Neuronales: Conceptos Básicos y Aplicaciones.» (PDF). Buenos Aires, Argentina. p. 6. Consultado el 26 de octubre de 2018.: https://www.frro.utn.edu.ar/repositorio/catedras/quimica/5_anio/orientadora1/monograias/matich-redesneuronales.pdf
[9] ↑ Rosenblatt, F. (1958). «The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain». Psychological Review 65 (6): 386-408. PMID 13602029. doi:10.1037/h0042519.: https://archive.org/details/sim_psychological-review_1958-11_65_6/page/386
[10] ↑ a b Werbos, P.J. (1975). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Harvard University.: https://books.google.com/books?id=z81XmgEACAAJ
[11] ↑ Hubel, David H.; Wiesel, Torsten (2005). Brain and visual perception: the story of a 25-year collaboration [Cerebro y percepción visual: la historia de una colaboración de 25 años]. Oxford University Press US. p. 106. ISBN 978-0-19-517618-6.: https://books.google.com/books?id=8YrxWojxUA4C&pg=PA106
[12] ↑ Schmidhuber, J. (2015). «Deep Learning in Neural Networks: An Overview». Neural Networks 61: 85-117. PMID 25462637. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003.: https://es.wikipedia.org//www.ncbi.nlm.nih.gov/pubmed/25462637
[13] ↑ Ivakhnenko, A. G. (1973). Cybernetic Predicting Devices. CCM Information Corporation.: https://books.google.com/books?id=FhwVNQAACAAJ
[14] ↑ Ivakhnenko, A. G.; Grigorʹevich Lapa, Valentín (1967). Cybernetics and forecasting techniques. American Elsevier Pub. Co.: https://books.google.com/books?id=rGFgAAAAMAAJ
[15] ↑ Minsky, Marvin; Papert, Seymour (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press. ISBN 0-262-63022-2.: https://books.google.com/books?id=Ow1OAQAAIAAJ
[16] ↑ Rumelhart, D.E; McClelland, James (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge: MIT Press. ISBN 978-0-262-63110-5.: https://books.google.com/books?id=davmLgzusB8C
[17] ↑ Qian, N.; Sejnowski, T.J. (1988). «Predicting the secondary structure of globular proteins using neural network models.». Journal of Molecular Biology 202: 865-884. Qian1988.
[18] ↑ Rost, B.; Sander, C. (1993). «Prediction of protein secondary structure at better than 70% accuracy». Journal of Molecular Biology 232: 584-599. Rost1993.
[19] ↑ Weng, J.; Ahuja, N.; Huang, T. S. (1992). «Cresceptron: a self-organizing neural network which grows adaptively». Proc. International Joint Conference on Neural Networks (Baltimore, Maryland) 1: 576-581.: http://www.cse.msu.edu/~weng/research/CresceptronIJCNN1992.pdf
[20] ↑ Weng, J.; Ahuja, N.; Huang, T.S. (1993). «Learning recognition and segmentation of 3-D objects from 2-D images». Proc. 4th International Conf. Computer Vision (Berlin, Alemania): 121-128.: http://www.cse.msu.edu/~weng/research/CresceptronICCV1993.pdf
[21] ↑ Weng, J.; Ahuja, N.; Huang, T.S. (1997). «Learning recognition and segmentation using the Cresceptron». International Journal of Computer Vision 25 (2): 105-139.: http://www.cse.msu.edu/~weng/research/CresceptronIJCV.pdf
[22] ↑ Scherer, Dominik; Müller, Andreas C.; Behnke, Sven (2010). «Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition». 20th International Conference Artificial Neural Networks (ICANN): 92-101. doi:10.1007/978-3-642-15825-4_10.: https://www.ais.uni-bonn.de/papers/icann2010_maxpool.pdf
[23] ↑ S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen, Diploma thesis. Institut f. Informatik, Technische Univ. Munich. Advisor: J. Schmidhuber, 1991.: http://people.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf
[24] ↑ Hochreiter, S.; et al. (15 de enero de 2001). «Gradient flow in recurrent nets: the difficulty of learning long-term dependencies». En Kolen, John F.; Kremer, Stefan C., eds. A Field Guide to Dynamical Recurrent Networks. John Wiley & Sons. ISBN 978-0-7803-5369-5.: https://books.google.com/books?id=NWOcMVA64aAC
[25] ↑ J. Schmidhuber. Learning complex, extended sequences using the principle of history compression. Neural Computation, 4, pp. 234–242, 1992.
[26] ↑ Behnke, Sven (2003). Hierarchical Neural Networks for Image Interpretation.. Lecture Notes in Computer Science 2766. Springer.: http://www.ais.uni-bonn.de/books/LNCS2766.pdf
[27] ↑ Yang, J. J.; Pickett, M. D.; Li, X. M.; Ohlberg, D. A. A.; Stewart, D. R.; Williams, R. S. (2008). «Memristive switching mechanism for metal/oxide/metal nanodevice». Nat. Nanotechnol. 3: 429-433. doi:10.1038/nnano.2008.160.: https://dx.doi.org/10.1038%2Fnnano.2008.160
[28] ↑ Strukov, D. B.; Snider, G. S.; Stewart, D. R.; Williams, R. S. (2008). «The missing memristor found». Nature 453 (7191): 80-83. Bibcode:2008Natur.453...80S. PMID 18451858. doi:10.1038/nature06932.: https://archive.org/details/sim_nature-uk_2008-05-01_453_7191/page/80
[29] ↑ Cireşan, Dan Claudiu; Meier, Ueli; Gambardella, Luca María; Schmidhuber, Jürgen (21 de septiembre de 2010). «Deep, Big, Simple Neural Nets for Handwritten Digit Recognition». Neural Computation 22 (12): 3207-3220. ISSN 0899-7667. doi:10.1162/neco_a_00052.: http://www.mitpressjournals.org/doi/10.1162/NECO_a_00052
[30] ↑ Yang, J. J.; Pickett, M. D.; Li, X. M.; Ohlberg, D. A. A.; Stewart, D. R.; Williams, R. S. (2008). «Memristive switching mechanism for metal/oxide/metal nanodevices». Nat. Nanotechnol. 3 (7): 429-433. doi:10.1038/nnano.2008.160.: https://dx.doi.org/10.1038%2Fnnano.2008.160
[31] ↑ Strukov, D. B.; Snider, G. S.; Stewart, D. R.; Williams, R. S. (2008). «The missing memristor found». Nature 453 (7191): 80-83. Bibcode:2008Natur.453...80S. PMID 18451858. doi:10.1038/nature06932.: https://archive.org/details/sim_nature-uk_2008-05-01_453_7191/page/80
[32] ↑ Cireşan, Dan; Meier, Ueli; Masci, Jonathan; Schmidhuber, Jürgen (Agosto de 2012). «Multi-column deep neural network for traffic sign classification». Neural Networks. Selected Papers from IJCNN 2011 32: 333-338. doi:10.1016/j.neunet.2012.02.023.: http://www.sciencedirect.com/science/article/pii/S0893608012000524
[33] ↑ Ciresan, Dan; Giusti, Alessandro; Gambardella, Luca M.; Schmidhuber, Juergen (2012). Pereira, F.; Burges, C. J. C.; Bottou, L. et al., eds. Advances in Neural Information Processing Systems 25. Curran Associates, Inc. pp. 2843-2851. Se sugiere usar |número-editores= (ayuda).: http://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images.pdf
[34] ↑ «The Machine Learning Dictionary». Archivado desde el original el 26 de agosto de 2018. Consultado el 19 de septiembre de 2018.: https://web.archive.org/web/20180826151959/http://www.cse.unsw.edu.au/~billw/mldict.html#activnfn
[35] ↑ Ojha, Varun Kumar; Abraham, Ajith; Snášel, Václav (1 de abril de 2017). «Metaheuristic design of feedforward neural networks: A review of two decades of research». Engineering Applications of Artificial Intelligence 60: 97-116. doi:10.1016/j.engappai.2017.01.013.: http://www.sciencedirect.com/science/article/pii/S0952197617300234
[36] ↑ Dominic, S.; Das, R.; Whitley, D.; Anderson, C. (July 1991). «Genetic reinforcement learning for neural networks». IJCNN-91-Seattle International Joint Conference on Neural Networks (Seattle, Washington, USA: IEEE). ISBN 0-7803-0164-1. doi:10.1109/IJCNN.1991.155315.: https://archive.org/details/ijcnn91seattlein01ieee
[37] ↑ Hoskins, J.C.; Himmelblau, D.M. (1992). «Process control via artificial neural networks and reinforcement learning». Computers & Chemical Engineering 16 (4): 241-251. doi:10.1016/0098-1354(92)80045-B.: https://archive.org/details/sim_computers-chemical-engineering_1992-04_16_4/page/241
[38] ↑ Bertsekas, D.P.; Tsitsiklis, J.N. (1996). Neuro-dynamic programming. Athena Scientific. p. 512. ISBN 1-886529-10-8.: https://papers.nips.cc/paper/4741-deep-neural-networks-segment-neuronal-membranes-in-electron-microscopy-images
[39] ↑ de Rigo, D.; Rizzoli, A. E.; Soncini-Sessa, R.; Weber, E.; Zenesi, P. (2001). «Neuro-dynamic programming for the efficient management of reservoir networks». Proceedings of MODSIM 2001, International Congress on Modelling and Simulation (Canberra, Australia: Modelling and Simulation Society of Australia and New Zealand). ISBN 0-867405252. doi:10.5281/zenodo.7481. Consultado el 29 de julio de 2012.: https://zenodo.org/record/7482/files/de_Rigo_etal_MODSIM2001_activelink_authorcopy.pdf
[40] ↑ Damas, M.; Salmeron, M.; Díaz, A.; Ortega, J.; Prieto, A.; Olivares, G. (2000). «Genetic algorithms and neuro-dynamic programming: application to water supply networks». Proceedings of 2000 Congress on Evolutionary Computation (La Jolla, California, USA: IEEE). ISBN 0-7803-6375-2. doi:10.1109/CEC.2000.870269.: https://archive.org/details/proceedingsof2000000cong
[41] ↑ Deng, Geng; Ferris, M.C. (2008). «Neuro-dynamic programming for fractionated radiotherapy planning». Springer Optimization and Its Applications. Springer Optimization and Its Applications 12: 47-70. ISBN 978-0-387-73298-5. doi:10.1007/978-0-387-73299-2_3.: https://dx.doi.org/10.1007%2F978-0-387-73299-2_3
[42] ↑ M. Forouzanfar; H. R. Dajani; V. Z. Groza; M. Bolic; S. Rajan (July 2010). Comparison of Feed-Forward Neural Network Training Algorithms for Oscillometric Blood Pressure Estimation. Arad, Romania: IEEE.: https://www.researchgate.net/profile/Mohamad_Forouzanfar/publication/224173336_Comparison_of_Feed-Forward_Neural_Network_training_algorithms_for_oscillometric_blood_pressure_estimation/links/00b7d533829c3a7484000000.pdf?ev=pub_int_doc_dl&origin=publication_detail&inViewer=true&msrp=TyT96%2BjWOHJo%2BVhkMF4IzwHPAImSd442n%2BAkEuXj9qBmQSZ495CpxqlaOYon%2BSlEzWQElBGyJmbBCiiUOV8ImeEqPFXiIRivcrWsWmlPBYU%3D
[43] ↑ de Rigo, D.; Castelletti, A.; Rizzoli, A. E.; Soncini-Sessa, R.; Weber, E. (January 2005). «A selective improvement technique for fastening Neuro-Dynamic Programming in Water Resources Network Management». En Pavel Zítek, ed. Proceedings of the 16th IFAC World Congress – IFAC-PapersOnLine (Prague, Czech Republic: IFAC) 16. ISBN 978-3-902661-75-3. doi:10.3182/20050703-6-CZ-1902.02172. Consultado el 30 de diciembre de 2011.: http://www.nt.ntnu.no/users/skoge/prost/proceedings/ifac2005/Papers/Paper4269.html
[44] ↑ Ferreira, C. (2006). «Designing Neural Networks Using Gene Expression Programming». In A. Abraham, B. de Baets, M. Köppen, and B. Nickolay, eds., Applied Soft Computing Technologies: The Challenge of Complexity, pages 517–536, Springer-Verlag.: http://www.gene-expression-programming.com/webpapers/Ferreira-ASCT2006.pdf
[45] ↑ Da, Y.; Xiurun, G. (July 2005). «An improved PSO-based ANN with simulated annealing technique». En T. Villmann, ed. New Aspects in Neurocomputing: 11th European Symposium on Artificial Neural Networks (Elsevier). doi:10.1016/j.neucom.2004.07.002.: https://dx.doi.org/10.1016%2Fj.neucom.2004.07.002
[46] ↑ Wu, J.; Chen, E. (May 2009). Wang, H., Shen, Y., Huang, T., Zeng, Z., ed. A Novel Nonparametric Regression Ensemble for Rainfall Forecasting Using Particle Swarm Optimization Technique Coupled with Artificial Neural Network. Springer. ISBN 978-3-642-01215-0. doi:10.1007/978-3-642-01513-7-6.: https://dx.doi.org/10.1007%2F978-3-642-01513-7-6
[47] ↑ Ting Qin, et al. A learning algorithm of CMAC based on RLS. Neural Processing Letters 19.1 (2004): 49–61.
[48] ↑ Ting Qin, et al. Continuous CMAC-QRLS and its systolic array. Neural Processing Letters 22.1 (2005): 1–16.
[49] ↑ Ochando Terreros, Cantero Obregón, Ventura Soto, Martínez Heredia,, F., A., S., A.M. (Noviembre 2021). «Diseño, implementación, entrenamiento y validación de un sistema de clasificación automático de las muestras de aceites lubricantes y de líquidos hidráulicos basado en Redes de Neuronas Artificiales aplicado al Programa de Análisis de Aceites del Ejército de Tierra». IX Congreso nacional de i+d en Defensa y Seguridad. Consultado el 10-02-2024.: https://www.researchgate.net/publication/364914381_Diseno_implementacion_entrenamiento_y_validacion_de_un_sistema_de_clasificacion_automatico_de_las_muestras_de_aceites_lubricantes_y_de_liquidos_hidraulicos_basado_en_Redes_de_Neuronas_Artificiales_apl
[50] ↑ Zissis, Dimitrios (October 2015). «A cloud based architecture capable of perceiving and predicting multiple vessel behaviour». Applied Soft Computing 35: 652-661. doi:10.1016/j.asoc.2015.07.002.: http://www.sciencedirect.com/science/article/pii/S1568494615004329
[51] ↑ Sengupta, Nandini; Sahidullah, Md; Saha, Goutam (August 2016). «Lung sound classification using cepstral-based statistical features». Computers in Biology and Medicine 75 (1): 118-129. doi:10.1016/j.compbiomed.2016.05.013.: http://www.sciencedirect.com/science/article/pii/S0010482516301263
[52] ↑ Schechner, Sam (15 de junio de 2017). «Facebook Boosts A.I. to Block Terrorist Propaganda». Wall Street Journal (en inglés estadounidense). ISSN 0099-9660. Consultado el 16 de junio de 2017.: https://www.wsj.com/articles/facebook-boosts-a-i-to-block-terrorist-propaganda-1497546000
[53] ↑ https://nmas1.org/news/2018/06/22/fb-musica-ra-tecnologia.: https://nmas1.org/news/2018/06/22/fb-musica-ra-tecnologia
[54] ↑ Crick, Francis (1989). «The recent excitement about neural networks». Nature 337 (6203): 129-132. Bibcode:1989Natur.337..129C. PMID 2911347. doi:10.1038/337129a0.: http://europepmc.org/abstract/med/2911347
[55] ↑ Adrian, Edward D. (1926). «The impulses produced by sensory nerve endings». The Journal of Physiology 61 (1): 49-72. PMC 1514809. PMID 16993776. doi:10.1113/jphysiol.1926.sp002273.: http://onlinelibrary.wiley.com/doi/10.1113/jphysiol.1926.sp002273/full
[56] ↑ Dewdney, A. K. (1 de abril de 1997). Yes, we have no neutrons: an eye-opening tour through the twists and turns of bad science. Wiley. p. 82. ISBN 978-0-471-10806-1.: https://books.google.com/books?id=KcHaAAAAMAAJ&pg=PA82

Navegación

Neural Structural Risk Models

Introduction

Training

Neural Structural Risk Models

Introduction

Training

History

Contenido

Hebb's learning

Backward propagation and resurgence

Hardware-based designs

Improvements since 2006

Models

Network function

The learning

Choosing a cost function

Learning paradigms

Entry type

Learning algorithms

The use of artificial neural networks

Applications

Real life applications

Neural networks and neuroscience

Calculation power

The criticism

Training stuff

Theoretical issues

The classes and types of RNAs

Examples

Quake II Neuralbot

Unbiased Protein Classifier

Gallery

Software tools

References

History

Contenido

Hebb's learning

Backward propagation and resurgence

Hardware-based designs

Improvements since 2006

Models

Network function

The learning

Choosing a cost function

Learning paradigms

Entry type

Learning algorithms

The use of artificial neural networks

Applications

Real life applications

Neural networks and neuroscience

Calculation power

The criticism

Training stuff

Theoretical issues

The classes and types of RNAs

Examples

Quake II Neuralbot

Unbiased Protein Classifier

Gallery

Software tools

References