[10050010] |Artificial neural network
[10050020] |An '''artificial neural network (ANN)''', often just called a "neural network" (NN), is a [[mathematical model]] or [[computational model]] based on [[biological neural networks]].
[10050030] |It consists of an interconnected group of [[artificial neuron]]s and processes information using a [[connectionism|connectionist]] approach to [[computation]].
[10050040] |In most cases an ANN is an [[adaptive system]] that changes its structure based on external or internal information that flows through the network during the learning phase.
[10050050] |In more practical terms neural networks are [[non-linear]] [[statistical]] [[data modeling]] tools.
[10050060] |They can be used to model complex relationships between inputs and outputs or to [[Pattern recognition|find patterns]] in data.
[10050070] |==Background==
[10050080] |There is no precise agreed-upon definition among researchers as to what a [[neural network]] is, but most would agree that it involves a network of simple processing elements ([[artificial neuron|neurons]]), which can exhibit complex global behavior, determined by the connections between the processing elements and element parameters.
[10050090] |The original inspiration for the technique was from examination of the [[central nervous system]] and the neurons (and their [[axons]], [[dendrites]] and [[synapses]]) which constitute one of its most significant information processing elements (see [[Neuroscience]]).
[10050100] |In a neural network model, simple [[Node (neural networks)|nodes]] (called variously "neurons", "neurodes", "PEs" ("processing elements") or "units") are connected together to form a network of nodes — hence the term "neural network."
[10050110] |While a neural network does not have to be adaptive per se, its practical use comes with algorithms designed to alter the strength (weights) of the connections in the network to produce a desired signal flow.
[10050120] |These networks are also similar to the [[biological neural networks]] in the sense that functions are performed collectively and in parallel by the units, rather than there being a clear delineation of subtasks to which various units are assigned (see also [[connectionism]]).
[10050130] |Currently, the term Artificial Neural Network (ANN) tends to refer mostly to neural network models employed in [[statistics]], [[cognitive psychology]] and [[artificial intelligence]].
[10050140] |[[Neural network]] models designed with emulation of the [[central nervous system]] (CNS) in mind are a subject of [[theoretical neuroscience]] ([[computational neuroscience]]).
[10050150] |In modern [[Neural network software|software implementations]] of artificial neural networks the approach inspired by biology has more or less been abandoned for a more practical approach based on statistics and signal processing.
[10050160] |In some of these systems neural networks, or parts of neural networks (such as [[artificial neuron]]s) are used as components in larger systems that combine both adaptive and non-adaptive elements.
[10050170] |While the more general approach of such [[adaptive systems]] is more suitable for real-world problem solving, it has far less to do with the traditional artificial intelligence connectionist models.
[10050180] |What they do, however, have in common is the principle of non-linear, distributed, parallel and local processing and adaptation.
[10050190] |===Models===
[10050200] |Neural network models in artificial intelligence are usually referred to as artificial neural networks (ANNs); these are essentially simple mathematical models defining a function .
[10050210] |Each type of ANN model corresponds to a ''class'' of such functions.
[10050220] |====The ''network'' in ''artificial neural network''====
[10050230] |The word ''network'' in the term 'artificial neural network' arises because the function is defined as a composition of other functions , which can further be defined as a composition of other functions.
[10050240] |This can be conveniently represented as a network structure, with arrows depicting the dependencies between variables.
[10050250] |A widely used type of composition is the ''nonlinear weighted sum'', where , where is some predefined function, such as the [[hyperbolic tangent]].
[10050260] |It will be convenient for the following to refer to a collection of functions as simply a vector .
[10050270] |This figure depicts such a decomposition of , with dependencies between variables indicated by arrows.
[10050280] |These can be interpreted in two ways.
[10050290] |The first view is the functional view: the input is transformed into a 3-dimensional vector , which is then transformed into a 2-dimensional vector , which is finally transformed into .
[10050300] |This view is most commonly encountered in the context of [[Optimization (mathematics)|optimization]].
[10050310] |The second view is the probabilistic view: the [[random variable]] depends upon the random variable , which depends upon , which depends upon the random variable .
[10050320] |This view is most commonly encountered in the context of [[graphical models]].
[10050330] |The two views are largely equivalent.
[10050340] |In either case, for this particular network architecture, the components of individual layers are independent of each other (e.g., the components of are independent of each other given their input ).
[10050350] |This naturally enables a degree of parallelism in the implementation.
[10050360] |Networks such as the previous one are commonly called [[feedforward]], because their graph is a [[directed acyclic graph]].
[10050370] |Networks with [[path (graph theory)|cycles]] are commonly called [[Recurrent_neural_network|recurrent]].
[10050380] |Such networks are commonly depicted in the manner shown at the top of the figure, where is shown as being dependent upon itself.
[10050390] |However, there is an implied temporal dependence which is not shown.
[10050400] |What this actually means in practice is that the value of at some point in time depends upon the values of at zero or at one or more other points in time.
[10050410] |The graphical model at the bottom of the figure illustrates the case: the value of at time only depends upon its last value.
[10050420] |===Learning===
[10050430] |However interesting such functions may be in themselves, what has attracted the most interest in neural networks is the possibility of ''learning'', which in practice means the following:
[10050440] |Given a specific ''task'' to solve, and a ''class'' of functions , learning means using a set of ''observations'', in order to find which solves the task in an ''optimal sense''.
[10050450] |This entails defining a [[cost function]] such that, for the optimal solution , (no solution has a cost less than the cost of the optimal solution).
[10050460] |The [[cost function]] is an important concept in learning, as it is a measure of how far away we are from an optimal solution to the problem that we want to solve.
[10050470] |Learning algorithms search through the solution space in order to find a function that has the smallest possible cost.
[10050480] |For applications where the solution is dependent on some data, the cost must necessarily be a ''function of the observations'', otherwise we would not be modelling anything related to the data.
[10050490] |It is frequently defined as a [[statistic]] to which only approximations can be made.
[10050500] |As a simple example consider the problem of finding the model which minimizes , for data pairs drawn from some distribution .
[10050510] |In practical situations we would only have samples from and thus, for the above example, we would only minimize .
[10050520] |Thus, the cost is minimized over a sample of the data rather than the true data distribution.
[10050530] |When some form of online learning must be used, where the cost is partially minimized as each new example is seen.
[10050540] |While online learning is often used when is fixed, it is most useful in the case where the distribution changes slowly over time.
[10050550] |In neural network methods, some form of online learning is frequently also used for finite datasets.
[10050560] |====Choosing a cost function====
[10050570] |While it is possible to arbitrarily define some [[ad hoc]] cost function, frequently a particular cost will be used either because it has desirable properties (such as convexity) or because it arises naturally from a particular formulation of the problem (i.e., In a probabilistic formulation the posterior probability of the model can be used as an inverse cost).
[10050580] |'''Ultimately, the cost function will depend on the task we wish to perform'''.
[10050590] |The three main categories of learning tasks are overviewed below.
[10050600] |===Learning paradigms===
[10050610] |There are three major learning paradigms, each corresponding to a particular abstract learning task.
[10050620] |These are [[supervised learning]], [[unsupervised learning]] and [[reinforcement learning]].
[10050630] |Usually any given type of network architecture can be employed in any of those tasks.
[10050640] |====Supervised learning====
[10050650] |In [[supervised learning]], we are given a set of example pairs and the aim is to find a function in the allowed class of functions that matches the examples.
[10050660] |In other words, we wish to ''infer'' the mapping implied by the data; the cost function is related to the mismatch between our mapping and the data and it implicitly contains prior knowledge about the problem domain.
[10050670] |A commonly used cost is the [[mean-squared error]] which tries to minimize the average error between the network's output, f(x), and the target value y over all the example pairs.
[10050680] |When one tries to minimise this cost using [[gradient descent]] for the class of neural networks called [[Multilayer perceptron|Multi-Layer Perceptrons]], one obtains the common and well-known [[Backpropagation|backpropagation algorithm]] for training neural networks.
[10050690] |Tasks that fall within the paradigm of supervised learning are [[pattern recognition]] (also known as classification) and [[Regression analysis|regression]] (also known as function approximation).
[10050700] |The supervised learning paradigm is also applicable to sequential data (e.g., for speech and gesture recognition).
[10050710] |This can be thought of as learning with a "teacher," in the form of a function that provides continuous feedback on the quality of solutions obtained thus far.
[10050720] |====Unsupervised learning====
[10050730] |In [[unsupervised learning]] we are given some data , and the cost function to be minimized can be any function of the data and the network's output, .
[10050740] |The cost function is dependent on the task (what we are trying to model) and our ''a priori'' assumptions (the implicit properties of our model, its parameters and the observed variables).
[10050750] |As a trivial example, consider the model , where is a constant and the cost .
[10050760] |Minimizing this cost will give us a value of that is equal to the mean of the data.
[10050770] |The cost function can be much more complicated.
[10050780] |Its form depends on the application: For example in compression it could be related to the [[mutual information]] between x and y.
[10050790] |In statistical modelling, it could be related to the [[posterior probability]] of the model given the data.
[10050800] |(Note that in both of those examples those quantities would be maximized rather than minimised).
[10050810] |Tasks that fall within the paradigm of unsupervised learning are in general [[estimation]] problems; the applications include [[Data clustering|clustering]], the estimation of [[statistical distributions]], [[Data compression|compression]] and [[Bayesian spam filtering|filtering]].
[10050820] |====Reinforcement learning====
[10050830] |In [[reinforcement learning]], data is usually not given, but generated by an agent's interactions with the environment.
[10050840] |At each point in time , the agent performs an action and the environment generates an observation and an instantaneous cost , according to some (usually unknown) dynamics.
[10050850] |The aim is to discover a ''policy'' for selecting actions that minimizes some measure of a long-term cost, i.e. the expected cumulative cost.
[10050860] |The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated.
[10050870] |More formally, the environment is modeled as a [[Markov decision process]] (MDP) with states S and actions with the following probability distributions: the instantaneous cost distribution , the observation distribution and the transition , while a policy is defined as conditional distribution over actions given the observations.
[10050880] |Taken together, the two define a [[Markov chain]] (MC).
[10050890] |The aim is to discover the policy that minimizes the cost, i.e. the MC for which the cost is minimal.
[10050900] |ANNs are frequently used in reinforcement learning as part of the overall algorithm.
[10050910] |Tasks that fall within the paradigm of reinforcement learning are control problems, [[game|games]] and other [[sequential decision making]] tasks.
[10050920] |See also: [[dynamic programming]], [[stochastic control]]
[10050930] |===Learning algorithms===
[10050940] |Training a neural network model essentially means selecting one model from the set of allowed models (or, in a [[Bayesian]] framework, determining a distribution over the set of allowed models) that minimises the cost criterion.
[10050950] |There are numerous algorithms available for training neural network models; most of them can be viewed as a straightforward application of [[Optimization (mathematics)|optimization]] theory and [[statistical estimation]].
[10050960] |Most of the algorithms used in training artificial neural networks are employing some form of [[gradient descent]].
[10050970] |This is done by simply taking the derivative of the cost function with respect to the network parameters and then changing those parameters in a [[gradient-related]] direction.
[10050980] |[[Evolutionary methods]], [[simulated annealing]], and [[expectation-maximization]] and [[non-parametric methods]] are among other commonly used methods for training neural networks.
[10050990] |See also [[machine learning]].
[10051000] |Temporal perceptual learning rely on finding temporal relationships in sensory signal streams.
[10051010] |In an environment, statistically salient temporal correlations can be found by monitoring the arrival times of sensory signals.
[10051020] |This is done by the [[perceptual network]].
[10051030] |==Employing artificial neural networks==
[10051040] |Perhaps the greatest advantage of ANNs is their ability to be used as an arbitrary function approximation mechanism which 'learns' from observed data.
[10051050] |However, using them is not so straightforward and a relatively good understanding of the underlying theory is essential.
[10051060] |*Choice of model: This will depend on the data representation and the application.
[10051070] |Overly complex models tend to lead to problems with learning.
[10051080] |*Learning algorithm: There are numerous tradeoffs between learning algorithms.
[10051090] |Almost any algorithm will work well with the ''correct [[hyperparameter]]s'' for training on a particular fixed dataset.
[10051100] |However selecting and tuning an algorithm for training on unseen data requires a significant amount of experimentation.
[10051110] |*Robustness: If the model, cost function and learning algorithm are selected appropriately the resulting ANN can be extremely robust.
[10051120] |With the correct implementation ANNs can be used naturally in [[online algorithm|online learning]] and large dataset applications.
[10051130] |Their simple implementation and the existence of mostly local dependencies exhibited in the structure allows for fast, parallel implementations in hardware.
[10051140] |==Applications==
[10051150] |The utility of artificial neural network models lies in the fact that they can be used to infer a function from observations.
[10051160] |This is particularly useful in applications where the complexity of the data or task makes the design of such a function by hand impractical.
[10051170] |===Real life applications===
[10051180] |The tasks to which artificial neural networks are applied tend to fall within the following broad categories:
[10051190] |*[[Function approximation]], or [[regression analysis]], including [[time series prediction]] and modeling.
[10051200] |*[[Statistical classification|Classification]], including [[Pattern recognition|pattern]] and sequence recognition, [[novelty detection]] and sequential decision making.
[10051210] |*[[Data processing]], including filtering, clustering, blind source separation and compression.
[10051220] |Application areas include system identification and control (vehicle control, process control), game-playing and decision making (backgammon, chess, racing), pattern recognition (radar systems, face identification, object recognition and more), sequence recognition (gesture, speech, handwritten text recognition), medical diagnosis, financial applications (automated trading systems), [[data mining]] (or knowledge discovery in databases, "KDD"), visualization and [[e-mail spam]] filtering.
[10051230] |==Neural network software==
[10051240] |'''Neural network software''' is used to [[Simulation|simulate]], [[research]], [[technology development|develop]] and apply artificial neural networks, [[biological neural network]]s and in some cases a wider array of [[adaptive system]]s.
[10051250] |See also [[logistic regression]].
[10051260] |==Types of neural networks==
[10051270] |===Feedforward neural network===
[10051280] |The feedforward neural network was the first and arguably simplest type of artificial neural network devised.
[10051290] |In this network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes.
[10051300] |There are no cycles or loops in the network.
[10051310] |===Radial basis function (RBF) network===
[10051320] |Radial Basis Functions are powerful techniques for interpolation in multidimensional space.
[10051330] |A RBF is a function which has built into a distance criterion with respect to a centre.
[10051340] |Radial basis functions have been applied in the area of neural networks where they may be used as a replacement for the sigmoidal hidden layer transfer characteristic in Multi-Layer Perceptrons.
[10051350] |RBF networks have two layers of processing: In the first, input is mapped onto each RBF in the 'hidden' layer.
[10051360] |The RBF chosen is usually a Gaussian.
[10051370] |In regression problems the output layer is then a linear combination of hidden layer values representing mean predicted output.
[10051380] |The interpretation of this output layer value is the same as a regression model in statistics.
[10051390] |In classification problems the output layer is typically a [[sigmoid function]] of a linear combination of hidden layer values, representing a posterior probability.
[10051400] |Performance in both cases is often improved by shrinkage techniques, known as [[ridge regression]] in classical statistics and known to correspond to a prior belief in small parameter values (and therefore smooth output functions) in a Bayesian framework.
[10051410] |RBF networks have the advantage of not suffering from local minima in the same way as Multi-Layer Perceptrons.
[10051420] |This is because the only parameters that are adjusted in the learning process are the linear mapping from hidden layer to output layer.
[10051430] |Linearity ensures that the error surface is quadratic and therefore has a single easily found minimum.
[10051440] |In regression problems this can be found in one matrix operation.
[10051450] |In classification problems the fixed non-linearity introduced by the sigmoid output function is most efficiently dealt with using [[iteratively re-weighted least squares]].
[10051460] |RBF networks have the disadvantage of requiring good coverage of the input space by radial basis functions.
[10051470] |RBF centres are determined with reference to the distribution of the input data, but without reference to the prediction task.
[10051480] |As a result, representational resources may be wasted on areas of the input space that are irrelevant to the learning task.
[10051490] |A common solution is to associate each data point with its own centre, although this can make the linear system to be solved in the final layer rather large, and requires shrinkage techniques to avoid overfitting.
[10051500] |Associating each input datum with an RBF leads naturally to kernel methods such as [[Support Vector Machine]]s and Gaussian Processes (the RBF is the kernel function).
[10051510] |All three approaches use a non-linear kernel function to project the input data into a space where the learning problem can be solved using a linear model.
[10051520] |Like Gaussian Processes, and unlike SVMs, RBF networks are typically trained in a Maximum Likelihood framework by maximizing the probability (minimizing the error) of the data under the model.
[10051530] |SVMs take a different approach to avoiding overfitting by maximizing instead a margin.
[10051540] |RBF networks are outperformed in most classification applications by SVMs.
[10051550] |In regression applications they can be competitive when the dimensionality of the input space is relatively small.
[10051560] |===Kohonen self-organizings network===
[10051570] |The self-organizing map (SOM) invented by [[Teuvo Kohonen]] uses a form of [[unsupervised learning]].
[10051580] |A set of artificial neurons learn to map points in an input space to coordinates in an output space.
[10051590] |The input space can have different dimensions and topology from the output space, and the SOM will attempt to preserve these.
[10051600] |===Recurrent network===
[10051610] |Contrary to feedforward networks, [[recurrent neural network]]s (RNs) are models with bi-directional data flow.
[10051620] |While a feedforward network propagates data linearly from input to output, RNs also propagate data from later processing stages to earlier stages.
[10051630] |====Simple recurrent network====
[10051640] |A ''simple recurrent network'' (SRN) is a variation on the Multi-Layer Perceptron, sometimes called an "Elman network" due to its invention by [[Jeff Elman]].
[10051650] |A three-layer network is used, with the addition of a set of "context units" in the input layer.
[10051660] |There are connections from the middle (hidden) layer to these context units fixed with a weight of one.
[10051670] |At each time step, the input is propagated in a standard feed-forward fashion, and then a learning rule (usually back-propagation) is applied.
[10051680] |The fixed back connections result in the context units always maintaining a copy of the previous values of the hidden units (since they propagate over the connections before the learning rule is applied).
[10051690] |Thus the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard Multi-Layer Perceptron.
[10051700] |In a ''fully recurrent network'', every neuron receives inputs from every other neuron in the network.
[10051710] |These networks are not arranged in layers.
[10051720] |Usually only a subset of the neurons receive external inputs in addition to the inputs from all the other neurons, and another disjunct subset of neurons report their output externally as well as sending it to all the neurons.
[10051730] |These distinctive inputs and outputs perform the function of the input and output layers of a feed-forward or simple recurrent network, and also join all the other neurons in the recurrent processing.
[10051740] |====Hopfield network====
[10051750] |The [[Hopfield network]] is a recurrent neural network in which all connections are symmetric.
[10051760] |Invented by [[John Hopfield]] in 1982, this network guarantees that its dynamics will converge.
[10051770] |If the connections are trained using [[Hebbian learning]] then the Hopfield network can perform as robust content-addressable (or [[associative memory|associative]]) memory, resistant to connection alteration.
[10051780] |====Echo state network====
[10051790] |The [[echo state network]] (ESN) is a [[recurrent neural network]] with a sparsely connected random hidden layer.
[10051800] |The weights of output neurons are the only part of the network that can change and be learned.
[10051810] |ESN are good to (re)produce temporal patterns.
[10051820] |====Long short term memory network====
[10051830] |The [[Long short term memory]] is an artificial neural net structure that unlike traditional RNNs doesn't have the problem of vanishing gradients.
[10051840] |It can therefore use long delays and can handle signals that have a mix of low and high frequency components.
[10051850] |===Stochastic neural networks===
[10051860] |A [[stochastic neural network]] differs from a typical neural network because it introduces random variations into the network.
[10051870] |In a probabilistic view of neural networks, such random variations can be viewed as a form of [[statistical sampling]], such as [[Monte Carlo sampling]].
[10051880] |====Boltzmann machine====
[10051890] |The [[Boltzmann machine]] can be thought of as a noisy Hopfield network.
[10051900] |Invented by [[Geoff Hinton]] and [[Terry Sejnowski]] in 1985, the Boltzmann machine is important because it is one of the first neural networks to demonstrate learning of latent variables (hidden units).
[10051910] |Boltzmann machine learning was at first slow to simulate, but the [[contrastive divergence algorithm]] of Geoff Hinton (circa 2000) allows models such as Boltzmann machines and ''products of experts'' to be trained much faster.
[10051920] |===Modular neural networks===
[10051930] |Biological studies showed that the human brain functions not as a single massive network, but as a collection of small networks.
[10051940] |This realisation gave birth to the concept of [[modular neural networks]], in which several small networks cooperate or compete to solve problems.
[10051950] |====Committee of machines====
[10051960] |A [[Committee machine|committee of machines]] (CoM) is a collection of different neural networks that together "vote" on a given example.
[10051970] |This generally gives a much better result compared to other neural network models.
[10051980] |In fact in many cases, starting with the same architecture and training but using different initial random weights gives vastly different networks.
[10051990] |A CoM tends to stabilize the result.
[10052000] |The CoM is similar to the general [[machine learning]] ''[[bootstrap Aggregating|bagging]]'' method, except that the necessary variety of machines in the committee is obtained by training from different random starting weights rather than training on different randomly selected subsets of the training data.
[10052010] |====Associative neural network (ASNN)====
[10052020] |The ASNN is an extension of the ''committee of machines'' that goes beyond a simple/weighted average of different models. [http://cogprints.soton.ac.uk/documents/disk0/00/00/14/41/index.html ASNN] represents a combination of an ensemble of feed-forward neural networks and the k-nearest neighbor technique ([[Nearest_neighbor_(pattern_recognition)|kNN]]).
[10052030] |It uses the correlation between ensemble responses as a measure of '''distance''' amid the analyzed cases for the kNN.
[10052040] |This corrects the bias of the neural network ensemble.
[10052050] |An associative neural network has a memory that can coincide with the training set.
[10052060] |If new data becomes available, the network instantly improves its predictive ability and provides data approximation (self-learn the data) without a need to retrain the ensemble.
[10052070] |Another important feature of ASNN is the possibility to interpret neural network results by analysis of correlations between data cases in the space of models.
[10052080] |The method is demonstrated at [http://www.vcclab.org/lab/asnn www.vcclab.org], where you can either use it online or download it.
[10052090] |===Other types of networks===
[10052100] |These special networks do not fit in any of the previous categories.
[10052110] |====Holographic associative memory====
[10052120] |[[Holographic associative memory|''Holographic associative memory'']] represents a family of analog, correlation-based, associative, stimulus-response memories, where information is mapped onto the phase orientation of complex numbers operating.
[10052130] |====Instantaneously trained networks====
[10052140] |''[[Instantaneously trained neural networks]]'' (ITNNs) were inspired by the phenomenon of short-term learning that seems to occur instantaneously.
[10052150] |In these networks the weights of the hidden and the output layers are mapped directly from the training vector data.
[10052160] |Ordinarily, they work on binary data, but versions for continuous data that require small additional processing are also available.
[10052170] |====Spiking neural networks====
[10052180] |[[Spiking neural network]]s (SNNs) are models which explicitly take into account the timing of inputs.
[10052190] |The network input and output are usually represented as series of spikes (delta function or more complex shapes).
[10052200] |SNNs have an advantage of being able to process information in the [[time domain]] (signals that vary over time).
[10052210] |They are often implemented as recurrent networks.
[10052220] |SNNs are also a form of [[pulse computer]].
[10052230] |Networks of spiking neurons — and the temporal correlations of neural assemblies in such networks — have been used to model figure/ground separation and region linking in the visual system (see e.g. Reitboeck et.al.in Haken and Stadler: Synergetics of the Brain.
[10052240] |Berlin, 1989).
[10052250] |Gerstner and Kistler have a freely available online textbook on [http://diwww.epfl.ch/~gerstner/BUCH.html Spiking Neuron Models].
[10052260] |Spiking neural networks with axonal conduction delays exhibit polychronization, and hence could have a potentially unlimited memory capacity.
[10052270] |In June 2005 [[IBM]] announced construction of a [[Blue Gene]] [[supercomputer]] dedicated to the simulation of a large recurrent spiking neural network [http://domino.research.ibm.com/comm/pr.nsf/pages/news.20050606_CognitiveIntelligence.html].
[10052280] |====Dynamic neural networks====
[10052290] |[[Dynamic neural network]]s not only deal with nonlinear multivariate behaviour, but also include (learning of) time-dependent behaviour such as various transient phenomena and delay effects.
[10052300] |====Cascading neural networks====
[10052310] |''Cascade-Correlation'' is an architecture and [[supervised learning]] [[algorithm]] developed by [[Scott Fahlman]] and [[Christian Lebiere]].
[10052320] |Instead of just adjusting the weights in a network of fixed topology, Cascade-Correlation begins with a minimal network, then automatically trains and adds new hidden units one by one, creating a multi-layer structure.
[10052330] |Once a new hidden unit has been added to the network, its input-side weights are frozen.
[10052340] |This unit then becomes a permanent feature-detector in the network, available for producing outputs or for creating other, more complex feature detectors.
[10052350] |The Cascade-Correlation architecture has several advantages over existing algorithms: it learns very quickly, the network determines its own size and topology, it retains the structures it has built even if the training set changes, and it requires no [[back-propagation]] of error signals through the connections of the network.
[10052360] |See: [[Cascade correlation algorithm]].
[10052370] |====Neuro-fuzzy networks====
[10052380] |A neuro-fuzzy network is a [[fuzzy inference system]] in the body of an artificial neural network.
[10052390] |Depending on the ''FIS'' type, there are several layers that simulate the processes involved in a ''fuzzy inference'' like fuzzification, inference, aggregation and defuzzification.
[10052400] |Embedding an ''FIS'' in a general structure of an ''ANN'' has the benefit of using available ''ANN'' training methods to find the parameters of a fuzzy system.
[10052410] |====Holosemantic neural networks====
[10052420] |The holosemantic neural network invented by Manfred Hoffleisch uses a kind a genetic algorithm to build a multidimensional structure.
[10052430] |It takes into account the timing of inputs.
[10052440] |====Compositional pattern-producing networks====
[10052450] |[[Compositional pattern-producing network]]s (CPPNs) are a variation of ANNs which differ in their set of activation functions and how they are applied.
[10052460] |While typical ANNs often contain only [[sigmoid function]]s (and sometimes [[Gaussian function]]s), CPPNs can include both types of functions and many others.
[10052470] |Furthermore, unlike typical ANNs, CPPNs are applied across the entire space of possible inputs so that they can represent a complete image.
[10052480] |Since they are compositions of functions, CPPNs in effect encode images at infinite resolution and can be sampled for a particular display at whatever resolution is optimal.
[10052490] |==Theoretical properties==
[10052500] |===Computational power===
[10052510] |The multi-layer perceptron (MLP) is a universal function approximator, as proven by the [[Cybenko theorem]].
[10052520] |However, the proof is not constructive regarding the number of neurons required or the settings of the weights.
[10052530] |Work by [[Hava T. Siegelmann]] and [[Eduardo D. Sontag]] has provided a proof that a specific recurrent architecture with rational valued weights (as opposed to the commonly used floating point approximations) has the full power of a [[Universal Turing Machine]].
[10052540] |They have further shown that the use of irrational values for weights results in a machine with trans-Turing power.
[10052550] |===Capacity===
[10052560] |Artificial neural network models have a property called 'capacity', which roughly corresponds to their ability to model any given function.
[10052570] |It is related to the amount of information that can be stored in the network and to the notion of complexity.
[10052580] |===Convergence===
[10052590] |Nothing can be said in general about convergence since it depends on a number of factors.
[10052600] |Firstly, there may exist many local minima.
[10052610] |This depends on the cost function and the model.
[10052620] |Secondly, the optimization method used might not be guaranteed to converge when far away from a local minimum.
[10052630] |Thirdly, for a very large amount of data or parameters, some methods become impractical.
[10052640] |In general, it has been found that theoretical guarantees regarding convergence are not always a very reliable guide to practical application.
[10052650] |===Generalisation and statistics===
[10052660] |In applications where the goal is to create a system that generalises well in unseen examples, the problem of overtraining has emerged.
[10052670] |This arises in overcomplex or overspecified systems when the capacity of the network significantly exceeds the needed free parameters.
[10052680] |There are two schools of thought for avoiding this problem: The first is to use cross-validation and similar techniques to check for the presence of overtraining and optimally select hyperparameters such as to minimise the generalisation error.
[10052690] |The second is to use some form of ''regularisation''.
[10052700] |This is a concept that emerges naturally in a probabilistic (Bayesian) framework, where the regularisation can be performed by putting a larger prior probability over simpler models; but also in statistical learning theory, where the goal is to minimise over two quantities: the 'empirical risk' and the 'structural risk', which roughly correspond to the error over the training set and the predicted error in unseen data due to overfitting.
[10052710] |Supervised neural networks that use an [[Mean squared error|MSE]] cost function can use formal statistical methods to determine the confidence of the trained model.
[10052720] |The MSE on a validation set can be used as an estimate for variance.
[10052730] |This value can then be used to calculate the [[confidence interval]] of the output of the network, assuming a [[normal distribution]].
[10052740] |A confidence analysis made this way is statistically valid as long as the output [[probability distribution]] stays the same and the network is not modified.
[10052750] |By assigning a softmax activation function on the output layer of the neural network (or a softmax component in a component-based neural network) for categorical target variables, the outputs can be interpreted as posterior probabilities.
[10052760] |This is very useful in classification as it gives a certainty measure on classifications.
[10052770] |The softmax activation function:
[10052780] |===Dynamic properties===
[10052790] |Various techniques originally developed for studying disordered magnetic systems (i.e. the [[spin glass]]) have been successfully applied to simple neural network architectures, such as the Hopfield network.
[10052800] |Influential work by E. Gardner and B. Derrida has revealed many interesting properties about perceptrons with real-valued synaptic weights, while later work by W. Krauth and M. Mezard has extended these principles to binary-valued synapses.
[10060010] |Association for Computational Linguistics
[10060020] |The '''Association for Computational Linguistics''' ('''ACL''') is the international scientific and professional society for people working on problems involving [[natural language and computation]].
[10060030] |An annual meeting is held each summer in locations where significant [[computational linguistics]] research is carried out.
[10060040] |It was founded in [[1962]], originally named the '''Association for Machine Translation and Computational Linguistics''' ('''AMTCL''').
[10060050] |It became the ACL in [[1968]].
[10060060] |The ACL has [[Europe]]an and [[North America]]n chapters, the [[European Chapter of the Association for Computational Linguistics]] (EACL) and the [[North American Chapter of the Association for Computational Linguistics]] (NAACL).
[10060070] |The ACL journal, [[Computational Linguistics (journal)|''Computational Linguistics'']], continues to be the primary forum for research on computational linguistics and [[natural language processing]].
[10060080] |Since [[1988]], the journal has been published for the ACL by [[MIT Press]].
[10060090] |The ACL book series, [[Studies_in_NLP|''Studies in Natural Language Processing'']], is published by [[Cambridge University Press]].
[10060100] |==Special Interest Groups==
[10060110] |ACL has a large number of Special Interest Groups (SIGs), focusing on specific areas of natural language processing.
[10060120] |Some current SIGs within ACL are:
[10060130] |*Linguistic data and corpus-based approaches: [http://www.aclweb.org/sigdat SIGDAT]
[10060140] |*Dialogue Processing: [http://www.aclweb.org/sigdial SIGDIAL]
[10060150] |*Natural Language Generation: [http://www.siggen.org SIGGEN]
[10060160] |*Lexicon: [http://www.aclweb.org/siglex SIGLEX]
[10060170] |*Mathematics of Language: [http://molweb.org SIGMOL]
[10060180] |*Computational Morphology and Phonology: [http://salad.cs.swarthmore.edu/sigphon SIGMORPHON]
[10060190] |*Computational Semantics: [http://www.aclweb.org/sigsem/ SIGSEM]
[10070010] |Babel Fish (website)
[10070020] |'''Babel Fish''' is a [[World Wide Web|web]]-based application on [[Yahoo!]] that [[machine translation|machine translates]] text or web pages from one of several languages into another.
[10070030] |Developed by [[AltaVista]], the application is named after the fictional animal used for instantaneous [[language]] [[translation]] in [[Douglas Adams]]'s series ''[[The Hitchhiker's Guide to the Galaxy]].''
[10070040] |In turn the fish is a reference to the [[biblical]] account of the city of [[Babel]] and the [[Confusion of tongues|various languages]] said to have arisen there.
[10070050] |The translation technology for Babel Fish is provided by [[SYSTRAN]], whose technology also powers a number of other sites and portals.
[10070060] |It translates among [[English language|English]], [[Simplified Chinese]], [[Traditional Chinese]], [[Dutch language|Dutch]], [[French language|French]], [[German language|German]], [[Greek language|Greek]], [[Italian language|Italian]], [[Japanese language|Japanese]], [[Korean language|Korean]], [[Portuguese language|Portuguese]], [[Russian language|Russian]], and [[Spanish language|Spanish]].
[10070070] |The service makes no claim to produce a perfect translation.
[10070080] |A number of humour sites have sprung up that use the Babel Fish service to translate back and forth between one or more languages (a so-called [[round-trip translation]]).
[10070090] |After a long existence at babelfish.altavista.com, the site was moved on May 9 2008 to babelfish.yahoo.com.
[10080010] |Bioinformatics
[10080020] |'''Bioinformatics''' and '''computational biology''' involve the use of techniques including [[applied mathematics]], [[informatics]], [[statistics]], [[computer science]], [[artificial intelligence]], [[chemistry]], and [[biochemistry]] to solve [[biology|biological]] problems usually on the [[molecular]] level.
[10080030] |The core principle of these techniques is using computing resources in order to solve problems on scales of magnitude far too great for human discernment.
[10080040] |Research in computational biology often overlaps with [[systems biology]].
[10080050] |Major research efforts in the field include [[sequence alignment]], [[gene finding]], [[genome assembly]], [[protein structural alignment|protein structure alignment]], [[protein structure prediction]], prediction of [[gene expression]] and [[protein-protein interactions]], and the modeling of [[evolution]].
[10080060] |==Introduction==
[10080070] |The terms '''''bioinformatics''''' and ''[[computational biology]]'' are often used interchangeably.
[10080080] |However ''bioinformatics'' more properly refers to the creation and advancement of algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data.
[10080090] |Computational biology, on the other hand, refers to hypothesis-driven investigation of a specific biological problem using computers, carried out with experimental or simulated data, with the primary goal of discovery and the advancement of biological knowledge.
[10080100] |Put more simply, bioinformatics is concerned with the information while computational biology is concerned with the hypotheses.
[10080110] |A similar distinction is made by [[NIH|National Institutes of Health]] in their [http://www.bisti.nih.gov/CompuBioDef.pdf working definitions of Bioinformatics and Computational Biology], where it is further emphasized that there is a tight coupling of developments and knowledge between the more hypothesis-driven research in computational biology and technique-driven research in bioinformatics.
[10080120] |Bioinformatics is also often specified as an applied subfield of the more general discipline of [[Biomedical informatics]].
[10080130] |A common thread in projects in bioinformatics and computational biology is the use of mathematical tools to extract useful information from data produced by high-throughput biological techniques such as [[genome sequencing]].
[10080140] |A representative problem in bioinformatics is the assembly of high-quality genome sequences from fragmentary "shotgun" DNA [[sequencing]].
[10080150] |Other common problems include the study of [[gene regulation]] to perform [[expression profiling]] using data from [[DNA microarray|microarrays]] or [[mass spectrometry]].
[10080160] |==Major research areas==
[10080170] |===Sequence analysis===
[10080180] |Since the [[Phi-X174 phage|Phage Φ-X174]] was [[sequencing|sequenced]] in 1977, the [[DNA sequence]]s of hundreds of organisms have been decoded and stored in databases.
[10080190] |The information is analyzed to determine genes that encode [[polypeptides]], as well as regulatory sequences.
[10080200] |A comparison of genes within a [[species]] or between different species can show similarities between protein functions, or relations between species (the use of [[molecular systematics]] to construct [[phylogenetic tree]]s).
[10080210] |With the growing amount of data, it long ago became impractical to analyze DNA sequences manually.
[10080220] |Today, [[computer program]]s are used to search the [[genome]] of thousands of organisms, containing billions of [[nucleotide]]s.
[10080230] |These programs would compensate for mutations (exchanged, deleted or inserted bases) in the DNA sequence, in order to identify sequences that are related, but not identical.
[10080240] |A variant of this [[sequence alignment]] is used in the sequencing process itself.
[10080250] |The so-called [[shotgun sequencing]] technique (which was used, for example, by [[The Institute for Genomic Research]] to sequence the first bacterial genome, ''Haemophilus influenzae'') does not give a sequential list of nucleotides, but instead the sequences of thousands of small DNA fragments (each about 600-800 nucleotides long).
[10080260] |The ends of these fragments overlap and, when aligned in the right way, make up the complete genome.
[10080270] |Shotgun sequencing yields sequence data quickly, but the task of assembling the fragments can be quite complicated for larger genomes.
[10080280] |In the case of the [[Human Genome Project]], it took several months of CPU time (on a circa-2000 vintage [[DEC Alpha]] computer) to assemble the fragments.
[10080290] |Shotgun sequencing is the method of choice for virtually all genomes sequenced today, and genome assembly algorithms are a critical area of bioinformatics research.
[10080300] |Another aspect of bioinformatics in sequence analysis is the automatic [[gene finding|search for genes]] and regulatory sequences within a genome.
[10080310] |Not all of the nucleotides within a genome are genes.
[10080320] |Within the genome of higher organisms, large parts of the DNA do not serve any obvious purpose.
[10080330] |This so-called [[junk DNA]] may, however, contain unrecognized functional elements.
[10080340] |Bioinformatics helps to bridge the gap between genome and [[proteome]] projects--for example, in the use of DNA sequences for protein identification.
[10080350] |''See also:'' [[sequence analysis]], [[sequence profiling tool]], [[sequence motif]].
[10080360] |===Genome annotation===
[10080370] |In the context of [[genomics]], '''annotation''' is the process of marking the genes and other biological features in a DNA sequence.
[10080380] |The first genome annotation software system was designed in 1995 by Dr. Owen White, who was part of the team that sequenced and analyzed the first genome of a free-living organism to be decoded, the bacterium ''[[Haemophilus influenzae]]''.
[10080390] |Dr. White built a software system to find the genes (places in the DNA sequence that encode a protein), the transfer RNA, and other features, and to make initial assignments of function to those genes.
[10080400] |Most current genome annotation systems work similarly, but the programs available for analysis of genomic DNA are constantly changing and improving.
[10080410] |===Computational evolutionary biology===
[10080420] |[[Evolutionary biology]] is the study of the origin and descent of [[species]], as well as their change over time.
[10080430] |Informatics has assisted evolutionary biologists in several key ways; it has enabled researchers to:
[10080440] |*trace the evolution of a large number of organisms by measuring changes in their [[DNA]], rather than through [[physical taxonomy]] or physiological observations alone,
[10080450] |*more recently, compare entire [[genomes]], which permits the study of more complex evolutionary events, such as [[gene duplication]], [[lateral gene transfer]], and the prediction of factors important in bacterial [[speciation]],
[10080460] |*build complex computational models of populations to predict the outcome of the system over time
[10080470] |*track and share information on an increasingly large number of species and organisms
[10080480] |Future work endeavours to reconstruct the now more complex [[Evolutionary tree|tree of life]].
[10080490] |The area of research within [[computer science]] that uses [[genetic algorithm]]s is sometimes confused with [[computational evolutionary biology]], but the two areas are unrelated.
[10080500] |===Measuring biodiversity===
[10080510] |[[Biodiversity]] of an ecosystem might be defined as the total genomic complement of a particular environment, from all of the species present, whether it is a biofilm in an abandoned mine, a drop of sea water, a scoop of soil, or the entire [[biosphere]] of the planet [[Earth]].
[10080520] |Databases are used to collect the [[species]] names, descriptions, distributions, genetic information, status and size of [[population]]s, [[Habitat (ecology)|habitat]] needs, and how each organism interacts with other species.
[10080530] |Specialized [[computer software|software]] programs are used to find, visualize, and analyze the information, and most importantly, communicate it to other people.
[10080540] |Computer simulations model such things as population dynamics, or calculate the cumulative genetic health of a breeding pool (in [[agriculture]]) or endangered population (in [[conservation ecology|conservation]]).
[10080550] |One very exciting potential of this field is that entire [[DNA]] sequences, or [[genome]]s of [[endangered species]] can be preserved, allowing the results of Nature's genetic experiment to be remembered ''[[in silico]]'', and possibly reused in the future, even if that species is eventually lost.
[10080560] |''Important projects:'' [http://www.sp2000.org/ Species 2000 project]; [http://www.ubio.org/ uBio Project].
[10080570] |===Analysis of gene expression===
[10080580] |The [[gene expression|expression]] of many genes can be determined by measuring [[mRNA]] levels with multiple techniques including [[DNA microarray|microarrays]], [[expressed sequence tag|expressed cDNA sequence tag]] (EST) sequencing, [[serial analysis of gene expression]] (SAGE) tag sequencing, [[massively parallel signature sequencing]] (MPSS), or various applications of multiplexed in-situ hybridization.
[10080590] |All of these techniques are extremely noise-prone and/or subject to bias in the biological measurement, and a major research area in computational biology involves developing statistical tools to separate [[signal (information theory)|signal]] from [[noise]] in high-throughput gene expression studies.
[10080600] |Such studies are often used to determine the genes implicated in a disorder: one might compare microarray data from cancerous [[epithelial]] cells to data from non-cancerous cells to determine the transcripts that are up-regulated and down-regulated in a particular population of cancer cells.
[10080610] |===Analysis of regulation===
[10080620] |Regulation is the complex orchestration of events starting with an extracellular signal such as a [[hormone]] and leading to an increase or decrease in the activity of one or more [[protein]]s.
[10080630] |Bioinformatics techniques have been applied to explore various steps in this process.
[10080640] |For example, [[promoter analysis]] involves the identification and study of [[sequence motif]]s in the DNA surrounding the coding region of a gene.
[10080650] |These motifs influence the extent to which that region is transcribed into mRNA.
[10080660] |Expression data can be used to infer gene regulation: one might compare [[microarray]] data from a wide variety of states of an organism to form hypotheses about the genes involved in each state.
[10080670] |In a single-cell organism, one might compare stages of the [[cell cycle]], along with various stress conditions (heat shock, starvation, etc.).
[10080680] |One can then apply [[cluster analysis|clustering algorithms]] to that expression data to determine which genes are co-expressed.
[10080690] |For example, the upstream regions (promoters) of co-expressed genes can be searched for over-represented [[regulatory elements]].
[10080700] |===Analysis of protein expression===
[10080710] |[[Protein microarray]]s and high throughput (HT) [[mass spectrometry]] (MS) can provide a snapshot of the proteins present in a biological sample.
[10080720] |Bioinformatics is very much involved in making sense of protein microarray and HT MS data; the former approach faces similar problems as with microarrays targeted at mRNA, the latter involves the problem of matching large amounts of mass data against predicted masses from protein sequence databases, and the complicated statistical analysis of samples where multiple, but incomplete peptides from each protein are detected.
[10080730] |===Analysis of mutations in cancer===
[10080740] |In cancer, the genomes of affected cells are rearranged in complex or even unpredictable ways.
[10080750] |Massive sequencing efforts are used to identify previously unknown [[point mutation]]s in a variety of [[gene]]s in [[cancer]].
[10080760] |Bioinformaticians continue to produce specialized automated systems to manage the sheer volume of sequence data produced, and they create new algorithms and software to compare the sequencing results to the growing collection of [[human genome]] sequences and [[germline]] polymorphisms.
[10080770] |New physical detection technology are employed, such as [[oligonucleotide]] microarrays to identify chromosomal gains and losses (called [[comparative genomic hybridization]]), and [[single nucleotide polymorphism]] arrays to detect known ''point mutations''.
[10080780] |These detection methods simultaneously measure several hundred thousand sites throughout the genome, and when used in high-throughput to measure thousands of samples, generate [[terabyte]]s of data per experiment.
[10080790] |Again the massive amounts and new types of data generate new opportunities for bioinformaticians.
[10080800] |The data is often found to contain considerable variability, or [[noise]], and thus [[Hidden Markov model]] and [[change-point analysis]] methods are being developed to infer real [[copy number variation|copy number]] changes.
[10080810] |Another type of data that requires novel informatics development is the analysis of lesions found to be recurrent among many tumors .
[10080820] |===Prediction of protein structure===
[10080830] |Protein structure prediction is another important application of bioinformatics.
[10080840] |The [[amino acid]] sequence of a protein, the so-called [[primary structure]], can be easily determined from the sequence on the gene that codes for it.
[10080850] |In the vast majority of cases, this primary structure uniquely determines a structure in its native environment.
[10080860] |(Of course, there are exceptions, such as the [[bovine spongiform encephalopathy]] - aka [[Mad Cow Disease]] - [[prion]].)
[10080870] |Knowledge of this structure is vital in understanding the function of the protein.
[10080880] |For lack of better terms, structural information is usually classified as one of ''[[secondary structure|secondary]]'', ''[[tertiary structure|tertiary]]'' and ''[[quaternary structure|quaternary]]'' structure.
[10080890] |A viable general solution to such predictions remains an open problem.
[10080900] |As of now, most efforts have been directed towards heuristics that work most of the time.
[10080910] |One of the key ideas in bioinformatics is the notion of [[homology (biology)|homology]].
[10080920] |In the genomic branch of bioinformatics, homology is used to predict the function of a gene: if the sequence of gene ''A'', whose function is known, is homologous to the sequence of gene ''B,'' whose function is unknown, one could infer that B may share A's function.
[10080930] |In the structural branch of bioinformatics, homology is used to determine which parts of a protein are important in structure formation and interaction with other proteins.
[10080940] |In a technique called homology modeling, this information is used to predict the structure of a protein once the structure of a homologous protein is known.
[10080950] |This currently remains the only way to predict protein structures reliably.
[10080960] |One example of this is the similar protein homology between hemoglobin in humans and the hemoglobin in legumes ([[leghemoglobin]]).
[10080970] |Both serve the same purpose of transporting oxygen in the organism.
[10080980] |Though both of these proteins have completely different amino acid sequences, their protein structures are virtually identical, which reflects their near identical purposes.
[10080990] |Other techniques for predicting protein structure include protein threading and ''de novo'' (from scratch) physics-based modeling.
[10081000] |''See also:'' [[structural motif]] and [[structural domain]].
[10081010] |=== Comparative genomics ===
[10081020] |The core of comparative genome analysis is the establishment of the correspondence between [[genes]] (orthology analysis) or other genomic features in different organisms.
[10081030] |It is these intergenomic maps that make it possible to trace the evolutionary processes responsible for the divergence of two genomes.
[10081040] |A multitude of evolutionary events acting at various organizational levels shape genome evolution.
[10081050] |At the lowest level, point mutations affect individual nucleotides.
[10081060] |At a higher level, large chromosomal segments undergo duplication, lateral transfer, inversion, transposition, deletion and insertion.
[10081070] |Ultimately, whole genomes are involved in processes of hybridization, polyploidization and [[endosymbiosis]], often leading to rapid speciation.
[10081080] |The complexity of genome evolution poses many exciting challenges to developers of mathematical models and algorithms, who have recourse to a spectra of algorithmic, statistical and mathematical techniques, ranging from exact, [[heuristics]], fixed parameter and [[approximation algorithms]] for problems based on parsimony models to [[Markov Chain Monte Carlo]] algorithms for [[Bayesian analysis]] of problems based on probabilistic models.
[10081090] |Many of these studies are based on the homology detection and protein families computation.
[10081100] |===Modeling biological systems===
[10081110] |Systems biology involves the use of [[computer simulation]]s of [[cell (biology)|cellular]] subsystems (such as the [[metabolic network|networks of metabolites]] and [[enzyme]]s which comprise [[metabolism]], [[signal transduction]] pathways and [[gene regulatory network]]s) to both analyze and visualize the complex connections of these cellular processes.
[10081120] |[[Artificial life]] or virtual evolution attempts to understand evolutionary processes via the computer simulation of simple (artificial) life forms.
[10081130] |===High-throughput image analysis===
[10081140] |Computational technologies are used to accelerate or fully automate the processing, quantification and analysis of large amounts of high-information-content [[biomedical imagery]].
[10081150] |Modern image analysis systems augment an observer's ability to make measurements from a large or complex set of images, by improving [[accuracy]], [[Objectivity (science)|objectivity]], or speed.
[10081160] |A fully developed analysis system may completely replace the observer.
[10081170] |Although these systems are not unique to biomedical imagery, biomedical imaging is becoming more important for both [[diagnostics]] and research.
[10081180] |Some examples are:
[10081190] |* high-throughput and high-fidelity quantification and sub-cellular localization ([[high-content screening]], [[cytohistopathology]])
[10081200] |* [[morphometrics]]
[10081210] |* clinical image analysis and visualization
[10081220] |* determining the real-time air-flow patterns in breathing lungs of living animals
[10081230] |* quantifying occlusion size in real-time imagery from the development of and recovery during arterial injury
[10081240] |* making behavioral observations from extended video recordings of laboratory animals
[10081250] |* infrared measurements for metabolic activity determination
[10081260] |===Protein-protein docking===
[10081270] |In the last two decades, tens of thousands of protein three-dimensional structures have been determined by [[X-ray crystallography]] and [[Protein nuclear magnetic resonance spectroscopy]] (protein NMR).
[10081280] |One central question for the biological scientist is whether it is practical to predict possible protein-protein interactions only based on these 3D shapes, without doing [[protein-protein interaction]] experiments.
[10081290] |A variety of methods have been developed to tackle the [[Protein-protein docking]] problem, though it seems that there is still much place to work on in this field.
[10081300] |===Software and Tools===
[10081310] |Software tools for bioinformatics range from simple command-line tools, to more complex graphical programs and standalone web-services.
[10081320] |The computational biology tool best-known among biologists is probably [[BLAST]], an algorithm for determining the similarity of arbitrary sequences against other sequences, possibly from curated databases of protein or DNA sequences.
[10081330] |The [[National Center for Biotechnology Information|NCBI]] provides a popular web-based implementation that searches their databases.
[10081340] |BLAST is one of a number of [[List of sequence alignment software|generally available programs]] for doing sequence alignment.
[10081350] |===Web Services in Bioinformatics===
[10081360] |[[SOAP]] and [[REST]]-based interfaces have been developed for a wide variety of bioinformatics applications allowing an application running on one computer in one part of the world to use algorithms, data and computing resources on servers in other parts of the world.
[10081370] |The main advantages lay in the end user not having to deal with software and database maintenance overheads Basic bioinformatics services are classified by the [[EBI]] into three categories: [[Sequence alignment software|SSS]] (Sequence Search Services), [[Multiple sequence alignment|MSA]] (Multiple Sequence Alignment) and [[Bioinformatics#Sequence_analysis|BSA]] (Biological Sequence Analysis).
[10081380] |The availability of these [[service-oriented]] bioinformatics resources demonstrate the applicability of web based bioinformatics solutions, and range from a collection of standalone tools with a common data format under a single, standalone or web-based interface, to integrative, distributed and extensible [[bioinformatics workflow management systems]].
[10090010] |BLEU
[10090020] |:''This page is about the evaluation metric for machine translation.
[10090030] |For other meanings, please see [[Bleu]].''
[10090040] |'''BLEU''' ('''Bilingual Evaluation Understudy''') is a method for evaluating the quality of text which has been translated from one [[natural language]] to another using [[machine translation]].
[10090050] |BLEU was one of the first [[software metric]]s to report high [[correlation]] with human judgements of quality.
[10090060] |The metric is currently one of the most popular in the field.
[10090070] |The central idea behind the metric is that, "the closer a machine translation is to a professional human translation, the better it is".
[10090080] |The metric calculates scores for individual segments, generally [[Sentence (linguistics)|sentence]]s, and then averages these scores over the whole [[corpus]] in order to reach a final score.
[10090090] |It has been shown to correlate highly with human judgements of quality at the corpus level.
[10090100] |The quality of translation is indicated as a number between 0 and 1 and is measured as statistical closeness to a given set of good quality human reference translations.
[10090110] |Therefore, it does not directly take into account translation intelligibility or grammatical correctness.
[10090120] |The metric works by measuring the [[n-gram]] co-occurrence between a given translation and the set of reference translations and then taking the weighted [[geometric mean]].
[10090130] |BLEU is specifically designed to approximate human judgement on a [[corpus]] level and performs badly if used to evaluate the quality of isolated sentences.
[10090140] |==Algorithm==
[10090150] |BLEU uses a modified form of [[precision]] to compare a candidate translation against multiple reference translations.
[10090160] |The metric modifies simple precision since machine translation systems have been known to generate more words than appear in a reference text.
[10090170] |This is illustrated in the following example from Papineni et al. (2002),
[10090180] |In this example, the candidate text is given a unigram precision of,
[10090190] |:
[10090200] |Of the seven words in the candidate translation, all of them appear in the reference translations.
[10090210] |This presents a problem for a metric, as the candidate translation above is complete nonsense, retaining none of the content of either of the references.
[10090220] |The modification that BLEU makes is fairly straightforward.
[10090230] |For each word in the candidate translation, the algorithm takes the maximum total count in the reference translations.
[10090240] |Taking the example above, the word 'the' appears twice in reference 1, and once in reference 2.
[10090250] |The largest value is taken, in this case '2' as the "maximum reference count".
[10090260] |For each of the words in the candidate translation, the count of the word is compared against the maximum reference count, and the lowest value is taken.
[10090270] |In this case, the count of the word 'the' in the candidate translation is '7', while the maximum reference count for the word is '2'.
[10090280] |This "modified count" is then divided by the total number of words in the candidate translation.
[10090290] |In the above example, the modified unigram precision score would be,
[10090300] |:
[10090310] |The above method is used to calculate scores for each .
[10090320] |The value of which has the "highest correlation with monolingual human judgements" was found to be 4.
[10090330] |The unigram scores are found to account for the adequacy of the translation, in other words, how much information is retained in the translation.
[10090340] |The longer -gram scores account for the fluency of the translation, or to what extent it reads like "good English".
[10090350] |The modification made to precision does not solve the problem of short translations.
[10090360] |Short translations can produce very high precision scores, even using modified precision.
[10090370] |An example of a candidate translation for the same references as above might be:
[10090380] |:the cat
[10090390] |In this example, the modified unigram precision would be,
[10090400] |:
[10090410] |as the word 'the' and the word 'cat' appear once each in the candidate, and the total number of words is two.
[10090420] |The modified bigram precision would be as the bigram, "the cat" appears once in the candidate.
[10090430] |It has been pointed out that precision is usually twinned with [[recall]] to overcome this problem , as the unigram recall of this example would be or .
[10090440] |The problem being that as there are multiple reference translations, a bad translation could easily have an inflated recall, such as a translation which consisted of all the words in each of the references.
[10090450] |In order to produce a score for the whole corpus, the modified precision scores for the segments are combined using the [[geometric mean]], multiplied by a brevity penalty, whose purpose is to prevent very short candidates from receiving too high a score.
[10090460] |Let be the total length of the reference corpus, and the total length of the translation corpus.
[10090470] |If , the brevity penalty applies and is defined to be .
[10090480] |(In the case of multiple reference sentences, is taken to be the sum of the lengths of the sentences whose lengths are closest to the lengths of the candidate sentences.
[10090490] |However, in the version of the metric used by [[NIST]], the short reference sentence is used.)
[10090500] |==Performance==
[10090510] |BLEU has frequently been reported as correlating well with human judgement, and certainly remains a benchmark for any new evaluation metric to beat.
[10090520] |There are however a number of criticisms that have been voiced.
[10090530] |It has been noted that while in theory capable of evaluating any language, BLEU does not in the present form work on languages without word boundaries.
[10090540] |It has been argued that although BLEU certainly has significant advantages, there is no guarantee that an increase in BLEU score is an indicator of improved translation quality.
[10090550] |As BLEU scores are taken at the corpus level, it is difficult to give a textual example.
[10090560] |Nevertheless, they highlight two instances where BLEU seriously underperformed.
[10090570] |These were the 2005 [[NIST]] evaluations where a number of different machine translation systems were tested, and their study of the [[SYSTRAN]] engine versus two engines using [[statistical machine translation]] (SMT) techniques.
[10090580] |In the 2005 NIST evaluation, they report that the scores generated by BLEU failed to correspond to the scores produced in the human evaluations.
[10090590] |The system which was ranked highest by the human judges was only ranked 6th by BLEU.
[10090600] |In their study, they compared SMT systems with SYSTRAN, a knowledge based system.
[10090610] |The scores from BLEU for SYSTRAN were substantially worse than the scores given to SYSTRAN by the human judges.
[10090620] |They note that the SMT systems were trained using BLEU minimum error rate training, and point out that this could be one of the reasons behind the difference.
[10090630] |They conclude by recommending that BLEU be used in a more restricted manner, for comparing the results from two similar systems, and for tracking "broad, incremental changes to a single system".
[10100010] |Business intelligence
[10100020] |'''Business intelligence''' ('''BI''') refers to technologies, applications and practices for the collection, integration, analysis, and presentation of business [[information]] and sometimes to the information itself.
[10100030] |The purpose of business intelligence--a term that dates at least to 1958--is to support better business decision making.
[10100040] |Thus, BI is also described as a [[decision support system]] (DSS):
[10100050] |
BI is sometimes used interchangeably with briefing books, report and query tools and executive information systems.
[10100060] |In general, business intelligence systems are data-driven DSS.
[10100070] |BI systems provide historical, current, and predictive views of business operations, most often using data that has been gathered into a [[data warehouse]] or a [[data mart]] and occasionally working from operational data.
[10100080] |Software elements support the use of this information by assisting in the extraction, analysis, and reporting of information.
[10100090] |Applications tackle sales, production, financial, and many other sources of business data for purposes that include, notably, [[business performance management]].
[10100100] |Information may be gathered on comparable companies to produce [[benchmarking|benchmarks]].
[10100110] |==History==
[10100120] |Prior to the start of the [[Information Age]] in the late 20th century, businesses had to collect data from non-automated sources.
[10100130] |Businesses then lacked the computing resources necessary to properly analyze the data, and as a result, companies often made business decisions primarily on the basis of [[intuition (knowledge)|intuition]].
[10100140] |As businesses automated systems the amount of data increased but its collection remained difficult due to the inability of information to be moved between or within systems.
[10100150] |Analysis of information informed for long-term decision making, but was slow and often required the use of instinct or expertise to make short-term decisions.
[10100160] |Business intelligence was defined in 1958 by [[Hans Peter Luhn]], who wrote,
[10100170] | In this paper, business is a collection of activities carried on for whatever purpose, be it science, technology, commerce, industry, law, government, defense, et cetera.
[10100180] |The communication facility serving the conduct of a business (in the broad sense) may be referred to as an intelligence system.
[10100190] |The notion of intelligence is also defined here, in a more general sense, as "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal."
[10100200] |In 1989 Howard Dresner, later a [[Gartner Group]] analyst, popularized BI as an umbrella term to describe "concepts and methods to improve business decision making by using fact-based support systems."
[10100210] |In modern businesses the use of standards, automation and specialized software, including [[Online analytical processing|analytical tools]], allows large volumes of data to be [[Extract, transform, load|extracted, transformed, loaded]] and [[Data warehouse|warehoused]] to greatly increase the speed at which information becomes available for decision-making.
[10100220] |===Key intelligence topics===
[10100230] |Business intelligence often uses [[key performance indicators]] (KPIs) to assess the present state of business and to prescribe a course of action.
[10100240] |Examples of KPIs are things such as lead conversion rate (in sales) and inventory turnover (in inventory management).
[10100250] |Prior to the widespread adoption of computer and web applications, when information had to be manually input and calculated, performance data was often not available for weeks or months.
[10100260] |Recently, banks have tried to make data available at shorter intervals and have reduced delays.
[10100270] |The KPI methodology was further expanded with the Chief Performance Officer methodology which incorporated KPIs and root cause analysis into a single methodology.
[10100280] |Businesses that face higher operational/[[credit risk]] loading, such as [[credit card]] companies and "wealth management" services, often make KPI-related data available weekly.
[10100290] |In some cases, companies may even offer a daily analysis of data.
[10100300] |This fast pace requires analysts to use [[information technology|IT]] [[system]]s to process this large volume of data.
[10110010] |Chatterbot
[10110020] |A '''chatterbot''' (or chatbot) is a type of conversational agent, a [[computer program]] designed to simulate an intelligent [[conversation]] with one or more human users via auditory or textual methods.
[10110030] |In other words, a chatterbot is a computer program with artificial intelligence to talk to people through voices or typed words.
[10110040] |Though many appear to be intelligently interpreting the human input prior to providing a response, most chatterbots simply scan for keywords within the input and pull a reply with the most matching keywords or the most similar wording pattern from a local [[database]].
[10110050] |Chatterbots may also be referred to as ''talk bots'', ''chat bots'', or ''chatterboxes''.
[10110060] |== Method of operation ==
[10110070] |A good understanding of a conversation is required to carry on a meaningful dialog but most chatterbots do not attempt this.
[10110080] |Instead they "converse" by recognizing cue words or phrases from the human user, which allows them to use pre-prepared or pre-calculated responses which can move the conversation on in an apparently meaningful way without requiring them to know what they are talking about.
[10110090] |For example, if a human types, "I am feeling very worried lately," the chatterbot may be programmed to recognize the phrase "I am" and respond by replacing it with "Why are you" plus a question mark at the end, giving the answer, "Why are you feeling very worried lately?"
[10110100] |A similar approach using keywords would be for the program to answer any comment including ''(Name of celebrity)'' with "I think they're great, don't you?"
[10110110] |Humans, especially those unfamiliar with chatterbots, sometimes find the resulting conversations engaging.
[10110120] |Critics of chatterbots call this engagement the [[ELIZA effect]].
[10110130] |Some programs classified as chatterbots use other principles.
[10110140] |One example is [[Jabberwacky]], which attempts to model the way humans learn new facts and language.
[10110150] |[[Ellaz Systems|ELLA]] attempts to use [[natural language processing]] to make more useful responses from a human's input.
[10110160] |Some programs that use natural language conversation, such as [[SHRDLU]], are not generally classified as chatterbots because they link their speech ability to knowledge of a simulated world.
[10110170] |This type of link requires a more complex [[artificial intelligence]] (eg., a "vision" system) than standard chatterbots have.
[10110180] |== Early chatterbots ==
[10110190] |The classic early chatterbots are [[ELIZA]] and [[PARRY]].
[10110200] |More recent programs are [[Racter]], [[Verbot]]s, [[Artificial Linguistic Internet Computer Entity|A.L.I.C.E.]], and [[Ellaz Systems|ELLA]].
[10110210] |The growth of chatterbots as a research field has created an expansion in their purposes.
[10110220] |While ELIZA and PARRY were used exclusively to simulate typed conversation, [[Racter]] was used to "write" a story called ''The Policeman's Beard is Half Constructed''.
[10110230] |ELLA includes a collection of games and functional features to further extend the potential of chatterbots.
[10110240] |The term "ChatterBot" was coined by [[Michael Loren Mauldin|Michael Mauldin]] (Creator of the first [[Verbot]], Julia) in 1994 to describe these conversational programs.
[10110250] |== Malicious chatterbots ==
[10110260] |Malicious chatterbots are frequently used to fill chat rooms with spam and advertising, or to entice people into revealing personal information, such as bank account numbers.
[10110270] |They are commonly found on [[Yahoo! Messenger]], [[Windows Live Messenger]], [[AOL Instant Messenger]] and other [[instant messaging]] protocols.
[10110280] |There has been a published report of a chatterbot used in a fake personal ad on a dating service's website.
[10110290] |==Chatterbots in modern AI==
[10110300] |Most modern AI research focuses on practical engineering tasks.
[10110310] |This is known as weak AI and is distinguished from [[strong AI]], which would require [[sapience]] and reasoning abilities.
[10110320] |One pertinent field of AI research is natural language.
[10110330] |Usually weak AI fields employ specialised software or programming languages created for them.
[10110340] |For example, one of the 'most-human' natural language chatterbots, [[Artificial Linguistic Internet Computer Entity|A.L.I.C.E.]], uses a programming language called AIML that is specific to its program, and its various clones, named Alicebots.
[10110350] |Nevertheless, A.L.I.C.E. is still based on pattern matching without any reasoning.
[10110360] |This is the same technique [[ELIZA]], the first chatterbot, was using back in 1966.
[10110370] |Australian company MyCyberTwin also deals in strong AI, allowing users to create and sustain their own virtual personalities online.
[10110380] |MyCyberTwin.com also works in a corporate setting, allowing companies to set up Virtual AI Assistants.
[10110390] |Another notable program, known as [[Jabberwacky]], also deals in strong AI, as it is claimed to learn new responses based on user interactions, rather than being driven from a static database like many other existing chatterbots.
[10110400] |Although such programs show initial promise, many of the existing results in trying to tackle the problem of natural language still appear fairly poor, and it seems reasonable to state that there is currently no general purpose conversational artificial intelligence.
[10110410] |This has led some software developers to focus more on the practical aspect of chatterbot technology - information retrieval.
[10110420] |A common rebuttal often used within the AI community against criticism of such approaches asks, "How do we know that humans don't also just follow some cleverly devised rules?" (in the way that Chatterbots do).
[10110430] |Two famous examples of this line of argument against the rationale for the basis of the Turing test are John Searle's [[Chinese room]] argument and Ned Block's [[Intentional stance|Blockhead argument]].
[10110440] |==Chatterbots/Virtual Assistants in Commercial Environments==
[10110450] |Automated Conversational Systems have progressed and evolved far from the original designs of the first widely used chatbots.
[10110460] |In the UK, large commercial entities such as Lloyds TSB, Royal Bank of Scotland, Renault, Citroën and One Railway are already utilizing Virtual Assistants to reduce expenditures on Call Centres and provide a first point of contact that can inform the user exactly of points of interest, provide support, capture data from the user and promote products for sale.
[10110470] |In the UK, new projects and research are being conducted to introduce a Virtual Assistant into the classroom to assist the teacher.
[10110480] |This project is the first of its kind and the chatbot VA in question is based on the Yhaken [http://www.elzware.com] chatbot design.
[10110490] |The Yhaken template provides a further move forward in Automated Conversational Systems with features such as complex conversational routing and responses, well defined personality, a complex hierarchical construct with additional external reference points, emotional responses and in depth small talk, all to make the experience more interactive and involving for the user.
[10110500] |==Annual contests for chatterbots==
[10110510] |Many organizations tries to encourage and support developers all over the world to develop chatterbots that able to do variety of tasks and compete with each other through [[turing test]]s and more.
[10110520] |Annual contests are organized at the following links:
[10110530] |*[http://www.chatterboxchallenge.com The Chatterbox Challenge]
[10110540] |*[http://www.loebner.net/Prizef/loebner-prize.html The Loebner Prize]
[10120010] |Computational linguistics
[10120020] |'''Computational linguistics''' is an [[interdisciplinary]] field dealing with the [[Statistics|statistical]] and/or rule-based modeling of [[natural language]] from a computational perspective.
[10120030] |This modeling is not limited to any particular field of [[linguistics]].
[10120040] |Traditionally, computational linguistics was usually performed by [[computer scientist]]s who had specialized in the application of computers to the processing of a [[natural language]].
[10120050] |Recent research has shown that human language is much more complex than previously thought, so computational linguists often work as members of interdisciplinary teams, including linguists (specifically trained in linguistics), language experts (persons with some level of ability in the languages relevant to a given project), and computer scientists.
[10120060] |In general computational linguistics draws upon the involvement of linguists, [[computer science|computer scientists]], experts in [[artificial intelligence]], [[cognitive psychology|cognitive psychologists]], [[math]]ematicians, and [[logic]]ians, amongst others.
[10120070] |==Origins==
[10120080] |Computational linguistics as a field predates [[artificial intelligence]], a field under which it is often grouped.
[10120090] |Computational linguistics originated with efforts in the [[United States]] in the 1950s to use computers to automatically translate texts from foreign languages, particularly [[Russian language|Russian]] scientific journals, into English.
[10120100] |Since computers had proven their ability to do [[arithmetic]] much faster and more accurately than humans, it was thought to be only a short matter of time before the technical details could be taken care of that would allow them the same remarkable capacity to process language.
[10120110] |When [[machine translation]] (also known as mechanical translation) failed to yield accurate translations right away, automated processing of human languages was recognized as far more complex than had originally been assumed.
[10120120] |Computational linguistics was born as the name of the new field of study devoted to developing [[algorithm]]s and [[software]] for intelligently processing language data.
[10120130] |When artificial intelligence came into existence in the 1960s, the field of computational linguistics became that sub-division of artificial intelligence dealing with human-level comprehension and production of natural languages.
[10120140] |In order to translate one language into another, it was observed that one had to understand the [[grammar]] of both languages, including both [[morphology (linguistics)|morphology]] (the grammar of word forms) and [[syntax]] (the grammar of sentence structure).
[10120150] |In order to understand syntax, one had to also understand the [[semantics]] and the [[lexicon]] (or 'vocabulary'), and even to understand something of the [[pragmatics]] of language use.
[10120160] |Thus, what started as an effort to translate between languages evolved into an entire discipline devoted to understanding how to represent and process natural languages using computers.
[10120170] |==Subfields==
[10120180] |Computational linguistics can be divided into major areas depending upon the medium of the language being processed, whether spoken or textual; and upon the task being performed, whether analyzing language (recognition) or synthesizing language (generation).
[10120190] |[[Speech recognition]] and [[speech synthesis]] deal with how spoken language can be understood or created using computers.
[10120200] |Parsing and generation are sub-divisions of computational linguistics dealing respectively with taking language apart and putting it together.
[10120210] |Machine translation remains the sub-division of computational linguistics dealing with having computers translate between languages.
[10120220] |Some of the areas of research that are studied by computational linguistics include:
[10120230] |*Computer aided [[corpus linguistics]]
[10120240] |*Design of [[parser]]s or [[phrase chunking|chunkers]] for [[natural language]]s
[10120250] |*Design of taggers like [[Part-of-speech tagging|POS-taggers (part-of-speech taggers)]]
[10120260] |*Definition of specialized logics like resource logics for [[Natural language processing|NLP]]
[10120270] |*Research in the relation between formal and natural languages in general
[10120280] |*[[Machine translation]], e.g. by a translating computer
[10120290] |*[[Computational complexity]] of natural language, largely modeled on [[automata theory]], with the application of [[context-sensitive grammar]] and [[Linear bounded automaton|linearly-bounded]] [[Turing machine]]s.
[10120300] |The [[Association for Computational Linguistics]] defines computational linguistics as:
[10120310] |:...the scientific study of [[language]] from a computational perspective.
[10120320] |Computational linguists are interested in providing [[computational model]]s of various kinds of linguistic phenomena.
[10130010] |Computer program
[10130020] |'''Computer programs''' (also '''[[Computer software|software programs]]''', or just '''programs''') are [[Instruction (computer science)|instructions]] for a [[computer]].
[10130030] |A computer requires programs to function, and a computer program does nothing unless its instructions are executed by a [[Central processing unit|central processor]].
[10130040] |Computer programs are usually [[executable]] programs or the [[source code]] from which executable programs are derived (e.g., [[compiler|compiled]]).
[10130050] |Computer source code is often written by professional [[computer programmer]]s.
[10130060] |Source code is written in a [[programming language]] that usually follows one of two main [[Programming paradigm|paradigms]]: [[imperative programming|imperative]] or [[declarative language|declarative]] programming.
[10130070] |Source code may be converted into an [[executable file]] (sometimes called an executable program) by a [[compiler]].
[10130080] |Alternatively, computer programs may be executed by a [[central processing unit]] with the aid of an [[Interpreter (computing)|interpreter]], or may be [[firmware|embedded]] directly into [[Computer hardware|hardware]].
[10130090] |Computer programs may be categorized along functional lines: [[system software]] and [[application software]].
[10130100] |And many computer programs may run simultaneously on a single computer, a process known as [[computer multitasking|multitasking]].
[10130110] |==Programming==
[10130120] |main()
[10130130] |{
[10130140] |output_string("Hello world!");
[10130150] |}
[10130160] |Source code of a program written in the [[C programming language]]
[10130170] |[[Computer programming]] is the iterative process of writing or editing [[source code]].
[10130180] |Editing source code involves testing, analyzing, and refining.
[10130190] |A person who practices this skill is referred to as a computer [[programmer]] or software developer.
[10130200] |The sometimes lengthy process of computer programming is usually referred to as [[software development]].
[10130210] |The term [[software engineering]] is becoming popular as the process is seen as an [[engineering]] discipline.
[10130220] |=== Paradigms ===
[10130230] |Computer programs can be categorized by the [[programming language]] [[Programming paradigm|paradigm]] used to produce them.
[10130240] |Two of the main paradigms are [[imperative programming|imperative]] and [[declarative language|declarative]].
[10130250] |Programs written using an imperative language specify an [[algorithm]] using declarations, expressions, and statements.
[10130260] |A declaration associates a [[variable]] name with a [[datatype]].
[10130270] |For example: var x: integer;
.
[10130280] |An expression yields a value.
[10130290] |For example: 2 + 2
yields 4.
[10130300] |Finally, a statement might assign an expression to a variable or use the value of a variable to alter the program's control flow.
[10130310] |For example: x := 2 + 2; if x = 4 then do_something();
One criticism of imperative languages is the side-effect of an assignment statement on a class of variables called non-local variables.
[10130320] |Programs written using a declarative language specify the properties that have to be met by the output and do not specify any implementation details.
[10130330] |Two broad categories of declarative languages are [[functional language]]s and [[logical language]]s.
[10130340] |The principle behind functional languages (like [[Haskell (programming language)|Haskell]]) is to not allow side-effects, which makes it easier to reason about programs like mathematical functions.
[10130350] |The principle behind logical languages (like [[Prolog]]) is to define the problem to be solved — the goal — and leave the detailed solution to the Prolog system itself.
[10130360] |The goal is defined by providing a list of subgoals.
[10130370] |Then each subgoal is defined by further providing a list of its subgoals, etc.
[10130380] |If a path of subgoals fails to find a solution, then that subgoal is [[Backtracking|backtracked]] and another path is systematically attempted.
[10130390] |The form in which a program is created may be textual or visual.
[10130400] |In a [[visual language]] program, elements are graphically manipulated rather than textually specified.
[10130410] |===Compilation or interpretation===
[10130420] |A ''computer program'' in the form of a [[human-readable]], computer programming language is called [[source code]].
[10130430] |Source code may be converted into an [[executable file|executable image]] by a [[compiler]] or executed immediately with the aid of an [[Interpreter (computing)|interpreter]].
[10130440] |Compiled computer programs are commonly referred to as executables, binary images, or simply as [[binary file|binaries]] — a reference to the [[binary numeral system|binary]] [[file format]] used to store the executable code.
[10130450] |Compilers are used to translate source code from a programming language into either [[object file|object code]] or [[machine code]].
[10130460] |Object code needs further processing to become machine code, and machine code is the [[Central processing unit|Central Processing Unit]]'s native [[microcode|code]], ready for execution.
[10130470] |Interpreted computer programs are either decoded and then immediately executed or are decoded into some efficient intermediate representation for future execution.
[10130480] |[[BASIC]], [[Perl]], and [[Python (programming language)|Python]] are examples of immediately executed computer programs.
[10130490] |Alternatively, [[Java (programming language)|Java]] computer programs are compiled ahead of time and stored as a machine independent code called [[bytecode]].
[10130500] |Bytecode is then executed upon request by an interpreter called a [[virtual machine]].
[10130510] |The main disadvantage of interpreters is computer programs run slower than if compiled.
[10130520] |Interpreting code is slower than running the compiled version because the interpreter must [[decode]] each [[Statement (programming)|statement]] each time it is loaded and then perform the desired action.
[10130530] |On the other hand, software development may be quicker using an interpreter because testing is immediate when the compilation step is omitted.
[10130540] |Another disadvantage of interpreters is the interpreter must be present on the computer at the time the computer program is executed.
[10130550] |Alternatively, compiled computer programs need not have the compiler present at the time of execution.
[10130560] |No properties of a programming language require it to be exclusively compiled or exclusively interpreted.
[10130570] |The categorization usually reflects the most popular method of language execution.
[10130580] |For example, BASIC is thought of as an interpreted language and C a compiled language, despite the existence of BASIC compilers and C interpreters.
[10130590] |===Self-modifying programs===
[10130600] |A computer program in [[execution (computers)|execution]] is normally treated as being different from the [[data (computing)|data]] the program operates on.
[10130610] |However, in some cases this distinction is blurred when a computer program modifies itself.
[10130620] |The modified computer program is subsequently executed as part of the same program.
[10130630] |[[Self-modifying code]] is possible for programs written in [[Lisp programming language|Lisp]], [[cobol|COBOL]], and [[Prolog]].
[10130640] |==Execution and storage==
[10130650] |Typically, computer programs are stored in [[non-volatile memory]] until requested either directly or indirectly to be [[execution (computers)|executed]] by the computer user.
[10130660] |Upon such a request, the program is loaded into [[random access memory]], by a computer program called an [[operating system]], where it can be accessed directly by the central processor.
[10130670] |The central processor then executes ("runs") the program, instruction by instruction, until termination.
[10130680] |A program in execution is called a [[Process (computing)|process]].
[10130690] |Termination is either by normal self-termination or by error — software or hardware error.
[10130700] |===Embedded programs===
[10130710] |Some computer programs are embedded into hardware.
[10130720] |A [[stored-program computer]] requires an initial computer program stored in its [[read-only memory]] to [[booting|boot]].
[10130730] |The boot process is to identify and initialize all aspects of the system, from [[Processor register|CPU registers]] to [[Device driver|device controllers]] to [[Volatile memory|memory]] contents.
[10130740] |Following the initialization process, this initial computer program loads the [[operating system]] and sets the [[program counter]] to begin normal operations.
[10130750] |Independent of the host computer, a [[Peripheral|hardware device]] might have embedded [[firmware]] to control its operation.
[10130760] |Firmware is used when the computer program is rarely or never expected to change, or when the program must not be lost when the power is off.
[10130770] |===Manual programming===
[10130780] |Computer programs historically were manually input to the central processor via switches.
[10130790] |An instruction was represented by a configuration of on/off settings.
[10130800] |After setting the configuration, an execute button was pressed.
[10130810] |This process was then repeated.
[10130820] |Computer programs also historically were manually input via [[paper tape]] or [[punched cards]].
[10130830] |After the medium was loaded, the starting address was set via switches and the execute button pressed.
[10130840] |===Automatic program generation===
[10130850] |[[Generative programming]] is a style of [[computer programming]] that creates [[source code]] through [[generic programming|generic]] [[class (computer science)|classes]], [[Prototype-based programming|prototypes]], [[template (programming)|template]]s, [[aspect (computer science)|aspect]]s, and [[Code generation (compiler)|code generator]]s to improve [[programmer]] productivity.
[10130860] |Source code is generated with [[programming tool]]s such as a [[template processor]] or an [[Integrated development environment|Integrated Development Environment]].
[10130870] |The simplest form of source code generator is a [[Macro (computer science)|macro]] processor, such as the [[C preprocessor]], which replaces patterns in source code according to relatively simple rules.
[10130880] |[[Software engine]]s output source code or [[Markup language|markup code]] that simultaneously become the input to another [[Process (computing)|computer process]].
[10130890] |The analogy is that of one process driving another process, with the computer code being burned as fuel.
[10130900] |[[Application server]]s are software engines that deliver applications to [[client computer]]s.
[10130910] |For example, a [[Wiki software|Wiki]] is an application server that allows users to build [[dynamic web page|dynamic content]] assembled from [[article (publishing)|articles]].
[10130920] |Wikis generate [[HTML]], [[CSS]], [[Java (programming language)|Java]], and [[Javascript]] which are then [[Interpreter (computing)|interpreted]] by a [[web browser]].
[10130930] |=== Simultaneous execution===
[10130940] |Many operating systems support [[computer multitasking|multitasking]] which enables many computer programs to appear to be running simultaneously on a single computer.
[10130950] |Operating systems may run multiple programs through [[process scheduling]] — a software mechanism to [[Context switch|switch]] the CPU among processes frequently so that users can [[Time-sharing|interact]] with each program while it is running.
[10130960] |Within hardware, modern day multiprocessor computers or computers with multicore processors may run multiple programs.
[10130970] |== Functional categories ==
[10130980] |Computer programs may be categorized along functional lines.
[10130990] |These functional categories are [[system software]] and [[application software]].
[10131000] |System software includes the [[operating system]] which couples the [[computer hardware|computer's hardware]] with the application software.
[10131010] |The purpose of the operating system is to provide an environment in which application software executes in a convenient and efficient manner.
[10131020] |In addition to the operating system, system software includes [[Utility software|utility programs]] that help manage and tune the computer.
[10131030] |If a computer program is not system software then it is application software.
[10131040] |Application software includes [[middleware]], which couples the system software with the [[user interface]].
[10131050] |Application software also includes utility programs that help users solve application problems, like the need for sorting.
[10140010] |Computer science
[10140020] |'''Computer science''' (or '''computing science''') is the study and the [[science]] of the theoretical foundations of [[information]] and [[computation]] and their implementation and application in [[computer|computer system]]s.
[10140030] |Computer science has many sub-fields; some emphasize the computation of specific results (such as [[computer graphics]]), while others relate to properties of [[computational problem]]s (such as [[computational complexity theory]]).
[10140040] |Still others focus on the challenges in implementing computations.
[10140050] |For example, [[programming language theory]] studies approaches to describing computations, while [[computer programming]] applies specific [[programming language]]s to solve specific computational problems.
[10140060] |A further subfield, [[human-computer interaction]], focuses on the challenges in making computers and computations useful, usable and universally accessible to [[humans|people]].
[10140070] |== History ==
[10140080] |The early foundations of what would become computer science predate the invention of the modern [[digital computer]].
[10140090] |Machines for calculating fixed numerical tasks, such as the [[abacus]], have existed since antiquity.
[10140100] |[[Wilhelm Schickard]] built the first mechanical calculator in 1623.
[10140110] |[[Charles Babbage]] designed a [[difference engine]] in [[Victorian era|Victorian]] times (between 1837 and 1901) helped by [[Ada Lovelace]].
[10140120] |Around 1900, the [[IBM]] corporation sold [[Key_punch|punch-card machines]].
[10140130] |However, all of these machines were constrained to perform a single task, or at best some subset of all possible tasks.
[10140140] |During the 1940s, as newer and more powerful computing machines were developed, the term ''computer'' came to refer to the machines rather than their human predecessors.
[10140150] |As it became clear that computers could be used for more than just mathematical calculations, the field of computer science broadened to study [[computation]] in general.
[10140160] |Computer science began to be established as a distinct academic discipline in the 1960s, with the creation of the first computer science departments and degree programs.
[10140170] |Since practical computers became available, many applications of computing have become distinct areas of study in their own right.
[10140180] |Many initially believed it impossible that "computers themselves could actually be a scientific field of study" (Levy 1984, p. 11), though it was in the "late fifties" (Levy 1984, p.11) that it gradually became accepted among the greater academic population.
[10140190] |It is the now well-known IBM brand that formed part of the computer science revolution during this time.
[10140200] |'IBM' (short for International Business Machines) released the IBM 704 and later the IBM 709 computers, which were widely used during the exploration period of such devices.
[10140210] |"Still, working with the IBM [computer] was frustrating...if you had misplaced as much as one letter in one instruction, the program would crash, and you would have to start the whole process over again" (Levy 1984, p.13).
[10140220] |During the late 1950s, the computer science discipline was very much in its developmental stages, and such issues were commonplace.
[10140230] |Time has seen significant improvements in the useability and effectiveness of computer science technology.
[10140240] |Modern society has seen a significant shift from computers being used solely by experts or professionals to a more widespread user base.
[10140250] |By the 1990s, computers became accepted as being the norm within everyday life.
[10140260] |During this time data entry was a primary component of the use of computers, many preferring to streamline their business practices through the use of a computer.
[10140270] |This also gave the additional benefit of removing the need of large amounts of documentation and file records which consumed much-needed physical space within offices.
[10140280] |== Major achievements ==
[10140290] |Despite its relatively short history as a formal academic discipline, computer science has made a number of fundamental contributions to [[science]] and [[society]].
[10140300] |These include:
[10140310] |;Applications within computer science
[10140320] |* A formal definition of [[computation]] and [[computability]], and proof that there are computationally [[Undecidable problem|unsolvable]] and [[Intractable#Intractability|intractable]] problems.
[10140330] |* The concept of a [[programming language]], a tool for the precise expression of methodological information at various levels of abstraction.
[10140340] |;Applications outside of computing
[10140350] |* Sparked the [[Digital Revolution]] which led to the current [[Information Age]] and the [[Internet]].
[10140360] |* In [[cryptography]], [[Cryptanalysis of the Enigma|breaking the Enigma machine]] was an important factor contributing to the Allied victory in World War II.
[10140370] |* [[Scientific computing]] enabled advanced study of the mind and mapping the human genome was possible with [[Human Genome Project]].
[10140380] |[[Distributed computing]] projects like [[Folding@home]] explore [[protein folding]].
[10140390] |* [[Algorithmic trading]] has increased the [[Economic efficiency|efficiency]] and [[Market liquidity|liquidity]] of financial markets by using [[artificial intelligence]], [[machine learning]] and other [[statistics|statistical]] and [[Numerical analysis|numerical]] techniques on a large scale.
[10140400] |== Relationship with other fields ==
[10140410] |Despite its name, a significant amount of computer science does not involve the study of computers themselves.
[10140420] |Because of this, several alternative names have been proposed.
[10140430] |Danish scientist [[Peter Naur]] suggested the term ''datalogy'', to reflect the fact that the scientific discipline revolves around data and data treatment, while not necessarily involving computers.
[10140440] |The first scientific institution to use the term was the Department of Datalogy at the University of Copenhagen, founded in 1969, with Peter Naur being the first professor in datalogy.
[10140450] |The term is used mainly in the Scandinavian countries.
[10140460] |Also, in the early days of computing, a number of terms for the and practitioners of the field of computing were suggested in the ''Communications are of the ACM''—''turingineer'', ''turologist'', ''flow-charts-man'', ''applied meta-mathematician'', and ''applied epistemologist''.
[10140470] |Three months later in the same journal, ''comptologist'' was suggested, followed next year by ''hypologist''.
[10140480] |Recently the term ''computics'' has been suggested.
[10140490] |''Informatik'' was a term used in Europe with more frequency.
[10140500] |The renowned computer scientist [[Edsger W. Dijkstra|Edsger Dijkstra]] stated, "Computer science is no more about computers than astronomy is about telescopes."
[10140510] |The design and deployment of computers and computer systems is generally considered the province of disciplines other than computer science.
[10140520] |For example, the study of [[computer hardware]] is usually considered part of [[computer engineering]], while the study of commercial [[computer system]]s and their deployment is often called [[information technology]] or [[information systems]].
[10140530] |Computer science is sometimes criticized as being insufficiently scientific, a view espoused in the statement "Science is to computer science as hydrodynamics is to plumbing", credited to [[Stan Kelly-Bootle]] and others.
[10140540] |However, there has been much cross-fertilization of ideas between the various computer-related disciplines.
[10140550] |Computer science research has also often crossed into other disciplines, such as [[cognitive science]], [[economics]], [[mathematics]], [[physics]] (see [[quantum computing]]), and [[linguistics]].
[10140560] |Computer science is considered by some to have a much closer relationship with [[mathematics]] than many scientific disciplines.
[10140570] |Early computer science was strongly influenced by the work of mathematicians such as [[Kurt Gödel]] and [[Alan Turing]], and there continues to be a useful interchange of ideas between the two fields in areas such as [[mathematical logic]], [[category theory]], [[domain theory]], and [[algebra]].
[10140580] |The relationship between computer science and [[software engineering]] is a contentious issue, which is further muddied by [[Debates within software engineering|disputes]] over what the term "software engineering" means, and how computer science is defined.
[10140590] |[[David Parnas]], taking a cue from the relationship between other engineering and science disciplines, has claimed that the principal focus of computer science is studying the properties of computation in general, while the principal focus of software engineering is the design of specific computations to achieve practical goals, making the two separate but complementary disciplines.
[10140600] |The academic, political, and funding aspects of computer science tend to have roots as to whether a department in the U.S. formed with either a mathematical emphasis or an engineering emphasis.
[10140610] |In general, electrical engineering-based computer science departments have tended to succeed as computer science and/or engineering departments.
[10140620] |Computer science departments with a mathematics emphasis and with a numerical orientation consider alignment [[computational science]].
[10140630] |Both types of departments tend to make efforts to bridge the field educationally if not across all research.
[10140640] |== Fields of computer science ==
[10140650] |Computer science searches for concepts and [[formal proof]]s to explain and describe computational systems of interest.
[10140660] |As with all sciences, these theories can then be utilised to synthesize practical engineering applications, which in turn may suggest new systems to be studied and analysed.
[10140670] |While the [[ACM Computing Classification System]] can be used to split computer science up into different topics of fields, a more descriptive breakdown follows:
[10140680] |=== Mathematical foundations ===
[10140690] |; [[Mathematical logic]]
[10140700] |: Boolean logic and other ways of modeling logical queries; the uses and limitations of formal proof methods.
[10140710] |; [[Number theory]]
[10140720] |: Theory of proofs and heuristics for finding proofs in the simple domain of integers.
[10140730] |Used in [[cryptography]] as well as a test domain in [[artificial intelligence]].
[10140740] |; [[Graph theory]]
[10140750] |: Foundations for data structures and searching algorithms.
[10140760] |; [[Type theory]]
[10140770] |: Formal analysis of the types of data, and the use of these types to understand properties of programs, especially program safety.
[10140780] |; [[Category theory]]
[10140790] |: Category theory provides a means of capturing all of math and computation in a single synthesis.
[10140800] |; [[Computational geometry]]
[10140810] |: The study of [[algorithm]]s to solve problems stated in terms of [[geometry]].
[10140820] |; [[Numerical analysis]]
[10140830] |: Foundations for algorithms in discrete mathematics, as well as the study of the limitations of floating point computation, including [[round-off]] errors.
[10140840] |=== Theory of computation ===
[10140850] |; [[Automata theory]]
[10140860] |: Different logical structures for solving problems.
[10140870] |; [[Computability theory (computer science)|Computability theory]]
[10140880] |: What is calculable with the current models of computers.
[10140890] |Proofs developed by [[Alan Turing]] and others provide insight into the possibilities of what can be computed and what cannot.
[10140900] |; [[Computational complexity theory]]
[10140910] |: Fundamental bounds (especially time and storage space) on classes of computations; in practice, study of which problems a computer can solve with reasonable resources (while computability theory studies which problems can be solved at all).
[10140920] |; [[Quantum computing|Quantum computing theory]]
[10140930] |: Representation and manipulation of data using the quantum properties of particles and quantum mechanism.
[10140940] |=== Algorithms and data structures ===
[10140950] |; [[Analysis of algorithms]]
[10140960] |: Time and space complexity of algorithms.
[10140970] |; [[Algorithms]]
[10140980] |: Formal logical processes used for computation, and the efficiency of these processes.
[10140990] |=== Programming languages and compilers ===
[10141000] |; [[Compiler]]s
[10141010] |: Ways of translating computer programs, usually from [[high-level programming language|higher level]] languages to [[low-level programming language|lower level]] ones.
[10141020] |; [[Interpreter (computing)|Interpreter]]s
[10141030] |: A program that takes in as input a computer program and executes it.
[10141040] |; [[Programming language]]s
[10141050] |: Formal language paradigms for expressing algorithms, and the properties of these languages (e.g., what problems they are suited to solve).
[10141060] |=== Concurrent, parallel, and distributed systems ===
[10141070] |; [[Concurrency (computer science)|Concurrency]]
[10141080] |: The theory and practice of simultaneous computation; data safety in any multitasking or multithreaded environment.
[10141090] |; [[Distributed computing]]
[10141100] |: Computing using multiple computing devices over a network to accomplish a common objective or task and thereby reducing the latency involved in single processor contributions for any task.
[10141110] |; [[Parallel computing]]
[10141120] |: Computing using multiple concurrent threads of execution.
[10141130] |=== Software engineering ===
[10141140] |; [[Algorithm design]]
[10141150] |: Using ideas from algorithm theory to creatively design solutions to real tasks
[10141160] |; [[Computer programming]]
[10141170] |: The practice of using a programming language to implement algorithms
[10141180] |; [[Formal methods]]
[10141190] |: Mathematical approaches for describing and reasoning about software designs.
[10141200] |; [[Reverse engineering]]
[10141210] |: The application of the scientific method to the understanding of arbitrary existing software
[10141220] |; [[Software development]]
[10141230] |: The principles and practice of designing, developing, and testing programs, as well as proper engineering practices.
[10141240] |=== System architecture ===
[10141250] |; [[Computer architecture]]
[10141260] |: The design, organization, optimization and verification of a computer system, mostly about [[CPU]]s and [[memory (computers)|memory]] subsystems (and the bus connecting them).
[10141270] |; [[Computer organization]]
[10141280] |: The implementation of computer architectures, in terms of descriptions of their specific [[electrical circuit]]ry
[10141290] |; [[Operating system]]s
[10141300] |: Systems for managing computer programs and providing the basis of a useable system.
[10141310] |=== Communications ===
[10141320] |; [[Computer audio]]
[10141330] |: Algorithms and data structures for the creation, manipulation, storage, and transmission of [[digital audio]] recordings.
[10141340] |Also important in [[voice recognition]] applications.
[10141350] |; [[Computer networking|Networking]]
[10141360] |: Algorithms and protocols for communicating data across different shared or dedicated media, often including [[error correction]].
[10141370] |; [[Cryptography]]
[10141380] |: Applies results from complexity, probability and number theory to invent and break codes.
[10141390] |=== Databases ===
[10141400] |; [[Data mining]]
[10141410] |: Data mining is the extraction of relevant data from all sources of data.
[10141420] |; [[Relational databases]]
[10141430] |: Study of algorithms for searching and processing information in documents and databases; closely related to [[information retrieval]].
[10141440] |; [[OLAP]]
[10141450] |: Online Analytical Processing, or OLAP, is an approach to quickly provide answers to analytical queries that are multi-dimensional in nature.
[10141460] |OLAP is part of the broader category [[business intelligence]], which also encompasses relational reporting and data mining.
[10141470] |=== Artificial intelligence ===
[10141480] |; [[Artificial intelligence]]
[10141490] |: The implementation and study of systems that exhibit an autonomous intelligence or behaviour of their own.
[10141500] |; [[Artificial life]]
[10141510] |: The study of digital organisms to learn about biological systems and evolution.
[10141520] |; [[Automated reasoning]]
[10141530] |: Solving engines, such as used in [[Prolog]], which produce steps to a result given a query on a fact and rule database.
[10141540] |; [[Computer vision]]
[10141550] |: Algorithms for identifying three dimensional objects from one or more two dimensional pictures.
[10141560] |; [[Machine learning]]
[10141570] |: Automated creation of a set of rules and axioms based on input.
[10141580] |; [[Natural language processing]]/[[Computational linguistics]]
[10141590] |: Automated understanding and generation of human language
[10141600] |; [[Robotics]]
[10141610] |: Algorithms for controlling the behavior of robots.
[10141620] |=== Visual rendering (or Computer graphics) ===
[10141630] |; [[Computer graphics]]
[10141640] |: Algorithms both for generating visual images synthetically, and for integrating or altering visual and spatial information sampled from the real world.
[10141650] |; [[Image processing]]
[10141660] |: Determining information from an image through computation.
[10141670] |=== Human-Computer Interaction ===
[10141680] |; [[Human computer interaction]]
[10141690] |: The study of making computers and computations useful, usable and universally accessible to [[user (computing)|people]], including the study and design of computer interfaces through which people use computers.
[10141700] |=== Scientific computing ===
[10141710] |; [[Bioinformatics]]
[10141720] |: The use of computer science to maintain, analyse, and store [[biological data]], and to assist in solving biological problems such as [[protein folding]], function prediction and [[phylogeny]].
[10141730] |; [[Cognitive Science]]
[10141740] |: Computational modelling of real minds
[10141750] |; [[Computational chemistry]]
[10141760] |: Computational modelling of theoretical chemistry in order to determine chemical structures and properties
[10141770] |; [[Computational neuroscience]]
[10141780] |: Computational modelling of real brains
[10141790] |; [[Computational physics]]
[10141800] |: Numerical simulations of large non-analytic systems
[10141810] |; [[Numerical analysis|Numerical algorithms]]
[10141820] |: Algorithms for the numerical solution of mathematical problems such as [[Root-finding algorithm|root-finding]], [[Numerical integration|integration]], the [[Numerical ordinary differential equations|solution of ordinary differential equations]] and the approximation/evaluation of [[special functions]].
[10141830] |; [[Symbolic mathematics]]
[10141840] |: Manipulation and solution of expressions in symbolic form, also known as [[Computer algebra]].
[10141850] |=== Didactics of computer science/informatics ===
[10141860] |The subfield didactics of computer science focuses on cognitive approaches of developing competencies of computer science and specific strategies for analysis, design, implementation and evaluation of excellent lessons in computer science.
[10141870] |== Computer science education ==
[10141880] |Some universities teach computer science as a theoretical study of computation and algorithmic reasoning.
[10141890] |These programs often feature the [[theory of computation]], [[analysis of algorithms]], [[formal methods]], [[Concurrency (computer science)|concurrency theory]], [[databases]], [[computer graphics]] and [[systems analysis]], among others.
[10141900] |They typically also teach [[computer programming]], but treat it as a vessel for the support of other fields of computer science rather than a central focus of high-level study.
[10141910] |Other colleges and universities, as well as [[secondary school]]s and vocational programs that teach computer science, emphasize the practice of advanced [[computer programming]] rather than the theory of algorithms and computation in their computer science curricula.
[10141920] |Such curricula tend to focus on those skills that are important to workers entering the software industry.
[10141930] |The practical aspects of computer programming are often referred to as [[software engineering]].
[10141940] |However, there is a lot of [[Debates within software engineering|disagreement]] over what the term "software engineering" actually means, and whether it is the same thing as programming.
[10150010] |Corpus linguistics
[10150020] |'''Corpus linguistics''' is the [[study of language]] as expressed in [[sample]]s ''([[Text corpus|corpora]])'' or "real world" text.
[10150030] |This method represents a [[digest]]ive approach to deriving a set of abstract rules by which a [[natural language]] is governed or else relates to another language.
[10150040] |Originally done by hand, corpora are largely derived by an automated process, which is corrected.
[10150050] |Computational methods had once been viewed as a [[holy grail]] of [[linguistics|linguistic]] research, which would ultimately manifest a [[ruleset]] for [[natural language processing]] and [[machine translation]] at a high level.
[10150060] |Such has not been the case, and since the [[cognitive revolution]], cognitive linguistics has been largely critical of many claimed practical uses for corpora.
[10150070] |However, as [[computation]] capacity and speed have increased, the use of corpora to study language and term relationships en masse has gained some respectability.
[10150080] |The corpus approach runs counter to [[Noam Chomsky]]'s view that real language is riddled with performance-related errors, thus requiring careful analysis of small speech samples obtained in a highly controlled laboratory setting.
[10150090] |Corpus linguistics does away with Chomsky's ''competence/performance'' split; adherents believe that reliable language analysis best occurs on field-collected samples, in natural contexts and with minimal experimental interference.
[10150100] |== History ==
[10150110] |A landmark in modern corpus linguistics was the publication by [[Henry Kucera]] and [[Nelson Francis]] of ''Computational Analysis of Present-Day American English'' in 1967, a work based on the analysis of the [[Brown Corpus]], a carefully compiled selection of current American English, totalling about a million words drawn from a wide variety of sources.
[10150120] |Kucera and Francis subjected it to a variety of computational analyses, from which they compiled a rich and variegated opus, combining elements of linguistics, language teaching, [[psychology]], [[statistics]], and [[sociology]].
[10150130] |A further key publication was [[Randolph Quirk]]'s 'Towards a description of English Usage' (1960, Transactions of the Philological Society, 40-61) in which he introduced ''The Survey of English Usage''.
[10150140] |Shortly thereafter, Boston publisher [[Houghton-Mifflin]] approached Kucera to supply a million word, three-line citation base for its new ''[[The American Heritage Dictionary of the English Language|American Heritage Dictionary]]'', the first [[dictionary]] to be compiled using corpus linguistics.
[10150150] |The AHD made the innovative step of combining prescriptive elements (how language ''should'' be used) with descriptive information (how it actually ''is'' used).
[10150160] |Other publishers followed suit.
[10150170] |The British publisher Collins' [[COBUILD]] [[monolingual learner's dictionary]], designed for users learning [[English language learning and teaching|English as a foreign language]], was compiled using the [[Bank of English]].
[10150180] |The [[Brown Corpus]] has also spawned a number of similarly structured corpora: the [[LOB Corpus]] (1960s [[British English]]), Kolhapur ([[Indian English]]), Wellington ([[New Zealand English]]), Australian Corpus of English ([[Australian English]]), the Frown Corpus ([[early 1990s]] [[American English]]), and the FLOB Corpus (1990s British English).
[10150190] |Other corpora represent many languages, varieties and modes, and include the [[International Corpus of English]], and the [[British National Corpus]], a 100 million word collection of a range of spoken and written texts, created in the 1990s by a consortium of publishers, universities ([[Oxford University|Oxford]] and [[Lancaster University|Lancaster]]) and the [[British Library]].
[10150200] |For contemporary American English, work has stalled on the [[American National Corpus]], but the 360 million word [[Corpus of Contemporary American English (COCA)]] (1990-present) is now available.
[10150210] |== Methods ==
[10150220] |This means dealing with real input data, where descriptions based on a linguist's intuition are not usually helpful.